2025-12-04T08:52:49.9367515Z Current runner version: '2.329.0' 2025-12-04T08:52:49.9370662Z Runner name: 'linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr' 2025-12-04T08:52:49.9371081Z Runner group name: 'default' 2025-12-04T08:52:49.9371513Z Machine name: 'linux' 2025-12-04T08:52:49.9372628Z ##[group]GITHUB_TOKEN Permissions 2025-12-04T08:52:49.9373794Z Contents: read 2025-12-04T08:52:49.9374055Z Metadata: read 2025-12-04T08:52:49.9374297Z ##[endgroup] 2025-12-04T08:52:49.9375313Z Secret source: Actions 2025-12-04T08:52:49.9375637Z Prepare workflow directory 2025-12-04T08:52:49.9613013Z Prepare all required actions 2025-12-04T08:52:49.9632576Z Getting action download info 2025-12-04T08:52:50.4382981Z Download action repository 'pytorch/pytorch@main' (SHA:ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T08:52:54.6062250Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd) 2025-12-04T08:52:55.9114815Z Download action repository 'actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T08:52:57.0910036Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722) 2025-12-04T08:52:58.1507090Z Getting action download info 2025-12-04T08:52:58.3542795Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5) 2025-12-04T08:52:59.3126097Z Getting action download info 2025-12-04T08:52:59.5377845Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e) 2025-12-04T08:53:00.4301214Z Getting action download info 2025-12-04T08:53:00.6397604Z Uses: pytorch/pytorch/.github/workflows/_rocm-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32) 2025-12-04T08:53:00.6399472Z ##[group] Inputs 2025-12-04T08:53:00.6399633Z build-environment: linux-noble-rocm-py3.12-mi300 2025-12-04T08:53:00.6401373Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T08:53:00.6403383Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:00.6403681Z sync-tag: 2025-12-04T08:53:00.6404054Z timeout-minutes: 300 2025-12-04T08:53:00.6404165Z tests-to-include: 2025-12-04T08:53:00.6404277Z dashboard-tag: 2025-12-04T08:53:00.6404505Z disable-monitor: true 2025-12-04T08:53:00.6404853Z monitor-log-interval: 5 2025-12-04T08:53:00.6405062Z monitor-data-collect-interval: 1 2025-12-04T08:53:00.6405194Z ##[endgroup] 2025-12-04T08:53:00.6405399Z Complete job name: linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T08:53:00.6635755Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main 2025-12-04T08:53:00.6636019Z with: 2025-12-04T08:53:00.6636109Z no-sudo: true 2025-12-04T08:53:00.6636202Z submodules: recursive 2025-12-04T08:53:00.6636300Z fetch-depth: 0 2025-12-04T08:53:00.6636427Z env: 2025-12-04T08:53:00.6636521Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:00.6636633Z ##[endgroup] 2025-12-04T08:53:00.6694149Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:53:00.6694520Z echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT" 2025-12-04T08:53:00.6701013Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:00.6701174Z env: 2025-12-04T08:53:00.6701262Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:00.6701364Z ##[endgroup] 2025-12-04T08:53:00.6858809Z ##[group]Run actions/checkout@v4 2025-12-04T08:53:00.6859002Z with: 2025-12-04T08:53:00.6859124Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:00.6859264Z fetch-depth: 0 2025-12-04T08:53:00.6859369Z submodules: recursive 2025-12-04T08:53:00.6859600Z show-progress: false 2025-12-04T08:53:00.6859715Z repository: pytorch/pytorch 2025-12-04T08:53:00.6859891Z token: *** 2025-12-04T08:53:00.6859988Z ssh-strict: true 2025-12-04T08:53:00.6860082Z ssh-user: git 2025-12-04T08:53:00.6860186Z persist-credentials: true 2025-12-04T08:53:00.6860299Z clean: true 2025-12-04T08:53:00.6860402Z sparse-checkout-cone-mode: true 2025-12-04T08:53:00.6860525Z fetch-tags: false 2025-12-04T08:53:00.6860623Z lfs: false 2025-12-04T08:53:00.6860717Z set-safe-directory: true 2025-12-04T08:53:00.6860835Z env: 2025-12-04T08:53:00.6860924Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:00.6861032Z ##[endgroup] 2025-12-04T08:53:00.7386017Z Syncing repository: pytorch/pytorch 2025-12-04T08:53:00.7386620Z ##[group]Getting Git version info 2025-12-04T08:53:00.7386793Z Working directory is '/home/runner/_work/pytorch/pytorch' 2025-12-04T08:53:00.7387053Z [command]/usr/bin/git version 2025-12-04T08:53:00.7387170Z git version 2.52.0 2025-12-04T08:53:00.7403074Z ##[endgroup] 2025-12-04T08:53:00.7407833Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/9301bd53-8760-43ae-82ef-6e6500e674ad/.gitconfig' 2025-12-04T08:53:00.7413602Z Temporarily overriding HOME='/home/runner/_work/_temp/9301bd53-8760-43ae-82ef-6e6500e674ad' before making global git config changes 2025-12-04T08:53:00.7413943Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T08:53:00.7416400Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T08:53:00.7447170Z [command]/usr/bin/git config --local --get remote.origin.url 2025-12-04T08:53:00.7464400Z https://github.com/pytorch/pytorch 2025-12-04T08:53:00.7472388Z ##[group]Removing previously created refs, to avoid conflicts 2025-12-04T08:53:00.7474709Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-12-04T08:53:00.7496069Z refs/heads/main 2025-12-04T08:53:00.7507685Z [command]/usr/bin/git checkout --detach 2025-12-04T08:53:02.3349269Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:53:02.3404662Z [command]/usr/bin/git branch --delete --force main 2025-12-04T08:53:02.3607118Z Deleted branch main (was ffd9b0fb4355). 2025-12-04T08:53:02.3614704Z ##[endgroup] 2025-12-04T08:53:02.3619993Z [command]/usr/bin/git submodule status 2025-12-04T08:53:02.3895437Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-12-04T08:53:02.3968636Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-12-04T08:53:02.4019601Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-12-04T08:53:02.4079296Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-12-04T08:53:02.4119698Z 3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93) 2025-12-04T08:53:02.4170064Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-12-04T08:53:02.4463787Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-12-04T08:53:02.4494279Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-12-04T08:53:02.4515596Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-12-04T08:53:02.4568297Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-12-04T08:53:02.4659041Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-12-04T08:53:02.4752788Z f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30) 2025-12-04T08:53:02.4775259Z 0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c) 2025-12-04T08:53:02.4842536Z f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1) 2025-12-04T08:53:02.4865653Z c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39) 2025-12-04T08:53:02.4917277Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-12-04T08:53:02.4937105Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-12-04T08:53:02.5172094Z 407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0) 2025-12-04T08:53:02.5249862Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-12-04T08:53:02.5326258Z 54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0) 2025-12-04T08:53:02.5461716Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-12-04T08:53:02.5520593Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-12-04T08:53:02.5562424Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-12-04T08:53:02.5699575Z 31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main) 2025-12-04T08:53:02.5717894Z d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0) 2025-12-04T08:53:02.5731491Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-12-04T08:53:02.5746200Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-12-04T08:53:02.5938674Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-12-04T08:53:02.5952402Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-12-04T08:53:02.5974390Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-12-04T08:53:02.6183879Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-12-04T08:53:02.6235509Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-12-04T08:53:02.6274673Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-12-04T08:53:02.6290690Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-12-04T08:53:02.6344275Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-12-04T08:53:02.6393902Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-12-04T08:53:02.6447184Z 2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main) 2025-12-04T08:53:02.6457254Z ##[group]Cleaning the repository 2025-12-04T08:53:02.6462508Z [command]/usr/bin/git clean -ffdx 2025-12-04T08:53:02.6586330Z [command]/usr/bin/git reset --hard HEAD 2025-12-04T08:53:03.2632475Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:53:03.2713941Z ##[endgroup] 2025-12-04T08:53:03.2715629Z ##[group]Disabling automatic garbage collection 2025-12-04T08:53:03.2727514Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T08:53:03.2756385Z ##[endgroup] 2025-12-04T08:53:03.2756549Z ##[group]Setting up auth 2025-12-04T08:53:03.2760684Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T08:53:03.2784967Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T08:53:03.2986051Z Entering 'android/libs/fbjni' 2025-12-04T08:53:03.3014584Z Entering 'third_party/FP16' 2025-12-04T08:53:03.3041725Z Entering 'third_party/FXdiv' 2025-12-04T08:53:03.3070669Z Entering 'third_party/NNPACK' 2025-12-04T08:53:03.3096578Z Entering 'third_party/NVTX' 2025-12-04T08:53:03.3122646Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:03.3152586Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:03.3186252Z Entering 'third_party/aiter' 2025-12-04T08:53:03.3215557Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:03.3244855Z Entering 'third_party/benchmark' 2025-12-04T08:53:03.3268731Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:03.3297241Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:03.3321909Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:03.3346753Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:03.3368855Z Entering 'third_party/cutlass' 2025-12-04T08:53:03.3395578Z Entering 'third_party/fbgemm' 2025-12-04T08:53:03.3420468Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:03.3442698Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:03.3476238Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:03.3499878Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:03.3527137Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:03.3554536Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:03.3576677Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:03.3602362Z Entering 'third_party/flash-attention' 2025-12-04T08:53:03.3624170Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:03.3650283Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:03.3690813Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:03.3733157Z Entering 'third_party/fmt' 2025-12-04T08:53:03.3762556Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:03.3797440Z Entering 'third_party/gloo' 2025-12-04T08:53:03.3820586Z Entering 'third_party/googletest' 2025-12-04T08:53:03.3841244Z Entering 'third_party/ideep' 2025-12-04T08:53:03.3866062Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:03.3897377Z Entering 'third_party/ittapi' 2025-12-04T08:53:03.3927021Z Entering 'third_party/kineto' 2025-12-04T08:53:03.3949555Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:03.3971970Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:03.3999481Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:03.4024156Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:03.4049376Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:03.4075362Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:03.4098559Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:03.4119836Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:03.4148242Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:03.4176073Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:03.4201041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:03.4229830Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.4256415Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.4287332Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:03.4309119Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:03.4331459Z Entering 'third_party/kleidiai' 2025-12-04T08:53:03.4354985Z Entering 'third_party/mimalloc' 2025-12-04T08:53:03.4380237Z Entering 'third_party/nlohmann' 2025-12-04T08:53:03.4403039Z Entering 'third_party/onnx' 2025-12-04T08:53:03.4434255Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:03.4462323Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:03.4489204Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:03.4511745Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:03.4535641Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:03.4557253Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:03.4578726Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:03.4599230Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:03.4623398Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:03.4645522Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.4668951Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.4696124Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:03.4724414Z Entering 'third_party/pocketfft' 2025-12-04T08:53:03.4747048Z Entering 'third_party/protobuf' 2025-12-04T08:53:03.4768494Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:03.4798249Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:03.4824056Z Entering 'third_party/psimd' 2025-12-04T08:53:03.4852181Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:03.4877792Z Entering 'third_party/pybind11' 2025-12-04T08:53:03.4901682Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:03.4924421Z Entering 'third_party/sleef' 2025-12-04T08:53:03.4946775Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:03.4971049Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:03.4999174Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:03.5037422Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:03.5064435Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:03.5084510Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:03.5131879Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T08:53:03.5156875Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T08:53:03.5332345Z Entering 'android/libs/fbjni' 2025-12-04T08:53:03.5361751Z Entering 'third_party/FP16' 2025-12-04T08:53:03.5395157Z Entering 'third_party/FXdiv' 2025-12-04T08:53:03.5422870Z Entering 'third_party/NNPACK' 2025-12-04T08:53:03.5453157Z Entering 'third_party/NVTX' 2025-12-04T08:53:03.5479948Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:03.5504541Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:03.5546181Z Entering 'third_party/aiter' 2025-12-04T08:53:03.5576438Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:03.5609439Z Entering 'third_party/benchmark' 2025-12-04T08:53:03.5638267Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:03.5675808Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:03.5705411Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:03.5737296Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:03.5764660Z Entering 'third_party/cutlass' 2025-12-04T08:53:03.5795846Z Entering 'third_party/fbgemm' 2025-12-04T08:53:03.5828787Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:03.5858516Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:03.5887111Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:03.5915542Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:03.5947688Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:03.5977926Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:03.6002430Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:03.6027690Z Entering 'third_party/flash-attention' 2025-12-04T08:53:03.6055692Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:03.6086917Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:03.6113551Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:03.6145201Z Entering 'third_party/fmt' 2025-12-04T08:53:03.6172411Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:03.6198349Z Entering 'third_party/gloo' 2025-12-04T08:53:03.6224772Z Entering 'third_party/googletest' 2025-12-04T08:53:03.6253237Z Entering 'third_party/ideep' 2025-12-04T08:53:03.6294246Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:03.6325496Z Entering 'third_party/ittapi' 2025-12-04T08:53:03.6349331Z Entering 'third_party/kineto' 2025-12-04T08:53:03.6372147Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:03.6395054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:03.6418178Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:03.6444442Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:03.6466724Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:03.6494190Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:03.6516523Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:03.6536845Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:03.6557658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:03.6584702Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:03.6609825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:03.6633909Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.6662440Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.6689079Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:03.6715895Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:03.6746291Z Entering 'third_party/kleidiai' 2025-12-04T08:53:03.6775206Z Entering 'third_party/mimalloc' 2025-12-04T08:53:03.6807036Z Entering 'third_party/nlohmann' 2025-12-04T08:53:03.6838192Z Entering 'third_party/onnx' 2025-12-04T08:53:03.6877879Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:03.6910821Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:03.6939068Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:03.6969316Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:03.6999017Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:03.7023165Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:03.7054226Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:03.7080125Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:03.7109484Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:03.7143238Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.7170442Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.7202434Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:03.7238320Z Entering 'third_party/pocketfft' 2025-12-04T08:53:03.7267489Z Entering 'third_party/protobuf' 2025-12-04T08:53:03.7297111Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:03.7322919Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:03.7352419Z Entering 'third_party/psimd' 2025-12-04T08:53:03.7383688Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:03.7410738Z Entering 'third_party/pybind11' 2025-12-04T08:53:03.7439109Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:03.7463582Z Entering 'third_party/sleef' 2025-12-04T08:53:03.7492136Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:03.7530576Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:03.7563879Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:03.7592531Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:03.7623532Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:03.7655090Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:03.7699515Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:03.7718722Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T08:53:03.7907588Z Entering 'android/libs/fbjni' 2025-12-04T08:53:03.7919998Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:53:03.7932503Z Entering 'third_party/FP16' 2025-12-04T08:53:03.7946140Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:53:03.7957764Z Entering 'third_party/FXdiv' 2025-12-04T08:53:03.7969040Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:53:03.7985259Z Entering 'third_party/NNPACK' 2025-12-04T08:53:03.7996522Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:53:03.8005653Z Entering 'third_party/NVTX' 2025-12-04T08:53:03.8015997Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:53:03.8025117Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:03.8042538Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:53:03.8055203Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:03.8073077Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:53:03.8091161Z Entering 'third_party/aiter' 2025-12-04T08:53:03.8104326Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:53:03.8113193Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:03.8126554Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:53:03.8145126Z Entering 'third_party/benchmark' 2025-12-04T08:53:03.8156399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:03.8174001Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:03.8189709Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:53:03.8204535Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:03.8216505Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:53:03.8226970Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:03.8243704Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:53:03.8253565Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:03.8267804Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:53:03.8283764Z Entering 'third_party/cutlass' 2025-12-04T08:53:03.8298577Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:53:03.8312028Z Entering 'third_party/fbgemm' 2025-12-04T08:53:03.8324246Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:53:03.8335046Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:03.8347924Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:53:03.8359267Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:03.8374157Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:53:03.8386028Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:03.8396639Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:53:03.8404856Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:03.8420620Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:53:03.8432766Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:03.8443730Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:53:03.8455239Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:03.8471655Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:53:03.8481728Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:03.8493351Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:53:03.8504832Z Entering 'third_party/flash-attention' 2025-12-04T08:53:03.8523228Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:53:03.8533586Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:03.8546440Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:53:03.8561940Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:03.8570400Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:53:03.8588540Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:03.8603695Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:53:03.8616179Z Entering 'third_party/fmt' 2025-12-04T08:53:03.8637217Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:03.8649887Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:03.8663861Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:53:03.8675618Z Entering 'third_party/gloo' 2025-12-04T08:53:03.8690170Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:53:03.8700723Z Entering 'third_party/googletest' 2025-12-04T08:53:03.8720072Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.8732036Z Entering 'third_party/ideep' 2025-12-04T08:53:03.8744918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:53:03.8755029Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:03.8767994Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:53:03.8781802Z Entering 'third_party/ittapi' 2025-12-04T08:53:03.8794853Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:53:03.8804911Z Entering 'third_party/kineto' 2025-12-04T08:53:03.8817861Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:53:03.8829177Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:03.8845202Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:53:03.8855374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:03.8867139Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:53:03.8881029Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:03.8897083Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:53:03.8910174Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:03.8921520Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:03.8930917Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:03.8943085Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:53:03.8953199Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:03.8969053Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:53:03.8981449Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:03.8992662Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:53:03.9008684Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:03.9019705Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.9029164Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:03.9041231Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:53:03.9054007Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:03.9068113Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:53:03.9076075Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:03.9086125Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:03.9098481Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.9109066Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:03.9117944Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.9129094Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:03.9141204Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:03.9152781Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:53:03.9162811Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:03.9174169Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.9187031Z Entering 'third_party/kleidiai' 2025-12-04T08:53:03.9201024Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:53:03.9209786Z Entering 'third_party/mimalloc' 2025-12-04T08:53:03.9220255Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:53:03.9228373Z Entering 'third_party/nlohmann' 2025-12-04T08:53:03.9242136Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:53:03.9257042Z Entering 'third_party/onnx' 2025-12-04T08:53:03.9267728Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:53:03.9283995Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:03.9293957Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:03.9308726Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:03.9319893Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:53:03.9328638Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:03.9338048Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:03.9347098Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:03.9356224Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.9365499Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:03.9386630Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:53:03.9394828Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:03.9405793Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:53:03.9418758Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:03.9430714Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:53:03.9440777Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:03.9450706Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:53:03.9458840Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:03.9468128Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:03.9477980Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:03.9488559Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:03.9498634Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:03.9511483Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:03.9524812Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:03.9535467Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:53:03.9552091Z Entering 'third_party/pocketfft' 2025-12-04T08:53:03.9562984Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:53:03.9572573Z Entering 'third_party/protobuf' 2025-12-04T08:53:03.9583036Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:53:03.9594265Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:03.9605336Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:03.9615003Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:03.9625022Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.9636476Z Entering 'third_party/psimd' 2025-12-04T08:53:03.9647676Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:53:03.9661523Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:03.9672135Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:53:03.9681882Z Entering 'third_party/pybind11' 2025-12-04T08:53:03.9693159Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:03.9703017Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:03.9713478Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:53:03.9722850Z Entering 'third_party/sleef' 2025-12-04T08:53:03.9733430Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:53:03.9743005Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:03.9753109Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:53:03.9769293Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:03.9782490Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:03.9795002Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:03.9809797Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:53:03.9820522Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:03.9834266Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:53:03.9845395Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:03.9859944Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:03.9869284Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:03.9886882Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:53:03.9917695Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:03.9946217Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:03.9969855Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:03.9993307Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0012471Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0033845Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0055879Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0076438Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0092660Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0110366Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0131559Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0147411Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0167500Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0185958Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0200024Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0216246Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0231101Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0250526Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0266271Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0280765Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0296276Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0312987Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0334931Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0351133Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0367806Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0385774Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0403330Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0419166Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0434978Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0457668Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0478694Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0494208Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0509891Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0526968Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0550439Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0568616Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0585425Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0601435Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0618499Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0636729Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0652814Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0669383Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0685740Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0701829Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0718766Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0736100Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0752357Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0773501Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0793417Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0810868Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0827418Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0844607Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0861233Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0878082Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0898329Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0916894Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0940107Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0957280Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0973145Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.0988581Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1011623Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1034272Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1053388Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1070494Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1089041Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1109268Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1133763Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1153388Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1174689Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1191544Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1216738Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1239159Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1257341Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1275973Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1294360Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1310436Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1331761Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1351000Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1369757Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1388220Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1412188Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:04.1440700Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:53:04.1477702Z ##[endgroup] 2025-12-04T08:53:04.1477905Z ##[group]Fetching the repository 2025-12-04T08:53:04.1481647Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T08:53:08.3817589Z From https://github.com/pytorch/pytorch 2025-12-04T08:53:08.3818180Z * [new branch] 2.6.0.dev20241004+ -> origin/2.6.0.dev20241004+ 2025-12-04T08:53:08.3818685Z * [new branch] 2.9.1 -> origin/2.9.1 2025-12-04T08:53:08.3819278Z * [new branch] AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest 2025-12-04T08:53:08.3819912Z * [new branch] Flamefire-patch-1 -> origin/Flamefire-patch-1 2025-12-04T08:53:08.3820507Z * [new branch] HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes 2025-12-04T08:53:08.3820985Z * [new branch] HOPrintFunc -> origin/HOPrintFunc 2025-12-04T08:53:08.3821444Z * [new branch] IvanKobzarev/stack/1 -> origin/IvanKobzarev/stack/1 2025-12-04T08:53:08.3821880Z * [new branch] NicoshevSVE128 -> origin/NicoshevSVE128 2025-12-04T08:53:08.3822387Z * [new branch] PR-AOTInductorNoneBug -> origin/PR-AOTInductorNoneBug 2025-12-04T08:53:08.3822862Z * [new branch] PR-AOTInductorNoneBugFix -> origin/PR-AOTInductorNoneBugFix 2025-12-04T08:53:08.3823388Z * [new branch] PR-FixConfigsIssue -> origin/PR-FixConfigsIssue 2025-12-04T08:53:08.3823828Z * [new branch] PR-NoneBugFix-viable -> origin/PR-NoneBugFix-viable 2025-12-04T08:53:08.3824245Z * [new branch] PR-ResetToZero -> origin/PR-ResetToZero 2025-12-04T08:53:08.3824674Z * [new branch] Update-Flash-Packaging -> origin/Update-Flash-Packaging 2025-12-04T08:53:08.3825104Z * [new branch] VLA_exp -> origin/VLA_exp 2025-12-04T08:53:08.3825499Z * [new branch] activation_bench -> origin/activation_bench 2025-12-04T08:53:08.3825912Z * [new branch] addmm-heuristic -> origin/addmm-heuristic 2025-12-04T08:53:08.3826320Z * [new branch] adi/onednn_aarch64 -> origin/adi/onednn_aarch64 2025-12-04T08:53:08.3826720Z * [new branch] adi/test -> origin/adi/test 2025-12-04T08:53:08.3827093Z * [new branch] adi/test_bgemm -> origin/adi/test_bgemm 2025-12-04T08:53:08.3827480Z * [new branch] adi/test_m8g -> origin/adi/test_m8g 2025-12-04T08:53:08.3827859Z * [new branch] adi/test_onednn -> origin/adi/test_onednn 2025-12-04T08:53:08.3828256Z * [new branch] adi/test_onednn_v3.9 -> origin/adi/test_onednn_v3.9 2025-12-04T08:53:08.3828685Z * [new branch] adi/test_presve_change -> origin/adi/test_presve_change 2025-12-04T08:53:08.3829736Z * [new branch] adi/test_timm -> origin/adi/test_timm 2025-12-04T08:53:08.3830292Z * [new branch] adi/testpresve_change -> origin/adi/testpresve_change 2025-12-04T08:53:08.3830674Z * [new branch] aditew01/test/vec_bf16 -> origin/aditew01/test/vec_bf16 2025-12-04T08:53:08.3830989Z * [new branch] ah-globalfeedback-hook -> origin/ah-globalfeedback-hook 2025-12-04T08:53:08.3831339Z * [new branch] albanD-patch-1 -> origin/albanD-patch-1 2025-12-04T08:53:08.3831643Z * [new branch] also-surround-shimh -> origin/also-surround-shimh 2025-12-04T08:53:08.3831932Z * [new branch] angelayi/aot_compile -> origin/angelayi/aot_compile 2025-12-04T08:53:08.3832279Z * [new branch] angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files 2025-12-04T08:53:08.3832611Z * [new branch] angelayi/benchmark -> origin/angelayi/benchmark 2025-12-04T08:53:08.3832960Z * [new branch] angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization 2025-12-04T08:53:08.3833389Z * [new branch] angelayi/cpp_loader -> origin/angelayi/cpp_loader 2025-12-04T08:53:08.3833694Z * [new branch] angelayi/inductor_const -> origin/angelayi/inductor_const 2025-12-04T08:53:08.3833979Z * [new branch] angelayi/lstm -> origin/angelayi/lstm 2025-12-04T08:53:08.3834268Z * [new branch] angelayi/no_so_weight -> origin/angelayi/no_so_weight 2025-12-04T08:53:08.3834561Z * [new branch] angelayi/scan_layers -> origin/angelayi/scan_layers 2025-12-04T08:53:08.3834845Z * [new branch] angelayi/side_eff -> origin/angelayi/side_eff 2025-12-04T08:53:08.3835137Z * [new branch] angelayi/state_dict -> origin/angelayi/state_dict 2025-12-04T08:53:08.3835428Z * [new branch] angelayi/symint_input -> origin/angelayi/symint_input 2025-12-04T08:53:08.3835719Z * [new branch] angelayi/symm_mem -> origin/angelayi/symm_mem 2025-12-04T08:53:08.3836003Z * [new branch] angelayi/test_cpp -> origin/angelayi/test_cpp 2025-12-04T08:53:08.3836284Z * [new branch] angelayi/torch_size -> origin/angelayi/torch_size 2025-12-04T08:53:08.3836571Z * [new branch] annotate_assert -> origin/annotate_assert 2025-12-04T08:53:08.3836870Z * [new branch] annotate_fallback_kernel -> origin/annotate_fallback_kernel 2025-12-04T08:53:08.3837171Z * [new branch] annotation_deepcopy -> origin/annotation_deepcopy 2025-12-04T08:53:08.3837456Z * [new branch] annotation_dynamo -> origin/annotation_dynamo 2025-12-04T08:53:08.3837737Z * [new branch] aot_eager_stack_trace -> origin/aot_eager_stack_trace 2025-12-04T08:53:08.3838023Z * [new branch] aoti-cuda-alloc -> origin/aoti-cuda-alloc 2025-12-04T08:53:08.3838308Z * [new branch] aoti_const_device -> origin/aoti_const_device 2025-12-04T08:53:08.3838597Z * [new branch] aoti_fqn_name_interface -> origin/aoti_fqn_name_interface 2025-12-04T08:53:08.3838919Z * [new branch] aoti_package_weights_binary -> origin/aoti_package_weights_binary 2025-12-04T08:53:08.3839235Z * [new branch] aoti_target_windows -> origin/aoti_target_windows 2025-12-04T08:53:08.3839574Z * [new branch] arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling 2025-12-04T08:53:08.3839911Z * [new branch] async_tp -> origin/async_tp 2025-12-04T08:53:08.3840228Z * [new branch] atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124 2025-12-04T08:53:08.3840638Z * [new branch] atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1 2025-12-04T08:53:08.3841034Z * [new branch] atalman-patch-2 -> origin/atalman-patch-2 2025-12-04T08:53:08.3841246Z * [new branch] atalman-patch-3 -> origin/atalman-patch-3 2025-12-04T08:53:08.3841508Z * [new branch] atalman-patch-4 -> origin/atalman-patch-4 2025-12-04T08:53:08.3841718Z * [new branch] atalman-patch-5 -> origin/atalman-patch-5 2025-12-04T08:53:08.3841928Z * [new branch] atalman-patch-6 -> origin/atalman-patch-6 2025-12-04T08:53:08.3842144Z * [new branch] atalman-patch-7 -> origin/atalman-patch-7 2025-12-04T08:53:08.3842356Z * [new branch] atalman-patch-8 -> origin/atalman-patch-8 2025-12-04T08:53:08.3842585Z * [new branch] atalman_inductor_2.3.1 -> origin/atalman_inductor_2.3.1 2025-12-04T08:53:08.3842821Z * [new branch] atalman_inductor_2.4.0 -> origin/atalman_inductor_2.4.0 2025-12-04T08:53:08.3843054Z * [new branch] atalman_inductor_2.4.x -> origin/atalman_inductor_2.4.x 2025-12-04T08:53:08.3843364Z * [new branch] attention_benchmarking_clean -> origin/attention_benchmarking_clean 2025-12-04T08:53:08.3843635Z * [new branch] bahuang/dt_fix_scalar_add -> origin/bahuang/dt_fix_scalar_add 2025-12-04T08:53:08.3843875Z * [new branch] bahuang/fix_debug_mode -> origin/bahuang/fix_debug_mode 2025-12-04T08:53:08.3844108Z * [new branch] bahuang/fix_expand -> origin/bahuang/fix_expand 2025-12-04T08:53:08.3844321Z * [new branch] bahuang/test -> origin/bahuang/test 2025-12-04T08:53:08.3844527Z * [new branch] base/1.5 -> origin/base/1.5 2025-12-04T08:53:08.3844781Z * [new branch] batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention 2025-12-04T08:53:08.3845042Z * [new branch] bench_scaled_mm_ops -> origin/bench_scaled_mm_ops 2025-12-04T08:53:08.3845274Z * [new branch] benchmark-updates -> origin/benchmark-updates 2025-12-04T08:53:08.3845513Z * [new branch] benchmarking-script -> origin/benchmarking-script 2025-12-04T08:53:08.3845745Z * [new branch] bertmaher/pinbump26 -> origin/bertmaher/pinbump26 2025-12-04T08:53:08.3845967Z * [new branch] bertrand/cutlass -> origin/bertrand/cutlass 2025-12-04T08:53:08.3846194Z * [new branch] bf/bug-static-input -> origin/bf/bug-static-input 2025-12-04T08:53:08.3846419Z * [new branch] bf/cg-backend -> origin/bf/cg-backend 2025-12-04T08:53:08.3846634Z * [new branch] bf/cg-nccl-test -> origin/bf/cg-nccl-test 2025-12-04T08:53:08.3846850Z * [new branch] bf/cg-remove-check -> origin/bf/cg-remove-check 2025-12-04T08:53:08.3847084Z * [new branch] bf/clean-torchbench-hf -> origin/bf/clean-torchbench-hf 2025-12-04T08:53:08.3847318Z * [new branch] bf/combo-debug-log -> origin/bf/combo-debug-log 2025-12-04T08:53:08.3847536Z * [new branch] bf/cudagraph -> origin/bf/cudagraph 2025-12-04T08:53:08.3847810Z * [new branch] bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation 2025-12-04T08:53:08.3848244Z * [new branch] bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark 2025-12-04T08:53:08.3848623Z * [new branch] bf/cudagraph-partition -> origin/bf/cudagraph-partition 2025-12-04T08:53:08.3848869Z * [new branch] bf/donated-buffer-bench -> origin/bf/donated-buffer-bench 2025-12-04T08:53:08.3849108Z * [new branch] bf/dynamo-partition -> origin/bf/dynamo-partition 2025-12-04T08:53:08.3849324Z * [new branch] bf/lite -> origin/bf/lite 2025-12-04T08:53:08.3849583Z * [new branch] bf/pa-non-divisible -> origin/bf/pa-non-divisible 2025-12-04T08:53:08.3849851Z * [new branch] bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols 2025-12-04T08:53:08.3850178Z * [new branch] bf/partition-memory-plan -> origin/bf/partition-memory-plan 2025-12-04T08:53:08.3850440Z * [new branch] bf/partition-move-cpu -> origin/bf/partition-move-cpu 2025-12-04T08:53:08.3850657Z * [new branch] bf/partition-view-fallback -> origin/bf/partition-view-fallback 2025-12-04T08:53:08.3850870Z * [new branch] bf/remove-check-55b0c39d -> origin/bf/remove-check-55b0c39d 2025-12-04T08:53:08.3851072Z * [new branch] bf/timm-nov-26-2025 -> origin/bf/timm-nov-26-2025 2025-12-04T08:53:08.3851281Z * [new branch] bf/transformer-pin-4-57-3 -> origin/bf/transformer-pin-4-57-3 2025-12-04T08:53:08.3851506Z * [new branch] bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492 2025-12-04T08:53:08.3851736Z * [new branch] bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb 2025-12-04T08:53:08.3851958Z * [new branch] bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129 2025-12-04T08:53:08.3852169Z * [new branch] bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d 2025-12-04T08:53:08.3852381Z * [new branch] bisect_perf_hf_T5_5268754e -> origin/bisect_perf_hf_T5_5268754e 2025-12-04T08:53:08.3852598Z * [new branch] bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c 2025-12-04T08:53:08.3852811Z * [new branch] bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c 2025-12-04T08:53:08.3853028Z * [new branch] bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f 2025-12-04T08:53:08.3853244Z * [new branch] bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0 2025-12-04T08:53:08.3853508Z * [new branch] bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149 2025-12-04T08:53:08.3853728Z * [new branch] bisect_perf_hf_T5_d65f194a -> origin/bisect_perf_hf_T5_d65f194a 2025-12-04T08:53:08.3853935Z * [new branch] bisect_perf_hf_T5_da94ab0b -> origin/bisect_perf_hf_T5_da94ab0b 2025-12-04T08:53:08.3854144Z * [new branch] bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new 2025-12-04T08:53:08.3854367Z * [new branch] bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8 2025-12-04T08:53:08.3854583Z * [new branch] bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2 2025-12-04T08:53:08.3854791Z * [new branch] bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563 2025-12-04T08:53:08.3854998Z * [new branch] brister/fx_device_type -> origin/brister/fx_device_type 2025-12-04T08:53:08.3855220Z * [new branch] brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx 2025-12-04T08:53:08.3855468Z * [new branch] brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check 2025-12-04T08:53:08.3855694Z * [new branch] bwd-backup -> origin/bwd-backup 2025-12-04T08:53:08.3855864Z * [new branch] c57382a49 -> origin/c57382a49 2025-12-04T08:53:08.3856032Z * [new branch] ca_0431d47eaa -> origin/ca_0431d47eaa 2025-12-04T08:53:08.3856212Z * [new branch] ca_fix_0431d47eaa -> origin/ca_fix_0431d47eaa 2025-12-04T08:53:08.3856418Z * [new branch] camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push 2025-12-04T08:53:08.3856621Z * [new branch] cccclai-patch-1 -> origin/cccclai-patch-1 2025-12-04T08:53:08.3856927Z * [new branch] cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3857274Z * [new branch] cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3857551Z * [new branch] cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3857836Z * [new branch] cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3858123Z * [new branch] cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3858396Z * [new branch] cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3858673Z * [new branch] cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3858953Z * [new branch] cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3859237Z * [new branch] cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3859514Z * [new branch] cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3859789Z * [new branch] cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3860064Z * [new branch] cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3860340Z * [new branch] cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3860614Z * [new branch] cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3860890Z * [new branch] cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3861172Z * [new branch] cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3861445Z * [new branch] cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3861716Z * [new branch] cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3861995Z * [new branch] cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_ 2025-12-04T08:53:08.3862234Z * [new branch] cherry_pick_166036_166040 -> origin/cherry_pick_166036_166040 2025-12-04T08:53:08.3862423Z * [new branch] cherry_pick_166457 -> origin/cherry_pick_166457 2025-12-04T08:53:08.3862608Z * [new branch] cherrypick_166338 -> origin/cherrypick_166338 2025-12-04T08:53:08.3862795Z * [new branch] cherrypick_166458 -> origin/cherrypick_166458 2025-12-04T08:53:08.3862972Z * [new branch] cherrypick_166586 -> origin/cherrypick_166586 2025-12-04T08:53:08.3863151Z * [new branch] cherrypick_166956 -> origin/cherrypick_166956 2025-12-04T08:53:08.3863374Z * [new branch] ci_attn -> origin/ci_attn 2025-12-04T08:53:08.3863654Z * [new branch] codex-testing -> origin/codex-testing 2025-12-04T08:53:08.3863923Z * [new branch] codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions 2025-12-04T08:53:08.3864228Z * [new branch] codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch 2025-12-04T08:53:08.3864544Z * [new branch] codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id 2025-12-04T08:53:08.3864973Z * [new branch] codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run 2025-12-04T08:53:08.3865273Z * [new branch] compatiblpy39util -> origin/compatiblpy39util 2025-12-04T08:53:08.3865454Z * [new branch] cond_hop_device -> origin/cond_hop_device 2025-12-04T08:53:08.3865630Z * [new branch] context_test -> origin/context_test 2025-12-04T08:53:08.3865876Z * [new branch] copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip 2025-12-04T08:53:08.3866118Z * [new branch] cpio/fix_new_ami_tests -> origin/cpio/fix_new_ami_tests 2025-12-04T08:53:08.3866338Z * [new branch] cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade 2025-12-04T08:53:08.3866556Z * [new branch] csl/always_produce_xml -> origin/csl/always_produce_xml 2025-12-04T08:53:08.3866762Z * [new branch] csl/build_test_more_procs -> origin/csl/build_test_more_procs 2025-12-04T08:53:08.3866977Z * [new branch] csl/build_test_more_procs2 -> origin/csl/build_test_more_procs2 2025-12-04T08:53:08.3867169Z * [new branch] csl/clean_up -> origin/csl/clean_up 2025-12-04T08:53:08.3867361Z * [new branch] csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit 2025-12-04T08:53:08.3867573Z * [new branch] csl/katex -> origin/csl/katex 2025-12-04T08:53:08.3867750Z * [new branch] csl/larger_runner -> origin/csl/larger_runner 2025-12-04T08:53:08.3867933Z * [new branch] csl/lint_testing -> origin/csl/lint_testing 2025-12-04T08:53:08.3868106Z * [new branch] csl/lint_thing -> origin/csl/lint_thing 2025-12-04T08:53:08.3868293Z * [new branch] csl/lintrunner_stuff -> origin/csl/lintrunner_stuff 2025-12-04T08:53:08.3868489Z * [new branch] csl/manually_gen_json -> origin/csl/manually_gen_json 2025-12-04T08:53:08.3868677Z * [new branch] csl/mps_sharding -> origin/csl/mps_sharding 2025-12-04T08:53:08.3868865Z * [new branch] csl/multistage_docker -> origin/csl/multistage_docker 2025-12-04T08:53:08.3869053Z * [new branch] csl/print_timing -> origin/csl/print_timing 2025-12-04T08:53:08.3869236Z * [new branch] csl/remove_experiment -> origin/csl/remove_experiment 2025-12-04T08:53:08.3869442Z * [new branch] csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var 2025-12-04T08:53:08.3869689Z * [new branch] csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel 2025-12-04T08:53:08.3869914Z * [new branch] csl/remove_run_parallel -> origin/csl/remove_run_parallel 2025-12-04T08:53:08.3870114Z * [new branch] csl/remove_unused_vars -> origin/csl/remove_unused_vars 2025-12-04T08:53:08.3870301Z * [new branch] csl/revert_open -> origin/csl/revert_open 2025-12-04T08:53:08.3870479Z * [new branch] csl/skip_build -> origin/csl/skip_build 2025-12-04T08:53:08.3870678Z * [new branch] csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs 2025-12-04T08:53:08.3870879Z * [new branch] csl/td_job_level -> origin/csl/td_job_level 2025-12-04T08:53:08.3871085Z * [new branch] csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner 2025-12-04T08:53:08.3871338Z * [new branch] csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn 2025-12-04T08:53:08.3871593Z * [new branch] csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence 2025-12-04T08:53:08.3871814Z * [new branch] csl/upload_json_running -> origin/csl/upload_json_running 2025-12-04T08:53:08.3872043Z * [new branch] csl/win_sccache -> origin/csl/win_sccache 2025-12-04T08:53:08.3872240Z * [new branch] csl/xml_stuff -> origin/csl/xml_stuff 2025-12-04T08:53:08.3872416Z * [new branch] cublasrelax2 -> origin/cublasrelax2 2025-12-04T08:53:08.3872591Z * [new branch] cuda_mempool -> origin/cuda_mempool 2025-12-04T08:53:08.3872769Z * [new branch] custom_lowering_dict -> origin/custom_lowering_dict 2025-12-04T08:53:08.3872974Z * [new branch] d4l3k/debug_plane_frtrace -> origin/d4l3k/debug_plane_frtrace 2025-12-04T08:53:08.3873169Z * [new branch] daxia6/2.8o3 -> origin/daxia6/2.8o3 2025-12-04T08:53:08.3873435Z * [new branch] debug-guard -> origin/debug-guard 2025-12-04T08:53:08.3873618Z * [new branch] delete-quant-docs -> origin/delete-quant-docs 2025-12-04T08:53:08.3873951Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 2025-12-04T08:53:08.3874411Z * [new branch] dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 2025-12-04T08:53:08.3874750Z * [new branch] desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper 2025-12-04T08:53:08.3874999Z * [new branch] desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64 2025-12-04T08:53:08.3875235Z * [new branch] dev/dhruva/flex_attn_opt -> origin/dev/dhruva/flex_attn_opt 2025-12-04T08:53:08.3875441Z * [new branch] dev/joona/MPSNDArrayAdd -> origin/dev/joona/MPSNDArrayAdd 2025-12-04T08:53:08.3875637Z * [new branch] dev/joona/Unranked -> origin/dev/joona/Unranked 2025-12-04T08:53:08.3875816Z * [new branch] dev/joona/cat -> origin/dev/joona/cat 2025-12-04T08:53:08.3876007Z * [new branch] dev/joona/embeddingbag -> origin/dev/joona/embeddingbag 2025-12-04T08:53:08.3876215Z * [new branch] dev/joona/fix_sdpa_memtest -> origin/dev/joona/fix_sdpa_memtest 2025-12-04T08:53:08.3876436Z * [new branch] dev/joona/getTensorsString -> origin/dev/joona/getTensorsString 2025-12-04T08:53:08.3876661Z * [new branch] dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14 2025-12-04T08:53:08.3876874Z * [new branch] dev/joona/scalar_clamp -> origin/dev/joona/scalar_clamp 2025-12-04T08:53:08.3877057Z * [new branch] dev/joona/sdpa -> origin/dev/joona/sdpa 2025-12-04T08:53:08.3877240Z * [new branch] dev/joona/sdpa_api -> origin/dev/joona/sdpa_api 2025-12-04T08:53:08.3877428Z * [new branch] dev/joona/type_inf -> origin/dev/joona/type_inf 2025-12-04T08:53:08.3877630Z * [new branch] dev/joona/ulpAssertClose -> origin/dev/joona/ulpAssertClose 2025-12-04T08:53:08.3877832Z * [new branch] dev/joona/upsize3d -> origin/dev/joona/upsize3d 2025-12-04T08:53:08.3878014Z * [new branch] disp_counter -> origin/disp_counter 2025-12-04T08:53:08.3878193Z * [new branch] divyanshk-patch-1 -> origin/divyanshk-patch-1 2025-12-04T08:53:08.3878370Z * [new branch] docs -> origin/docs 2025-12-04T08:53:08.3878543Z * [new branch] documentation -> origin/documentation 2025-12-04T08:53:08.3878728Z * [new branch] eager_model_benchmarks -> origin/eager_model_benchmarks 2025-12-04T08:53:08.3878947Z * [new branch] embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control 2025-12-04T08:53:08.3879176Z * [new branch] embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B 2025-12-04T08:53:08.3879433Z * [new branch] embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B 2025-12-04T08:53:08.3879665Z * [new branch] eqy-patch-1 -> origin/eqy-patch-1 2025-12-04T08:53:08.3879840Z * [new branch] eqy-patch-2 -> origin/eqy-patch-2 2025-12-04T08:53:08.3880014Z * [new branch] eqy-patch-3 -> origin/eqy-patch-3 2025-12-04T08:53:08.3880184Z * [new branch] eqy-patch-4 -> origin/eqy-patch-4 2025-12-04T08:53:08.3880352Z * [new branch] eqy-patch-5 -> origin/eqy-patch-5 2025-12-04T08:53:08.3880522Z * [new branch] eqy-patch-6 -> origin/eqy-patch-6 2025-12-04T08:53:08.3880712Z * [new branch] exclamaforte/amd-ma -> origin/exclamaforte/amd-ma 2025-12-04T08:53:08.3880949Z * [new branch] exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run 2025-12-04T08:53:08.3881214Z * [new branch] exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor 2025-12-04T08:53:08.3881472Z * [new branch] exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion 2025-12-04T08:53:08.3881759Z * [new branch] exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning 2025-12-04T08:53:08.3882059Z * [new branch] exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg 2025-12-04T08:53:08.3882369Z * [new branch] exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run 2025-12-04T08:53:08.3882634Z * [new branch] exclamaforte/fusion-data -> origin/exclamaforte/fusion-data 2025-12-04T08:53:08.3882867Z * [new branch] exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run 2025-12-04T08:53:08.3883127Z * [new branch] exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model 2025-12-04T08:53:08.3883391Z * [new branch] exclamaforte/gemm-model -> origin/exclamaforte/gemm-model 2025-12-04T08:53:08.3883661Z * [new branch] exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection 2025-12-04T08:53:08.3883930Z * [new branch] exclamaforte/gemm-to-amd -> origin/exclamaforte/gemm-to-amd 2025-12-04T08:53:08.3884151Z * [new branch] exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model 2025-12-04T08:53:08.3884425Z * [new branch] exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor 2025-12-04T08:53:08.3884702Z * [new branch] exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo 2025-12-04T08:53:08.3884966Z * [new branch] exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization 2025-12-04T08:53:08.3885242Z * [new branch] exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode 2025-12-04T08:53:08.3885517Z * [new branch] exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs 2025-12-04T08:53:08.3885805Z * [new branch] exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2 2025-12-04T08:53:08.3886041Z * [new branch] exec -> origin/exec 2025-12-04T08:53:08.3886223Z * [new branch] experimental-mosaic -> origin/experimental-mosaic 2025-12-04T08:53:08.3886409Z * [new branch] export-D61047529 -> origin/export-D61047529 2025-12-04T08:53:08.3886589Z * [new branch] export-D71412006 -> origin/export-D71412006 2025-12-04T08:53:08.3886808Z * [new branch] export-D73042989 -> origin/export-D73042989 2025-12-04T08:53:08.3887037Z * [new branch] export-D78957093 -> origin/export-D78957093 2025-12-04T08:53:08.3887247Z * [new branch] export-D78996107 -> origin/export-D78996107 2025-12-04T08:53:08.3887423Z * [new branch] export-D80823877 -> origin/export-D80823877 2025-12-04T08:53:08.3887593Z * [new branch] export-D80958642 -> origin/export-D80958642 2025-12-04T08:53:08.3887774Z * [new branch] export-D81054193 -> origin/export-D81054193 2025-12-04T08:53:08.3887949Z * [new branch] export-D81204584 -> origin/export-D81204584 2025-12-04T08:53:08.3888121Z * [new branch] export-D81429090 -> origin/export-D81429090 2025-12-04T08:53:08.3888296Z * [new branch] export-D82250826 -> origin/export-D82250826 2025-12-04T08:53:08.3888475Z * [new branch] export-D82253817 -> origin/export-D82253817 2025-12-04T08:53:08.3888648Z * [new branch] export-D83541846 -> origin/export-D83541846 2025-12-04T08:53:08.3888822Z * [new branch] export-D83627170 -> origin/export-D83627170 2025-12-04T08:53:08.3888997Z * [new branch] export-D83766701 -> origin/export-D83766701 2025-12-04T08:53:08.3889171Z * [new branch] export-D83768878 -> origin/export-D83768878 2025-12-04T08:53:08.3889347Z * [new branch] export-D83769447 -> origin/export-D83769447 2025-12-04T08:53:08.3889518Z * [new branch] export-D84089824 -> origin/export-D84089824 2025-12-04T08:53:08.3889693Z * [new branch] export-D84213020 -> origin/export-D84213020 2025-12-04T08:53:08.3889868Z * [new branch] export-D84373821 -> origin/export-D84373821 2025-12-04T08:53:08.3890040Z * [new branch] export-D84612194 -> origin/export-D84612194 2025-12-04T08:53:08.3890221Z * [new branch] export-D84890985 -> origin/export-D84890985 2025-12-04T08:53:08.3890395Z * [new branch] export-D85122326 -> origin/export-D85122326 2025-12-04T08:53:08.3890568Z * [new branch] export-D86256198 -> origin/export-D86256198 2025-12-04T08:53:08.3890743Z * [new branch] export-D86460608 -> origin/export-D86460608 2025-12-04T08:53:08.3890918Z * [new branch] export-D86474796 -> origin/export-D86474796 2025-12-04T08:53:08.3891092Z * [new branch] export-D86712396 -> origin/export-D86712396 2025-12-04T08:53:08.3891267Z * [new branch] export-D87022129 -> origin/export-D87022129 2025-12-04T08:53:08.3891439Z * [new branch] export-D87838959 -> origin/export-D87838959 2025-12-04T08:53:08.3891609Z * [new branch] export-D88319437 -> origin/export-D88319437 2025-12-04T08:53:08.3891832Z * [new branch] exported-model-train-idempotent -> origin/exported-model-train-idempotent 2025-12-04T08:53:08.3892072Z * [new branch] ezyang-titan-october -> origin/ezyang-titan-october 2025-12-04T08:53:08.3892274Z * [new branch] ezyang-titan-october2 -> origin/ezyang-titan-october2 2025-12-04T08:53:08.3892462Z * [new branch] ezyang-war -> origin/ezyang-war 2025-12-04T08:53:08.3892661Z * [new branch] ezyang/wip-aot-descriptors -> origin/ezyang/wip-aot-descriptors 2025-12-04T08:53:08.3892862Z * [new branch] fa_u8_brgemm -> origin/fa_u8_brgemm 2025-12-04T08:53:08.3893055Z * [new branch] fadeputr/sequence_fbgemm -> origin/fadeputr/sequence_fbgemm 2025-12-04T08:53:08.3893297Z * [new branch] fastmath_baseline -> origin/fastmath_baseline 2025-12-04T08:53:08.3893474Z * [new branch] fbcode/warm -> origin/fbcode/warm 2025-12-04T08:53:08.3893679Z * [new branch] fca -> origin/fca 2025-12-04T08:53:08.3893842Z * [new branch] fca2_ca5984c -> origin/fca2_ca5984c 2025-12-04T08:53:08.3894038Z * [new branch] fca5 -> origin/fca5 2025-12-04T08:53:08.3894219Z * [new branch] feature/justknobs-cpp -> origin/feature/justknobs-cpp 2025-12-04T08:53:08.3894417Z * [new branch] feature/numa-forkserver -> origin/feature/numa-forkserver 2025-12-04T08:53:08.3894611Z * [new branch] ffast_math_baseline -> origin/ffast_math_baseline 2025-12-04T08:53:08.3894804Z * [new branch] ffast_math_target -> origin/ffast_math_target 2025-12-04T08:53:08.3894984Z * [new branch] findhao/base_commit -> origin/findhao/base_commit 2025-12-04T08:53:08.3895173Z * [new branch] findhao/base_commit1 -> origin/findhao/base_commit1 2025-12-04T08:53:08.3895366Z * [new branch] findhao/multistream2 -> origin/findhao/multistream2 2025-12-04T08:53:08.3895555Z * [new branch] findhao/multistream5 -> origin/findhao/multistream5 2025-12-04T08:53:08.3895758Z * [new branch] findhao/multistream6 -> origin/findhao/multistream6 2025-12-04T08:53:08.3895958Z * [new branch] findhao/operatorbench3 -> origin/findhao/operatorbench3 2025-12-04T08:53:08.3896159Z * [new branch] findhao/operatorbench5 -> origin/findhao/operatorbench5 2025-12-04T08:53:08.3896355Z * [new branch] findhao/tritonparse -> origin/findhao/tritonparse 2025-12-04T08:53:08.3896576Z * [new branch] fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format 2025-12-04T08:53:08.3896784Z * [new branch] fix-config-ignore -> origin/fix-config-ignore 2025-12-04T08:53:08.3896973Z * [new branch] fix-dict-guard -> origin/fix-dict-guard 2025-12-04T08:53:08.3897151Z * [new branch] fix_addmm_issue -> origin/fix_addmm_issue 2025-12-04T08:53:08.3897352Z * [new branch] fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims 2025-12-04T08:53:08.3897559Z * [new branch] fix_bench_bwd_pass -> origin/fix_bench_bwd_pass 2025-12-04T08:53:08.3897746Z * [new branch] fix_mem_profiler_config -> origin/fix_mem_profiler_config 2025-12-04T08:53:08.3897939Z * [new branch] fix_nvrtc_discovery -> origin/fix_nvrtc_discovery 2025-12-04T08:53:08.3898118Z * [new branch] fix_op_runner -> origin/fix_op_runner 2025-12-04T08:53:08.3898286Z * [new branch] fix_ubn_159469 -> origin/fix_ubn_159469 2025-12-04T08:53:08.3898458Z * [new branch] fixes-triage -> origin/fixes-triage 2025-12-04T08:53:08.3898633Z * [new branch] fixflashinfer -> origin/fixflashinfer 2025-12-04T08:53:08.3898818Z * [new branch] flash_decoding_cpu -> origin/flash_decoding_cpu 2025-12-04T08:53:08.3899000Z * [new branch] flex-flash -> origin/flex-flash 2025-12-04T08:53:08.3899204Z * [new branch] flex_attention_functorch_grad -> origin/flex_attention_functorch_grad 2025-12-04T08:53:08.3899400Z * [new branch] flex_flash -> origin/flex_flash 2025-12-04T08:53:08.3899608Z * [new branch] fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule 2025-12-04T08:53:08.3899858Z * [new branch] fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler 2025-12-04T08:53:08.3900075Z * [new branch] forkserver_fix -> origin/forkserver_fix 2025-12-04T08:53:08.3900253Z * [new branch] fsdp2_trace_rules -> origin/fsdp2_trace_rules 2025-12-04T08:53:08.3900433Z * [new branch] fx_cpp -> origin/fx_cpp 2025-12-04T08:53:08.3900628Z * [new branch] fy/fix-win -> origin/fy/fix-win 2025-12-04T08:53:08.3900802Z * [new branch] galv-patch-1 -> origin/galv-patch-1 2025-12-04T08:53:08.3901061Z * [new branch] galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4 2025-12-04T08:53:08.3901323Z * [new branch] georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch 2025-12-04T08:53:08.3901534Z * [new branch] gh/AlnisM/1/base -> origin/gh/AlnisM/1/base 2025-12-04T08:53:08.3901711Z * [new branch] gh/AlnisM/1/head -> origin/gh/AlnisM/1/head 2025-12-04T08:53:08.3901899Z * [new branch] gh/EikanWang/67/base -> origin/gh/EikanWang/67/base 2025-12-04T08:53:08.3902095Z * [new branch] gh/EikanWang/67/head -> origin/gh/EikanWang/67/head 2025-12-04T08:53:08.3902281Z * [new branch] gh/Gasoonjia/1/base -> origin/gh/Gasoonjia/1/base 2025-12-04T08:53:08.3902471Z * [new branch] gh/Gasoonjia/1/head -> origin/gh/Gasoonjia/1/head 2025-12-04T08:53:08.3902656Z * [new branch] gh/H-Huang/131/base -> origin/gh/H-Huang/131/base 2025-12-04T08:53:08.3902837Z * [new branch] gh/H-Huang/131/head -> origin/gh/H-Huang/131/head 2025-12-04T08:53:08.3903021Z * [new branch] gh/H-Huang/131/orig -> origin/gh/H-Huang/131/orig 2025-12-04T08:53:08.3903202Z * [new branch] gh/H-Huang/132/base -> origin/gh/H-Huang/132/base 2025-12-04T08:53:08.3903419Z * [new branch] gh/H-Huang/132/head -> origin/gh/H-Huang/132/head 2025-12-04T08:53:08.3903601Z * [new branch] gh/H-Huang/132/orig -> origin/gh/H-Huang/132/orig 2025-12-04T08:53:08.3903790Z * [new branch] gh/H-Huang/180/base -> origin/gh/H-Huang/180/base 2025-12-04T08:53:08.3903973Z * [new branch] gh/H-Huang/180/head -> origin/gh/H-Huang/180/head 2025-12-04T08:53:08.3904156Z * [new branch] gh/H-Huang/180/orig -> origin/gh/H-Huang/180/orig 2025-12-04T08:53:08.3904340Z * [new branch] gh/H-Huang/182/base -> origin/gh/H-Huang/182/base 2025-12-04T08:53:08.3904519Z * [new branch] gh/H-Huang/182/head -> origin/gh/H-Huang/182/head 2025-12-04T08:53:08.3904706Z * [new branch] gh/H-Huang/182/orig -> origin/gh/H-Huang/182/orig 2025-12-04T08:53:08.3904887Z * [new branch] gh/H-Huang/226/base -> origin/gh/H-Huang/226/base 2025-12-04T08:53:08.3905064Z * [new branch] gh/H-Huang/226/head -> origin/gh/H-Huang/226/head 2025-12-04T08:53:08.3905244Z * [new branch] gh/H-Huang/226/orig -> origin/gh/H-Huang/226/orig 2025-12-04T08:53:08.3905421Z * [new branch] gh/H-Huang/228/base -> origin/gh/H-Huang/228/base 2025-12-04T08:53:08.3905608Z * [new branch] gh/H-Huang/228/head -> origin/gh/H-Huang/228/head 2025-12-04T08:53:08.3905790Z * [new branch] gh/H-Huang/228/orig -> origin/gh/H-Huang/228/orig 2025-12-04T08:53:08.3905983Z * [new branch] gh/IvanKobzarev/150/base -> origin/gh/IvanKobzarev/150/base 2025-12-04T08:53:08.3906192Z * [new branch] gh/IvanKobzarev/150/head -> origin/gh/IvanKobzarev/150/head 2025-12-04T08:53:08.3906396Z * [new branch] gh/IvanKobzarev/150/orig -> origin/gh/IvanKobzarev/150/orig 2025-12-04T08:53:08.3906601Z * [new branch] gh/IvanKobzarev/157/base -> origin/gh/IvanKobzarev/157/base 2025-12-04T08:53:08.3906805Z * [new branch] gh/IvanKobzarev/157/head -> origin/gh/IvanKobzarev/157/head 2025-12-04T08:53:08.3907008Z * [new branch] gh/IvanKobzarev/157/orig -> origin/gh/IvanKobzarev/157/orig 2025-12-04T08:53:08.3907206Z * [new branch] gh/IvanKobzarev/159/base -> origin/gh/IvanKobzarev/159/base 2025-12-04T08:53:08.3907411Z * [new branch] gh/IvanKobzarev/159/head -> origin/gh/IvanKobzarev/159/head 2025-12-04T08:53:08.3907655Z * [new branch] gh/IvanKobzarev/159/orig -> origin/gh/IvanKobzarev/159/orig 2025-12-04T08:53:08.3907884Z * [new branch] gh/IvanKobzarev/162/base -> origin/gh/IvanKobzarev/162/base 2025-12-04T08:53:08.3908086Z * [new branch] gh/IvanKobzarev/162/head -> origin/gh/IvanKobzarev/162/head 2025-12-04T08:53:08.3908287Z * [new branch] gh/IvanKobzarev/162/orig -> origin/gh/IvanKobzarev/162/orig 2025-12-04T08:53:08.3908484Z * [new branch] gh/IvanKobzarev/163/base -> origin/gh/IvanKobzarev/163/base 2025-12-04T08:53:08.3908684Z * [new branch] gh/IvanKobzarev/163/head -> origin/gh/IvanKobzarev/163/head 2025-12-04T08:53:08.3908886Z * [new branch] gh/IvanKobzarev/163/orig -> origin/gh/IvanKobzarev/163/orig 2025-12-04T08:53:08.3909085Z * [new branch] gh/IvanKobzarev/166/base -> origin/gh/IvanKobzarev/166/base 2025-12-04T08:53:08.3909293Z * [new branch] gh/IvanKobzarev/166/head -> origin/gh/IvanKobzarev/166/head 2025-12-04T08:53:08.3909496Z * [new branch] gh/IvanKobzarev/166/orig -> origin/gh/IvanKobzarev/166/orig 2025-12-04T08:53:08.3909698Z * [new branch] gh/IvanKobzarev/167/base -> origin/gh/IvanKobzarev/167/base 2025-12-04T08:53:08.3909909Z * [new branch] gh/IvanKobzarev/167/head -> origin/gh/IvanKobzarev/167/head 2025-12-04T08:53:08.3910109Z * [new branch] gh/IvanKobzarev/167/orig -> origin/gh/IvanKobzarev/167/orig 2025-12-04T08:53:08.3910309Z * [new branch] gh/IvanKobzarev/168/base -> origin/gh/IvanKobzarev/168/base 2025-12-04T08:53:08.3910511Z * [new branch] gh/IvanKobzarev/168/head -> origin/gh/IvanKobzarev/168/head 2025-12-04T08:53:08.3910717Z * [new branch] gh/IvanKobzarev/168/orig -> origin/gh/IvanKobzarev/168/orig 2025-12-04T08:53:08.3910915Z * [new branch] gh/IvanKobzarev/169/base -> origin/gh/IvanKobzarev/169/base 2025-12-04T08:53:08.3911118Z * [new branch] gh/IvanKobzarev/169/head -> origin/gh/IvanKobzarev/169/head 2025-12-04T08:53:08.3911319Z * [new branch] gh/IvanKobzarev/169/orig -> origin/gh/IvanKobzarev/169/orig 2025-12-04T08:53:08.3911520Z * [new branch] gh/IvanKobzarev/170/base -> origin/gh/IvanKobzarev/170/base 2025-12-04T08:53:08.3911721Z * [new branch] gh/IvanKobzarev/170/head -> origin/gh/IvanKobzarev/170/head 2025-12-04T08:53:08.3911917Z * [new branch] gh/IvanKobzarev/170/orig -> origin/gh/IvanKobzarev/170/orig 2025-12-04T08:53:08.3912121Z * [new branch] gh/IvanKobzarev/171/base -> origin/gh/IvanKobzarev/171/base 2025-12-04T08:53:08.3912327Z * [new branch] gh/IvanKobzarev/171/head -> origin/gh/IvanKobzarev/171/head 2025-12-04T08:53:08.3912525Z * [new branch] gh/IvanKobzarev/171/orig -> origin/gh/IvanKobzarev/171/orig 2025-12-04T08:53:08.3912732Z * [new branch] gh/IvanKobzarev/172/base -> origin/gh/IvanKobzarev/172/base 2025-12-04T08:53:08.3912934Z * [new branch] gh/IvanKobzarev/172/head -> origin/gh/IvanKobzarev/172/head 2025-12-04T08:53:08.3913141Z * [new branch] gh/IvanKobzarev/172/orig -> origin/gh/IvanKobzarev/172/orig 2025-12-04T08:53:08.3913410Z * [new branch] gh/IvanKobzarev/173/base -> origin/gh/IvanKobzarev/173/base 2025-12-04T08:53:08.3913614Z * [new branch] gh/IvanKobzarev/173/head -> origin/gh/IvanKobzarev/173/head 2025-12-04T08:53:08.3913812Z * [new branch] gh/IvanKobzarev/173/orig -> origin/gh/IvanKobzarev/173/orig 2025-12-04T08:53:08.3914016Z * [new branch] gh/IvanKobzarev/174/base -> origin/gh/IvanKobzarev/174/base 2025-12-04T08:53:08.3914225Z * [new branch] gh/IvanKobzarev/174/head -> origin/gh/IvanKobzarev/174/head 2025-12-04T08:53:08.3914424Z * [new branch] gh/IvanKobzarev/174/orig -> origin/gh/IvanKobzarev/174/orig 2025-12-04T08:53:08.3914679Z * [new branch] gh/IvanKobzarev/175/base -> origin/gh/IvanKobzarev/175/base 2025-12-04T08:53:08.3914914Z * [new branch] gh/IvanKobzarev/175/head -> origin/gh/IvanKobzarev/175/head 2025-12-04T08:53:08.3915114Z * [new branch] gh/IvanKobzarev/175/orig -> origin/gh/IvanKobzarev/175/orig 2025-12-04T08:53:08.3915311Z * [new branch] gh/IvanKobzarev/176/base -> origin/gh/IvanKobzarev/176/base 2025-12-04T08:53:08.3915516Z * [new branch] gh/IvanKobzarev/176/head -> origin/gh/IvanKobzarev/176/head 2025-12-04T08:53:08.3915715Z * [new branch] gh/IvanKobzarev/176/orig -> origin/gh/IvanKobzarev/176/orig 2025-12-04T08:53:08.3915918Z * [new branch] gh/IvanKobzarev/177/base -> origin/gh/IvanKobzarev/177/base 2025-12-04T08:53:08.3916120Z * [new branch] gh/IvanKobzarev/177/head -> origin/gh/IvanKobzarev/177/head 2025-12-04T08:53:08.3916327Z * [new branch] gh/IvanKobzarev/177/orig -> origin/gh/IvanKobzarev/177/orig 2025-12-04T08:53:08.3916525Z * [new branch] gh/IvanKobzarev/178/base -> origin/gh/IvanKobzarev/178/base 2025-12-04T08:53:08.3916729Z * [new branch] gh/IvanKobzarev/178/head -> origin/gh/IvanKobzarev/178/head 2025-12-04T08:53:08.3916923Z * [new branch] gh/IvanKobzarev/178/orig -> origin/gh/IvanKobzarev/178/orig 2025-12-04T08:53:08.3917121Z * [new branch] gh/IvanKobzarev/179/base -> origin/gh/IvanKobzarev/179/base 2025-12-04T08:53:08.3917321Z * [new branch] gh/IvanKobzarev/179/head -> origin/gh/IvanKobzarev/179/head 2025-12-04T08:53:08.3917524Z * [new branch] gh/IvanKobzarev/179/orig -> origin/gh/IvanKobzarev/179/orig 2025-12-04T08:53:08.3917727Z * [new branch] gh/IvanKobzarev/180/base -> origin/gh/IvanKobzarev/180/base 2025-12-04T08:53:08.3917931Z * [new branch] gh/IvanKobzarev/180/head -> origin/gh/IvanKobzarev/180/head 2025-12-04T08:53:08.3918130Z * [new branch] gh/IvanKobzarev/180/orig -> origin/gh/IvanKobzarev/180/orig 2025-12-04T08:53:08.3918343Z * [new branch] gh/IvanKobzarev/181/base -> origin/gh/IvanKobzarev/181/base 2025-12-04T08:53:08.3918542Z * [new branch] gh/IvanKobzarev/181/head -> origin/gh/IvanKobzarev/181/head 2025-12-04T08:53:08.3918741Z * [new branch] gh/IvanKobzarev/181/orig -> origin/gh/IvanKobzarev/181/orig 2025-12-04T08:53:08.3918942Z * [new branch] gh/IvanKobzarev/182/base -> origin/gh/IvanKobzarev/182/base 2025-12-04T08:53:08.3919140Z * [new branch] gh/IvanKobzarev/182/head -> origin/gh/IvanKobzarev/182/head 2025-12-04T08:53:08.3919346Z * [new branch] gh/IvanKobzarev/182/orig -> origin/gh/IvanKobzarev/182/orig 2025-12-04T08:53:08.3919547Z * [new branch] gh/IvanKobzarev/183/base -> origin/gh/IvanKobzarev/183/base 2025-12-04T08:53:08.3919746Z * [new branch] gh/IvanKobzarev/183/head -> origin/gh/IvanKobzarev/183/head 2025-12-04T08:53:08.3919951Z * [new branch] gh/IvanKobzarev/183/orig -> origin/gh/IvanKobzarev/183/orig 2025-12-04T08:53:08.3920159Z * [new branch] gh/IvanKobzarev/184/base -> origin/gh/IvanKobzarev/184/base 2025-12-04T08:53:08.3920355Z * [new branch] gh/IvanKobzarev/184/head -> origin/gh/IvanKobzarev/184/head 2025-12-04T08:53:08.3920555Z * [new branch] gh/IvanKobzarev/184/orig -> origin/gh/IvanKobzarev/184/orig 2025-12-04T08:53:08.3920761Z * [new branch] gh/NikhilAPatel/1/base -> origin/gh/NikhilAPatel/1/base 2025-12-04T08:53:08.3920959Z * [new branch] gh/NikhilAPatel/1/head -> origin/gh/NikhilAPatel/1/head 2025-12-04T08:53:08.3921165Z * [new branch] gh/NikhilAPatel/2/base -> origin/gh/NikhilAPatel/2/base 2025-12-04T08:53:08.3921362Z * [new branch] gh/NikhilAPatel/2/head -> origin/gh/NikhilAPatel/2/head 2025-12-04T08:53:08.3921584Z * [new branch] gh/NikhilAPatel/4/base -> origin/gh/NikhilAPatel/4/base 2025-12-04T08:53:08.3921803Z * [new branch] gh/NikhilAPatel/4/head -> origin/gh/NikhilAPatel/4/head 2025-12-04T08:53:08.3921998Z * [new branch] gh/NikhilAPatel/5/base -> origin/gh/NikhilAPatel/5/base 2025-12-04T08:53:08.3922188Z * [new branch] gh/NikhilAPatel/5/head -> origin/gh/NikhilAPatel/5/head 2025-12-04T08:53:08.3922381Z * [new branch] gh/NikhilAPatel/5/orig -> origin/gh/NikhilAPatel/5/orig 2025-12-04T08:53:08.3922567Z * [new branch] gh/PaliC/17/base -> origin/gh/PaliC/17/base 2025-12-04T08:53:08.3922742Z * [new branch] gh/PaliC/17/head -> origin/gh/PaliC/17/head 2025-12-04T08:53:08.3922926Z * [new branch] gh/PaliC/17/orig -> origin/gh/PaliC/17/orig 2025-12-04T08:53:08.3923099Z * [new branch] gh/PaliC/18/base -> origin/gh/PaliC/18/base 2025-12-04T08:53:08.3923339Z * [new branch] gh/PaliC/18/head -> origin/gh/PaliC/18/head 2025-12-04T08:53:08.3923513Z * [new branch] gh/PaliC/18/orig -> origin/gh/PaliC/18/orig 2025-12-04T08:53:08.3923685Z * [new branch] gh/PaliC/20/base -> origin/gh/PaliC/20/base 2025-12-04T08:53:08.3923857Z * [new branch] gh/PaliC/20/head -> origin/gh/PaliC/20/head 2025-12-04T08:53:08.3924042Z * [new branch] gh/PaliC/20/orig -> origin/gh/PaliC/20/orig 2025-12-04T08:53:08.3924209Z * [new branch] gh/PaliC/21/base -> origin/gh/PaliC/21/base 2025-12-04T08:53:08.3924381Z * [new branch] gh/PaliC/21/head -> origin/gh/PaliC/21/head 2025-12-04T08:53:08.3924552Z * [new branch] gh/PaliC/21/orig -> origin/gh/PaliC/21/orig 2025-12-04T08:53:08.3924719Z * [new branch] gh/PaliC/23/base -> origin/gh/PaliC/23/base 2025-12-04T08:53:08.3924896Z * [new branch] gh/PaliC/23/head -> origin/gh/PaliC/23/head 2025-12-04T08:53:08.3925066Z * [new branch] gh/PaliC/23/orig -> origin/gh/PaliC/23/orig 2025-12-04T08:53:08.3925235Z * [new branch] gh/PaliC/24/base -> origin/gh/PaliC/24/base 2025-12-04T08:53:08.3925406Z * [new branch] gh/PaliC/24/head -> origin/gh/PaliC/24/head 2025-12-04T08:53:08.3925578Z * [new branch] gh/PaliC/24/orig -> origin/gh/PaliC/24/orig 2025-12-04T08:53:08.3925751Z * [new branch] gh/PaliC/25/head -> origin/gh/PaliC/25/head 2025-12-04T08:53:08.3925920Z * [new branch] gh/PaliC/25/next -> origin/gh/PaliC/25/next 2025-12-04T08:53:08.3926091Z * [new branch] gh/PaliC/25/orig -> origin/gh/PaliC/25/orig 2025-12-04T08:53:08.3926258Z * [new branch] gh/PaliC/26/head -> origin/gh/PaliC/26/head 2025-12-04T08:53:08.3926430Z * [new branch] gh/PaliC/26/next -> origin/gh/PaliC/26/next 2025-12-04T08:53:08.3926608Z * [new branch] gh/PaliC/26/orig -> origin/gh/PaliC/26/orig 2025-12-04T08:53:08.3926777Z * [new branch] gh/PaliC/27/next -> origin/gh/PaliC/27/next 2025-12-04T08:53:08.3926947Z * [new branch] gh/PaliC/28/head -> origin/gh/PaliC/28/head 2025-12-04T08:53:08.3927114Z * [new branch] gh/PaliC/28/next -> origin/gh/PaliC/28/next 2025-12-04T08:53:08.3927284Z * [new branch] gh/PaliC/28/orig -> origin/gh/PaliC/28/orig 2025-12-04T08:53:08.3927461Z * [new branch] gh/PaliC/29/head -> origin/gh/PaliC/29/head 2025-12-04T08:53:08.3927628Z * [new branch] gh/PaliC/29/next -> origin/gh/PaliC/29/next 2025-12-04T08:53:08.3927798Z * [new branch] gh/PaliC/29/orig -> origin/gh/PaliC/29/orig 2025-12-04T08:53:08.3927968Z * [new branch] gh/PaliC/30/head -> origin/gh/PaliC/30/head 2025-12-04T08:53:08.3928175Z * [new branch] gh/PaliC/30/next -> origin/gh/PaliC/30/next 2025-12-04T08:53:08.3928383Z * [new branch] gh/PaliC/30/orig -> origin/gh/PaliC/30/orig 2025-12-04T08:53:08.3928558Z * [new branch] gh/PaliC/31/head -> origin/gh/PaliC/31/head 2025-12-04T08:53:08.3928728Z * [new branch] gh/PaliC/31/next -> origin/gh/PaliC/31/next 2025-12-04T08:53:08.3928898Z * [new branch] gh/PaliC/31/orig -> origin/gh/PaliC/31/orig 2025-12-04T08:53:08.3929084Z * [new branch] gh/PaulZhang12/25/base -> origin/gh/PaulZhang12/25/base 2025-12-04T08:53:08.3929278Z * [new branch] gh/PaulZhang12/25/head -> origin/gh/PaulZhang12/25/head 2025-12-04T08:53:08.3929470Z * [new branch] gh/PaulZhang12/25/orig -> origin/gh/PaulZhang12/25/orig 2025-12-04T08:53:08.3929659Z * [new branch] gh/PaulZhang12/28/base -> origin/gh/PaulZhang12/28/base 2025-12-04T08:53:08.3929849Z * [new branch] gh/PaulZhang12/28/head -> origin/gh/PaulZhang12/28/head 2025-12-04T08:53:08.3930038Z * [new branch] gh/PaulZhang12/28/orig -> origin/gh/PaulZhang12/28/orig 2025-12-04T08:53:08.3930232Z * [new branch] gh/PaulZhang12/31/base -> origin/gh/PaulZhang12/31/base 2025-12-04T08:53:08.3930416Z * [new branch] gh/PaulZhang12/31/head -> origin/gh/PaulZhang12/31/head 2025-12-04T08:53:08.3930606Z * [new branch] gh/PaulZhang12/31/orig -> origin/gh/PaulZhang12/31/orig 2025-12-04T08:53:08.3930790Z * [new branch] gh/PaulZhang12/37/base -> origin/gh/PaulZhang12/37/base 2025-12-04T08:53:08.3930977Z * [new branch] gh/PaulZhang12/37/head -> origin/gh/PaulZhang12/37/head 2025-12-04T08:53:08.3931172Z * [new branch] gh/PaulZhang12/37/orig -> origin/gh/PaulZhang12/37/orig 2025-12-04T08:53:08.3931357Z * [new branch] gh/PaulZhang12/40/base -> origin/gh/PaulZhang12/40/base 2025-12-04T08:53:08.3931548Z * [new branch] gh/PaulZhang12/40/head -> origin/gh/PaulZhang12/40/head 2025-12-04T08:53:08.3931738Z * [new branch] gh/PaulZhang12/40/orig -> origin/gh/PaulZhang12/40/orig 2025-12-04T08:53:08.3931927Z * [new branch] gh/PaulZhang12/42/base -> origin/gh/PaulZhang12/42/base 2025-12-04T08:53:08.3932116Z * [new branch] gh/PaulZhang12/42/head -> origin/gh/PaulZhang12/42/head 2025-12-04T08:53:08.3932303Z * [new branch] gh/PaulZhang12/43/base -> origin/gh/PaulZhang12/43/base 2025-12-04T08:53:08.3932488Z * [new branch] gh/PaulZhang12/43/head -> origin/gh/PaulZhang12/43/head 2025-12-04T08:53:08.3932680Z * [new branch] gh/PaulZhang12/43/orig -> origin/gh/PaulZhang12/43/orig 2025-12-04T08:53:08.3932872Z * [new branch] gh/PaulZhang12/44/base -> origin/gh/PaulZhang12/44/base 2025-12-04T08:53:08.3933059Z * [new branch] gh/PaulZhang12/44/head -> origin/gh/PaulZhang12/44/head 2025-12-04T08:53:08.3933246Z * [new branch] gh/PaulZhang12/45/base -> origin/gh/PaulZhang12/45/base 2025-12-04T08:53:08.3933491Z * [new branch] gh/PaulZhang12/45/head -> origin/gh/PaulZhang12/45/head 2025-12-04T08:53:08.3933677Z * [new branch] gh/PaulZhang12/45/orig -> origin/gh/PaulZhang12/45/orig 2025-12-04T08:53:08.3933869Z * [new branch] gh/PaulZhang12/46/base -> origin/gh/PaulZhang12/46/base 2025-12-04T08:53:08.3934058Z * [new branch] gh/PaulZhang12/46/head -> origin/gh/PaulZhang12/46/head 2025-12-04T08:53:08.3934243Z * [new branch] gh/PaulZhang12/46/orig -> origin/gh/PaulZhang12/46/orig 2025-12-04T08:53:08.3934430Z * [new branch] gh/PaulZhang12/47/base -> origin/gh/PaulZhang12/47/base 2025-12-04T08:53:08.3934625Z * [new branch] gh/PaulZhang12/47/head -> origin/gh/PaulZhang12/47/head 2025-12-04T08:53:08.3934850Z * [new branch] gh/PaulZhang12/47/orig -> origin/gh/PaulZhang12/47/orig 2025-12-04T08:53:08.3935074Z * [new branch] gh/PaulZhang12/48/base -> origin/gh/PaulZhang12/48/base 2025-12-04T08:53:08.3935263Z * [new branch] gh/PaulZhang12/48/head -> origin/gh/PaulZhang12/48/head 2025-12-04T08:53:08.3935449Z * [new branch] gh/PaulZhang12/48/orig -> origin/gh/PaulZhang12/48/orig 2025-12-04T08:53:08.3935643Z * [new branch] gh/SamGinzburg/11/base -> origin/gh/SamGinzburg/11/base 2025-12-04T08:53:08.3935828Z * [new branch] gh/SamGinzburg/11/head -> origin/gh/SamGinzburg/11/head 2025-12-04T08:53:08.3936024Z * [new branch] gh/SherlockNoMad/1/base -> origin/gh/SherlockNoMad/1/base 2025-12-04T08:53:08.3936220Z * [new branch] gh/SherlockNoMad/1/head -> origin/gh/SherlockNoMad/1/head 2025-12-04T08:53:08.3936415Z * [new branch] gh/SherlockNoMad/10/base -> origin/gh/SherlockNoMad/10/base 2025-12-04T08:53:08.3936626Z * [new branch] gh/SherlockNoMad/10/head -> origin/gh/SherlockNoMad/10/head 2025-12-04T08:53:08.3936829Z * [new branch] gh/SherlockNoMad/10/orig -> origin/gh/SherlockNoMad/10/orig 2025-12-04T08:53:08.3937024Z * [new branch] gh/SherlockNoMad/11/base -> origin/gh/SherlockNoMad/11/base 2025-12-04T08:53:08.3937221Z * [new branch] gh/SherlockNoMad/11/head -> origin/gh/SherlockNoMad/11/head 2025-12-04T08:53:08.3937423Z * [new branch] gh/SherlockNoMad/11/orig -> origin/gh/SherlockNoMad/11/orig 2025-12-04T08:53:08.3937617Z * [new branch] gh/SherlockNoMad/12/base -> origin/gh/SherlockNoMad/12/base 2025-12-04T08:53:08.3937815Z * [new branch] gh/SherlockNoMad/12/head -> origin/gh/SherlockNoMad/12/head 2025-12-04T08:53:08.3938011Z * [new branch] gh/SherlockNoMad/12/orig -> origin/gh/SherlockNoMad/12/orig 2025-12-04T08:53:08.3938214Z * [new branch] gh/SherlockNoMad/15/base -> origin/gh/SherlockNoMad/15/base 2025-12-04T08:53:08.3938420Z * [new branch] gh/SherlockNoMad/15/head -> origin/gh/SherlockNoMad/15/head 2025-12-04T08:53:08.3938618Z * [new branch] gh/SherlockNoMad/15/orig -> origin/gh/SherlockNoMad/15/orig 2025-12-04T08:53:08.3938812Z * [new branch] gh/SherlockNoMad/17/base -> origin/gh/SherlockNoMad/17/base 2025-12-04T08:53:08.3939009Z * [new branch] gh/SherlockNoMad/17/head -> origin/gh/SherlockNoMad/17/head 2025-12-04T08:53:08.3939216Z * [new branch] gh/SherlockNoMad/17/orig -> origin/gh/SherlockNoMad/17/orig 2025-12-04T08:53:08.3939410Z * [new branch] gh/SherlockNoMad/18/base -> origin/gh/SherlockNoMad/18/base 2025-12-04T08:53:08.3939608Z * [new branch] gh/SherlockNoMad/18/head -> origin/gh/SherlockNoMad/18/head 2025-12-04T08:53:08.3939806Z * [new branch] gh/SherlockNoMad/18/orig -> origin/gh/SherlockNoMad/18/orig 2025-12-04T08:53:08.3940008Z * [new branch] gh/SherlockNoMad/19/base -> origin/gh/SherlockNoMad/19/base 2025-12-04T08:53:08.3940209Z * [new branch] gh/SherlockNoMad/19/head -> origin/gh/SherlockNoMad/19/head 2025-12-04T08:53:08.3940406Z * [new branch] gh/SherlockNoMad/19/orig -> origin/gh/SherlockNoMad/19/orig 2025-12-04T08:53:08.3940600Z * [new branch] gh/SherlockNoMad/2/base -> origin/gh/SherlockNoMad/2/base 2025-12-04T08:53:08.3940803Z * [new branch] gh/SherlockNoMad/2/head -> origin/gh/SherlockNoMad/2/head 2025-12-04T08:53:08.3940999Z * [new branch] gh/SherlockNoMad/20/base -> origin/gh/SherlockNoMad/20/base 2025-12-04T08:53:08.3941196Z * [new branch] gh/SherlockNoMad/20/head -> origin/gh/SherlockNoMad/20/head 2025-12-04T08:53:08.3941395Z * [new branch] gh/SherlockNoMad/20/orig -> origin/gh/SherlockNoMad/20/orig 2025-12-04T08:53:08.3941618Z * [new branch] gh/SherlockNoMad/21/base -> origin/gh/SherlockNoMad/21/base 2025-12-04T08:53:08.3941817Z * [new branch] gh/SherlockNoMad/21/head -> origin/gh/SherlockNoMad/21/head 2025-12-04T08:53:08.3942042Z * [new branch] gh/SherlockNoMad/21/orig -> origin/gh/SherlockNoMad/21/orig 2025-12-04T08:53:08.3942242Z * [new branch] gh/SherlockNoMad/3/base -> origin/gh/SherlockNoMad/3/base 2025-12-04T08:53:08.3942436Z * [new branch] gh/SherlockNoMad/3/head -> origin/gh/SherlockNoMad/3/head 2025-12-04T08:53:08.3942629Z * [new branch] gh/SherlockNoMad/4/base -> origin/gh/SherlockNoMad/4/base 2025-12-04T08:53:08.3942819Z * [new branch] gh/SherlockNoMad/4/head -> origin/gh/SherlockNoMad/4/head 2025-12-04T08:53:08.3943013Z * [new branch] gh/SherlockNoMad/5/base -> origin/gh/SherlockNoMad/5/base 2025-12-04T08:53:08.3943205Z * [new branch] gh/SherlockNoMad/5/head -> origin/gh/SherlockNoMad/5/head 2025-12-04T08:53:08.3943455Z * [new branch] gh/Sidharth123-cpu/24/base -> origin/gh/Sidharth123-cpu/24/base 2025-12-04T08:53:08.3943675Z * [new branch] gh/Sidharth123-cpu/25/base -> origin/gh/Sidharth123-cpu/25/base 2025-12-04T08:53:08.3943886Z * [new branch] gh/Sidharth123-cpu/26/base -> origin/gh/Sidharth123-cpu/26/base 2025-12-04T08:53:08.3944092Z * [new branch] gh/Sidharth123-cpu/27/base -> origin/gh/Sidharth123-cpu/27/base 2025-12-04T08:53:08.3944292Z * [new branch] gh/StrongerXi/1/base -> origin/gh/StrongerXi/1/base 2025-12-04T08:53:08.3944481Z * [new branch] gh/StrongerXi/1/head -> origin/gh/StrongerXi/1/head 2025-12-04T08:53:08.3944671Z * [new branch] gh/StrongerXi/71/base -> origin/gh/StrongerXi/71/base 2025-12-04T08:53:08.3944860Z * [new branch] gh/StrongerXi/71/head -> origin/gh/StrongerXi/71/head 2025-12-04T08:53:08.3945048Z * [new branch] gh/StrongerXi/72/base -> origin/gh/StrongerXi/72/base 2025-12-04T08:53:08.3945237Z * [new branch] gh/StrongerXi/72/head -> origin/gh/StrongerXi/72/head 2025-12-04T08:53:08.3945510Z * [new branch] gh/StrongerXi/73/base -> origin/gh/StrongerXi/73/base 2025-12-04T08:53:08.3945749Z * [new branch] gh/StrongerXi/73/head -> origin/gh/StrongerXi/73/head 2025-12-04T08:53:08.3967105Z * [new branch] gh/StrongerXi/73/orig -> origin/gh/StrongerXi/73/orig 2025-12-04T08:53:08.3967320Z * [new branch] gh/XilunWu/160/base -> origin/gh/XilunWu/160/base 2025-12-04T08:53:08.3967540Z * [new branch] gh/XilunWu/160/head -> origin/gh/XilunWu/160/head 2025-12-04T08:53:08.3967739Z * [new branch] gh/XilunWu/160/orig -> origin/gh/XilunWu/160/orig 2025-12-04T08:53:08.3967959Z * [new branch] gh/XilunWu/163/base -> origin/gh/XilunWu/163/base 2025-12-04T08:53:08.3968156Z * [new branch] gh/XilunWu/163/head -> origin/gh/XilunWu/163/head 2025-12-04T08:53:08.3968362Z * [new branch] gh/XilunWu/163/orig -> origin/gh/XilunWu/163/orig 2025-12-04T08:53:08.3968601Z * [new branch] gh/XilunWu/168/base -> origin/gh/XilunWu/168/base 2025-12-04T08:53:08.3968818Z * [new branch] gh/XilunWu/168/head -> origin/gh/XilunWu/168/head 2025-12-04T08:53:08.3969049Z * [new branch] gh/XilunWu/168/orig -> origin/gh/XilunWu/168/orig 2025-12-04T08:53:08.3969301Z * [new branch] gh/XilunWu/169/base -> origin/gh/XilunWu/169/base 2025-12-04T08:53:08.3969505Z * [new branch] gh/XilunWu/169/head -> origin/gh/XilunWu/169/head 2025-12-04T08:53:08.3969690Z * [new branch] gh/XilunWu/169/orig -> origin/gh/XilunWu/169/orig 2025-12-04T08:53:08.3969878Z * [new branch] gh/XilunWu/170/base -> origin/gh/XilunWu/170/base 2025-12-04T08:53:08.3970070Z * [new branch] gh/XilunWu/170/head -> origin/gh/XilunWu/170/head 2025-12-04T08:53:08.3970331Z * [new branch] gh/XilunWu/170/orig -> origin/gh/XilunWu/170/orig 2025-12-04T08:53:08.3970553Z * [new branch] gh/XilunWu/171/base -> origin/gh/XilunWu/171/base 2025-12-04T08:53:08.3970743Z * [new branch] gh/XilunWu/171/head -> origin/gh/XilunWu/171/head 2025-12-04T08:53:08.3970926Z * [new branch] gh/XilunWu/171/orig -> origin/gh/XilunWu/171/orig 2025-12-04T08:53:08.3971117Z * [new branch] gh/XilunWu/173/base -> origin/gh/XilunWu/173/base 2025-12-04T08:53:08.3971307Z * [new branch] gh/XilunWu/173/head -> origin/gh/XilunWu/173/head 2025-12-04T08:53:08.3971491Z * [new branch] gh/XilunWu/173/orig -> origin/gh/XilunWu/173/orig 2025-12-04T08:53:08.3971679Z * [new branch] gh/XilunWu/175/base -> origin/gh/XilunWu/175/base 2025-12-04T08:53:08.3971875Z * [new branch] gh/XilunWu/175/head -> origin/gh/XilunWu/175/head 2025-12-04T08:53:08.3972060Z * [new branch] gh/XilunWu/175/orig -> origin/gh/XilunWu/175/orig 2025-12-04T08:53:08.3972256Z * [new branch] gh/XilunWu/176/base -> origin/gh/XilunWu/176/base 2025-12-04T08:53:08.3972448Z * [new branch] gh/XilunWu/176/head -> origin/gh/XilunWu/176/head 2025-12-04T08:53:08.3972634Z * [new branch] gh/XilunWu/176/orig -> origin/gh/XilunWu/176/orig 2025-12-04T08:53:08.3972831Z * [new branch] gh/XuehaiPan/14/base -> origin/gh/XuehaiPan/14/base 2025-12-04T08:53:08.3973032Z * [new branch] gh/XuehaiPan/14/head -> origin/gh/XuehaiPan/14/head 2025-12-04T08:53:08.3973224Z * [new branch] gh/XuehaiPan/14/orig -> origin/gh/XuehaiPan/14/orig 2025-12-04T08:53:08.3973504Z * [new branch] gh/XuehaiPan/179/base -> origin/gh/XuehaiPan/179/base 2025-12-04T08:53:08.3973701Z * [new branch] gh/XuehaiPan/179/head -> origin/gh/XuehaiPan/179/head 2025-12-04T08:53:08.3973901Z * [new branch] gh/XuehaiPan/179/orig -> origin/gh/XuehaiPan/179/orig 2025-12-04T08:53:08.3974096Z * [new branch] gh/XuehaiPan/249/base -> origin/gh/XuehaiPan/249/base 2025-12-04T08:53:08.3974288Z * [new branch] gh/XuehaiPan/249/head -> origin/gh/XuehaiPan/249/head 2025-12-04T08:53:08.3974471Z * [new branch] gh/XuehaiPan/249/orig -> origin/gh/XuehaiPan/249/orig 2025-12-04T08:53:08.3974652Z * [new branch] gh/XuehaiPan/253/base -> origin/gh/XuehaiPan/253/base 2025-12-04T08:53:08.3974834Z * [new branch] gh/XuehaiPan/253/head -> origin/gh/XuehaiPan/253/head 2025-12-04T08:53:08.3975017Z * [new branch] gh/XuehaiPan/253/orig -> origin/gh/XuehaiPan/253/orig 2025-12-04T08:53:08.3975208Z * [new branch] gh/XuehaiPan/254/base -> origin/gh/XuehaiPan/254/base 2025-12-04T08:53:08.3975409Z * [new branch] gh/XuehaiPan/254/head -> origin/gh/XuehaiPan/254/head 2025-12-04T08:53:08.3975598Z * [new branch] gh/XuehaiPan/254/orig -> origin/gh/XuehaiPan/254/orig 2025-12-04T08:53:08.3975785Z * [new branch] gh/XuehaiPan/255/base -> origin/gh/XuehaiPan/255/base 2025-12-04T08:53:08.3975970Z * [new branch] gh/XuehaiPan/255/head -> origin/gh/XuehaiPan/255/head 2025-12-04T08:53:08.3976155Z * [new branch] gh/XuehaiPan/255/orig -> origin/gh/XuehaiPan/255/orig 2025-12-04T08:53:08.3976336Z * [new branch] gh/XuehaiPan/271/base -> origin/gh/XuehaiPan/271/base 2025-12-04T08:53:08.3976518Z * [new branch] gh/XuehaiPan/271/head -> origin/gh/XuehaiPan/271/head 2025-12-04T08:53:08.3976701Z * [new branch] gh/XuehaiPan/271/orig -> origin/gh/XuehaiPan/271/orig 2025-12-04T08:53:08.3976884Z * [new branch] gh/XuehaiPan/343/base -> origin/gh/XuehaiPan/343/base 2025-12-04T08:53:08.3977127Z * [new branch] gh/XuehaiPan/343/head -> origin/gh/XuehaiPan/343/head 2025-12-04T08:53:08.3977338Z * [new branch] gh/XuehaiPan/343/orig -> origin/gh/XuehaiPan/343/orig 2025-12-04T08:53:08.3977520Z * [new branch] gh/XuehaiPan/347/base -> origin/gh/XuehaiPan/347/base 2025-12-04T08:53:08.3977701Z * [new branch] gh/XuehaiPan/347/head -> origin/gh/XuehaiPan/347/head 2025-12-04T08:53:08.3977882Z * [new branch] gh/XuehaiPan/347/orig -> origin/gh/XuehaiPan/347/orig 2025-12-04T08:53:08.3978064Z * [new branch] gh/XuehaiPan/348/base -> origin/gh/XuehaiPan/348/base 2025-12-04T08:53:08.3978246Z * [new branch] gh/XuehaiPan/348/head -> origin/gh/XuehaiPan/348/head 2025-12-04T08:53:08.3978427Z * [new branch] gh/XuehaiPan/348/orig -> origin/gh/XuehaiPan/348/orig 2025-12-04T08:53:08.3978612Z * [new branch] gh/XuehaiPan/350/base -> origin/gh/XuehaiPan/350/base 2025-12-04T08:53:08.3978798Z * [new branch] gh/XuehaiPan/350/head -> origin/gh/XuehaiPan/350/head 2025-12-04T08:53:08.3978981Z * [new branch] gh/XuehaiPan/350/orig -> origin/gh/XuehaiPan/350/orig 2025-12-04T08:53:08.3979163Z * [new branch] gh/XuehaiPan/365/base -> origin/gh/XuehaiPan/365/base 2025-12-04T08:53:08.3979346Z * [new branch] gh/XuehaiPan/365/head -> origin/gh/XuehaiPan/365/head 2025-12-04T08:53:08.3979529Z * [new branch] gh/XuehaiPan/365/orig -> origin/gh/XuehaiPan/365/orig 2025-12-04T08:53:08.3979710Z * [new branch] gh/XuehaiPan/366/base -> origin/gh/XuehaiPan/366/base 2025-12-04T08:53:08.3979898Z * [new branch] gh/XuehaiPan/366/head -> origin/gh/XuehaiPan/366/head 2025-12-04T08:53:08.3980082Z * [new branch] gh/XuehaiPan/370/base -> origin/gh/XuehaiPan/370/base 2025-12-04T08:53:08.3980266Z * [new branch] gh/XuehaiPan/370/head -> origin/gh/XuehaiPan/370/head 2025-12-04T08:53:08.3980451Z * [new branch] gh/XuehaiPan/370/orig -> origin/gh/XuehaiPan/370/orig 2025-12-04T08:53:08.3980641Z * [new branch] gh/XuehaiPan/390/base -> origin/gh/XuehaiPan/390/base 2025-12-04T08:53:08.3980834Z * [new branch] gh/XuehaiPan/390/head -> origin/gh/XuehaiPan/390/head 2025-12-04T08:53:08.3981022Z * [new branch] gh/XuehaiPan/390/orig -> origin/gh/XuehaiPan/390/orig 2025-12-04T08:53:08.3981214Z * [new branch] gh/XuehaiPan/391/base -> origin/gh/XuehaiPan/391/base 2025-12-04T08:53:08.3981406Z * [new branch] gh/XuehaiPan/391/head -> origin/gh/XuehaiPan/391/head 2025-12-04T08:53:08.3981594Z * [new branch] gh/XuehaiPan/391/orig -> origin/gh/XuehaiPan/391/orig 2025-12-04T08:53:08.3981785Z * [new branch] gh/XuehaiPan/392/base -> origin/gh/XuehaiPan/392/base 2025-12-04T08:53:08.3981975Z * [new branch] gh/XuehaiPan/392/head -> origin/gh/XuehaiPan/392/head 2025-12-04T08:53:08.3982163Z * [new branch] gh/XuehaiPan/392/orig -> origin/gh/XuehaiPan/392/orig 2025-12-04T08:53:08.3982356Z * [new branch] gh/XuehaiPan/394/base -> origin/gh/XuehaiPan/394/base 2025-12-04T08:53:08.3982546Z * [new branch] gh/XuehaiPan/394/head -> origin/gh/XuehaiPan/394/head 2025-12-04T08:53:08.3982734Z * [new branch] gh/XuehaiPan/394/orig -> origin/gh/XuehaiPan/394/orig 2025-12-04T08:53:08.3982927Z * [new branch] gh/XuehaiPan/397/base -> origin/gh/XuehaiPan/397/base 2025-12-04T08:53:08.3983117Z * [new branch] gh/XuehaiPan/397/head -> origin/gh/XuehaiPan/397/head 2025-12-04T08:53:08.3983360Z * [new branch] gh/XuehaiPan/397/orig -> origin/gh/XuehaiPan/397/orig 2025-12-04T08:53:08.3983556Z * [new branch] gh/XuehaiPan/398/base -> origin/gh/XuehaiPan/398/base 2025-12-04T08:53:08.3983797Z * [new branch] gh/XuehaiPan/398/head -> origin/gh/XuehaiPan/398/head 2025-12-04T08:53:08.3983987Z * [new branch] gh/XuehaiPan/398/orig -> origin/gh/XuehaiPan/398/orig 2025-12-04T08:53:08.3984217Z * [new branch] gh/XuehaiPan/399/base -> origin/gh/XuehaiPan/399/base 2025-12-04T08:53:08.3984409Z * [new branch] gh/XuehaiPan/399/head -> origin/gh/XuehaiPan/399/head 2025-12-04T08:53:08.3984598Z * [new branch] gh/XuehaiPan/399/orig -> origin/gh/XuehaiPan/399/orig 2025-12-04T08:53:08.3984789Z * [new branch] gh/XuehaiPan/400/base -> origin/gh/XuehaiPan/400/base 2025-12-04T08:53:08.3984977Z * [new branch] gh/XuehaiPan/400/head -> origin/gh/XuehaiPan/400/head 2025-12-04T08:53:08.3985168Z * [new branch] gh/XuehaiPan/400/orig -> origin/gh/XuehaiPan/400/orig 2025-12-04T08:53:08.3985369Z * [new branch] gh/ZhiweiYan-96/39/base -> origin/gh/ZhiweiYan-96/39/base 2025-12-04T08:53:08.3985567Z * [new branch] gh/ZhiweiYan-96/39/head -> origin/gh/ZhiweiYan-96/39/head 2025-12-04T08:53:08.3985763Z * [new branch] gh/ZhiweiYan-96/39/orig -> origin/gh/ZhiweiYan-96/39/orig 2025-12-04T08:53:08.3985959Z * [new branch] gh/ZhiweiYan-96/44/base -> origin/gh/ZhiweiYan-96/44/base 2025-12-04T08:53:08.3986149Z * [new branch] gh/ZhiweiYan-96/44/head -> origin/gh/ZhiweiYan-96/44/head 2025-12-04T08:53:08.3986344Z * [new branch] gh/ZhiweiYan-96/45/base -> origin/gh/ZhiweiYan-96/45/base 2025-12-04T08:53:08.3986539Z * [new branch] gh/ZhiweiYan-96/45/head -> origin/gh/ZhiweiYan-96/45/head 2025-12-04T08:53:08.3986729Z * [new branch] gh/ZhiweiYan-96/49/base -> origin/gh/ZhiweiYan-96/49/base 2025-12-04T08:53:08.3986923Z * [new branch] gh/ZhiweiYan-96/49/head -> origin/gh/ZhiweiYan-96/49/head 2025-12-04T08:53:08.3987116Z * [new branch] gh/ZhiweiYan-96/62/base -> origin/gh/ZhiweiYan-96/62/base 2025-12-04T08:53:08.3987306Z * [new branch] gh/ZhiweiYan-96/62/head -> origin/gh/ZhiweiYan-96/62/head 2025-12-04T08:53:08.3987502Z * [new branch] gh/ZhiweiYan-96/66/base -> origin/gh/ZhiweiYan-96/66/base 2025-12-04T08:53:08.3987695Z * [new branch] gh/ZhiweiYan-96/66/head -> origin/gh/ZhiweiYan-96/66/head 2025-12-04T08:53:08.3987887Z * [new branch] gh/ZhiweiYan-96/67/base -> origin/gh/ZhiweiYan-96/67/base 2025-12-04T08:53:08.3988081Z * [new branch] gh/ZhiweiYan-96/67/head -> origin/gh/ZhiweiYan-96/67/head 2025-12-04T08:53:08.3988272Z * [new branch] gh/ZhiweiYan-96/68/base -> origin/gh/ZhiweiYan-96/68/base 2025-12-04T08:53:08.3988461Z * [new branch] gh/ZhiweiYan-96/68/head -> origin/gh/ZhiweiYan-96/68/head 2025-12-04T08:53:08.3988653Z * [new branch] gh/ZhiweiYan-96/68/orig -> origin/gh/ZhiweiYan-96/68/orig 2025-12-04T08:53:08.3988847Z * [new branch] gh/aakhundov/1/base -> origin/gh/aakhundov/1/base 2025-12-04T08:53:08.3989035Z * [new branch] gh/aakhundov/1/head -> origin/gh/aakhundov/1/head 2025-12-04T08:53:08.3989228Z * [new branch] gh/aakhundov/2/base -> origin/gh/aakhundov/2/base 2025-12-04T08:53:08.3989409Z * [new branch] gh/aakhundov/2/head -> origin/gh/aakhundov/2/head 2025-12-04T08:53:08.3989601Z * [new branch] gh/aditew01/openblas -> origin/gh/aditew01/openblas 2025-12-04T08:53:08.3989796Z * [new branch] gh/aditew01/sbgemm -> origin/gh/aditew01/sbgemm 2025-12-04T08:53:08.3989980Z * [new branch] gh/aditew01/vecbf16 -> origin/gh/aditew01/vecbf16 2025-12-04T08:53:08.3990166Z * [new branch] gh/albanD/4/base -> origin/gh/albanD/4/base 2025-12-04T08:53:08.3990348Z * [new branch] gh/albanD/4/head -> origin/gh/albanD/4/head 2025-12-04T08:53:08.3990522Z * [new branch] gh/albanD/4/orig -> origin/gh/albanD/4/orig 2025-12-04T08:53:08.3990828Z * [new branch] gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init 2025-12-04T08:53:08.3991136Z * [new branch] gh/alexsamardzic/12/base -> origin/gh/alexsamardzic/12/base 2025-12-04T08:53:08.3991341Z * [new branch] gh/alexsamardzic/12/head -> origin/gh/alexsamardzic/12/head 2025-12-04T08:53:08.3991546Z * [new branch] gh/alexsamardzic/12/orig -> origin/gh/alexsamardzic/12/orig 2025-12-04T08:53:08.3991749Z * [new branch] gh/alexsamardzic/14/base -> origin/gh/alexsamardzic/14/base 2025-12-04T08:53:08.3991949Z * [new branch] gh/alexsamardzic/14/head -> origin/gh/alexsamardzic/14/head 2025-12-04T08:53:08.3992151Z * [new branch] gh/alexsamardzic/14/orig -> origin/gh/alexsamardzic/14/orig 2025-12-04T08:53:08.3992354Z * [new branch] gh/alexsamardzic/15/base -> origin/gh/alexsamardzic/15/base 2025-12-04T08:53:08.3992555Z * [new branch] gh/alexsamardzic/15/head -> origin/gh/alexsamardzic/15/head 2025-12-04T08:53:08.3992763Z * [new branch] gh/alexsamardzic/15/orig -> origin/gh/alexsamardzic/15/orig 2025-12-04T08:53:08.3992960Z * [new branch] gh/amjames/18/base -> origin/gh/amjames/18/base 2025-12-04T08:53:08.3993143Z * [new branch] gh/amjames/18/head -> origin/gh/amjames/18/head 2025-12-04T08:53:08.3993399Z * [new branch] gh/amjames/18/orig -> origin/gh/amjames/18/orig 2025-12-04T08:53:08.3993592Z * [new branch] gh/andrewor14/35/base -> origin/gh/andrewor14/35/base 2025-12-04T08:53:08.3993782Z * [new branch] gh/andrewor14/35/head -> origin/gh/andrewor14/35/head 2025-12-04T08:53:08.3993974Z * [new branch] gh/andrewor14/35/orig -> origin/gh/andrewor14/35/orig 2025-12-04T08:53:08.3994164Z * [new branch] gh/andrewor14/50/base -> origin/gh/andrewor14/50/base 2025-12-04T08:53:08.3994353Z * [new branch] gh/andrewor14/50/head -> origin/gh/andrewor14/50/head 2025-12-04T08:53:08.3994545Z * [new branch] gh/andrewor14/50/orig -> origin/gh/andrewor14/50/orig 2025-12-04T08:53:08.3994732Z * [new branch] gh/andyanwang/30/base -> origin/gh/andyanwang/30/base 2025-12-04T08:53:08.3994927Z * [new branch] gh/andyanwang/30/orig -> origin/gh/andyanwang/30/orig 2025-12-04T08:53:08.3995118Z * [new branch] gh/andyanwang/31/base -> origin/gh/andyanwang/31/base 2025-12-04T08:53:08.3995307Z * [new branch] gh/andyanwang/31/orig -> origin/gh/andyanwang/31/orig 2025-12-04T08:53:08.3995500Z * [new branch] gh/andyanwang/39/base -> origin/gh/andyanwang/39/base 2025-12-04T08:53:08.3995691Z * [new branch] gh/andyanwang/39/head -> origin/gh/andyanwang/39/head 2025-12-04T08:53:08.3995881Z * [new branch] gh/andyanwang/39/orig -> origin/gh/andyanwang/39/orig 2025-12-04T08:53:08.3996072Z * [new branch] gh/andyanwang/42/base -> origin/gh/andyanwang/42/base 2025-12-04T08:53:08.3996266Z * [new branch] gh/andyanwang/42/head -> origin/gh/andyanwang/42/head 2025-12-04T08:53:08.3996455Z * [new branch] gh/andyanwang/42/orig -> origin/gh/andyanwang/42/orig 2025-12-04T08:53:08.3996648Z * [new branch] gh/andyanwang/45/base -> origin/gh/andyanwang/45/base 2025-12-04T08:53:08.3996839Z * [new branch] gh/andyanwang/45/head -> origin/gh/andyanwang/45/head 2025-12-04T08:53:08.3997027Z * [new branch] gh/andyanwang/45/orig -> origin/gh/andyanwang/45/orig 2025-12-04T08:53:08.3997217Z * [new branch] gh/angelayi/107/base -> origin/gh/angelayi/107/base 2025-12-04T08:53:08.3997409Z * [new branch] gh/angelayi/107/head -> origin/gh/angelayi/107/head 2025-12-04T08:53:08.3997638Z * [new branch] gh/angelayi/114/base -> origin/gh/angelayi/114/base 2025-12-04T08:53:08.3997824Z * [new branch] gh/angelayi/114/head -> origin/gh/angelayi/114/head 2025-12-04T08:53:08.3998042Z * [new branch] gh/angelayi/114/orig -> origin/gh/angelayi/114/orig 2025-12-04T08:53:08.3998225Z * [new branch] gh/angelayi/116/base -> origin/gh/angelayi/116/base 2025-12-04T08:53:08.3998411Z * [new branch] gh/angelayi/116/head -> origin/gh/angelayi/116/head 2025-12-04T08:53:08.3998595Z * [new branch] gh/angelayi/116/orig -> origin/gh/angelayi/116/orig 2025-12-04T08:53:08.3998777Z * [new branch] gh/angelayi/122/base -> origin/gh/angelayi/122/base 2025-12-04T08:53:08.3998963Z * [new branch] gh/angelayi/122/head -> origin/gh/angelayi/122/head 2025-12-04T08:53:08.3999145Z * [new branch] gh/angelayi/122/orig -> origin/gh/angelayi/122/orig 2025-12-04T08:53:08.3999334Z * [new branch] gh/angelayi/124/base -> origin/gh/angelayi/124/base 2025-12-04T08:53:08.3999520Z * [new branch] gh/angelayi/124/head -> origin/gh/angelayi/124/head 2025-12-04T08:53:08.3999706Z * [new branch] gh/angelayi/124/orig -> origin/gh/angelayi/124/orig 2025-12-04T08:53:08.3999894Z * [new branch] gh/angelayi/128/base -> origin/gh/angelayi/128/base 2025-12-04T08:53:08.4000080Z * [new branch] gh/angelayi/128/head -> origin/gh/angelayi/128/head 2025-12-04T08:53:08.4000263Z * [new branch] gh/angelayi/128/orig -> origin/gh/angelayi/128/orig 2025-12-04T08:53:08.4000451Z * [new branch] gh/angelayi/131/base -> origin/gh/angelayi/131/base 2025-12-04T08:53:08.4000638Z * [new branch] gh/angelayi/131/head -> origin/gh/angelayi/131/head 2025-12-04T08:53:08.4000821Z * [new branch] gh/angelayi/131/orig -> origin/gh/angelayi/131/orig 2025-12-04T08:53:08.4001009Z * [new branch] gh/angelayi/132/base -> origin/gh/angelayi/132/base 2025-12-04T08:53:08.4001195Z * [new branch] gh/angelayi/132/head -> origin/gh/angelayi/132/head 2025-12-04T08:53:08.4001373Z * [new branch] gh/angelayi/132/orig -> origin/gh/angelayi/132/orig 2025-12-04T08:53:08.4001553Z * [new branch] gh/angelayi/133/base -> origin/gh/angelayi/133/base 2025-12-04T08:53:08.4001740Z * [new branch] gh/angelayi/133/head -> origin/gh/angelayi/133/head 2025-12-04T08:53:08.4001920Z * [new branch] gh/angelayi/133/orig -> origin/gh/angelayi/133/orig 2025-12-04T08:53:08.4002106Z * [new branch] gh/angelayi/134/base -> origin/gh/angelayi/134/base 2025-12-04T08:53:08.4002293Z * [new branch] gh/angelayi/134/head -> origin/gh/angelayi/134/head 2025-12-04T08:53:08.4002477Z * [new branch] gh/angelayi/134/orig -> origin/gh/angelayi/134/orig 2025-12-04T08:53:08.4002667Z * [new branch] gh/angelayi/135/base -> origin/gh/angelayi/135/base 2025-12-04T08:53:08.4002855Z * [new branch] gh/angelayi/135/head -> origin/gh/angelayi/135/head 2025-12-04T08:53:08.4003036Z * [new branch] gh/angelayi/135/orig -> origin/gh/angelayi/135/orig 2025-12-04T08:53:08.4003220Z * [new branch] gh/angelayi/136/base -> origin/gh/angelayi/136/base 2025-12-04T08:53:08.4003453Z * [new branch] gh/angelayi/136/head -> origin/gh/angelayi/136/head 2025-12-04T08:53:08.4003638Z * [new branch] gh/angelayi/136/orig -> origin/gh/angelayi/136/orig 2025-12-04T08:53:08.4003823Z * [new branch] gh/angelayi/137/base -> origin/gh/angelayi/137/base 2025-12-04T08:53:08.4004002Z * [new branch] gh/angelayi/137/head -> origin/gh/angelayi/137/head 2025-12-04T08:53:08.4004190Z * [new branch] gh/angelayi/137/orig -> origin/gh/angelayi/137/orig 2025-12-04T08:53:08.4004408Z * [new branch] gh/angelayi/138/base -> origin/gh/angelayi/138/base 2025-12-04T08:53:08.4004624Z * [new branch] gh/angelayi/138/head -> origin/gh/angelayi/138/head 2025-12-04T08:53:08.4004808Z * [new branch] gh/angelayi/138/orig -> origin/gh/angelayi/138/orig 2025-12-04T08:53:08.4004993Z * [new branch] gh/angelayi/139/base -> origin/gh/angelayi/139/base 2025-12-04T08:53:08.4005175Z * [new branch] gh/angelayi/139/head -> origin/gh/angelayi/139/head 2025-12-04T08:53:08.4005361Z * [new branch] gh/angelayi/139/orig -> origin/gh/angelayi/139/orig 2025-12-04T08:53:08.4005544Z * [new branch] gh/angelayi/140/base -> origin/gh/angelayi/140/base 2025-12-04T08:53:08.4005724Z * [new branch] gh/angelayi/140/head -> origin/gh/angelayi/140/head 2025-12-04T08:53:08.4005912Z * [new branch] gh/angelayi/140/orig -> origin/gh/angelayi/140/orig 2025-12-04T08:53:08.4006098Z * [new branch] gh/angelayi/141/base -> origin/gh/angelayi/141/base 2025-12-04T08:53:08.4006282Z * [new branch] gh/angelayi/141/head -> origin/gh/angelayi/141/head 2025-12-04T08:53:08.4006466Z * [new branch] gh/angelayi/141/orig -> origin/gh/angelayi/141/orig 2025-12-04T08:53:08.4006649Z * [new branch] gh/angelayi/142/base -> origin/gh/angelayi/142/base 2025-12-04T08:53:08.4006829Z * [new branch] gh/angelayi/142/head -> origin/gh/angelayi/142/head 2025-12-04T08:53:08.4007012Z * [new branch] gh/angelayi/142/orig -> origin/gh/angelayi/142/orig 2025-12-04T08:53:08.4007195Z * [new branch] gh/angelayi/143/base -> origin/gh/angelayi/143/base 2025-12-04T08:53:08.4007373Z * [new branch] gh/angelayi/143/head -> origin/gh/angelayi/143/head 2025-12-04T08:53:08.4007637Z * [new branch] gh/angelayi/143/orig -> origin/gh/angelayi/143/orig 2025-12-04T08:53:08.4007820Z * [new branch] gh/angelayi/144/base -> origin/gh/angelayi/144/base 2025-12-04T08:53:08.4008008Z * [new branch] gh/angelayi/144/head -> origin/gh/angelayi/144/head 2025-12-04T08:53:08.4008190Z * [new branch] gh/angelayi/144/orig -> origin/gh/angelayi/144/orig 2025-12-04T08:53:08.4008378Z * [new branch] gh/anijain2305/753/base -> origin/gh/anijain2305/753/base 2025-12-04T08:53:08.4008578Z * [new branch] gh/anijain2305/753/head -> origin/gh/anijain2305/753/head 2025-12-04T08:53:08.4008769Z * [new branch] gh/anijain2305/753/orig -> origin/gh/anijain2305/753/orig 2025-12-04T08:53:08.4008956Z * [new branch] gh/anijain2305/810/base -> origin/gh/anijain2305/810/base 2025-12-04T08:53:08.4009144Z * [new branch] gh/anijain2305/810/head -> origin/gh/anijain2305/810/head 2025-12-04T08:53:08.4009333Z * [new branch] gh/anijain2305/810/orig -> origin/gh/anijain2305/810/orig 2025-12-04T08:53:08.4009520Z * [new branch] gh/anijain2305/854/base -> origin/gh/anijain2305/854/base 2025-12-04T08:53:08.4009714Z * [new branch] gh/anijain2305/854/head -> origin/gh/anijain2305/854/head 2025-12-04T08:53:08.4009904Z * [new branch] gh/anijain2305/854/orig -> origin/gh/anijain2305/854/orig 2025-12-04T08:53:08.4010089Z * [new branch] gh/anijain2305/864/base -> origin/gh/anijain2305/864/base 2025-12-04T08:53:08.4010280Z * [new branch] gh/anijain2305/864/head -> origin/gh/anijain2305/864/head 2025-12-04T08:53:08.4010471Z * [new branch] gh/anijain2305/864/orig -> origin/gh/anijain2305/864/orig 2025-12-04T08:53:08.4010656Z * [new branch] gh/anijain2305/870/base -> origin/gh/anijain2305/870/base 2025-12-04T08:53:08.4010849Z * [new branch] gh/anijain2305/870/head -> origin/gh/anijain2305/870/head 2025-12-04T08:53:08.4011068Z * [new branch] gh/anijain2305/870/orig -> origin/gh/anijain2305/870/orig 2025-12-04T08:53:08.4011277Z * [new branch] gh/anijain2305/873/base -> origin/gh/anijain2305/873/base 2025-12-04T08:53:08.4011466Z * [new branch] gh/anijain2305/873/head -> origin/gh/anijain2305/873/head 2025-12-04T08:53:08.4011655Z * [new branch] gh/anijain2305/873/orig -> origin/gh/anijain2305/873/orig 2025-12-04T08:53:08.4011842Z * [new branch] gh/anijain2305/894/base -> origin/gh/anijain2305/894/base 2025-12-04T08:53:08.4012030Z * [new branch] gh/anijain2305/894/head -> origin/gh/anijain2305/894/head 2025-12-04T08:53:08.4012219Z * [new branch] gh/anijain2305/894/orig -> origin/gh/anijain2305/894/orig 2025-12-04T08:53:08.4012405Z * [new branch] gh/anijain2305/895/base -> origin/gh/anijain2305/895/base 2025-12-04T08:53:08.4012595Z * [new branch] gh/anijain2305/895/head -> origin/gh/anijain2305/895/head 2025-12-04T08:53:08.4012788Z * [new branch] gh/anijain2305/895/orig -> origin/gh/anijain2305/895/orig 2025-12-04T08:53:08.4012975Z * [new branch] gh/anijain2305/910/base -> origin/gh/anijain2305/910/base 2025-12-04T08:53:08.4013165Z * [new branch] gh/anijain2305/910/head -> origin/gh/anijain2305/910/head 2025-12-04T08:53:08.4013412Z * [new branch] gh/anijain2305/910/orig -> origin/gh/anijain2305/910/orig 2025-12-04T08:53:08.4013602Z * [new branch] gh/anijain2305/919/base -> origin/gh/anijain2305/919/base 2025-12-04T08:53:08.4013789Z * [new branch] gh/anijain2305/919/head -> origin/gh/anijain2305/919/head 2025-12-04T08:53:08.4013975Z * [new branch] gh/anijain2305/919/orig -> origin/gh/anijain2305/919/orig 2025-12-04T08:53:08.4014167Z * [new branch] gh/anijain2305/922/base -> origin/gh/anijain2305/922/base 2025-12-04T08:53:08.4014355Z * [new branch] gh/anijain2305/922/head -> origin/gh/anijain2305/922/head 2025-12-04T08:53:08.4014543Z * [new branch] gh/anijain2305/922/orig -> origin/gh/anijain2305/922/orig 2025-12-04T08:53:08.4014735Z * [new branch] gh/anijain2305/932/base -> origin/gh/anijain2305/932/base 2025-12-04T08:53:08.4014923Z * [new branch] gh/anijain2305/932/head -> origin/gh/anijain2305/932/head 2025-12-04T08:53:08.4015109Z * [new branch] gh/anijain2305/932/orig -> origin/gh/anijain2305/932/orig 2025-12-04T08:53:08.4015299Z * [new branch] gh/anijain2305/940/base -> origin/gh/anijain2305/940/base 2025-12-04T08:53:08.4015489Z * [new branch] gh/anijain2305/940/head -> origin/gh/anijain2305/940/head 2025-12-04T08:53:08.4015679Z * [new branch] gh/anijain2305/940/orig -> origin/gh/anijain2305/940/orig 2025-12-04T08:53:08.4015868Z * [new branch] gh/anijain2305/941/base -> origin/gh/anijain2305/941/base 2025-12-04T08:53:08.4016060Z * [new branch] gh/anijain2305/941/head -> origin/gh/anijain2305/941/head 2025-12-04T08:53:08.4016246Z * [new branch] gh/anijain2305/941/orig -> origin/gh/anijain2305/941/orig 2025-12-04T08:53:08.4016439Z * [new branch] gh/anijain2305/942/base -> origin/gh/anijain2305/942/base 2025-12-04T08:53:08.4016630Z * [new branch] gh/anijain2305/942/head -> origin/gh/anijain2305/942/head 2025-12-04T08:53:08.4016817Z * [new branch] gh/anijain2305/942/orig -> origin/gh/anijain2305/942/orig 2025-12-04T08:53:08.4017006Z * [new branch] gh/anijain2305/943/base -> origin/gh/anijain2305/943/base 2025-12-04T08:53:08.4017194Z * [new branch] gh/anijain2305/943/head -> origin/gh/anijain2305/943/head 2025-12-04T08:53:08.4017381Z * [new branch] gh/anijain2305/943/orig -> origin/gh/anijain2305/943/orig 2025-12-04T08:53:08.4017571Z * [new branch] gh/anijain2305/944/base -> origin/gh/anijain2305/944/base 2025-12-04T08:53:08.4017815Z * [new branch] gh/anijain2305/944/head -> origin/gh/anijain2305/944/head 2025-12-04T08:53:08.4018028Z * [new branch] gh/anijain2305/944/orig -> origin/gh/anijain2305/944/orig 2025-12-04T08:53:08.4018217Z * [new branch] gh/anijain2305/945/base -> origin/gh/anijain2305/945/base 2025-12-04T08:53:08.4018405Z * [new branch] gh/anijain2305/945/head -> origin/gh/anijain2305/945/head 2025-12-04T08:53:08.4018592Z * [new branch] gh/anijain2305/945/orig -> origin/gh/anijain2305/945/orig 2025-12-04T08:53:08.4018783Z * [new branch] gh/anijain2305/946/base -> origin/gh/anijain2305/946/base 2025-12-04T08:53:08.4018969Z * [new branch] gh/anijain2305/946/head -> origin/gh/anijain2305/946/head 2025-12-04T08:53:08.4019162Z * [new branch] gh/anijain2305/946/orig -> origin/gh/anijain2305/946/orig 2025-12-04T08:53:08.4019354Z * [new branch] gh/anijain2305/947/base -> origin/gh/anijain2305/947/base 2025-12-04T08:53:08.4019546Z * [new branch] gh/anijain2305/947/head -> origin/gh/anijain2305/947/head 2025-12-04T08:53:08.4019738Z * [new branch] gh/anijain2305/947/orig -> origin/gh/anijain2305/947/orig 2025-12-04T08:53:08.4019927Z * [new branch] gh/anijain2305/948/base -> origin/gh/anijain2305/948/base 2025-12-04T08:53:08.4020112Z * [new branch] gh/anijain2305/948/head -> origin/gh/anijain2305/948/head 2025-12-04T08:53:08.4020301Z * [new branch] gh/anijain2305/948/orig -> origin/gh/anijain2305/948/orig 2025-12-04T08:53:08.4020492Z * [new branch] gh/anijain2305/949/base -> origin/gh/anijain2305/949/base 2025-12-04T08:53:08.4020678Z * [new branch] gh/anijain2305/949/head -> origin/gh/anijain2305/949/head 2025-12-04T08:53:08.4020869Z * [new branch] gh/anijain2305/949/orig -> origin/gh/anijain2305/949/orig 2025-12-04T08:53:08.4021058Z * [new branch] gh/anijain2305/950/base -> origin/gh/anijain2305/950/base 2025-12-04T08:53:08.4021245Z * [new branch] gh/anijain2305/950/head -> origin/gh/anijain2305/950/head 2025-12-04T08:53:08.4021435Z * [new branch] gh/anijain2305/950/orig -> origin/gh/anijain2305/950/orig 2025-12-04T08:53:08.4021624Z * [new branch] gh/anijain2305/951/base -> origin/gh/anijain2305/951/base 2025-12-04T08:53:08.4021808Z * [new branch] gh/anijain2305/951/head -> origin/gh/anijain2305/951/head 2025-12-04T08:53:08.4021997Z * [new branch] gh/anijain2305/951/orig -> origin/gh/anijain2305/951/orig 2025-12-04T08:53:08.4022188Z * [new branch] gh/anijain2305/952/base -> origin/gh/anijain2305/952/base 2025-12-04T08:53:08.4022374Z * [new branch] gh/anijain2305/952/head -> origin/gh/anijain2305/952/head 2025-12-04T08:53:08.4022562Z * [new branch] gh/anijain2305/952/orig -> origin/gh/anijain2305/952/orig 2025-12-04T08:53:08.4022750Z * [new branch] gh/anijain2305/953/base -> origin/gh/anijain2305/953/base 2025-12-04T08:53:08.4022941Z * [new branch] gh/anijain2305/953/head -> origin/gh/anijain2305/953/head 2025-12-04T08:53:08.4023131Z * [new branch] gh/anijain2305/953/orig -> origin/gh/anijain2305/953/orig 2025-12-04T08:53:08.4023372Z * [new branch] gh/anijain2305/954/base -> origin/gh/anijain2305/954/base 2025-12-04T08:53:08.4023562Z * [new branch] gh/anijain2305/954/head -> origin/gh/anijain2305/954/head 2025-12-04T08:53:08.4023750Z * [new branch] gh/anijain2305/954/orig -> origin/gh/anijain2305/954/orig 2025-12-04T08:53:08.4023935Z * [new branch] gh/anijain2305/955/base -> origin/gh/anijain2305/955/base 2025-12-04T08:53:08.4024126Z * [new branch] gh/anijain2305/955/head -> origin/gh/anijain2305/955/head 2025-12-04T08:53:08.4024318Z * [new branch] gh/anijain2305/955/orig -> origin/gh/anijain2305/955/orig 2025-12-04T08:53:08.4024549Z * [new branch] gh/anijain2305/956/base -> origin/gh/anijain2305/956/base 2025-12-04T08:53:08.4024781Z * [new branch] gh/anijain2305/956/head -> origin/gh/anijain2305/956/head 2025-12-04T08:53:08.4024969Z * [new branch] gh/anijain2305/956/orig -> origin/gh/anijain2305/956/orig 2025-12-04T08:53:08.4025154Z * [new branch] gh/anijain2305/957/base -> origin/gh/anijain2305/957/base 2025-12-04T08:53:08.4025344Z * [new branch] gh/anijain2305/957/head -> origin/gh/anijain2305/957/head 2025-12-04T08:53:08.4025532Z * [new branch] gh/anijain2305/957/orig -> origin/gh/anijain2305/957/orig 2025-12-04T08:53:08.4025718Z * [new branch] gh/anijain2305/958/base -> origin/gh/anijain2305/958/base 2025-12-04T08:53:08.4025911Z * [new branch] gh/anijain2305/958/head -> origin/gh/anijain2305/958/head 2025-12-04T08:53:08.4026102Z * [new branch] gh/anijain2305/958/orig -> origin/gh/anijain2305/958/orig 2025-12-04T08:53:08.4026287Z * [new branch] gh/anijain2305/959/base -> origin/gh/anijain2305/959/base 2025-12-04T08:53:08.4026476Z * [new branch] gh/anijain2305/959/head -> origin/gh/anijain2305/959/head 2025-12-04T08:53:08.4026667Z * [new branch] gh/anijain2305/959/orig -> origin/gh/anijain2305/959/orig 2025-12-04T08:53:08.4026854Z * [new branch] gh/anijain2305/960/base -> origin/gh/anijain2305/960/base 2025-12-04T08:53:08.4027045Z * [new branch] gh/anijain2305/960/head -> origin/gh/anijain2305/960/head 2025-12-04T08:53:08.4027231Z * [new branch] gh/anijain2305/960/orig -> origin/gh/anijain2305/960/orig 2025-12-04T08:53:08.4027418Z * [new branch] gh/anijain2305/961/base -> origin/gh/anijain2305/961/base 2025-12-04T08:53:08.4027610Z * [new branch] gh/anijain2305/961/head -> origin/gh/anijain2305/961/head 2025-12-04T08:53:08.4027798Z * [new branch] gh/anijain2305/961/orig -> origin/gh/anijain2305/961/orig 2025-12-04T08:53:08.4027987Z * [new branch] gh/anijain2305/962/base -> origin/gh/anijain2305/962/base 2025-12-04T08:53:08.4028175Z * [new branch] gh/anijain2305/962/head -> origin/gh/anijain2305/962/head 2025-12-04T08:53:08.4028360Z * [new branch] gh/anijain2305/962/orig -> origin/gh/anijain2305/962/orig 2025-12-04T08:53:08.4028550Z * [new branch] gh/anijain2305/963/base -> origin/gh/anijain2305/963/base 2025-12-04T08:53:08.4028739Z * [new branch] gh/anijain2305/963/head -> origin/gh/anijain2305/963/head 2025-12-04T08:53:08.4028924Z * [new branch] gh/anijain2305/963/orig -> origin/gh/anijain2305/963/orig 2025-12-04T08:53:08.4029114Z * [new branch] gh/anijain2305/964/base -> origin/gh/anijain2305/964/base 2025-12-04T08:53:08.4029304Z * [new branch] gh/anijain2305/964/head -> origin/gh/anijain2305/964/head 2025-12-04T08:53:08.4029491Z * [new branch] gh/anijain2305/964/orig -> origin/gh/anijain2305/964/orig 2025-12-04T08:53:08.4029679Z * [new branch] gh/anijain2305/965/base -> origin/gh/anijain2305/965/base 2025-12-04T08:53:08.4029868Z * [new branch] gh/anijain2305/965/head -> origin/gh/anijain2305/965/head 2025-12-04T08:53:08.4030055Z * [new branch] gh/anijain2305/965/orig -> origin/gh/anijain2305/965/orig 2025-12-04T08:53:08.4030248Z * [new branch] gh/anijain2305/966/base -> origin/gh/anijain2305/966/base 2025-12-04T08:53:08.4030437Z * [new branch] gh/anijain2305/966/head -> origin/gh/anijain2305/966/head 2025-12-04T08:53:08.4030622Z * [new branch] gh/anijain2305/966/orig -> origin/gh/anijain2305/966/orig 2025-12-04T08:53:08.4030813Z * [new branch] gh/anijain2305/967/base -> origin/gh/anijain2305/967/base 2025-12-04T08:53:08.4031037Z * [new branch] gh/anijain2305/967/head -> origin/gh/anijain2305/967/head 2025-12-04T08:53:08.4031221Z * [new branch] gh/anijain2305/967/orig -> origin/gh/anijain2305/967/orig 2025-12-04T08:53:08.4031450Z * [new branch] gh/anijain2305/968/base -> origin/gh/anijain2305/968/base 2025-12-04T08:53:08.4031636Z * [new branch] gh/anijain2305/968/head -> origin/gh/anijain2305/968/head 2025-12-04T08:53:08.4031823Z * [new branch] gh/anijain2305/968/orig -> origin/gh/anijain2305/968/orig 2025-12-04T08:53:08.4032012Z * [new branch] gh/anijain2305/969/base -> origin/gh/anijain2305/969/base 2025-12-04T08:53:08.4032197Z * [new branch] gh/anijain2305/969/head -> origin/gh/anijain2305/969/head 2025-12-04T08:53:08.4032385Z * [new branch] gh/anijain2305/969/orig -> origin/gh/anijain2305/969/orig 2025-12-04T08:53:08.4032575Z * [new branch] gh/anijain2305/970/base -> origin/gh/anijain2305/970/base 2025-12-04T08:53:08.4032763Z * [new branch] gh/anijain2305/970/head -> origin/gh/anijain2305/970/head 2025-12-04T08:53:08.4032958Z * [new branch] gh/anijain2305/970/orig -> origin/gh/anijain2305/970/orig 2025-12-04T08:53:08.4033145Z * [new branch] gh/anjali411/216/base -> origin/gh/anjali411/216/base 2025-12-04T08:53:08.4033374Z * [new branch] gh/anjali411/216/head -> origin/gh/anjali411/216/head 2025-12-04T08:53:08.4033561Z * [new branch] gh/anjali411/216/orig -> origin/gh/anjali411/216/orig 2025-12-04T08:53:08.4033749Z * [new branch] gh/anshul-si/1/base -> origin/gh/anshul-si/1/base 2025-12-04T08:53:08.4033932Z * [new branch] gh/anshul-si/1/head -> origin/gh/anshul-si/1/head 2025-12-04T08:53:08.4034113Z * [new branch] gh/anshul-si/2/base -> origin/gh/anshul-si/2/base 2025-12-04T08:53:08.4034294Z * [new branch] gh/anshul-si/2/head -> origin/gh/anshul-si/2/head 2025-12-04T08:53:08.4034475Z * [new branch] gh/anshul-si/3/base -> origin/gh/anshul-si/3/base 2025-12-04T08:53:08.4034654Z * [new branch] gh/anshul-si/3/head -> origin/gh/anshul-si/3/head 2025-12-04T08:53:08.4034833Z * [new branch] gh/anshul-si/4/base -> origin/gh/anshul-si/4/base 2025-12-04T08:53:08.4035010Z * [new branch] gh/anshul-si/4/head -> origin/gh/anshul-si/4/head 2025-12-04T08:53:08.4035191Z * [new branch] gh/anshul-si/5/base -> origin/gh/anshul-si/5/base 2025-12-04T08:53:08.4035371Z * [new branch] gh/anshul-si/5/head -> origin/gh/anshul-si/5/head 2025-12-04T08:53:08.4035551Z * [new branch] gh/anshul-si/53/base -> origin/gh/anshul-si/53/base 2025-12-04T08:53:08.4035738Z * [new branch] gh/anshul-si/53/head -> origin/gh/anshul-si/53/head 2025-12-04T08:53:08.4035919Z * [new branch] gh/anshul-si/58/base -> origin/gh/anshul-si/58/base 2025-12-04T08:53:08.4036103Z * [new branch] gh/anshul-si/58/head -> origin/gh/anshul-si/58/head 2025-12-04T08:53:08.4036285Z * [new branch] gh/anshul-si/66/base -> origin/gh/anshul-si/66/base 2025-12-04T08:53:08.4036463Z * [new branch] gh/anshul-si/66/head -> origin/gh/anshul-si/66/head 2025-12-04T08:53:08.4036644Z * [new branch] gh/anshul-si/66/orig -> origin/gh/anshul-si/66/orig 2025-12-04T08:53:08.4036826Z * [new branch] gh/anshul-si/67/base -> origin/gh/anshul-si/67/base 2025-12-04T08:53:08.4037004Z * [new branch] gh/anshul-si/67/head -> origin/gh/anshul-si/67/head 2025-12-04T08:53:08.4037187Z * [new branch] gh/anshul-si/67/orig -> origin/gh/anshul-si/67/orig 2025-12-04T08:53:08.4037369Z * [new branch] gh/anshul-si/68/base -> origin/gh/anshul-si/68/base 2025-12-04T08:53:08.4037555Z * [new branch] gh/anshul-si/68/head -> origin/gh/anshul-si/68/head 2025-12-04T08:53:08.4037792Z * [new branch] gh/anshul-si/68/orig -> origin/gh/anshul-si/68/orig 2025-12-04T08:53:08.4038015Z * [new branch] gh/anshul-si/69/base -> origin/gh/anshul-si/69/base 2025-12-04T08:53:08.4038194Z * [new branch] gh/anshul-si/69/head -> origin/gh/anshul-si/69/head 2025-12-04T08:53:08.4038372Z * [new branch] gh/anshul-si/69/orig -> origin/gh/anshul-si/69/orig 2025-12-04T08:53:08.4038550Z * [new branch] gh/anshul-si/70/base -> origin/gh/anshul-si/70/base 2025-12-04T08:53:08.4038733Z * [new branch] gh/anshul-si/70/head -> origin/gh/anshul-si/70/head 2025-12-04T08:53:08.4038908Z * [new branch] gh/anshul-si/70/orig -> origin/gh/anshul-si/70/orig 2025-12-04T08:53:08.4039086Z * [new branch] gh/anshul-si/71/base -> origin/gh/anshul-si/71/base 2025-12-04T08:53:08.4039266Z * [new branch] gh/anshul-si/71/head -> origin/gh/anshul-si/71/head 2025-12-04T08:53:08.4039445Z * [new branch] gh/anshul-si/71/orig -> origin/gh/anshul-si/71/orig 2025-12-04T08:53:08.4039632Z * [new branch] gh/anshul-si/72/base -> origin/gh/anshul-si/72/base 2025-12-04T08:53:08.4039809Z * [new branch] gh/anshul-si/72/head -> origin/gh/anshul-si/72/head 2025-12-04T08:53:08.4039987Z * [new branch] gh/anshul-si/72/orig -> origin/gh/anshul-si/72/orig 2025-12-04T08:53:08.4040172Z * [new branch] gh/anshul-si/73/base -> origin/gh/anshul-si/73/base 2025-12-04T08:53:08.4040350Z * [new branch] gh/anshul-si/73/head -> origin/gh/anshul-si/73/head 2025-12-04T08:53:08.4040529Z * [new branch] gh/anshul-si/73/orig -> origin/gh/anshul-si/73/orig 2025-12-04T08:53:08.4040710Z * [new branch] gh/aorenste/132/base -> origin/gh/aorenste/132/base 2025-12-04T08:53:08.4040893Z * [new branch] gh/aorenste/132/head -> origin/gh/aorenste/132/head 2025-12-04T08:53:08.4041074Z * [new branch] gh/aorenste/134/base -> origin/gh/aorenste/134/base 2025-12-04T08:53:08.4041259Z * [new branch] gh/aorenste/134/head -> origin/gh/aorenste/134/head 2025-12-04T08:53:08.4041444Z * [new branch] gh/aorenste/134/orig -> origin/gh/aorenste/134/orig 2025-12-04T08:53:08.4041624Z * [new branch] gh/aorenste/139/base -> origin/gh/aorenste/139/base 2025-12-04T08:53:08.4041803Z * [new branch] gh/aorenste/139/head -> origin/gh/aorenste/139/head 2025-12-04T08:53:08.4041980Z * [new branch] gh/aorenste/139/orig -> origin/gh/aorenste/139/orig 2025-12-04T08:53:08.4042163Z * [new branch] gh/aorenste/141/base -> origin/gh/aorenste/141/base 2025-12-04T08:53:08.4042343Z * [new branch] gh/aorenste/141/head -> origin/gh/aorenste/141/head 2025-12-04T08:53:08.4042521Z * [new branch] gh/aorenste/145/base -> origin/gh/aorenste/145/base 2025-12-04T08:53:08.4042703Z * [new branch] gh/aorenste/145/head -> origin/gh/aorenste/145/head 2025-12-04T08:53:08.4042886Z * [new branch] gh/aorenste/145/orig -> origin/gh/aorenste/145/orig 2025-12-04T08:53:08.4043067Z * [new branch] gh/aorenste/146/base -> origin/gh/aorenste/146/base 2025-12-04T08:53:08.4043293Z * [new branch] gh/aorenste/146/head -> origin/gh/aorenste/146/head 2025-12-04T08:53:08.4043477Z * [new branch] gh/aorenste/146/orig -> origin/gh/aorenste/146/orig 2025-12-04T08:53:08.4043655Z * [new branch] gh/aorenste/147/base -> origin/gh/aorenste/147/base 2025-12-04T08:53:08.4043839Z * [new branch] gh/aorenste/147/head -> origin/gh/aorenste/147/head 2025-12-04T08:53:08.4044021Z * [new branch] gh/aorenste/147/orig -> origin/gh/aorenste/147/orig 2025-12-04T08:53:08.4044271Z * [new branch] gh/aorenste/148/base -> origin/gh/aorenste/148/base 2025-12-04T08:53:08.4044450Z * [new branch] gh/aorenste/148/head -> origin/gh/aorenste/148/head 2025-12-04T08:53:08.4044679Z * [new branch] gh/aorenste/148/orig -> origin/gh/aorenste/148/orig 2025-12-04T08:53:08.4044859Z * [new branch] gh/aorenste/149/base -> origin/gh/aorenste/149/base 2025-12-04T08:53:08.4045039Z * [new branch] gh/aorenste/149/head -> origin/gh/aorenste/149/head 2025-12-04T08:53:08.4045221Z * [new branch] gh/aorenste/149/orig -> origin/gh/aorenste/149/orig 2025-12-04T08:53:08.4045403Z * [new branch] gh/aorenste/150/base -> origin/gh/aorenste/150/base 2025-12-04T08:53:08.4045586Z * [new branch] gh/aorenste/150/head -> origin/gh/aorenste/150/head 2025-12-04T08:53:08.4045766Z * [new branch] gh/aorenste/150/orig -> origin/gh/aorenste/150/orig 2025-12-04T08:53:08.4045948Z * [new branch] gh/aorenste/151/base -> origin/gh/aorenste/151/base 2025-12-04T08:53:08.4046131Z * [new branch] gh/aorenste/151/head -> origin/gh/aorenste/151/head 2025-12-04T08:53:08.4046313Z * [new branch] gh/aorenste/151/orig -> origin/gh/aorenste/151/orig 2025-12-04T08:53:08.4046495Z * [new branch] gh/aorenste/152/base -> origin/gh/aorenste/152/base 2025-12-04T08:53:08.4046674Z * [new branch] gh/aorenste/152/head -> origin/gh/aorenste/152/head 2025-12-04T08:53:08.4046852Z * [new branch] gh/aorenste/152/orig -> origin/gh/aorenste/152/orig 2025-12-04T08:53:08.4047038Z * [new branch] gh/aorenste/153/base -> origin/gh/aorenste/153/base 2025-12-04T08:53:08.4047222Z * [new branch] gh/aorenste/153/head -> origin/gh/aorenste/153/head 2025-12-04T08:53:08.4047401Z * [new branch] gh/aorenste/153/orig -> origin/gh/aorenste/153/orig 2025-12-04T08:53:08.4047586Z * [new branch] gh/aorenste/154/base -> origin/gh/aorenste/154/base 2025-12-04T08:53:08.4047766Z * [new branch] gh/aorenste/154/head -> origin/gh/aorenste/154/head 2025-12-04T08:53:08.4047950Z * [new branch] gh/aorenste/154/orig -> origin/gh/aorenste/154/orig 2025-12-04T08:53:08.4048133Z * [new branch] gh/aorenste/155/base -> origin/gh/aorenste/155/base 2025-12-04T08:53:08.4048315Z * [new branch] gh/aorenste/155/head -> origin/gh/aorenste/155/head 2025-12-04T08:53:08.4048497Z * [new branch] gh/aorenste/155/orig -> origin/gh/aorenste/155/orig 2025-12-04T08:53:08.4048677Z * [new branch] gh/aorenste/156/base -> origin/gh/aorenste/156/base 2025-12-04T08:53:08.4048859Z * [new branch] gh/aorenste/156/head -> origin/gh/aorenste/156/head 2025-12-04T08:53:08.4049035Z * [new branch] gh/aorenste/156/orig -> origin/gh/aorenste/156/orig 2025-12-04T08:53:08.4049220Z * [new branch] gh/aorenste/157/base -> origin/gh/aorenste/157/base 2025-12-04T08:53:08.4049402Z * [new branch] gh/aorenste/157/head -> origin/gh/aorenste/157/head 2025-12-04T08:53:08.4049579Z * [new branch] gh/aorenste/157/orig -> origin/gh/aorenste/157/orig 2025-12-04T08:53:08.4049758Z * [new branch] gh/aorenste/158/base -> origin/gh/aorenste/158/base 2025-12-04T08:53:08.4049938Z * [new branch] gh/aorenste/158/head -> origin/gh/aorenste/158/head 2025-12-04T08:53:08.4050119Z * [new branch] gh/aorenste/158/orig -> origin/gh/aorenste/158/orig 2025-12-04T08:53:08.4050297Z * [new branch] gh/aorenste/159/base -> origin/gh/aorenste/159/base 2025-12-04T08:53:08.4050474Z * [new branch] gh/aorenste/159/head -> origin/gh/aorenste/159/head 2025-12-04T08:53:08.4050654Z * [new branch] gh/aorenste/159/orig -> origin/gh/aorenste/159/orig 2025-12-04T08:53:08.4050887Z * [new branch] gh/avikchaudhuri/1/base -> origin/gh/avikchaudhuri/1/base 2025-12-04T08:53:08.4051113Z * [new branch] gh/avikchaudhuri/1/head -> origin/gh/avikchaudhuri/1/head 2025-12-04T08:53:08.4051311Z * [new branch] gh/avikchaudhuri/2/base -> origin/gh/avikchaudhuri/2/base 2025-12-04T08:53:08.4051503Z * [new branch] gh/avikchaudhuri/2/head -> origin/gh/avikchaudhuri/2/head 2025-12-04T08:53:08.4051693Z * [new branch] gh/avikchaudhuri/2/orig -> origin/gh/avikchaudhuri/2/orig 2025-12-04T08:53:08.4051880Z * [new branch] gh/bdhirsh/666/base -> origin/gh/bdhirsh/666/base 2025-12-04T08:53:08.4052064Z * [new branch] gh/bdhirsh/666/head -> origin/gh/bdhirsh/666/head 2025-12-04T08:53:08.4052244Z * [new branch] gh/bdhirsh/666/orig -> origin/gh/bdhirsh/666/orig 2025-12-04T08:53:08.4052422Z * [new branch] gh/bdhirsh/668/base -> origin/gh/bdhirsh/668/base 2025-12-04T08:53:08.4052600Z * [new branch] gh/bdhirsh/668/head -> origin/gh/bdhirsh/668/head 2025-12-04T08:53:08.4052779Z * [new branch] gh/bdhirsh/668/orig -> origin/gh/bdhirsh/668/orig 2025-12-04T08:53:08.4052961Z * [new branch] gh/bdhirsh/669/base -> origin/gh/bdhirsh/669/base 2025-12-04T08:53:08.4053143Z * [new branch] gh/bdhirsh/669/head -> origin/gh/bdhirsh/669/head 2025-12-04T08:53:08.4053381Z * [new branch] gh/bdhirsh/669/orig -> origin/gh/bdhirsh/669/orig 2025-12-04T08:53:08.4053558Z * [new branch] gh/bdhirsh/670/base -> origin/gh/bdhirsh/670/base 2025-12-04T08:53:08.4053735Z * [new branch] gh/bdhirsh/670/head -> origin/gh/bdhirsh/670/head 2025-12-04T08:53:08.4053909Z * [new branch] gh/bdhirsh/670/orig -> origin/gh/bdhirsh/670/orig 2025-12-04T08:53:08.4054090Z * [new branch] gh/bdhirsh/672/base -> origin/gh/bdhirsh/672/base 2025-12-04T08:53:08.4054267Z * [new branch] gh/bdhirsh/672/head -> origin/gh/bdhirsh/672/head 2025-12-04T08:53:08.4054449Z * [new branch] gh/bdhirsh/672/orig -> origin/gh/bdhirsh/672/orig 2025-12-04T08:53:08.4054626Z * [new branch] gh/bdhirsh/675/base -> origin/gh/bdhirsh/675/base 2025-12-04T08:53:08.4054803Z * [new branch] gh/bdhirsh/675/head -> origin/gh/bdhirsh/675/head 2025-12-04T08:53:08.4054982Z * [new branch] gh/bdhirsh/675/orig -> origin/gh/bdhirsh/675/orig 2025-12-04T08:53:08.4055157Z * [new branch] gh/bdhirsh/676/base -> origin/gh/bdhirsh/676/base 2025-12-04T08:53:08.4055330Z * [new branch] gh/bdhirsh/676/head -> origin/gh/bdhirsh/676/head 2025-12-04T08:53:08.4055505Z * [new branch] gh/bdhirsh/676/orig -> origin/gh/bdhirsh/676/orig 2025-12-04T08:53:08.4055682Z * [new branch] gh/bdhirsh/677/base -> origin/gh/bdhirsh/677/base 2025-12-04T08:53:08.4055752Z * [new branch] gh/bdhirsh/677/head -> origin/gh/bdhirsh/677/head 2025-12-04T08:53:08.4055822Z * [new branch] gh/bdhirsh/677/orig -> origin/gh/bdhirsh/677/orig 2025-12-04T08:53:08.4055891Z * [new branch] gh/bdhirsh/678/base -> origin/gh/bdhirsh/678/base 2025-12-04T08:53:08.4055959Z * [new branch] gh/bdhirsh/678/head -> origin/gh/bdhirsh/678/head 2025-12-04T08:53:08.4056027Z * [new branch] gh/bdhirsh/678/orig -> origin/gh/bdhirsh/678/orig 2025-12-04T08:53:08.4056098Z * [new branch] gh/bdhirsh/679/base -> origin/gh/bdhirsh/679/base 2025-12-04T08:53:08.4056165Z * [new branch] gh/bdhirsh/679/head -> origin/gh/bdhirsh/679/head 2025-12-04T08:53:08.4056232Z * [new branch] gh/bdhirsh/679/orig -> origin/gh/bdhirsh/679/orig 2025-12-04T08:53:08.4056305Z * [new branch] gh/bdhirsh/680/base -> origin/gh/bdhirsh/680/base 2025-12-04T08:53:08.4056421Z * [new branch] gh/bdhirsh/680/head -> origin/gh/bdhirsh/680/head 2025-12-04T08:53:08.4056541Z * [new branch] gh/bdhirsh/680/orig -> origin/gh/bdhirsh/680/orig 2025-12-04T08:53:08.4056613Z * [new branch] gh/bdhirsh/681/base -> origin/gh/bdhirsh/681/base 2025-12-04T08:53:08.4056681Z * [new branch] gh/bdhirsh/681/head -> origin/gh/bdhirsh/681/head 2025-12-04T08:53:08.4056750Z * [new branch] gh/bdhirsh/681/orig -> origin/gh/bdhirsh/681/orig 2025-12-04T08:53:08.4056842Z * [new branch] gh/benjaminglass1/101/base -> origin/gh/benjaminglass1/101/base 2025-12-04T08:53:08.4056930Z * [new branch] gh/benjaminglass1/101/head -> origin/gh/benjaminglass1/101/head 2025-12-04T08:53:08.4057018Z * [new branch] gh/benjaminglass1/101/orig -> origin/gh/benjaminglass1/101/orig 2025-12-04T08:53:08.4057104Z * [new branch] gh/benjaminglass1/102/base -> origin/gh/benjaminglass1/102/base 2025-12-04T08:53:08.4057188Z * [new branch] gh/benjaminglass1/102/head -> origin/gh/benjaminglass1/102/head 2025-12-04T08:53:08.4057275Z * [new branch] gh/benjaminglass1/102/orig -> origin/gh/benjaminglass1/102/orig 2025-12-04T08:53:08.4057359Z * [new branch] gh/benjaminglass1/106/base -> origin/gh/benjaminglass1/106/base 2025-12-04T08:53:08.4057443Z * [new branch] gh/benjaminglass1/106/head -> origin/gh/benjaminglass1/106/head 2025-12-04T08:53:08.4057531Z * [new branch] gh/benjaminglass1/106/orig -> origin/gh/benjaminglass1/106/orig 2025-12-04T08:53:08.4057616Z * [new branch] gh/benjaminglass1/107/base -> origin/gh/benjaminglass1/107/base 2025-12-04T08:53:08.4057699Z * [new branch] gh/benjaminglass1/107/head -> origin/gh/benjaminglass1/107/head 2025-12-04T08:53:08.4057784Z * [new branch] gh/benjaminglass1/107/orig -> origin/gh/benjaminglass1/107/orig 2025-12-04T08:53:08.4057868Z * [new branch] gh/benjaminglass1/108/base -> origin/gh/benjaminglass1/108/base 2025-12-04T08:53:08.4057953Z * [new branch] gh/benjaminglass1/108/head -> origin/gh/benjaminglass1/108/head 2025-12-04T08:53:08.4058038Z * [new branch] gh/benjaminglass1/108/orig -> origin/gh/benjaminglass1/108/orig 2025-12-04T08:53:08.4058121Z * [new branch] gh/benjaminglass1/109/base -> origin/gh/benjaminglass1/109/base 2025-12-04T08:53:08.4058205Z * [new branch] gh/benjaminglass1/109/head -> origin/gh/benjaminglass1/109/head 2025-12-04T08:53:08.4058288Z * [new branch] gh/benjaminglass1/109/orig -> origin/gh/benjaminglass1/109/orig 2025-12-04T08:53:08.4058373Z * [new branch] gh/benjaminglass1/97/base -> origin/gh/benjaminglass1/97/base 2025-12-04T08:53:08.4058458Z * [new branch] gh/benjaminglass1/97/head -> origin/gh/benjaminglass1/97/head 2025-12-04T08:53:08.4058541Z * [new branch] gh/benjaminglass1/97/orig -> origin/gh/benjaminglass1/97/orig 2025-12-04T08:53:08.4058620Z * [new branch] gh/bobrenjc93/570/base -> origin/gh/bobrenjc93/570/base 2025-12-04T08:53:08.4058696Z * [new branch] gh/bobrenjc93/570/head -> origin/gh/bobrenjc93/570/head 2025-12-04T08:53:08.4058769Z * [new branch] gh/bobrenjc93/570/orig -> origin/gh/bobrenjc93/570/orig 2025-12-04T08:53:08.4058844Z * [new branch] gh/bobrenjc93/604/base -> origin/gh/bobrenjc93/604/base 2025-12-04T08:53:08.4058918Z * [new branch] gh/bobrenjc93/604/head -> origin/gh/bobrenjc93/604/head 2025-12-04T08:53:08.4058992Z * [new branch] gh/bobrenjc93/604/orig -> origin/gh/bobrenjc93/604/orig 2025-12-04T08:53:08.4059063Z * [new branch] gh/bobrenjc93/638/base -> origin/gh/bobrenjc93/638/base 2025-12-04T08:53:08.4059137Z * [new branch] gh/bobrenjc93/638/head -> origin/gh/bobrenjc93/638/head 2025-12-04T08:53:08.4059244Z * [new branch] gh/bobrenjc93/638/orig -> origin/gh/bobrenjc93/638/orig 2025-12-04T08:53:08.4059351Z * [new branch] gh/bobrenjc93/653/base -> origin/gh/bobrenjc93/653/base 2025-12-04T08:53:08.4059425Z * [new branch] gh/bobrenjc93/653/head -> origin/gh/bobrenjc93/653/head 2025-12-04T08:53:08.4059496Z * [new branch] gh/bobrenjc93/653/orig -> origin/gh/bobrenjc93/653/orig 2025-12-04T08:53:08.4059568Z * [new branch] gh/bobrenjc93/654/base -> origin/gh/bobrenjc93/654/base 2025-12-04T08:53:08.4059641Z * [new branch] gh/bobrenjc93/654/head -> origin/gh/bobrenjc93/654/head 2025-12-04T08:53:08.4059713Z * [new branch] gh/bobrenjc93/654/orig -> origin/gh/bobrenjc93/654/orig 2025-12-04T08:53:08.4059785Z * [new branch] gh/bobrenjc93/657/base -> origin/gh/bobrenjc93/657/base 2025-12-04T08:53:08.4059859Z * [new branch] gh/bobrenjc93/657/head -> origin/gh/bobrenjc93/657/head 2025-12-04T08:53:08.4059931Z * [new branch] gh/bobrenjc93/657/orig -> origin/gh/bobrenjc93/657/orig 2025-12-04T08:53:08.4060009Z * [new branch] gh/bobrenjc93/672/base -> origin/gh/bobrenjc93/672/base 2025-12-04T08:53:08.4060080Z * [new branch] gh/bobrenjc93/672/head -> origin/gh/bobrenjc93/672/head 2025-12-04T08:53:08.4060151Z * [new branch] gh/bobrenjc93/672/orig -> origin/gh/bobrenjc93/672/orig 2025-12-04T08:53:08.4060224Z * [new branch] gh/bobrenjc93/679/base -> origin/gh/bobrenjc93/679/base 2025-12-04T08:53:08.4060296Z * [new branch] gh/bobrenjc93/679/head -> origin/gh/bobrenjc93/679/head 2025-12-04T08:53:08.4060368Z * [new branch] gh/bobrenjc93/679/orig -> origin/gh/bobrenjc93/679/orig 2025-12-04T08:53:08.4060441Z * [new branch] gh/bobrenjc93/680/base -> origin/gh/bobrenjc93/680/base 2025-12-04T08:53:08.4060514Z * [new branch] gh/bobrenjc93/680/head -> origin/gh/bobrenjc93/680/head 2025-12-04T08:53:08.4060586Z * [new branch] gh/bobrenjc93/680/orig -> origin/gh/bobrenjc93/680/orig 2025-12-04T08:53:08.4060659Z * [new branch] gh/bobrenjc93/681/base -> origin/gh/bobrenjc93/681/base 2025-12-04T08:53:08.4060730Z * [new branch] gh/bobrenjc93/681/head -> origin/gh/bobrenjc93/681/head 2025-12-04T08:53:08.4060803Z * [new branch] gh/bobrenjc93/681/orig -> origin/gh/bobrenjc93/681/orig 2025-12-04T08:53:08.4060877Z * [new branch] gh/bobrenjc93/682/base -> origin/gh/bobrenjc93/682/base 2025-12-04T08:53:08.4060948Z * [new branch] gh/bobrenjc93/682/head -> origin/gh/bobrenjc93/682/head 2025-12-04T08:53:08.4061020Z * [new branch] gh/bobrenjc93/682/orig -> origin/gh/bobrenjc93/682/orig 2025-12-04T08:53:08.4061093Z * [new branch] gh/bobrenjc93/683/base -> origin/gh/bobrenjc93/683/base 2025-12-04T08:53:08.4061166Z * [new branch] gh/bobrenjc93/683/head -> origin/gh/bobrenjc93/683/head 2025-12-04T08:53:08.4061240Z * [new branch] gh/bobrenjc93/683/orig -> origin/gh/bobrenjc93/683/orig 2025-12-04T08:53:08.4061312Z * [new branch] gh/bobrenjc93/684/base -> origin/gh/bobrenjc93/684/base 2025-12-04T08:53:08.4061383Z * [new branch] gh/bobrenjc93/684/head -> origin/gh/bobrenjc93/684/head 2025-12-04T08:53:08.4061456Z * [new branch] gh/bobrenjc93/684/orig -> origin/gh/bobrenjc93/684/orig 2025-12-04T08:53:08.4061526Z * [new branch] gh/bobrenjc93/685/base -> origin/gh/bobrenjc93/685/base 2025-12-04T08:53:08.4061599Z * [new branch] gh/bobrenjc93/685/head -> origin/gh/bobrenjc93/685/head 2025-12-04T08:53:08.4061673Z * [new branch] gh/bobrenjc93/685/orig -> origin/gh/bobrenjc93/685/orig 2025-12-04T08:53:08.4061745Z * [new branch] gh/bobrenjc93/686/base -> origin/gh/bobrenjc93/686/base 2025-12-04T08:53:08.4061855Z * [new branch] gh/bobrenjc93/686/head -> origin/gh/bobrenjc93/686/head 2025-12-04T08:53:08.4061956Z * [new branch] gh/bobrenjc93/686/orig -> origin/gh/bobrenjc93/686/orig 2025-12-04T08:53:08.4062029Z * [new branch] gh/bobrenjc93/687/base -> origin/gh/bobrenjc93/687/base 2025-12-04T08:53:08.4062100Z * [new branch] gh/bobrenjc93/687/head -> origin/gh/bobrenjc93/687/head 2025-12-04T08:53:08.4062173Z * [new branch] gh/bobrenjc93/687/orig -> origin/gh/bobrenjc93/687/orig 2025-12-04T08:53:08.4062243Z * [new branch] gh/bobrenjc93/688/base -> origin/gh/bobrenjc93/688/base 2025-12-04T08:53:08.4062315Z * [new branch] gh/bobrenjc93/688/head -> origin/gh/bobrenjc93/688/head 2025-12-04T08:53:08.4062390Z * [new branch] gh/bobrenjc93/688/orig -> origin/gh/bobrenjc93/688/orig 2025-12-04T08:53:08.4062465Z * [new branch] gh/bobrenjc93/689/base -> origin/gh/bobrenjc93/689/base 2025-12-04T08:53:08.4062538Z * [new branch] gh/bobrenjc93/689/head -> origin/gh/bobrenjc93/689/head 2025-12-04T08:53:08.4062613Z * [new branch] gh/bobrenjc93/689/orig -> origin/gh/bobrenjc93/689/orig 2025-12-04T08:53:08.4062684Z * [new branch] gh/bobrenjc93/690/base -> origin/gh/bobrenjc93/690/base 2025-12-04T08:53:08.4062757Z * [new branch] gh/bobrenjc93/690/head -> origin/gh/bobrenjc93/690/head 2025-12-04T08:53:08.4062828Z * [new branch] gh/bobrenjc93/690/orig -> origin/gh/bobrenjc93/690/orig 2025-12-04T08:53:08.4062899Z * [new branch] gh/bobrenjc93/691/base -> origin/gh/bobrenjc93/691/base 2025-12-04T08:53:08.4062971Z * [new branch] gh/bobrenjc93/691/head -> origin/gh/bobrenjc93/691/head 2025-12-04T08:53:08.4063042Z * [new branch] gh/bobrenjc93/691/orig -> origin/gh/bobrenjc93/691/orig 2025-12-04T08:53:08.4063117Z * [new branch] gh/bobrenjc93/692/base -> origin/gh/bobrenjc93/692/base 2025-12-04T08:53:08.4063194Z * [new branch] gh/bobrenjc93/692/head -> origin/gh/bobrenjc93/692/head 2025-12-04T08:53:08.4063306Z * [new branch] gh/bobrenjc93/692/orig -> origin/gh/bobrenjc93/692/orig 2025-12-04T08:53:08.4063380Z * [new branch] gh/bobrenjc93/693/base -> origin/gh/bobrenjc93/693/base 2025-12-04T08:53:08.4063453Z * [new branch] gh/bobrenjc93/693/head -> origin/gh/bobrenjc93/693/head 2025-12-04T08:53:08.4063524Z * [new branch] gh/bobrenjc93/693/orig -> origin/gh/bobrenjc93/693/orig 2025-12-04T08:53:08.4063595Z * [new branch] gh/bobrenjc93/694/base -> origin/gh/bobrenjc93/694/base 2025-12-04T08:53:08.4063668Z * [new branch] gh/bobrenjc93/694/head -> origin/gh/bobrenjc93/694/head 2025-12-04T08:53:08.4063740Z * [new branch] gh/bobrenjc93/694/orig -> origin/gh/bobrenjc93/694/orig 2025-12-04T08:53:08.4063813Z * [new branch] gh/bobrenjc93/695/base -> origin/gh/bobrenjc93/695/base 2025-12-04T08:53:08.4063890Z * [new branch] gh/bobrenjc93/695/head -> origin/gh/bobrenjc93/695/head 2025-12-04T08:53:08.4063963Z * [new branch] gh/bobrenjc93/695/orig -> origin/gh/bobrenjc93/695/orig 2025-12-04T08:53:08.4064030Z * [new branch] gh/c00w/23/base -> origin/gh/c00w/23/base 2025-12-04T08:53:08.4064097Z * [new branch] gh/c00w/23/head -> origin/gh/c00w/23/head 2025-12-04T08:53:08.4064160Z * [new branch] gh/c00w/53/base -> origin/gh/c00w/53/base 2025-12-04T08:53:08.4064223Z * [new branch] gh/c00w/53/head -> origin/gh/c00w/53/head 2025-12-04T08:53:08.4064287Z * [new branch] gh/c00w/53/orig -> origin/gh/c00w/53/orig 2025-12-04T08:53:08.4064348Z * [new branch] gh/c00w/54/base -> origin/gh/c00w/54/base 2025-12-04T08:53:08.4064708Z * [new branch] gh/c00w/54/head -> origin/gh/c00w/54/head 2025-12-04T08:53:08.4064813Z * [new branch] gh/c00w/54/orig -> origin/gh/c00w/54/orig 2025-12-04T08:53:08.4064875Z * [new branch] gh/c00w/56/base -> origin/gh/c00w/56/base 2025-12-04T08:53:08.4064940Z * [new branch] gh/c00w/56/head -> origin/gh/c00w/56/head 2025-12-04T08:53:08.4065002Z * [new branch] gh/c00w/56/orig -> origin/gh/c00w/56/orig 2025-12-04T08:53:08.4065064Z * [new branch] gh/c00w/57/base -> origin/gh/c00w/57/base 2025-12-04T08:53:08.4065127Z * [new branch] gh/c00w/57/head -> origin/gh/c00w/57/head 2025-12-04T08:53:08.4065190Z * [new branch] gh/c00w/57/orig -> origin/gh/c00w/57/orig 2025-12-04T08:53:08.4065254Z * [new branch] gh/c00w/58/base -> origin/gh/c00w/58/base 2025-12-04T08:53:08.4065321Z * [new branch] gh/c00w/58/head -> origin/gh/c00w/58/head 2025-12-04T08:53:08.4065383Z * [new branch] gh/c00w/58/orig -> origin/gh/c00w/58/orig 2025-12-04T08:53:08.4065455Z * [new branch] gh/clee2000/1/base -> origin/gh/clee2000/1/base 2025-12-04T08:53:08.4065527Z * [new branch] gh/clee2000/1/head -> origin/gh/clee2000/1/head 2025-12-04T08:53:08.4065595Z * [new branch] gh/clee2000/1/orig -> origin/gh/clee2000/1/orig 2025-12-04T08:53:08.4065672Z * [new branch] gh/coconutruben/1/base -> origin/gh/coconutruben/1/base 2025-12-04T08:53:08.4065749Z * [new branch] gh/coconutruben/1/head -> origin/gh/coconutruben/1/head 2025-12-04T08:53:08.4065829Z * [new branch] gh/coconutruben/55/base -> origin/gh/coconutruben/55/base 2025-12-04T08:53:08.4065908Z * [new branch] gh/coconutruben/55/head -> origin/gh/coconutruben/55/head 2025-12-04T08:53:08.4065989Z * [new branch] gh/coconutruben/55/orig -> origin/gh/coconutruben/55/orig 2025-12-04T08:53:08.4066064Z * [new branch] gh/coconutruben/57/base -> origin/gh/coconutruben/57/base 2025-12-04T08:53:08.4066141Z * [new branch] gh/coconutruben/57/head -> origin/gh/coconutruben/57/head 2025-12-04T08:53:08.4066218Z * [new branch] gh/coconutruben/57/orig -> origin/gh/coconutruben/57/orig 2025-12-04T08:53:08.4066293Z * [new branch] gh/coconutruben/70/base -> origin/gh/coconutruben/70/base 2025-12-04T08:53:08.4066369Z * [new branch] gh/coconutruben/70/head -> origin/gh/coconutruben/70/head 2025-12-04T08:53:08.4066444Z * [new branch] gh/coconutruben/70/orig -> origin/gh/coconutruben/70/orig 2025-12-04T08:53:08.4066518Z * [new branch] gh/coconutruben/71/base -> origin/gh/coconutruben/71/base 2025-12-04T08:53:08.4066598Z * [new branch] gh/coconutruben/71/head -> origin/gh/coconutruben/71/head 2025-12-04T08:53:08.4066676Z * [new branch] gh/coconutruben/71/orig -> origin/gh/coconutruben/71/orig 2025-12-04T08:53:08.4066751Z * [new branch] gh/coconutruben/72/base -> origin/gh/coconutruben/72/base 2025-12-04T08:53:08.4066828Z * [new branch] gh/coconutruben/72/head -> origin/gh/coconutruben/72/head 2025-12-04T08:53:08.4066903Z * [new branch] gh/coconutruben/72/orig -> origin/gh/coconutruben/72/orig 2025-12-04T08:53:08.4066977Z * [new branch] gh/coconutruben/73/base -> origin/gh/coconutruben/73/base 2025-12-04T08:53:08.4067054Z * [new branch] gh/coconutruben/73/head -> origin/gh/coconutruben/73/head 2025-12-04T08:53:08.4067128Z * [new branch] gh/coconutruben/73/orig -> origin/gh/coconutruben/73/orig 2025-12-04T08:53:08.4067202Z * [new branch] gh/coconutruben/74/base -> origin/gh/coconutruben/74/base 2025-12-04T08:53:08.4067278Z * [new branch] gh/coconutruben/74/head -> origin/gh/coconutruben/74/head 2025-12-04T08:53:08.4067389Z * [new branch] gh/coconutruben/74/orig -> origin/gh/coconutruben/74/orig 2025-12-04T08:53:08.4067492Z * [new branch] gh/coconutruben/79/base -> origin/gh/coconutruben/79/base 2025-12-04T08:53:08.4067571Z * [new branch] gh/coconutruben/79/head -> origin/gh/coconutruben/79/head 2025-12-04T08:53:08.4067647Z * [new branch] gh/coconutruben/79/orig -> origin/gh/coconutruben/79/orig 2025-12-04T08:53:08.4067723Z * [new branch] gh/coconutruben/80/base -> origin/gh/coconutruben/80/base 2025-12-04T08:53:08.4067801Z * [new branch] gh/coconutruben/80/head -> origin/gh/coconutruben/80/head 2025-12-04T08:53:08.4067876Z * [new branch] gh/coconutruben/80/orig -> origin/gh/coconutruben/80/orig 2025-12-04T08:53:08.4067953Z * [new branch] gh/coconutruben/82/base -> origin/gh/coconutruben/82/base 2025-12-04T08:53:08.4068030Z * [new branch] gh/coconutruben/82/head -> origin/gh/coconutruben/82/head 2025-12-04T08:53:08.4068105Z * [new branch] gh/coconutruben/82/orig -> origin/gh/coconutruben/82/orig 2025-12-04T08:53:08.4068185Z * [new branch] gh/coconutruben/83/base -> origin/gh/coconutruben/83/base 2025-12-04T08:53:08.4068261Z * [new branch] gh/coconutruben/83/head -> origin/gh/coconutruben/83/head 2025-12-04T08:53:08.4068336Z * [new branch] gh/coconutruben/83/orig -> origin/gh/coconutruben/83/orig 2025-12-04T08:53:08.4068411Z * [new branch] gh/coconutruben/84/base -> origin/gh/coconutruben/84/base 2025-12-04T08:53:08.4068486Z * [new branch] gh/coconutruben/84/head -> origin/gh/coconutruben/84/head 2025-12-04T08:53:08.4068561Z * [new branch] gh/coconutruben/84/orig -> origin/gh/coconutruben/84/orig 2025-12-04T08:53:08.4068637Z * [new branch] gh/coconutruben/85/base -> origin/gh/coconutruben/85/base 2025-12-04T08:53:08.4068714Z * [new branch] gh/coconutruben/85/head -> origin/gh/coconutruben/85/head 2025-12-04T08:53:08.4068791Z * [new branch] gh/coconutruben/85/orig -> origin/gh/coconutruben/85/orig 2025-12-04T08:53:08.4068868Z * [new branch] gh/coconutruben/86/base -> origin/gh/coconutruben/86/base 2025-12-04T08:53:08.4068945Z * [new branch] gh/coconutruben/86/head -> origin/gh/coconutruben/86/head 2025-12-04T08:53:08.4069022Z * [new branch] gh/coconutruben/86/orig -> origin/gh/coconutruben/86/orig 2025-12-04T08:53:08.4069099Z * [new branch] gh/colinchan15/1/base -> origin/gh/colinchan15/1/base 2025-12-04T08:53:08.4069174Z * [new branch] gh/colinchan15/1/head -> origin/gh/colinchan15/1/head 2025-12-04T08:53:08.4069248Z * [new branch] gh/colinchan15/2/base -> origin/gh/colinchan15/2/base 2025-12-04T08:53:08.4069324Z * [new branch] gh/colinchan15/2/head -> origin/gh/colinchan15/2/head 2025-12-04T08:53:08.4069398Z * [new branch] gh/colinchan15/3/base -> origin/gh/colinchan15/3/base 2025-12-04T08:53:08.4069473Z * [new branch] gh/colinchan15/3/head -> origin/gh/colinchan15/3/head 2025-12-04T08:53:08.4069548Z * [new branch] gh/colinchan15/6/base -> origin/gh/colinchan15/6/base 2025-12-04T08:53:08.4069622Z * [new branch] gh/colinchan15/6/head -> origin/gh/colinchan15/6/head 2025-12-04T08:53:08.4069695Z * [new branch] gh/d4l3k/1/base -> origin/gh/d4l3k/1/base 2025-12-04T08:53:08.4069762Z * [new branch] gh/d4l3k/1/head -> origin/gh/d4l3k/1/head 2025-12-04T08:53:08.4069827Z * [new branch] gh/d4l3k/2/base -> origin/gh/d4l3k/2/base 2025-12-04T08:53:08.4069894Z * [new branch] gh/d4l3k/2/head -> origin/gh/d4l3k/2/head 2025-12-04T08:53:08.4069959Z * [new branch] gh/d4l3k/2/orig -> origin/gh/d4l3k/2/orig 2025-12-04T08:53:08.4070059Z * [new branch] gh/d4l3k/3/base -> origin/gh/d4l3k/3/base 2025-12-04T08:53:08.4070171Z * [new branch] gh/d4l3k/3/head -> origin/gh/d4l3k/3/head 2025-12-04T08:53:08.4070235Z * [new branch] gh/d4l3k/3/orig -> origin/gh/d4l3k/3/orig 2025-12-04T08:53:08.4070298Z * [new branch] gh/d4l3k/4/base -> origin/gh/d4l3k/4/base 2025-12-04T08:53:08.4070366Z * [new branch] gh/d4l3k/4/head -> origin/gh/d4l3k/4/head 2025-12-04T08:53:08.4070430Z * [new branch] gh/d4l3k/4/orig -> origin/gh/d4l3k/4/orig 2025-12-04T08:53:08.4070496Z * [new branch] gh/d4l3k/5/base -> origin/gh/d4l3k/5/base 2025-12-04T08:53:08.4070563Z * [new branch] gh/d4l3k/5/orig -> origin/gh/d4l3k/5/orig 2025-12-04T08:53:08.4070650Z * [new branch] gh/davidberard98/392/base -> origin/gh/davidberard98/392/base 2025-12-04T08:53:08.4070738Z * [new branch] gh/davidberard98/392/head -> origin/gh/davidberard98/392/head 2025-12-04T08:53:08.4070827Z * [new branch] gh/davidberard98/392/orig -> origin/gh/davidberard98/392/orig 2025-12-04T08:53:08.4070910Z * [new branch] gh/davidberard98/399/base -> origin/gh/davidberard98/399/base 2025-12-04T08:53:08.4070992Z * [new branch] gh/davidberard98/399/head -> origin/gh/davidberard98/399/head 2025-12-04T08:53:08.4071078Z * [new branch] gh/davidberard98/399/orig -> origin/gh/davidberard98/399/orig 2025-12-04T08:53:08.4071156Z * [new branch] gh/desertfire/605/base -> origin/gh/desertfire/605/base 2025-12-04T08:53:08.4071235Z * [new branch] gh/desertfire/605/head -> origin/gh/desertfire/605/head 2025-12-04T08:53:08.4071310Z * [new branch] gh/desertfire/605/orig -> origin/gh/desertfire/605/orig 2025-12-04T08:53:08.4071385Z * [new branch] gh/desertfire/606/base -> origin/gh/desertfire/606/base 2025-12-04T08:53:08.4071467Z * [new branch] gh/desertfire/606/head -> origin/gh/desertfire/606/head 2025-12-04T08:53:08.4071543Z * [new branch] gh/desertfire/606/orig -> origin/gh/desertfire/606/orig 2025-12-04T08:53:08.4071617Z * [new branch] gh/desertfire/607/base -> origin/gh/desertfire/607/base 2025-12-04T08:53:08.4071694Z * [new branch] gh/desertfire/607/head -> origin/gh/desertfire/607/head 2025-12-04T08:53:08.4071769Z * [new branch] gh/desertfire/607/orig -> origin/gh/desertfire/607/orig 2025-12-04T08:53:08.4071842Z * [new branch] gh/desertfire/608/base -> origin/gh/desertfire/608/base 2025-12-04T08:53:08.4071917Z * [new branch] gh/desertfire/608/head -> origin/gh/desertfire/608/head 2025-12-04T08:53:08.4071990Z * [new branch] gh/desertfire/608/orig -> origin/gh/desertfire/608/orig 2025-12-04T08:53:08.4072065Z * [new branch] gh/desertfire/609/base -> origin/gh/desertfire/609/base 2025-12-04T08:53:08.4072140Z * [new branch] gh/desertfire/609/head -> origin/gh/desertfire/609/head 2025-12-04T08:53:08.4072215Z * [new branch] gh/desertfire/609/orig -> origin/gh/desertfire/609/orig 2025-12-04T08:53:08.4072291Z * [new branch] gh/desertfire/610/base -> origin/gh/desertfire/610/base 2025-12-04T08:53:08.4072369Z * [new branch] gh/desertfire/610/head -> origin/gh/desertfire/610/head 2025-12-04T08:53:08.4072442Z * [new branch] gh/desertfire/610/orig -> origin/gh/desertfire/610/orig 2025-12-04T08:53:08.4072516Z * [new branch] gh/desertfire/611/base -> origin/gh/desertfire/611/base 2025-12-04T08:53:08.4072593Z * [new branch] gh/desertfire/611/head -> origin/gh/desertfire/611/head 2025-12-04T08:53:08.4072667Z * [new branch] gh/desertfire/611/orig -> origin/gh/desertfire/611/orig 2025-12-04T08:53:08.4072771Z * [new branch] gh/desertfire/612/base -> origin/gh/desertfire/612/base 2025-12-04T08:53:08.4072874Z * [new branch] gh/desertfire/612/head -> origin/gh/desertfire/612/head 2025-12-04T08:53:08.4072951Z * [new branch] gh/desertfire/612/orig -> origin/gh/desertfire/612/orig 2025-12-04T08:53:08.4073030Z * [new branch] gh/desertfire/613/base -> origin/gh/desertfire/613/base 2025-12-04T08:53:08.4073105Z * [new branch] gh/desertfire/613/head -> origin/gh/desertfire/613/head 2025-12-04T08:53:08.4073180Z * [new branch] gh/desertfire/613/orig -> origin/gh/desertfire/613/orig 2025-12-04T08:53:08.4073309Z * [new branch] gh/desertfire/614/base -> origin/gh/desertfire/614/base 2025-12-04T08:53:08.4073384Z * [new branch] gh/desertfire/614/head -> origin/gh/desertfire/614/head 2025-12-04T08:53:08.4073460Z * [new branch] gh/desertfire/614/orig -> origin/gh/desertfire/614/orig 2025-12-04T08:53:08.4073538Z * [new branch] gh/desertfire/615/base -> origin/gh/desertfire/615/base 2025-12-04T08:53:08.4073613Z * [new branch] gh/desertfire/615/head -> origin/gh/desertfire/615/head 2025-12-04T08:53:08.4073686Z * [new branch] gh/desertfire/615/orig -> origin/gh/desertfire/615/orig 2025-12-04T08:53:08.4073763Z * [new branch] gh/desertfire/616/base -> origin/gh/desertfire/616/base 2025-12-04T08:53:08.4073836Z * [new branch] gh/desertfire/616/head -> origin/gh/desertfire/616/head 2025-12-04T08:53:08.4073911Z * [new branch] gh/desertfire/616/orig -> origin/gh/desertfire/616/orig 2025-12-04T08:53:08.4073989Z * [new branch] gh/desertfire/617/base -> origin/gh/desertfire/617/base 2025-12-04T08:53:08.4074064Z * [new branch] gh/desertfire/617/head -> origin/gh/desertfire/617/head 2025-12-04T08:53:08.4074140Z * [new branch] gh/desertfire/617/orig -> origin/gh/desertfire/617/orig 2025-12-04T08:53:08.4074214Z * [new branch] gh/dharakk/1/base -> origin/gh/dharakk/1/base 2025-12-04T08:53:08.4074286Z * [new branch] gh/dharakk/1/head -> origin/gh/dharakk/1/head 2025-12-04T08:53:08.4074357Z * [new branch] gh/drisspg/170/base -> origin/gh/drisspg/170/base 2025-12-04T08:53:08.4074431Z * [new branch] gh/drisspg/170/head -> origin/gh/drisspg/170/head 2025-12-04T08:53:08.4074501Z * [new branch] gh/drisspg/170/orig -> origin/gh/drisspg/170/orig 2025-12-04T08:53:08.4074575Z * [new branch] gh/drisspg/182/base -> origin/gh/drisspg/182/base 2025-12-04T08:53:08.4074645Z * [new branch] gh/drisspg/182/head -> origin/gh/drisspg/182/head 2025-12-04T08:53:08.4074714Z * [new branch] gh/drisspg/183/base -> origin/gh/drisspg/183/base 2025-12-04T08:53:08.4074789Z * [new branch] gh/drisspg/183/head -> origin/gh/drisspg/183/head 2025-12-04T08:53:08.4074860Z * [new branch] gh/drisspg/184/base -> origin/gh/drisspg/184/base 2025-12-04T08:53:08.4074930Z * [new branch] gh/drisspg/184/head -> origin/gh/drisspg/184/head 2025-12-04T08:53:08.4075002Z * [new branch] gh/drisspg/185/base -> origin/gh/drisspg/185/base 2025-12-04T08:53:08.4075069Z * [new branch] gh/drisspg/185/head -> origin/gh/drisspg/185/head 2025-12-04T08:53:08.4075138Z * [new branch] gh/drisspg/194/base -> origin/gh/drisspg/194/base 2025-12-04T08:53:08.4075210Z * [new branch] gh/drisspg/194/head -> origin/gh/drisspg/194/head 2025-12-04T08:53:08.4075282Z * [new branch] gh/drisspg/194/orig -> origin/gh/drisspg/194/orig 2025-12-04T08:53:08.4075354Z * [new branch] gh/drisspg/200/base -> origin/gh/drisspg/200/base 2025-12-04T08:53:08.4075428Z * [new branch] gh/drisspg/200/head -> origin/gh/drisspg/200/head 2025-12-04T08:53:08.4075545Z * [new branch] gh/drisspg/200/orig -> origin/gh/drisspg/200/orig 2025-12-04T08:53:08.4075657Z * [new branch] gh/drisspg/218/base -> origin/gh/drisspg/218/base 2025-12-04T08:53:08.4075731Z * [new branch] gh/drisspg/218/head -> origin/gh/drisspg/218/head 2025-12-04T08:53:08.4075800Z * [new branch] gh/drisspg/218/orig -> origin/gh/drisspg/218/orig 2025-12-04T08:53:08.4075869Z * [new branch] gh/drisspg/219/base -> origin/gh/drisspg/219/base 2025-12-04T08:53:08.4075940Z * [new branch] gh/drisspg/219/head -> origin/gh/drisspg/219/head 2025-12-04T08:53:08.4076012Z * [new branch] gh/drisspg/219/orig -> origin/gh/drisspg/219/orig 2025-12-04T08:53:08.4076080Z * [new branch] gh/drisspg/220/base -> origin/gh/drisspg/220/base 2025-12-04T08:53:08.4076153Z * [new branch] gh/drisspg/220/head -> origin/gh/drisspg/220/head 2025-12-04T08:53:08.4076225Z * [new branch] gh/drisspg/220/orig -> origin/gh/drisspg/220/orig 2025-12-04T08:53:08.4076299Z * [new branch] gh/drisspg/221/base -> origin/gh/drisspg/221/base 2025-12-04T08:53:08.4076369Z * [new branch] gh/drisspg/221/head -> origin/gh/drisspg/221/head 2025-12-04T08:53:08.4076438Z * [new branch] gh/drisspg/221/orig -> origin/gh/drisspg/221/orig 2025-12-04T08:53:08.4076511Z * [new branch] gh/drisspg/222/base -> origin/gh/drisspg/222/base 2025-12-04T08:53:08.4076580Z * [new branch] gh/drisspg/222/head -> origin/gh/drisspg/222/head 2025-12-04T08:53:08.4076650Z * [new branch] gh/drisspg/222/orig -> origin/gh/drisspg/222/orig 2025-12-04T08:53:08.4076722Z * [new branch] gh/drisspg/223/base -> origin/gh/drisspg/223/base 2025-12-04T08:53:08.4076793Z * [new branch] gh/drisspg/223/head -> origin/gh/drisspg/223/head 2025-12-04T08:53:08.4076863Z * [new branch] gh/drisspg/223/orig -> origin/gh/drisspg/223/orig 2025-12-04T08:53:08.4076937Z * [new branch] gh/drisspg/224/base -> origin/gh/drisspg/224/base 2025-12-04T08:53:08.4077006Z * [new branch] gh/drisspg/224/head -> origin/gh/drisspg/224/head 2025-12-04T08:53:08.4077077Z * [new branch] gh/drisspg/224/orig -> origin/gh/drisspg/224/orig 2025-12-04T08:53:08.4077154Z * [new branch] gh/drisspg/225/base -> origin/gh/drisspg/225/base 2025-12-04T08:53:08.4077223Z * [new branch] gh/drisspg/225/head -> origin/gh/drisspg/225/head 2025-12-04T08:53:08.4077292Z * [new branch] gh/drisspg/225/orig -> origin/gh/drisspg/225/orig 2025-12-04T08:53:08.4077364Z * [new branch] gh/drisspg/226/base -> origin/gh/drisspg/226/base 2025-12-04T08:53:08.4077434Z * [new branch] gh/drisspg/226/head -> origin/gh/drisspg/226/head 2025-12-04T08:53:08.4077503Z * [new branch] gh/drisspg/226/orig -> origin/gh/drisspg/226/orig 2025-12-04T08:53:08.4077579Z * [new branch] gh/drisspg/227/base -> origin/gh/drisspg/227/base 2025-12-04T08:53:08.4077649Z * [new branch] gh/drisspg/227/head -> origin/gh/drisspg/227/head 2025-12-04T08:53:08.4077719Z * [new branch] gh/drisspg/227/orig -> origin/gh/drisspg/227/orig 2025-12-04T08:53:08.4077791Z * [new branch] gh/drisspg/228/base -> origin/gh/drisspg/228/base 2025-12-04T08:53:08.4077862Z * [new branch] gh/drisspg/228/head -> origin/gh/drisspg/228/head 2025-12-04T08:53:08.4077933Z * [new branch] gh/drisspg/228/orig -> origin/gh/drisspg/228/orig 2025-12-04T08:53:08.4078004Z * [new branch] gh/drisspg/229/base -> origin/gh/drisspg/229/base 2025-12-04T08:53:08.4078101Z * [new branch] gh/drisspg/229/head -> origin/gh/drisspg/229/head 2025-12-04T08:53:08.4078173Z * [new branch] gh/drisspg/229/orig -> origin/gh/drisspg/229/orig 2025-12-04T08:53:08.4078278Z * [new branch] gh/drisspg/230/base -> origin/gh/drisspg/230/base 2025-12-04T08:53:08.4078348Z * [new branch] gh/drisspg/230/head -> origin/gh/drisspg/230/head 2025-12-04T08:53:08.4078420Z * [new branch] gh/drisspg/230/orig -> origin/gh/drisspg/230/orig 2025-12-04T08:53:08.4078493Z * [new branch] gh/dsjohns2/1/base -> origin/gh/dsjohns2/1/base 2025-12-04T08:53:08.4078564Z * [new branch] gh/dsjohns2/1/head -> origin/gh/dsjohns2/1/head 2025-12-04T08:53:08.4078647Z * [new branch] gh/dzmitry-huba/1/base -> origin/gh/dzmitry-huba/1/base 2025-12-04T08:53:08.4078723Z * [new branch] gh/dzmitry-huba/1/head -> origin/gh/dzmitry-huba/1/head 2025-12-04T08:53:08.4078802Z * [new branch] gh/dzmitry-huba/12/base -> origin/gh/dzmitry-huba/12/base 2025-12-04T08:53:08.4078885Z * [new branch] gh/dzmitry-huba/12/head -> origin/gh/dzmitry-huba/12/head 2025-12-04T08:53:08.4078962Z * [new branch] gh/dzmitry-huba/12/orig -> origin/gh/dzmitry-huba/12/orig 2025-12-04T08:53:08.4079037Z * [new branch] gh/dzmitry-huba/13/base -> origin/gh/dzmitry-huba/13/base 2025-12-04T08:53:08.4079116Z * [new branch] gh/dzmitry-huba/13/head -> origin/gh/dzmitry-huba/13/head 2025-12-04T08:53:08.4079191Z * [new branch] gh/dzmitry-huba/13/orig -> origin/gh/dzmitry-huba/13/orig 2025-12-04T08:53:08.4079267Z * [new branch] gh/dzmitry-huba/14/base -> origin/gh/dzmitry-huba/14/base 2025-12-04T08:53:08.4079345Z * [new branch] gh/dzmitry-huba/14/head -> origin/gh/dzmitry-huba/14/head 2025-12-04T08:53:08.4079421Z * [new branch] gh/dzmitry-huba/14/orig -> origin/gh/dzmitry-huba/14/orig 2025-12-04T08:53:08.4079497Z * [new branch] gh/dzmitry-huba/15/base -> origin/gh/dzmitry-huba/15/base 2025-12-04T08:53:08.4079576Z * [new branch] gh/dzmitry-huba/15/head -> origin/gh/dzmitry-huba/15/head 2025-12-04T08:53:08.4079651Z * [new branch] gh/dzmitry-huba/15/orig -> origin/gh/dzmitry-huba/15/orig 2025-12-04T08:53:08.4079733Z * [new branch] gh/dzmitry-huba/16/base -> origin/gh/dzmitry-huba/16/base 2025-12-04T08:53:08.4079809Z * [new branch] gh/dzmitry-huba/16/head -> origin/gh/dzmitry-huba/16/head 2025-12-04T08:53:08.4079885Z * [new branch] gh/dzmitry-huba/16/orig -> origin/gh/dzmitry-huba/16/orig 2025-12-04T08:53:08.4079963Z * [new branch] gh/dzmitry-huba/17/base -> origin/gh/dzmitry-huba/17/base 2025-12-04T08:53:08.4080037Z * [new branch] gh/dzmitry-huba/17/head -> origin/gh/dzmitry-huba/17/head 2025-12-04T08:53:08.4080111Z * [new branch] gh/dzmitry-huba/17/orig -> origin/gh/dzmitry-huba/17/orig 2025-12-04T08:53:08.4080194Z * [new branch] gh/dzmitry-huba/2/base -> origin/gh/dzmitry-huba/2/base 2025-12-04T08:53:08.4080271Z * [new branch] gh/dzmitry-huba/2/head -> origin/gh/dzmitry-huba/2/head 2025-12-04T08:53:08.4080346Z * [new branch] gh/dzmitry-huba/3/base -> origin/gh/dzmitry-huba/3/base 2025-12-04T08:53:08.4080428Z * [new branch] gh/dzmitry-huba/3/head -> origin/gh/dzmitry-huba/3/head 2025-12-04T08:53:08.4080505Z * [new branch] gh/eellison/808/base -> origin/gh/eellison/808/base 2025-12-04T08:53:08.4080581Z * [new branch] gh/eellison/808/head -> origin/gh/eellison/808/head 2025-12-04T08:53:08.4080657Z * [new branch] gh/eellison/808/orig -> origin/gh/eellison/808/orig 2025-12-04T08:53:08.4080729Z * [new branch] gh/eellison/822/base -> origin/gh/eellison/822/base 2025-12-04T08:53:08.4080801Z * [new branch] gh/eellison/822/head -> origin/gh/eellison/822/head 2025-12-04T08:53:08.4080907Z * [new branch] gh/eellison/822/orig -> origin/gh/eellison/822/orig 2025-12-04T08:53:08.4081006Z * [new branch] gh/eellison/823/base -> origin/gh/eellison/823/base 2025-12-04T08:53:08.4081077Z * [new branch] gh/eellison/823/head -> origin/gh/eellison/823/head 2025-12-04T08:53:08.4081152Z * [new branch] gh/eellison/823/orig -> origin/gh/eellison/823/orig 2025-12-04T08:53:08.4081225Z * [new branch] gh/eellison/862/base -> origin/gh/eellison/862/base 2025-12-04T08:53:08.4081297Z * [new branch] gh/eellison/862/head -> origin/gh/eellison/862/head 2025-12-04T08:53:08.4081370Z * [new branch] gh/eellison/862/orig -> origin/gh/eellison/862/orig 2025-12-04T08:53:08.4081440Z * [new branch] gh/eellison/863/base -> origin/gh/eellison/863/base 2025-12-04T08:53:08.4081516Z * [new branch] gh/eellison/863/head -> origin/gh/eellison/863/head 2025-12-04T08:53:08.4081586Z * [new branch] gh/eellison/863/orig -> origin/gh/eellison/863/orig 2025-12-04T08:53:08.4081658Z * [new branch] gh/eellison/864/base -> origin/gh/eellison/864/base 2025-12-04T08:53:08.4081734Z * [new branch] gh/eellison/864/head -> origin/gh/eellison/864/head 2025-12-04T08:53:08.4081806Z * [new branch] gh/eellison/864/orig -> origin/gh/eellison/864/orig 2025-12-04T08:53:08.4081877Z * [new branch] gh/eellison/865/base -> origin/gh/eellison/865/base 2025-12-04T08:53:08.4081950Z * [new branch] gh/eellison/865/head -> origin/gh/eellison/865/head 2025-12-04T08:53:08.4082022Z * [new branch] gh/eellison/865/orig -> origin/gh/eellison/865/orig 2025-12-04T08:53:08.4082092Z * [new branch] gh/eellison/866/base -> origin/gh/eellison/866/base 2025-12-04T08:53:08.4082166Z * [new branch] gh/eellison/866/head -> origin/gh/eellison/866/head 2025-12-04T08:53:08.4082236Z * [new branch] gh/eellison/866/orig -> origin/gh/eellison/866/orig 2025-12-04T08:53:08.4082308Z * [new branch] gh/eellison/867/base -> origin/gh/eellison/867/base 2025-12-04T08:53:08.4082384Z * [new branch] gh/eellison/867/head -> origin/gh/eellison/867/head 2025-12-04T08:53:08.4082455Z * [new branch] gh/eellison/867/orig -> origin/gh/eellison/867/orig 2025-12-04T08:53:08.4082525Z * [new branch] gh/eellison/868/base -> origin/gh/eellison/868/base 2025-12-04T08:53:08.4082597Z * [new branch] gh/eellison/868/head -> origin/gh/eellison/868/head 2025-12-04T08:53:08.4082668Z * [new branch] gh/eellison/868/orig -> origin/gh/eellison/868/orig 2025-12-04T08:53:08.4082739Z * [new branch] gh/eellison/869/base -> origin/gh/eellison/869/base 2025-12-04T08:53:08.4082813Z * [new branch] gh/eellison/869/head -> origin/gh/eellison/869/head 2025-12-04T08:53:08.4082883Z * [new branch] gh/eellison/869/orig -> origin/gh/eellison/869/orig 2025-12-04T08:53:08.4082960Z * [new branch] gh/eellison/870/base -> origin/gh/eellison/870/base 2025-12-04T08:53:08.4083031Z * [new branch] gh/eellison/870/head -> origin/gh/eellison/870/head 2025-12-04T08:53:08.4083101Z * [new branch] gh/eellison/870/orig -> origin/gh/eellison/870/orig 2025-12-04T08:53:08.4083174Z * [new branch] gh/eellison/871/base -> origin/gh/eellison/871/base 2025-12-04T08:53:08.4083243Z * [new branch] gh/eellison/871/head -> origin/gh/eellison/871/head 2025-12-04T08:53:08.4083354Z * [new branch] gh/eellison/871/orig -> origin/gh/eellison/871/orig 2025-12-04T08:53:08.4083429Z * [new branch] gh/eellison/872/base -> origin/gh/eellison/872/base 2025-12-04T08:53:08.4083543Z * [new branch] gh/eellison/872/head -> origin/gh/eellison/872/head 2025-12-04T08:53:08.4083613Z * [new branch] gh/eellison/872/orig -> origin/gh/eellison/872/orig 2025-12-04T08:53:08.4083721Z * [new branch] gh/eellison/873/base -> origin/gh/eellison/873/base 2025-12-04T08:53:08.4083793Z * [new branch] gh/eellison/873/head -> origin/gh/eellison/873/head 2025-12-04T08:53:08.4083865Z * [new branch] gh/eellison/873/orig -> origin/gh/eellison/873/orig 2025-12-04T08:53:08.4083938Z * [new branch] gh/eellison/874/base -> origin/gh/eellison/874/base 2025-12-04T08:53:08.4084010Z * [new branch] gh/eellison/874/head -> origin/gh/eellison/874/head 2025-12-04T08:53:08.4084080Z * [new branch] gh/eellison/874/orig -> origin/gh/eellison/874/orig 2025-12-04T08:53:08.4084155Z * [new branch] gh/eellison/875/base -> origin/gh/eellison/875/base 2025-12-04T08:53:08.4084229Z * [new branch] gh/eellison/875/head -> origin/gh/eellison/875/head 2025-12-04T08:53:08.4084302Z * [new branch] gh/eellison/875/orig -> origin/gh/eellison/875/orig 2025-12-04T08:53:08.4084376Z * [new branch] gh/eellison/876/base -> origin/gh/eellison/876/base 2025-12-04T08:53:08.4084448Z * [new branch] gh/eellison/876/head -> origin/gh/eellison/876/head 2025-12-04T08:53:08.4084521Z * [new branch] gh/eellison/876/orig -> origin/gh/eellison/876/orig 2025-12-04T08:53:08.4084591Z * [new branch] gh/eellison/877/base -> origin/gh/eellison/877/base 2025-12-04T08:53:08.4084662Z * [new branch] gh/eellison/877/head -> origin/gh/eellison/877/head 2025-12-04T08:53:08.4084737Z * [new branch] gh/eellison/877/orig -> origin/gh/eellison/877/orig 2025-12-04T08:53:08.4084808Z * [new branch] gh/eellison/878/base -> origin/gh/eellison/878/base 2025-12-04T08:53:08.4084882Z * [new branch] gh/eellison/878/head -> origin/gh/eellison/878/head 2025-12-04T08:53:08.4084957Z * [new branch] gh/eellison/878/orig -> origin/gh/eellison/878/orig 2025-12-04T08:53:08.4085029Z * [new branch] gh/eellison/879/base -> origin/gh/eellison/879/base 2025-12-04T08:53:08.4085101Z * [new branch] gh/eellison/879/head -> origin/gh/eellison/879/head 2025-12-04T08:53:08.4085177Z * [new branch] gh/eellison/879/orig -> origin/gh/eellison/879/orig 2025-12-04T08:53:08.4085249Z * [new branch] gh/eellison/880/base -> origin/gh/eellison/880/base 2025-12-04T08:53:08.4085320Z * [new branch] gh/eellison/880/head -> origin/gh/eellison/880/head 2025-12-04T08:53:08.4085394Z * [new branch] gh/eellison/880/orig -> origin/gh/eellison/880/orig 2025-12-04T08:53:08.4085465Z * [new branch] gh/eellison/881/base -> origin/gh/eellison/881/base 2025-12-04T08:53:08.4085537Z * [new branch] gh/eellison/881/head -> origin/gh/eellison/881/head 2025-12-04T08:53:08.4085613Z * [new branch] gh/eellison/881/orig -> origin/gh/eellison/881/orig 2025-12-04T08:53:08.4085685Z * [new branch] gh/eellison/882/base -> origin/gh/eellison/882/base 2025-12-04T08:53:08.4085755Z * [new branch] gh/eellison/882/head -> origin/gh/eellison/882/head 2025-12-04T08:53:08.4085831Z * [new branch] gh/eellison/882/orig -> origin/gh/eellison/882/orig 2025-12-04T08:53:08.4085903Z * [new branch] gh/eellison/883/base -> origin/gh/eellison/883/base 2025-12-04T08:53:08.4085976Z * [new branch] gh/eellison/883/head -> origin/gh/eellison/883/head 2025-12-04T08:53:08.4086046Z * [new branch] gh/eellison/883/orig -> origin/gh/eellison/883/orig 2025-12-04T08:53:08.4086116Z * [new branch] gh/eellison/884/base -> origin/gh/eellison/884/base 2025-12-04T08:53:08.4086222Z * [new branch] gh/eellison/884/head -> origin/gh/eellison/884/head 2025-12-04T08:53:08.4086325Z * [new branch] gh/eellison/884/orig -> origin/gh/eellison/884/orig 2025-12-04T08:53:08.4086394Z * [new branch] gh/etaf/147/base -> origin/gh/etaf/147/base 2025-12-04T08:53:08.4086464Z * [new branch] gh/etaf/147/head -> origin/gh/etaf/147/head 2025-12-04T08:53:08.4086529Z * [new branch] gh/etaf/154/base -> origin/gh/etaf/154/base 2025-12-04T08:53:08.4086596Z * [new branch] gh/etaf/154/head -> origin/gh/etaf/154/head 2025-12-04T08:53:08.4086665Z * [new branch] gh/etaf/154/orig -> origin/gh/etaf/154/orig 2025-12-04T08:53:08.4086732Z * [new branch] gh/etaf/156/base -> origin/gh/etaf/156/base 2025-12-04T08:53:08.4086798Z * [new branch] gh/etaf/156/head -> origin/gh/etaf/156/head 2025-12-04T08:53:08.4086868Z * [new branch] gh/etaf/156/orig -> origin/gh/etaf/156/orig 2025-12-04T08:53:08.4086933Z * [new branch] gh/etaf/157/base -> origin/gh/etaf/157/base 2025-12-04T08:53:08.4086998Z * [new branch] gh/etaf/157/head -> origin/gh/etaf/157/head 2025-12-04T08:53:08.4087065Z * [new branch] gh/etaf/157/orig -> origin/gh/etaf/157/orig 2025-12-04T08:53:08.4087130Z * [new branch] gh/etaf/158/base -> origin/gh/etaf/158/base 2025-12-04T08:53:08.4087194Z * [new branch] gh/etaf/158/head -> origin/gh/etaf/158/head 2025-12-04T08:53:08.4087262Z * [new branch] gh/etaf/158/orig -> origin/gh/etaf/158/orig 2025-12-04T08:53:08.4087326Z * [new branch] gh/etaf/159/base -> origin/gh/etaf/159/base 2025-12-04T08:53:08.4087389Z * [new branch] gh/etaf/159/head -> origin/gh/etaf/159/head 2025-12-04T08:53:08.4087461Z * [new branch] gh/etaf/159/orig -> origin/gh/etaf/159/orig 2025-12-04T08:53:08.4087525Z * [new branch] gh/etaf/160/base -> origin/gh/etaf/160/base 2025-12-04T08:53:08.4087596Z * [new branch] gh/etaf/160/head -> origin/gh/etaf/160/head 2025-12-04T08:53:08.4087661Z * [new branch] gh/etaf/160/orig -> origin/gh/etaf/160/orig 2025-12-04T08:53:08.4087726Z * [new branch] gh/etaf/161/base -> origin/gh/etaf/161/base 2025-12-04T08:53:08.4087793Z * [new branch] gh/etaf/161/head -> origin/gh/etaf/161/head 2025-12-04T08:53:08.4087858Z * [new branch] gh/etaf/161/orig -> origin/gh/etaf/161/orig 2025-12-04T08:53:08.4087922Z * [new branch] gh/etaf/166/base -> origin/gh/etaf/166/base 2025-12-04T08:53:08.4087990Z * [new branch] gh/etaf/166/head -> origin/gh/etaf/166/head 2025-12-04T08:53:08.4088058Z * [new branch] gh/etaf/166/orig -> origin/gh/etaf/166/orig 2025-12-04T08:53:08.4088125Z * [new branch] gh/etaf/167/base -> origin/gh/etaf/167/base 2025-12-04T08:53:08.4088193Z * [new branch] gh/etaf/167/head -> origin/gh/etaf/167/head 2025-12-04T08:53:08.4088259Z * [new branch] gh/etaf/167/orig -> origin/gh/etaf/167/orig 2025-12-04T08:53:08.4088323Z * [new branch] gh/etaf/168/base -> origin/gh/etaf/168/base 2025-12-04T08:53:08.4088390Z * [new branch] gh/etaf/168/head -> origin/gh/etaf/168/head 2025-12-04T08:53:08.4088455Z * [new branch] gh/etaf/168/orig -> origin/gh/etaf/168/orig 2025-12-04T08:53:08.4088521Z * [new branch] gh/etaf/172/base -> origin/gh/etaf/172/base 2025-12-04T08:53:08.4088592Z * [new branch] gh/etaf/172/head -> origin/gh/etaf/172/head 2025-12-04T08:53:08.4088656Z * [new branch] gh/etaf/172/orig -> origin/gh/etaf/172/orig 2025-12-04T08:53:08.4088755Z * [new branch] gh/etaf/173/base -> origin/gh/etaf/173/base 2025-12-04T08:53:08.4088854Z * [new branch] gh/etaf/173/head -> origin/gh/etaf/173/head 2025-12-04T08:53:08.4088919Z * [new branch] gh/etaf/173/orig -> origin/gh/etaf/173/orig 2025-12-04T08:53:08.4088984Z * [new branch] gh/etaf/174/base -> origin/gh/etaf/174/base 2025-12-04T08:53:08.4089051Z * [new branch] gh/etaf/174/head -> origin/gh/etaf/174/head 2025-12-04T08:53:08.4089116Z * [new branch] gh/etaf/175/base -> origin/gh/etaf/175/base 2025-12-04T08:53:08.4089180Z * [new branch] gh/etaf/175/head -> origin/gh/etaf/175/head 2025-12-04T08:53:08.4089248Z * [new branch] gh/etaf/175/orig -> origin/gh/etaf/175/orig 2025-12-04T08:53:08.4089313Z * [new branch] gh/etaf/176/base -> origin/gh/etaf/176/base 2025-12-04T08:53:08.4089384Z * [new branch] gh/etaf/176/head -> origin/gh/etaf/176/head 2025-12-04T08:53:08.4089453Z * [new branch] gh/etaf/176/orig -> origin/gh/etaf/176/orig 2025-12-04T08:53:08.4089516Z * [new branch] gh/etaf/177/base -> origin/gh/etaf/177/base 2025-12-04T08:53:08.4089585Z * [new branch] gh/etaf/177/head -> origin/gh/etaf/177/head 2025-12-04T08:53:08.4089649Z * [new branch] gh/etaf/177/orig -> origin/gh/etaf/177/orig 2025-12-04T08:53:08.4089713Z * [new branch] gh/etaf/178/base -> origin/gh/etaf/178/base 2025-12-04T08:53:08.4089780Z * [new branch] gh/etaf/178/head -> origin/gh/etaf/178/head 2025-12-04T08:53:08.4089846Z * [new branch] gh/etaf/178/orig -> origin/gh/etaf/178/orig 2025-12-04T08:53:08.4089911Z * [new branch] gh/etaf/179/base -> origin/gh/etaf/179/base 2025-12-04T08:53:08.4089979Z * [new branch] gh/etaf/179/head -> origin/gh/etaf/179/head 2025-12-04T08:53:08.4090047Z * [new branch] gh/etaf/179/orig -> origin/gh/etaf/179/orig 2025-12-04T08:53:08.4090115Z * [new branch] gh/etaf/180/base -> origin/gh/etaf/180/base 2025-12-04T08:53:08.4090185Z * [new branch] gh/etaf/180/head -> origin/gh/etaf/180/head 2025-12-04T08:53:08.4090249Z * [new branch] gh/etaf/180/orig -> origin/gh/etaf/180/orig 2025-12-04T08:53:08.4090330Z * [new branch] gh/exclamaforte/1/base -> origin/gh/exclamaforte/1/base 2025-12-04T08:53:08.4090410Z * [new branch] gh/exclamaforte/1/head -> origin/gh/exclamaforte/1/head 2025-12-04T08:53:08.4090486Z * [new branch] gh/exclamaforte/2/base -> origin/gh/exclamaforte/2/base 2025-12-04T08:53:08.4090562Z * [new branch] gh/exclamaforte/2/head -> origin/gh/exclamaforte/2/head 2025-12-04T08:53:08.4090641Z * [new branch] gh/exclamaforte/3/base -> origin/gh/exclamaforte/3/base 2025-12-04T08:53:08.4090718Z * [new branch] gh/exclamaforte/3/head -> origin/gh/exclamaforte/3/head 2025-12-04T08:53:08.4090793Z * [new branch] gh/exclamaforte/4/base -> origin/gh/exclamaforte/4/base 2025-12-04T08:53:08.4090870Z * [new branch] gh/exclamaforte/4/head -> origin/gh/exclamaforte/4/head 2025-12-04T08:53:08.4090943Z * [new branch] gh/ezyang/2374/base -> origin/gh/ezyang/2374/base 2025-12-04T08:53:08.4091017Z * [new branch] gh/ezyang/2374/head -> origin/gh/ezyang/2374/head 2025-12-04T08:53:08.4091087Z * [new branch] gh/ezyang/2374/orig -> origin/gh/ezyang/2374/orig 2025-12-04T08:53:08.4091155Z * [new branch] gh/ezyang/2973/base -> origin/gh/ezyang/2973/base 2025-12-04T08:53:08.4091227Z * [new branch] gh/ezyang/2973/head -> origin/gh/ezyang/2973/head 2025-12-04T08:53:08.4091341Z * [new branch] gh/ezyang/2973/orig -> origin/gh/ezyang/2973/orig 2025-12-04T08:53:08.4091434Z * [new branch] gh/ezyang/2974/base -> origin/gh/ezyang/2974/base 2025-12-04T08:53:08.4091505Z * [new branch] gh/ezyang/2974/head -> origin/gh/ezyang/2974/head 2025-12-04T08:53:08.4091575Z * [new branch] gh/ezyang/2974/orig -> origin/gh/ezyang/2974/orig 2025-12-04T08:53:08.4091645Z * [new branch] gh/ezyang/3131/base -> origin/gh/ezyang/3131/base 2025-12-04T08:53:08.4091716Z * [new branch] gh/ezyang/3131/head -> origin/gh/ezyang/3131/head 2025-12-04T08:53:08.4091784Z * [new branch] gh/ezyang/3131/orig -> origin/gh/ezyang/3131/orig 2025-12-04T08:53:08.4091853Z * [new branch] gh/ezyang/3139/base -> origin/gh/ezyang/3139/base 2025-12-04T08:53:08.4091924Z * [new branch] gh/ezyang/3139/head -> origin/gh/ezyang/3139/head 2025-12-04T08:53:08.4091992Z * [new branch] gh/ezyang/3139/orig -> origin/gh/ezyang/3139/orig 2025-12-04T08:53:08.4092060Z * [new branch] gh/ezyang/3140/base -> origin/gh/ezyang/3140/base 2025-12-04T08:53:08.4092131Z * [new branch] gh/ezyang/3140/head -> origin/gh/ezyang/3140/head 2025-12-04T08:53:08.4092200Z * [new branch] gh/ezyang/3140/orig -> origin/gh/ezyang/3140/orig 2025-12-04T08:53:08.4092271Z * [new branch] gh/ezyang/3143/base -> origin/gh/ezyang/3143/base 2025-12-04T08:53:08.4092343Z * [new branch] gh/ezyang/3143/head -> origin/gh/ezyang/3143/head 2025-12-04T08:53:08.4092411Z * [new branch] gh/ezyang/3143/orig -> origin/gh/ezyang/3143/orig 2025-12-04T08:53:08.4092479Z * [new branch] gh/ezyang/3144/base -> origin/gh/ezyang/3144/base 2025-12-04T08:53:08.4092549Z * [new branch] gh/ezyang/3144/head -> origin/gh/ezyang/3144/head 2025-12-04T08:53:08.4092620Z * [new branch] gh/ezyang/3144/orig -> origin/gh/ezyang/3144/orig 2025-12-04T08:53:08.4092691Z * [new branch] gh/ezyang/3167/base -> origin/gh/ezyang/3167/base 2025-12-04T08:53:08.4092759Z * [new branch] gh/ezyang/3167/head -> origin/gh/ezyang/3167/head 2025-12-04T08:53:08.4092828Z * [new branch] gh/ezyang/3167/orig -> origin/gh/ezyang/3167/orig 2025-12-04T08:53:08.4092902Z * [new branch] gh/ezyang/3173/base -> origin/gh/ezyang/3173/base 2025-12-04T08:53:08.4092970Z * [new branch] gh/ezyang/3173/head -> origin/gh/ezyang/3173/head 2025-12-04T08:53:08.4093038Z * [new branch] gh/ezyang/3173/orig -> origin/gh/ezyang/3173/orig 2025-12-04T08:53:08.4093111Z * [new branch] gh/ezyang/3175/base -> origin/gh/ezyang/3175/base 2025-12-04T08:53:08.4093179Z * [new branch] gh/ezyang/3175/head -> origin/gh/ezyang/3175/head 2025-12-04T08:53:08.4093296Z * [new branch] gh/ezyang/3175/orig -> origin/gh/ezyang/3175/orig 2025-12-04T08:53:08.4093372Z * [new branch] gh/ezyang/3182/base -> origin/gh/ezyang/3182/base 2025-12-04T08:53:08.4093441Z * [new branch] gh/ezyang/3182/head -> origin/gh/ezyang/3182/head 2025-12-04T08:53:08.4093509Z * [new branch] gh/ezyang/3182/orig -> origin/gh/ezyang/3182/orig 2025-12-04T08:53:08.4093581Z * [new branch] gh/ezyang/3185/base -> origin/gh/ezyang/3185/base 2025-12-04T08:53:08.4093649Z * [new branch] gh/ezyang/3185/head -> origin/gh/ezyang/3185/head 2025-12-04T08:53:08.4093719Z * [new branch] gh/ezyang/3185/orig -> origin/gh/ezyang/3185/orig 2025-12-04T08:53:08.4093789Z * [new branch] gh/ezyang/3189/base -> origin/gh/ezyang/3189/base 2025-12-04T08:53:08.4093857Z * [new branch] gh/ezyang/3189/head -> origin/gh/ezyang/3189/head 2025-12-04T08:53:08.4093974Z * [new branch] gh/ezyang/3189/orig -> origin/gh/ezyang/3189/orig 2025-12-04T08:53:08.4094087Z * [new branch] gh/ezyang/3191/base -> origin/gh/ezyang/3191/base 2025-12-04T08:53:08.4094157Z * [new branch] gh/ezyang/3191/head -> origin/gh/ezyang/3191/head 2025-12-04T08:53:08.4094227Z * [new branch] gh/ezyang/3191/orig -> origin/gh/ezyang/3191/orig 2025-12-04T08:53:08.4094300Z * [new branch] gh/ezyang/3192/base -> origin/gh/ezyang/3192/base 2025-12-04T08:53:08.4094370Z * [new branch] gh/ezyang/3192/head -> origin/gh/ezyang/3192/head 2025-12-04T08:53:08.4094441Z * [new branch] gh/ezyang/3192/orig -> origin/gh/ezyang/3192/orig 2025-12-04T08:53:08.4094510Z * [new branch] gh/ezyang/3193/base -> origin/gh/ezyang/3193/base 2025-12-04T08:53:08.4094579Z * [new branch] gh/ezyang/3193/head -> origin/gh/ezyang/3193/head 2025-12-04T08:53:08.4094654Z * [new branch] gh/ezyang/3193/orig -> origin/gh/ezyang/3193/orig 2025-12-04T08:53:08.4094726Z * [new branch] gh/ezyang/3194/base -> origin/gh/ezyang/3194/base 2025-12-04T08:53:08.4094794Z * [new branch] gh/ezyang/3194/head -> origin/gh/ezyang/3194/head 2025-12-04T08:53:08.4094865Z * [new branch] gh/ezyang/3194/orig -> origin/gh/ezyang/3194/orig 2025-12-04T08:53:08.4094935Z * [new branch] gh/ezyang/3195/base -> origin/gh/ezyang/3195/base 2025-12-04T08:53:08.4095005Z * [new branch] gh/ezyang/3195/head -> origin/gh/ezyang/3195/head 2025-12-04T08:53:08.4095076Z * [new branch] gh/ezyang/3195/orig -> origin/gh/ezyang/3195/orig 2025-12-04T08:53:08.4095144Z * [new branch] gh/ezyang/3196/base -> origin/gh/ezyang/3196/base 2025-12-04T08:53:08.4095212Z * [new branch] gh/ezyang/3196/head -> origin/gh/ezyang/3196/head 2025-12-04T08:53:08.4095284Z * [new branch] gh/ezyang/3196/orig -> origin/gh/ezyang/3196/orig 2025-12-04T08:53:08.4095353Z * [new branch] gh/ezyang/3197/base -> origin/gh/ezyang/3197/base 2025-12-04T08:53:08.4095421Z * [new branch] gh/ezyang/3197/head -> origin/gh/ezyang/3197/head 2025-12-04T08:53:08.4095492Z * [new branch] gh/ezyang/3197/orig -> origin/gh/ezyang/3197/orig 2025-12-04T08:53:08.4095563Z * [new branch] gh/ezyang/3198/base -> origin/gh/ezyang/3198/base 2025-12-04T08:53:08.4095633Z * [new branch] gh/ezyang/3198/head -> origin/gh/ezyang/3198/head 2025-12-04T08:53:08.4095703Z * [new branch] gh/ezyang/3198/orig -> origin/gh/ezyang/3198/orig 2025-12-04T08:53:08.4095772Z * [new branch] gh/ezyang/3199/base -> origin/gh/ezyang/3199/base 2025-12-04T08:53:08.4095841Z * [new branch] gh/ezyang/3199/head -> origin/gh/ezyang/3199/head 2025-12-04T08:53:08.4095913Z * [new branch] gh/ezyang/3199/orig -> origin/gh/ezyang/3199/orig 2025-12-04T08:53:08.4095982Z * [new branch] gh/ezyang/3200/base -> origin/gh/ezyang/3200/base 2025-12-04T08:53:08.4096051Z * [new branch] gh/ezyang/3200/head -> origin/gh/ezyang/3200/head 2025-12-04T08:53:08.4096121Z * [new branch] gh/ezyang/3200/orig -> origin/gh/ezyang/3200/orig 2025-12-04T08:53:08.4096190Z * [new branch] gh/ezyang/3201/base -> origin/gh/ezyang/3201/base 2025-12-04T08:53:08.4096264Z * [new branch] gh/ezyang/3201/head -> origin/gh/ezyang/3201/head 2025-12-04T08:53:08.4096332Z * [new branch] gh/ezyang/3201/orig -> origin/gh/ezyang/3201/orig 2025-12-04T08:53:08.4096401Z * [new branch] gh/ezyang/3202/base -> origin/gh/ezyang/3202/base 2025-12-04T08:53:08.4096471Z * [new branch] gh/ezyang/3202/head -> origin/gh/ezyang/3202/head 2025-12-04T08:53:08.4096575Z * [new branch] gh/ezyang/3202/orig -> origin/gh/ezyang/3202/orig 2025-12-04T08:53:08.4096674Z * [new branch] gh/ezyang/3203/base -> origin/gh/ezyang/3203/base 2025-12-04T08:53:08.4096747Z * [new branch] gh/ezyang/3203/head -> origin/gh/ezyang/3203/head 2025-12-04T08:53:08.4096816Z * [new branch] gh/ezyang/3203/orig -> origin/gh/ezyang/3203/orig 2025-12-04T08:53:08.4096884Z * [new branch] gh/ezyang/3204/base -> origin/gh/ezyang/3204/base 2025-12-04T08:53:08.4096958Z * [new branch] gh/ezyang/3204/head -> origin/gh/ezyang/3204/head 2025-12-04T08:53:08.4097027Z * [new branch] gh/ezyang/3204/orig -> origin/gh/ezyang/3204/orig 2025-12-04T08:53:08.4097097Z * [new branch] gh/ezyang/3205/base -> origin/gh/ezyang/3205/base 2025-12-04T08:53:08.4097168Z * [new branch] gh/ezyang/3205/head -> origin/gh/ezyang/3205/head 2025-12-04T08:53:08.4097238Z * [new branch] gh/ezyang/3205/orig -> origin/gh/ezyang/3205/orig 2025-12-04T08:53:08.4097308Z * [new branch] gh/ezyang/3206/base -> origin/gh/ezyang/3206/base 2025-12-04T08:53:08.4097381Z * [new branch] gh/ezyang/3206/head -> origin/gh/ezyang/3206/head 2025-12-04T08:53:08.4097450Z * [new branch] gh/ezyang/3206/orig -> origin/gh/ezyang/3206/orig 2025-12-04T08:53:08.4097520Z * [new branch] gh/ezyang/3207/base -> origin/gh/ezyang/3207/base 2025-12-04T08:53:08.4116361Z * [new branch] gh/ezyang/3207/head -> origin/gh/ezyang/3207/head 2025-12-04T08:53:08.4116482Z * [new branch] gh/ezyang/3207/orig -> origin/gh/ezyang/3207/orig 2025-12-04T08:53:08.4116566Z * [new branch] gh/ezyang/3208/base -> origin/gh/ezyang/3208/base 2025-12-04T08:53:08.4116642Z * [new branch] gh/ezyang/3208/head -> origin/gh/ezyang/3208/head 2025-12-04T08:53:08.4116718Z * [new branch] gh/ezyang/3208/orig -> origin/gh/ezyang/3208/orig 2025-12-04T08:53:08.4116793Z * [new branch] gh/ezyang/3209/base -> origin/gh/ezyang/3209/base 2025-12-04T08:53:08.4116870Z * [new branch] gh/ezyang/3209/head -> origin/gh/ezyang/3209/head 2025-12-04T08:53:08.4116944Z * [new branch] gh/ezyang/3209/orig -> origin/gh/ezyang/3209/orig 2025-12-04T08:53:08.4117025Z * [new branch] gh/fadara01/3/base -> origin/gh/fadara01/3/base 2025-12-04T08:53:08.4117103Z * [new branch] gh/fadara01/3/head -> origin/gh/fadara01/3/head 2025-12-04T08:53:08.4117176Z * [new branch] gh/fadara01/3/orig -> origin/gh/fadara01/3/orig 2025-12-04T08:53:08.4117252Z * [new branch] gh/fadara01/5/base -> origin/gh/fadara01/5/base 2025-12-04T08:53:08.4117322Z * [new branch] gh/fadara01/5/head -> origin/gh/fadara01/5/head 2025-12-04T08:53:08.4117392Z * [new branch] gh/fadara01/5/orig -> origin/gh/fadara01/5/orig 2025-12-04T08:53:08.4117467Z * [new branch] gh/fadara01/6/base -> origin/gh/fadara01/6/base 2025-12-04T08:53:08.4117539Z * [new branch] gh/fadara01/6/head -> origin/gh/fadara01/6/head 2025-12-04T08:53:08.4117608Z * [new branch] gh/fadara01/6/orig -> origin/gh/fadara01/6/orig 2025-12-04T08:53:08.4117686Z * [new branch] gh/fadara01/7/base -> origin/gh/fadara01/7/base 2025-12-04T08:53:08.4117757Z * [new branch] gh/fadara01/7/head -> origin/gh/fadara01/7/head 2025-12-04T08:53:08.4117826Z * [new branch] gh/fadara01/7/orig -> origin/gh/fadara01/7/orig 2025-12-04T08:53:08.4117900Z * [new branch] gh/fadara01/8/base -> origin/gh/fadara01/8/base 2025-12-04T08:53:08.4117975Z * [new branch] gh/fadara01/8/head -> origin/gh/fadara01/8/head 2025-12-04T08:53:08.4118123Z * [new branch] gh/fadara01/8/orig -> origin/gh/fadara01/8/orig 2025-12-04T08:53:08.4118240Z * [new branch] gh/fadara01/9/base -> origin/gh/fadara01/9/base 2025-12-04T08:53:08.4118313Z * [new branch] gh/fadara01/9/head -> origin/gh/fadara01/9/head 2025-12-04T08:53:08.4118383Z * [new branch] gh/fadara01/9/orig -> origin/gh/fadara01/9/orig 2025-12-04T08:53:08.4118464Z * [new branch] gh/fduwjj/182/base -> origin/gh/fduwjj/182/base 2025-12-04T08:53:08.4118534Z * [new branch] gh/fduwjj/182/head -> origin/gh/fduwjj/182/head 2025-12-04T08:53:08.4118605Z * [new branch] gh/fduwjj/182/orig -> origin/gh/fduwjj/182/orig 2025-12-04T08:53:08.4118677Z * [new branch] gh/fduwjj/211/base -> origin/gh/fduwjj/211/base 2025-12-04T08:53:08.4118743Z * [new branch] gh/fduwjj/211/head -> origin/gh/fduwjj/211/head 2025-12-04T08:53:08.4118818Z * [new branch] gh/fduwjj/211/orig -> origin/gh/fduwjj/211/orig 2025-12-04T08:53:08.4118891Z * [new branch] gh/fduwjj/212/base -> origin/gh/fduwjj/212/base 2025-12-04T08:53:08.4118960Z * [new branch] gh/fduwjj/212/head -> origin/gh/fduwjj/212/head 2025-12-04T08:53:08.4119033Z * [new branch] gh/fduwjj/212/orig -> origin/gh/fduwjj/212/orig 2025-12-04T08:53:08.4119105Z * [new branch] gh/fduwjj/213/base -> origin/gh/fduwjj/213/base 2025-12-04T08:53:08.4119177Z * [new branch] gh/fduwjj/213/head -> origin/gh/fduwjj/213/head 2025-12-04T08:53:08.4119250Z * [new branch] gh/fduwjj/213/orig -> origin/gh/fduwjj/213/orig 2025-12-04T08:53:08.4119320Z * [new branch] gh/fduwjj/226/base -> origin/gh/fduwjj/226/base 2025-12-04T08:53:08.4119388Z * [new branch] gh/fduwjj/226/head -> origin/gh/fduwjj/226/head 2025-12-04T08:53:08.4119463Z * [new branch] gh/fduwjj/226/orig -> origin/gh/fduwjj/226/orig 2025-12-04T08:53:08.4119534Z * [new branch] gh/fduwjj/229/base -> origin/gh/fduwjj/229/base 2025-12-04T08:53:08.4119607Z * [new branch] gh/fduwjj/229/head -> origin/gh/fduwjj/229/head 2025-12-04T08:53:08.4119681Z * [new branch] gh/fduwjj/229/orig -> origin/gh/fduwjj/229/orig 2025-12-04T08:53:08.4119750Z * [new branch] gh/fduwjj/233/base -> origin/gh/fduwjj/233/base 2025-12-04T08:53:08.4119819Z * [new branch] gh/fduwjj/233/head -> origin/gh/fduwjj/233/head 2025-12-04T08:53:08.4119893Z * [new branch] gh/fduwjj/233/orig -> origin/gh/fduwjj/233/orig 2025-12-04T08:53:08.4119961Z * [new branch] gh/fduwjj/234/base -> origin/gh/fduwjj/234/base 2025-12-04T08:53:08.4120031Z * [new branch] gh/fduwjj/234/head -> origin/gh/fduwjj/234/head 2025-12-04T08:53:08.4120107Z * [new branch] gh/fduwjj/234/orig -> origin/gh/fduwjj/234/orig 2025-12-04T08:53:08.4120181Z * [new branch] gh/fduwjj/235/base -> origin/gh/fduwjj/235/base 2025-12-04T08:53:08.4120251Z * [new branch] gh/fduwjj/235/head -> origin/gh/fduwjj/235/head 2025-12-04T08:53:08.4120323Z * [new branch] gh/fduwjj/235/orig -> origin/gh/fduwjj/235/orig 2025-12-04T08:53:08.4120391Z * [new branch] gh/fduwjj/236/base -> origin/gh/fduwjj/236/base 2025-12-04T08:53:08.4120458Z * [new branch] gh/fduwjj/236/head -> origin/gh/fduwjj/236/head 2025-12-04T08:53:08.4120528Z * [new branch] gh/fduwjj/236/orig -> origin/gh/fduwjj/236/orig 2025-12-04T08:53:08.4120600Z * [new branch] gh/fduwjj/237/base -> origin/gh/fduwjj/237/base 2025-12-04T08:53:08.4120674Z * [new branch] gh/fduwjj/237/head -> origin/gh/fduwjj/237/head 2025-12-04T08:53:08.4120771Z * [new branch] gh/fduwjj/237/orig -> origin/gh/fduwjj/237/orig 2025-12-04T08:53:08.4120839Z * [new branch] gh/fduwjj/238/base -> origin/gh/fduwjj/238/base 2025-12-04T08:53:08.4120940Z * [new branch] gh/fduwjj/238/head -> origin/gh/fduwjj/238/head 2025-12-04T08:53:08.4121010Z * [new branch] gh/fduwjj/238/orig -> origin/gh/fduwjj/238/orig 2025-12-04T08:53:08.4121078Z * [new branch] gh/fduwjj/239/base -> origin/gh/fduwjj/239/base 2025-12-04T08:53:08.4121151Z * [new branch] gh/fduwjj/239/head -> origin/gh/fduwjj/239/head 2025-12-04T08:53:08.4121219Z * [new branch] gh/fduwjj/239/orig -> origin/gh/fduwjj/239/orig 2025-12-04T08:53:08.4121290Z * [new branch] gh/fegin/332/base -> origin/gh/fegin/332/base 2025-12-04T08:53:08.4121361Z * [new branch] gh/fegin/332/head -> origin/gh/fegin/332/head 2025-12-04T08:53:08.4121431Z * [new branch] gh/fegin/332/orig -> origin/gh/fegin/332/orig 2025-12-04T08:53:08.4121497Z * [new branch] gh/fegin/333/base -> origin/gh/fegin/333/base 2025-12-04T08:53:08.4121570Z * [new branch] gh/fegin/333/head -> origin/gh/fegin/333/head 2025-12-04T08:53:08.4121638Z * [new branch] gh/fegin/333/orig -> origin/gh/fegin/333/orig 2025-12-04T08:53:08.4121706Z * [new branch] gh/fegin/334/base -> origin/gh/fegin/334/base 2025-12-04T08:53:08.4121776Z * [new branch] gh/fegin/334/head -> origin/gh/fegin/334/head 2025-12-04T08:53:08.4121842Z * [new branch] gh/fegin/334/orig -> origin/gh/fegin/334/orig 2025-12-04T08:53:08.4121910Z * [new branch] gh/fegin/335/base -> origin/gh/fegin/335/base 2025-12-04T08:53:08.4121981Z * [new branch] gh/fegin/335/head -> origin/gh/fegin/335/head 2025-12-04T08:53:08.4122051Z * [new branch] gh/fegin/335/orig -> origin/gh/fegin/335/orig 2025-12-04T08:53:08.4122121Z * [new branch] gh/fffrog/160/base -> origin/gh/fffrog/160/base 2025-12-04T08:53:08.4122196Z * [new branch] gh/fffrog/160/head -> origin/gh/fffrog/160/head 2025-12-04T08:53:08.4122264Z * [new branch] gh/fffrog/177/base -> origin/gh/fffrog/177/base 2025-12-04T08:53:08.4122333Z * [new branch] gh/fffrog/177/head -> origin/gh/fffrog/177/head 2025-12-04T08:53:08.4122403Z * [new branch] gh/fffrog/177/orig -> origin/gh/fffrog/177/orig 2025-12-04T08:53:08.4122470Z * [new branch] gh/fffrog/178/base -> origin/gh/fffrog/178/base 2025-12-04T08:53:08.4122540Z * [new branch] gh/fffrog/178/head -> origin/gh/fffrog/178/head 2025-12-04T08:53:08.4122607Z * [new branch] gh/fffrog/178/orig -> origin/gh/fffrog/178/orig 2025-12-04T08:53:08.4122678Z * [new branch] gh/fffrog/181/base -> origin/gh/fffrog/181/base 2025-12-04T08:53:08.4122749Z * [new branch] gh/fffrog/181/head -> origin/gh/fffrog/181/head 2025-12-04T08:53:08.4122817Z * [new branch] gh/fffrog/181/orig -> origin/gh/fffrog/181/orig 2025-12-04T08:53:08.4122887Z * [new branch] gh/fffrog/183/base -> origin/gh/fffrog/183/base 2025-12-04T08:53:08.4122956Z * [new branch] gh/fffrog/183/head -> origin/gh/fffrog/183/head 2025-12-04T08:53:08.4123024Z * [new branch] gh/fffrog/183/orig -> origin/gh/fffrog/183/orig 2025-12-04T08:53:08.4123094Z * [new branch] gh/fxdawnn/10/base -> origin/gh/fxdawnn/10/base 2025-12-04T08:53:08.4123170Z * [new branch] gh/fxdawnn/10/head -> origin/gh/fxdawnn/10/head 2025-12-04T08:53:08.4123239Z * [new branch] gh/fxdawnn/10/orig -> origin/gh/fxdawnn/10/orig 2025-12-04T08:53:08.4123394Z * [new branch] gh/fxdawnn/11/base -> origin/gh/fxdawnn/11/base 2025-12-04T08:53:08.4123464Z * [new branch] gh/fxdawnn/11/head -> origin/gh/fxdawnn/11/head 2025-12-04T08:53:08.4123577Z * [new branch] gh/fxdawnn/11/orig -> origin/gh/fxdawnn/11/orig 2025-12-04T08:53:08.4123647Z * [new branch] gh/fxdawnn/12/base -> origin/gh/fxdawnn/12/base 2025-12-04T08:53:08.4123714Z * [new branch] gh/fxdawnn/12/head -> origin/gh/fxdawnn/12/head 2025-12-04T08:53:08.4123786Z * [new branch] gh/fxdawnn/12/orig -> origin/gh/fxdawnn/12/orig 2025-12-04T08:53:08.4123853Z * [new branch] gh/fxdawnn/13/base -> origin/gh/fxdawnn/13/base 2025-12-04T08:53:08.4123922Z * [new branch] gh/fxdawnn/13/head -> origin/gh/fxdawnn/13/head 2025-12-04T08:53:08.4123995Z * [new branch] gh/fxdawnn/13/orig -> origin/gh/fxdawnn/13/orig 2025-12-04T08:53:08.4124063Z * [new branch] gh/fxdawnn/14/base -> origin/gh/fxdawnn/14/base 2025-12-04T08:53:08.4124131Z * [new branch] gh/fxdawnn/14/head -> origin/gh/fxdawnn/14/head 2025-12-04T08:53:08.4124206Z * [new branch] gh/fxdawnn/14/orig -> origin/gh/fxdawnn/14/orig 2025-12-04T08:53:08.4124274Z * [new branch] gh/fxdawnn/15/base -> origin/gh/fxdawnn/15/base 2025-12-04T08:53:08.4124341Z * [new branch] gh/fxdawnn/15/head -> origin/gh/fxdawnn/15/head 2025-12-04T08:53:08.4124412Z * [new branch] gh/fxdawnn/15/orig -> origin/gh/fxdawnn/15/orig 2025-12-04T08:53:08.4124480Z * [new branch] gh/fxdawnn/6/base -> origin/gh/fxdawnn/6/base 2025-12-04T08:53:08.4124548Z * [new branch] gh/fxdawnn/6/head -> origin/gh/fxdawnn/6/head 2025-12-04T08:53:08.4124618Z * [new branch] gh/fxdawnn/6/orig -> origin/gh/fxdawnn/6/orig 2025-12-04T08:53:08.4124685Z * [new branch] gh/fxdawnn/7/base -> origin/gh/fxdawnn/7/base 2025-12-04T08:53:08.4124754Z * [new branch] gh/fxdawnn/7/head -> origin/gh/fxdawnn/7/head 2025-12-04T08:53:08.4124821Z * [new branch] gh/fxdawnn/7/orig -> origin/gh/fxdawnn/7/orig 2025-12-04T08:53:08.4124888Z * [new branch] gh/fxdawnn/9/base -> origin/gh/fxdawnn/9/base 2025-12-04T08:53:08.4124957Z * [new branch] gh/fxdawnn/9/head -> origin/gh/fxdawnn/9/head 2025-12-04T08:53:08.4125024Z * [new branch] gh/fxdawnn/9/orig -> origin/gh/fxdawnn/9/orig 2025-12-04T08:53:08.4125092Z * [new branch] gh/galv/1/base -> origin/gh/galv/1/base 2025-12-04T08:53:08.4125161Z * [new branch] gh/galv/1/head -> origin/gh/galv/1/head 2025-12-04T08:53:08.4125225Z * [new branch] gh/galv/1/orig -> origin/gh/galv/1/orig 2025-12-04T08:53:08.4125288Z * [new branch] gh/galv/2/base -> origin/gh/galv/2/base 2025-12-04T08:53:08.4125356Z * [new branch] gh/galv/2/head -> origin/gh/galv/2/head 2025-12-04T08:53:08.4125422Z * [new branch] gh/galv/2/orig -> origin/gh/galv/2/orig 2025-12-04T08:53:08.4125485Z * [new branch] gh/galv/3/base -> origin/gh/galv/3/base 2025-12-04T08:53:08.4125551Z * [new branch] gh/galv/3/head -> origin/gh/galv/3/head 2025-12-04T08:53:08.4125613Z * [new branch] gh/galv/3/orig -> origin/gh/galv/3/orig 2025-12-04T08:53:08.4125691Z * [new branch] gh/guangyey/134/base -> origin/gh/guangyey/134/base 2025-12-04T08:53:08.4125772Z * [new branch] gh/guangyey/134/head -> origin/gh/guangyey/134/head 2025-12-04T08:53:08.4125843Z * [new branch] gh/guangyey/134/orig -> origin/gh/guangyey/134/orig 2025-12-04T08:53:08.4125914Z * [new branch] gh/guangyey/163/base -> origin/gh/guangyey/163/base 2025-12-04T08:53:08.4126035Z * [new branch] gh/guangyey/163/head -> origin/gh/guangyey/163/head 2025-12-04T08:53:08.4126131Z * [new branch] gh/guangyey/163/orig -> origin/gh/guangyey/163/orig 2025-12-04T08:53:08.4126200Z * [new branch] gh/guangyey/168/base -> origin/gh/guangyey/168/base 2025-12-04T08:53:08.4126273Z * [new branch] gh/guangyey/168/head -> origin/gh/guangyey/168/head 2025-12-04T08:53:08.4126344Z * [new branch] gh/guangyey/168/orig -> origin/gh/guangyey/168/orig 2025-12-04T08:53:08.4126414Z * [new branch] gh/guangyey/169/base -> origin/gh/guangyey/169/base 2025-12-04T08:53:08.4126486Z * [new branch] gh/guangyey/169/head -> origin/gh/guangyey/169/head 2025-12-04T08:53:08.4126555Z * [new branch] gh/guangyey/169/orig -> origin/gh/guangyey/169/orig 2025-12-04T08:53:08.4126627Z * [new branch] gh/guangyey/170/base -> origin/gh/guangyey/170/base 2025-12-04T08:53:08.4126700Z * [new branch] gh/guangyey/170/head -> origin/gh/guangyey/170/head 2025-12-04T08:53:08.4126771Z * [new branch] gh/guangyey/170/orig -> origin/gh/guangyey/170/orig 2025-12-04T08:53:08.4126843Z * [new branch] gh/guangyey/171/base -> origin/gh/guangyey/171/base 2025-12-04T08:53:08.4126912Z * [new branch] gh/guangyey/171/head -> origin/gh/guangyey/171/head 2025-12-04T08:53:08.4126983Z * [new branch] gh/guangyey/171/orig -> origin/gh/guangyey/171/orig 2025-12-04T08:53:08.4127057Z * [new branch] gh/guangyey/178/base -> origin/gh/guangyey/178/base 2025-12-04T08:53:08.4127126Z * [new branch] gh/guangyey/178/head -> origin/gh/guangyey/178/head 2025-12-04T08:53:08.4127194Z * [new branch] gh/guangyey/178/orig -> origin/gh/guangyey/178/orig 2025-12-04T08:53:08.4127266Z * [new branch] gh/guangyey/182/base -> origin/gh/guangyey/182/base 2025-12-04T08:53:08.4127338Z * [new branch] gh/guangyey/182/head -> origin/gh/guangyey/182/head 2025-12-04T08:53:08.4127410Z * [new branch] gh/guangyey/182/orig -> origin/gh/guangyey/182/orig 2025-12-04T08:53:08.4127481Z * [new branch] gh/guangyey/183/base -> origin/gh/guangyey/183/base 2025-12-04T08:53:08.4127551Z * [new branch] gh/guangyey/183/head -> origin/gh/guangyey/183/head 2025-12-04T08:53:08.4127622Z * [new branch] gh/guangyey/183/orig -> origin/gh/guangyey/183/orig 2025-12-04T08:53:08.4127694Z * [new branch] gh/guangyey/185/base -> origin/gh/guangyey/185/base 2025-12-04T08:53:08.4127765Z * [new branch] gh/guangyey/185/head -> origin/gh/guangyey/185/head 2025-12-04T08:53:08.4127836Z * [new branch] gh/guangyey/185/orig -> origin/gh/guangyey/185/orig 2025-12-04T08:53:08.4127909Z * [new branch] gh/guangyey/186/base -> origin/gh/guangyey/186/base 2025-12-04T08:53:08.4127981Z * [new branch] gh/guangyey/186/head -> origin/gh/guangyey/186/head 2025-12-04T08:53:08.4128057Z * [new branch] gh/guangyey/186/orig -> origin/gh/guangyey/186/orig 2025-12-04T08:53:08.4128128Z * [new branch] gh/guangyey/187/base -> origin/gh/guangyey/187/base 2025-12-04T08:53:08.4128198Z * [new branch] gh/guangyey/187/head -> origin/gh/guangyey/187/head 2025-12-04T08:53:08.4128270Z * [new branch] gh/guangyey/187/orig -> origin/gh/guangyey/187/orig 2025-12-04T08:53:08.4128340Z * [new branch] gh/guangyey/188/base -> origin/gh/guangyey/188/base 2025-12-04T08:53:08.4128410Z * [new branch] gh/guangyey/188/head -> origin/gh/guangyey/188/head 2025-12-04T08:53:08.4128482Z * [new branch] gh/guangyey/188/orig -> origin/gh/guangyey/188/orig 2025-12-04T08:53:08.4128552Z * [new branch] gh/guangyey/190/base -> origin/gh/guangyey/190/base 2025-12-04T08:53:08.4128704Z * [new branch] gh/guangyey/190/head -> origin/gh/guangyey/190/head 2025-12-04T08:53:08.4128802Z * [new branch] gh/guangyey/190/orig -> origin/gh/guangyey/190/orig 2025-12-04T08:53:08.4128873Z * [new branch] gh/guangyey/208/base -> origin/gh/guangyey/208/base 2025-12-04T08:53:08.4128943Z * [new branch] gh/guangyey/208/head -> origin/gh/guangyey/208/head 2025-12-04T08:53:08.4129015Z * [new branch] gh/guangyey/208/orig -> origin/gh/guangyey/208/orig 2025-12-04T08:53:08.4129085Z * [new branch] gh/guangyey/228/base -> origin/gh/guangyey/228/base 2025-12-04T08:53:08.4129155Z * [new branch] gh/guangyey/228/head -> origin/gh/guangyey/228/head 2025-12-04T08:53:08.4129227Z * [new branch] gh/guangyey/228/orig -> origin/gh/guangyey/228/orig 2025-12-04T08:53:08.4129297Z * [new branch] gh/guangyey/230/base -> origin/gh/guangyey/230/base 2025-12-04T08:53:08.4129369Z * [new branch] gh/guangyey/230/head -> origin/gh/guangyey/230/head 2025-12-04T08:53:08.4129443Z * [new branch] gh/guangyey/230/orig -> origin/gh/guangyey/230/orig 2025-12-04T08:53:08.4129513Z * [new branch] gh/guangyey/231/base -> origin/gh/guangyey/231/base 2025-12-04T08:53:08.4129585Z * [new branch] gh/guangyey/231/head -> origin/gh/guangyey/231/head 2025-12-04T08:53:08.4129655Z * [new branch] gh/guangyey/231/orig -> origin/gh/guangyey/231/orig 2025-12-04T08:53:08.4129725Z * [new branch] gh/guangyey/232/base -> origin/gh/guangyey/232/base 2025-12-04T08:53:08.4129798Z * [new branch] gh/guangyey/232/head -> origin/gh/guangyey/232/head 2025-12-04T08:53:08.4129868Z * [new branch] gh/guangyey/232/orig -> origin/gh/guangyey/232/orig 2025-12-04T08:53:08.4129940Z * [new branch] gh/guangyey/233/base -> origin/gh/guangyey/233/base 2025-12-04T08:53:08.4130012Z * [new branch] gh/guangyey/233/head -> origin/gh/guangyey/233/head 2025-12-04T08:53:08.4130083Z * [new branch] gh/guangyey/233/orig -> origin/gh/guangyey/233/orig 2025-12-04T08:53:08.4130153Z * [new branch] gh/guangyey/234/base -> origin/gh/guangyey/234/base 2025-12-04T08:53:08.4130224Z * [new branch] gh/guangyey/234/head -> origin/gh/guangyey/234/head 2025-12-04T08:53:08.4130293Z * [new branch] gh/guangyey/234/orig -> origin/gh/guangyey/234/orig 2025-12-04T08:53:08.4130363Z * [new branch] gh/guangyey/235/base -> origin/gh/guangyey/235/base 2025-12-04T08:53:08.4130436Z * [new branch] gh/guangyey/235/head -> origin/gh/guangyey/235/head 2025-12-04T08:53:08.4130505Z * [new branch] gh/guangyey/235/orig -> origin/gh/guangyey/235/orig 2025-12-04T08:53:08.4130576Z * [new branch] gh/guangyey/236/base -> origin/gh/guangyey/236/base 2025-12-04T08:53:08.4130649Z * [new branch] gh/guangyey/236/head -> origin/gh/guangyey/236/head 2025-12-04T08:53:08.4130721Z * [new branch] gh/guangyey/236/orig -> origin/gh/guangyey/236/orig 2025-12-04T08:53:08.4130791Z * [new branch] gh/guangyey/237/base -> origin/gh/guangyey/237/base 2025-12-04T08:53:08.4130863Z * [new branch] gh/guangyey/237/head -> origin/gh/guangyey/237/head 2025-12-04T08:53:08.4130931Z * [new branch] gh/guangyey/237/orig -> origin/gh/guangyey/237/orig 2025-12-04T08:53:08.4131004Z * [new branch] gh/guangyey/238/base -> origin/gh/guangyey/238/base 2025-12-04T08:53:08.4131072Z * [new branch] gh/guangyey/238/head -> origin/gh/guangyey/238/head 2025-12-04T08:53:08.4131141Z * [new branch] gh/guangyey/239/base -> origin/gh/guangyey/239/base 2025-12-04T08:53:08.4131240Z * [new branch] gh/guangyey/239/head -> origin/gh/guangyey/239/head 2025-12-04T08:53:08.4131311Z * [new branch] gh/guangyey/239/orig -> origin/gh/guangyey/239/orig 2025-12-04T08:53:08.4131434Z * [new branch] gh/guangyey/240/base -> origin/gh/guangyey/240/base 2025-12-04T08:53:08.4131507Z * [new branch] gh/guangyey/240/head -> origin/gh/guangyey/240/head 2025-12-04T08:53:08.4131577Z * [new branch] gh/guangyey/240/orig -> origin/gh/guangyey/240/orig 2025-12-04T08:53:08.4131648Z * [new branch] gh/guangyey/241/base -> origin/gh/guangyey/241/base 2025-12-04T08:53:08.4131718Z * [new branch] gh/guangyey/241/head -> origin/gh/guangyey/241/head 2025-12-04T08:53:08.4131788Z * [new branch] gh/guangyey/241/orig -> origin/gh/guangyey/241/orig 2025-12-04T08:53:08.4131859Z * [new branch] gh/guangyey/242/base -> origin/gh/guangyey/242/base 2025-12-04T08:53:08.4131933Z * [new branch] gh/guangyey/242/head -> origin/gh/guangyey/242/head 2025-12-04T08:53:08.4132004Z * [new branch] gh/guangyey/242/orig -> origin/gh/guangyey/242/orig 2025-12-04T08:53:08.4132075Z * [new branch] gh/guangyey/243/base -> origin/gh/guangyey/243/base 2025-12-04T08:53:08.4132147Z * [new branch] gh/guangyey/243/head -> origin/gh/guangyey/243/head 2025-12-04T08:53:08.4132215Z * [new branch] gh/guangyey/243/orig -> origin/gh/guangyey/243/orig 2025-12-04T08:53:08.4132284Z * [new branch] gh/guangyey/244/base -> origin/gh/guangyey/244/base 2025-12-04T08:53:08.4132356Z * [new branch] gh/guangyey/244/head -> origin/gh/guangyey/244/head 2025-12-04T08:53:08.4132425Z * [new branch] gh/guangyey/244/orig -> origin/gh/guangyey/244/orig 2025-12-04T08:53:08.4132495Z * [new branch] gh/guangyey/245/base -> origin/gh/guangyey/245/base 2025-12-04T08:53:08.4132567Z * [new branch] gh/guangyey/245/head -> origin/gh/guangyey/245/head 2025-12-04T08:53:08.4132638Z * [new branch] gh/guangyey/245/orig -> origin/gh/guangyey/245/orig 2025-12-04T08:53:08.4132712Z * [new branch] gh/guangyey/246/base -> origin/gh/guangyey/246/base 2025-12-04T08:53:08.4132782Z * [new branch] gh/guangyey/246/head -> origin/gh/guangyey/246/head 2025-12-04T08:53:08.4132852Z * [new branch] gh/guangyey/246/orig -> origin/gh/guangyey/246/orig 2025-12-04T08:53:08.4132924Z * [new branch] gh/guangyey/247/base -> origin/gh/guangyey/247/base 2025-12-04T08:53:08.4132994Z * [new branch] gh/guangyey/247/head -> origin/gh/guangyey/247/head 2025-12-04T08:53:08.4133064Z * [new branch] gh/guangyey/247/orig -> origin/gh/guangyey/247/orig 2025-12-04T08:53:08.4133139Z * [new branch] gh/guangyey/248/base -> origin/gh/guangyey/248/base 2025-12-04T08:53:08.4133210Z * [new branch] gh/guangyey/248/head -> origin/gh/guangyey/248/head 2025-12-04T08:53:08.4133330Z * [new branch] gh/guangyey/248/orig -> origin/gh/guangyey/248/orig 2025-12-04T08:53:08.4133403Z * [new branch] gh/guangyey/249/base -> origin/gh/guangyey/249/base 2025-12-04T08:53:08.4133473Z * [new branch] gh/guangyey/249/head -> origin/gh/guangyey/249/head 2025-12-04T08:53:08.4133543Z * [new branch] gh/guangyey/249/orig -> origin/gh/guangyey/249/orig 2025-12-04T08:53:08.4133614Z * [new branch] gh/guangyey/250/base -> origin/gh/guangyey/250/base 2025-12-04T08:53:08.4133682Z * [new branch] gh/guangyey/250/head -> origin/gh/guangyey/250/head 2025-12-04T08:53:08.4133750Z * [new branch] gh/guangyey/250/orig -> origin/gh/guangyey/250/orig 2025-12-04T08:53:08.4133823Z * [new branch] gh/guangyey/251/base -> origin/gh/guangyey/251/base 2025-12-04T08:53:08.4133946Z * [new branch] gh/guangyey/251/head -> origin/gh/guangyey/251/head 2025-12-04T08:53:08.4134060Z * [new branch] gh/guangyey/251/orig -> origin/gh/guangyey/251/orig 2025-12-04T08:53:08.4134133Z * [new branch] gh/guangyey/252/base -> origin/gh/guangyey/252/base 2025-12-04T08:53:08.4134203Z * [new branch] gh/guangyey/252/head -> origin/gh/guangyey/252/head 2025-12-04T08:53:08.4134275Z * [new branch] gh/guangyey/252/orig -> origin/gh/guangyey/252/orig 2025-12-04T08:53:08.4134345Z * [new branch] gh/guangyey/253/base -> origin/gh/guangyey/253/base 2025-12-04T08:53:08.4134415Z * [new branch] gh/guangyey/253/head -> origin/gh/guangyey/253/head 2025-12-04T08:53:08.4134488Z * [new branch] gh/guangyey/253/orig -> origin/gh/guangyey/253/orig 2025-12-04T08:53:08.4134558Z * [new branch] gh/guangyey/254/base -> origin/gh/guangyey/254/base 2025-12-04T08:53:08.4134629Z * [new branch] gh/guangyey/254/head -> origin/gh/guangyey/254/head 2025-12-04T08:53:08.4134703Z * [new branch] gh/guangyey/254/orig -> origin/gh/guangyey/254/orig 2025-12-04T08:53:08.4134773Z * [new branch] gh/guangyey/255/base -> origin/gh/guangyey/255/base 2025-12-04T08:53:08.4134843Z * [new branch] gh/guangyey/255/head -> origin/gh/guangyey/255/head 2025-12-04T08:53:08.4134914Z * [new branch] gh/guangyey/255/orig -> origin/gh/guangyey/255/orig 2025-12-04T08:53:08.4135012Z * [new branch] gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base 2025-12-04T08:53:08.4135104Z * [new branch] gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head 2025-12-04T08:53:08.4135197Z * [new branch] gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig 2025-12-04T08:53:08.4135288Z * [new branch] gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base 2025-12-04T08:53:08.4135375Z * [new branch] gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head 2025-12-04T08:53:08.4135466Z * [new branch] gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig 2025-12-04T08:53:08.4135552Z * [new branch] gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base 2025-12-04T08:53:08.4135637Z * [new branch] gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head 2025-12-04T08:53:08.4135727Z * [new branch] gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig 2025-12-04T08:53:08.4135814Z * [new branch] gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base 2025-12-04T08:53:08.4135903Z * [new branch] gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head 2025-12-04T08:53:08.4135990Z * [new branch] gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig 2025-12-04T08:53:08.4136081Z * [new branch] gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base 2025-12-04T08:53:08.4136171Z * [new branch] gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head 2025-12-04T08:53:08.4136259Z * [new branch] gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig 2025-12-04T08:53:08.4136346Z * [new branch] gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base 2025-12-04T08:53:08.4136435Z * [new branch] gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head 2025-12-04T08:53:08.4136522Z * [new branch] gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig 2025-12-04T08:53:08.4136609Z * [new branch] gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base 2025-12-04T08:53:08.4136697Z * [new branch] gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head 2025-12-04T08:53:08.4136813Z * [new branch] gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig 2025-12-04T08:53:08.4136936Z * [new branch] gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base 2025-12-04T08:53:08.4137026Z * [new branch] gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head 2025-12-04T08:53:08.4137113Z * [new branch] gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig 2025-12-04T08:53:08.4137200Z * [new branch] gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base 2025-12-04T08:53:08.4137288Z * [new branch] gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head 2025-12-04T08:53:08.4137374Z * [new branch] gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig 2025-12-04T08:53:08.4137463Z * [new branch] gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base 2025-12-04T08:53:08.4137552Z * [new branch] gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head 2025-12-04T08:53:08.4137640Z * [new branch] gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig 2025-12-04T08:53:08.4137729Z * [new branch] gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base 2025-12-04T08:53:08.4137816Z * [new branch] gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head 2025-12-04T08:53:08.4137903Z * [new branch] gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig 2025-12-04T08:53:08.4137992Z * [new branch] gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base 2025-12-04T08:53:08.4138079Z * [new branch] gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head 2025-12-04T08:53:08.4138166Z * [new branch] gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig 2025-12-04T08:53:08.4138258Z * [new branch] gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base 2025-12-04T08:53:08.4138347Z * [new branch] gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head 2025-12-04T08:53:08.4138435Z * [new branch] gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig 2025-12-04T08:53:08.4138522Z * [new branch] gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base 2025-12-04T08:53:08.4138609Z * [new branch] gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head 2025-12-04T08:53:08.4138696Z * [new branch] gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig 2025-12-04T08:53:08.4138785Z * [new branch] gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base 2025-12-04T08:53:08.4138873Z * [new branch] gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head 2025-12-04T08:53:08.4138962Z * [new branch] gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig 2025-12-04T08:53:08.4139052Z * [new branch] gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base 2025-12-04T08:53:08.4139138Z * [new branch] gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head 2025-12-04T08:53:08.4139229Z * [new branch] gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig 2025-12-04T08:53:08.4139316Z * [new branch] gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base 2025-12-04T08:53:08.4139403Z * [new branch] gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head 2025-12-04T08:53:08.4139491Z * [new branch] gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig 2025-12-04T08:53:08.4139579Z * [new branch] gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base 2025-12-04T08:53:08.4139698Z * [new branch] gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head 2025-12-04T08:53:08.4139813Z * [new branch] gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig 2025-12-04T08:53:08.4139902Z * [new branch] gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base 2025-12-04T08:53:08.4139989Z * [new branch] gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head 2025-12-04T08:53:08.4140079Z * [new branch] gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig 2025-12-04T08:53:08.4140166Z * [new branch] gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base 2025-12-04T08:53:08.4140255Z * [new branch] gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head 2025-12-04T08:53:08.4140342Z * [new branch] gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig 2025-12-04T08:53:08.4140432Z * [new branch] gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base 2025-12-04T08:53:08.4140526Z * [new branch] gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head 2025-12-04T08:53:08.4140612Z * [new branch] gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig 2025-12-04T08:53:08.4140697Z * [new branch] gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base 2025-12-04T08:53:08.4140786Z * [new branch] gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head 2025-12-04T08:53:08.4140873Z * [new branch] gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig 2025-12-04T08:53:08.4140959Z * [new branch] gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base 2025-12-04T08:53:08.4141048Z * [new branch] gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head 2025-12-04T08:53:08.4141137Z * [new branch] gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig 2025-12-04T08:53:08.4141224Z * [new branch] gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base 2025-12-04T08:53:08.4141315Z * [new branch] gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head 2025-12-04T08:53:08.4141402Z * [new branch] gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig 2025-12-04T08:53:08.4141487Z * [new branch] gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base 2025-12-04T08:53:08.4141575Z * [new branch] gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head 2025-12-04T08:53:08.4141663Z * [new branch] gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig 2025-12-04T08:53:08.4141754Z * [new branch] gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base 2025-12-04T08:53:08.4141841Z * [new branch] gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head 2025-12-04T08:53:08.4141931Z * [new branch] gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig 2025-12-04T08:53:08.4142020Z * [new branch] gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base 2025-12-04T08:53:08.4142107Z * [new branch] gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head 2025-12-04T08:53:08.4142195Z * [new branch] gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig 2025-12-04T08:53:08.4142283Z * [new branch] gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base 2025-12-04T08:53:08.4142372Z * [new branch] gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head 2025-12-04T08:53:08.4142458Z * [new branch] gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig 2025-12-04T08:53:08.4142547Z * [new branch] gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base 2025-12-04T08:53:08.4142666Z * [new branch] gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head 2025-12-04T08:53:08.4142778Z * [new branch] gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig 2025-12-04T08:53:08.4142868Z * [new branch] gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base 2025-12-04T08:53:08.4142954Z * [new branch] gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head 2025-12-04T08:53:08.4143041Z * [new branch] gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig 2025-12-04T08:53:08.4143128Z * [new branch] gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base 2025-12-04T08:53:08.4143215Z * [new branch] gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head 2025-12-04T08:53:08.4143346Z * [new branch] gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig 2025-12-04T08:53:08.4143433Z * [new branch] gh/hameerabbasi/1/base -> origin/gh/hameerabbasi/1/base 2025-12-04T08:53:08.4143512Z * [new branch] gh/hameerabbasi/1/head -> origin/gh/hameerabbasi/1/head 2025-12-04T08:53:08.4143591Z * [new branch] gh/hameerabbasi/2/base -> origin/gh/hameerabbasi/2/base 2025-12-04T08:53:08.4143667Z * [new branch] gh/hameerabbasi/2/head -> origin/gh/hameerabbasi/2/head 2025-12-04T08:53:08.4143742Z * [new branch] gh/hameerabbasi/2/orig -> origin/gh/hameerabbasi/2/orig 2025-12-04T08:53:08.4143818Z * [new branch] gh/hameerabbasi/3/base -> origin/gh/hameerabbasi/3/base 2025-12-04T08:53:08.4143893Z * [new branch] gh/hameerabbasi/3/head -> origin/gh/hameerabbasi/3/head 2025-12-04T08:53:08.4143968Z * [new branch] gh/hameerabbasi/3/orig -> origin/gh/hameerabbasi/3/orig 2025-12-04T08:53:08.4144044Z * [new branch] gh/hameerabbasi/4/base -> origin/gh/hameerabbasi/4/base 2025-12-04T08:53:08.4144121Z * [new branch] gh/hameerabbasi/4/head -> origin/gh/hameerabbasi/4/head 2025-12-04T08:53:08.4144198Z * [new branch] gh/hameerabbasi/4/orig -> origin/gh/hameerabbasi/4/orig 2025-12-04T08:53:08.4144269Z * [new branch] gh/huydhn/1/next -> origin/gh/huydhn/1/next 2025-12-04T08:53:08.4144338Z * [new branch] gh/huydhn/2/next -> origin/gh/huydhn/2/next 2025-12-04T08:53:08.4144404Z * [new branch] gh/huydhn/3/next -> origin/gh/huydhn/3/next 2025-12-04T08:53:08.4144471Z * [new branch] gh/huydhn/4/next -> origin/gh/huydhn/4/next 2025-12-04T08:53:08.4144537Z * [new branch] gh/huydhn/5/next -> origin/gh/huydhn/5/next 2025-12-04T08:53:08.4144603Z * [new branch] gh/huydhn/6/next -> origin/gh/huydhn/6/next 2025-12-04T08:53:08.4144673Z * [new branch] gh/int3/97/base -> origin/gh/int3/97/base 2025-12-04T08:53:08.4144741Z * [new branch] gh/int3/97/head -> origin/gh/int3/97/head 2025-12-04T08:53:08.4144814Z * [new branch] gh/isuruf/101/base -> origin/gh/isuruf/101/base 2025-12-04T08:53:08.4144885Z * [new branch] gh/isuruf/101/head -> origin/gh/isuruf/101/head 2025-12-04T08:53:08.4144954Z * [new branch] gh/isuruf/146/base -> origin/gh/isuruf/146/base 2025-12-04T08:53:08.4145022Z * [new branch] gh/isuruf/146/head -> origin/gh/isuruf/146/head 2025-12-04T08:53:08.4145090Z * [new branch] gh/isuruf/146/orig -> origin/gh/isuruf/146/orig 2025-12-04T08:53:08.4145155Z * [new branch] gh/isuruf/158/base -> origin/gh/isuruf/158/base 2025-12-04T08:53:08.4145221Z * [new branch] gh/isuruf/158/head -> origin/gh/isuruf/158/head 2025-12-04T08:53:08.4145286Z * [new branch] gh/isuruf/159/base -> origin/gh/isuruf/159/base 2025-12-04T08:53:08.4145395Z * [new branch] gh/isuruf/159/head -> origin/gh/isuruf/159/head 2025-12-04T08:53:08.4145514Z * [new branch] gh/isuruf/160/base -> origin/gh/isuruf/160/base 2025-12-04T08:53:08.4145582Z * [new branch] gh/isuruf/160/head -> origin/gh/isuruf/160/head 2025-12-04T08:53:08.4145649Z * [new branch] gh/isuruf/160/orig -> origin/gh/isuruf/160/orig 2025-12-04T08:53:08.4145719Z * [new branch] gh/isuruf/81/base -> origin/gh/isuruf/81/base 2025-12-04T08:53:08.4145790Z * [new branch] gh/isuruf/81/head -> origin/gh/isuruf/81/head 2025-12-04T08:53:08.4145861Z * [new branch] gh/isuruf/81/orig -> origin/gh/isuruf/81/orig 2025-12-04T08:53:08.4145935Z * [new branch] gh/jamesjwu/176/base -> origin/gh/jamesjwu/176/base 2025-12-04T08:53:08.4146009Z * [new branch] gh/jamesjwu/176/head -> origin/gh/jamesjwu/176/head 2025-12-04T08:53:08.4146086Z * [new branch] gh/jamesjwu/176/orig -> origin/gh/jamesjwu/176/orig 2025-12-04T08:53:08.4146159Z * [new branch] gh/jamesjwu/187/base -> origin/gh/jamesjwu/187/base 2025-12-04T08:53:08.4146233Z * [new branch] gh/jamesjwu/187/head -> origin/gh/jamesjwu/187/head 2025-12-04T08:53:08.4146303Z * [new branch] gh/jamesjwu/187/orig -> origin/gh/jamesjwu/187/orig 2025-12-04T08:53:08.4146373Z * [new branch] gh/jamesjwu/196/base -> origin/gh/jamesjwu/196/base 2025-12-04T08:53:08.4146448Z * [new branch] gh/jamesjwu/196/head -> origin/gh/jamesjwu/196/head 2025-12-04T08:53:08.4146518Z * [new branch] gh/jamesjwu/196/orig -> origin/gh/jamesjwu/196/orig 2025-12-04T08:53:08.4146587Z * [new branch] gh/jamesjwu/198/base -> origin/gh/jamesjwu/198/base 2025-12-04T08:53:08.4146660Z * [new branch] gh/jamesjwu/198/head -> origin/gh/jamesjwu/198/head 2025-12-04T08:53:08.4146731Z * [new branch] gh/jamesjwu/198/orig -> origin/gh/jamesjwu/198/orig 2025-12-04T08:53:08.4146803Z * [new branch] gh/jamesjwu/207/base -> origin/gh/jamesjwu/207/base 2025-12-04T08:53:08.4146874Z * [new branch] gh/jamesjwu/207/head -> origin/gh/jamesjwu/207/head 2025-12-04T08:53:08.4146945Z * [new branch] gh/jamesjwu/207/orig -> origin/gh/jamesjwu/207/orig 2025-12-04T08:53:08.4147014Z * [new branch] gh/jamesjwu/208/base -> origin/gh/jamesjwu/208/base 2025-12-04T08:53:08.4147086Z * [new branch] gh/jamesjwu/208/head -> origin/gh/jamesjwu/208/head 2025-12-04T08:53:08.4147156Z * [new branch] gh/jamesjwu/208/orig -> origin/gh/jamesjwu/208/orig 2025-12-04T08:53:08.4147228Z * [new branch] gh/jamesjwu/52/base -> origin/gh/jamesjwu/52/base 2025-12-04T08:53:08.4147302Z * [new branch] gh/jamesjwu/52/head -> origin/gh/jamesjwu/52/head 2025-12-04T08:53:08.4147373Z * [new branch] gh/jamesjwu/53/base -> origin/gh/jamesjwu/53/base 2025-12-04T08:53:08.4147447Z * [new branch] gh/jamesjwu/53/head -> origin/gh/jamesjwu/53/head 2025-12-04T08:53:08.4147515Z * [new branch] gh/jamesjwu/54/base -> origin/gh/jamesjwu/54/base 2025-12-04T08:53:08.4147584Z * [new branch] gh/jamesjwu/54/head -> origin/gh/jamesjwu/54/head 2025-12-04T08:53:08.4147654Z * [new branch] gh/jamesjwu/55/base -> origin/gh/jamesjwu/55/base 2025-12-04T08:53:08.4147723Z * [new branch] gh/jamesjwu/55/head -> origin/gh/jamesjwu/55/head 2025-12-04T08:53:08.4147792Z * [new branch] gh/jamesjwu/56/base -> origin/gh/jamesjwu/56/base 2025-12-04T08:53:08.4147864Z * [new branch] gh/jamesjwu/56/head -> origin/gh/jamesjwu/56/head 2025-12-04T08:53:08.4147932Z * [new branch] gh/jamesjwu/57/base -> origin/gh/jamesjwu/57/base 2025-12-04T08:53:08.4148027Z * [new branch] gh/jamesjwu/57/head -> origin/gh/jamesjwu/57/head 2025-12-04T08:53:08.4148126Z * [new branch] gh/jamesjwu/58/base -> origin/gh/jamesjwu/58/base 2025-12-04T08:53:08.4148195Z * [new branch] gh/jamesjwu/58/head -> origin/gh/jamesjwu/58/head 2025-12-04T08:53:08.4148263Z * [new branch] gh/jamesjwu/59/base -> origin/gh/jamesjwu/59/base 2025-12-04T08:53:08.4148331Z * [new branch] gh/jamesjwu/59/head -> origin/gh/jamesjwu/59/head 2025-12-04T08:53:08.4148401Z * [new branch] gh/jamesjwu/60/base -> origin/gh/jamesjwu/60/base 2025-12-04T08:53:08.4148470Z * [new branch] gh/jamesjwu/60/head -> origin/gh/jamesjwu/60/head 2025-12-04T08:53:08.4148541Z * [new branch] gh/jamesjwu/61/base -> origin/gh/jamesjwu/61/base 2025-12-04T08:53:08.4148611Z * [new branch] gh/jamesjwu/61/head -> origin/gh/jamesjwu/61/head 2025-12-04T08:53:08.4148682Z * [new branch] gh/jamesjwu/62/base -> origin/gh/jamesjwu/62/base 2025-12-04T08:53:08.4148753Z * [new branch] gh/jamesjwu/62/head -> origin/gh/jamesjwu/62/head 2025-12-04T08:53:08.4148823Z * [new branch] gh/jamesjwu/63/base -> origin/gh/jamesjwu/63/base 2025-12-04T08:53:08.4148891Z * [new branch] gh/jamesjwu/63/head -> origin/gh/jamesjwu/63/head 2025-12-04T08:53:08.4148964Z * [new branch] gh/jamesjwu/64/base -> origin/gh/jamesjwu/64/base 2025-12-04T08:53:08.4149034Z * [new branch] gh/jamesjwu/64/head -> origin/gh/jamesjwu/64/head 2025-12-04T08:53:08.4149101Z * [new branch] gh/jamesjwu/65/base -> origin/gh/jamesjwu/65/base 2025-12-04T08:53:08.4149174Z * [new branch] gh/jamesjwu/65/head -> origin/gh/jamesjwu/65/head 2025-12-04T08:53:08.4149243Z * [new branch] gh/janeyx99/165/base -> origin/gh/janeyx99/165/base 2025-12-04T08:53:08.4149316Z * [new branch] gh/janeyx99/165/head -> origin/gh/janeyx99/165/head 2025-12-04T08:53:08.4149388Z * [new branch] gh/janeyx99/165/orig -> origin/gh/janeyx99/165/orig 2025-12-04T08:53:08.4149456Z * [new branch] gh/janeyx99/201/base -> origin/gh/janeyx99/201/base 2025-12-04T08:53:08.4149526Z * [new branch] gh/janeyx99/201/head -> origin/gh/janeyx99/201/head 2025-12-04T08:53:08.4149596Z * [new branch] gh/janeyx99/201/orig -> origin/gh/janeyx99/201/orig 2025-12-04T08:53:08.4149666Z * [new branch] gh/janeyx99/225/base -> origin/gh/janeyx99/225/base 2025-12-04T08:53:08.4149738Z * [new branch] gh/janeyx99/225/head -> origin/gh/janeyx99/225/head 2025-12-04T08:53:08.4149808Z * [new branch] gh/janeyx99/225/orig -> origin/gh/janeyx99/225/orig 2025-12-04T08:53:08.4149880Z * [new branch] gh/janeyx99/299/base -> origin/gh/janeyx99/299/base 2025-12-04T08:53:08.4149950Z * [new branch] gh/janeyx99/299/head -> origin/gh/janeyx99/299/head 2025-12-04T08:53:08.4150020Z * [new branch] gh/janeyx99/299/orig -> origin/gh/janeyx99/299/orig 2025-12-04T08:53:08.4150088Z * [new branch] gh/janeyx99/302/base -> origin/gh/janeyx99/302/base 2025-12-04T08:53:08.4150158Z * [new branch] gh/janeyx99/302/head -> origin/gh/janeyx99/302/head 2025-12-04T08:53:08.4150226Z * [new branch] gh/janeyx99/303/base -> origin/gh/janeyx99/303/base 2025-12-04T08:53:08.4150294Z * [new branch] gh/janeyx99/303/head -> origin/gh/janeyx99/303/head 2025-12-04T08:53:08.4150365Z * [new branch] gh/janeyx99/305/base -> origin/gh/janeyx99/305/base 2025-12-04T08:53:08.4150434Z * [new branch] gh/janeyx99/305/head -> origin/gh/janeyx99/305/head 2025-12-04T08:53:08.4150531Z * [new branch] gh/janeyx99/306/base -> origin/gh/janeyx99/306/base 2025-12-04T08:53:08.4150601Z * [new branch] gh/janeyx99/306/head -> origin/gh/janeyx99/306/head 2025-12-04T08:53:08.4150701Z * [new branch] gh/janeyx99/314/base -> origin/gh/janeyx99/314/base 2025-12-04T08:53:08.4150773Z * [new branch] gh/janeyx99/314/head -> origin/gh/janeyx99/314/head 2025-12-04T08:53:08.4150842Z * [new branch] gh/janeyx99/314/orig -> origin/gh/janeyx99/314/orig 2025-12-04T08:53:08.4150911Z * [new branch] gh/janeyx99/315/base -> origin/gh/janeyx99/315/base 2025-12-04T08:53:08.4150981Z * [new branch] gh/janeyx99/315/head -> origin/gh/janeyx99/315/head 2025-12-04T08:53:08.4151050Z * [new branch] gh/janeyx99/315/orig -> origin/gh/janeyx99/315/orig 2025-12-04T08:53:08.4151117Z * [new branch] gh/janeyx99/316/base -> origin/gh/janeyx99/316/base 2025-12-04T08:53:08.4151191Z * [new branch] gh/janeyx99/316/head -> origin/gh/janeyx99/316/head 2025-12-04T08:53:08.4151262Z * [new branch] gh/janeyx99/316/orig -> origin/gh/janeyx99/316/orig 2025-12-04T08:53:08.4151334Z * [new branch] gh/janeyx99/317/base -> origin/gh/janeyx99/317/base 2025-12-04T08:53:08.4151406Z * [new branch] gh/janeyx99/317/head -> origin/gh/janeyx99/317/head 2025-12-04T08:53:08.4151474Z * [new branch] gh/janeyx99/317/orig -> origin/gh/janeyx99/317/orig 2025-12-04T08:53:08.4151543Z * [new branch] gh/janeyx99/325/base -> origin/gh/janeyx99/325/base 2025-12-04T08:53:08.4151613Z * [new branch] gh/janeyx99/325/head -> origin/gh/janeyx99/325/head 2025-12-04T08:53:08.4151682Z * [new branch] gh/janeyx99/325/orig -> origin/gh/janeyx99/325/orig 2025-12-04T08:53:08.4151751Z * [new branch] gh/janeyx99/327/base -> origin/gh/janeyx99/327/base 2025-12-04T08:53:08.4151823Z * [new branch] gh/janeyx99/327/head -> origin/gh/janeyx99/327/head 2025-12-04T08:53:08.4151892Z * [new branch] gh/janeyx99/327/orig -> origin/gh/janeyx99/327/orig 2025-12-04T08:53:08.4151963Z * [new branch] gh/janeyx99/328/base -> origin/gh/janeyx99/328/base 2025-12-04T08:53:08.4152035Z * [new branch] gh/janeyx99/328/head -> origin/gh/janeyx99/328/head 2025-12-04T08:53:08.4152104Z * [new branch] gh/janeyx99/328/orig -> origin/gh/janeyx99/328/orig 2025-12-04T08:53:08.4152172Z * [new branch] gh/janeyx99/329/base -> origin/gh/janeyx99/329/base 2025-12-04T08:53:08.4152242Z * [new branch] gh/janeyx99/329/head -> origin/gh/janeyx99/329/head 2025-12-04T08:53:08.4152310Z * [new branch] gh/janeyx99/329/orig -> origin/gh/janeyx99/329/orig 2025-12-04T08:53:08.4152381Z * [new branch] gh/janeyx99/330/base -> origin/gh/janeyx99/330/base 2025-12-04T08:53:08.4152451Z * [new branch] gh/janeyx99/330/head -> origin/gh/janeyx99/330/head 2025-12-04T08:53:08.4152521Z * [new branch] gh/janeyx99/330/orig -> origin/gh/janeyx99/330/orig 2025-12-04T08:53:08.4152593Z * [new branch] gh/janeyx99/331/base -> origin/gh/janeyx99/331/base 2025-12-04T08:53:08.4152662Z * [new branch] gh/janeyx99/331/head -> origin/gh/janeyx99/331/head 2025-12-04T08:53:08.4152731Z * [new branch] gh/janeyx99/331/orig -> origin/gh/janeyx99/331/orig 2025-12-04T08:53:08.4152802Z * [new branch] gh/janeyx99/332/base -> origin/gh/janeyx99/332/base 2025-12-04T08:53:08.4152871Z * [new branch] gh/janeyx99/332/head -> origin/gh/janeyx99/332/head 2025-12-04T08:53:08.4152941Z * [new branch] gh/janeyx99/332/orig -> origin/gh/janeyx99/332/orig 2025-12-04T08:53:08.4153010Z * [new branch] gh/janeyx99/333/base -> origin/gh/janeyx99/333/base 2025-12-04T08:53:08.4153109Z * [new branch] gh/janeyx99/333/head -> origin/gh/janeyx99/333/head 2025-12-04T08:53:08.4153200Z * [new branch] gh/janeyx99/333/orig -> origin/gh/janeyx99/333/orig 2025-12-04T08:53:08.4153368Z * [new branch] gh/janeyx99/88/base -> origin/gh/janeyx99/88/base 2025-12-04T08:53:08.4153440Z * [new branch] gh/janeyx99/88/head -> origin/gh/janeyx99/88/head 2025-12-04T08:53:08.4153509Z * [new branch] gh/janeyx99/88/orig -> origin/gh/janeyx99/88/orig 2025-12-04T08:53:08.4153581Z * [new branch] gh/jansel/360/base -> origin/gh/jansel/360/base 2025-12-04T08:53:08.4153650Z * [new branch] gh/jansel/360/head -> origin/gh/jansel/360/head 2025-12-04T08:53:08.4153718Z * [new branch] gh/jansel/451/base -> origin/gh/jansel/451/base 2025-12-04T08:53:08.4153787Z * [new branch] gh/jansel/451/head -> origin/gh/jansel/451/head 2025-12-04T08:53:08.4153856Z * [new branch] gh/jansel/451/orig -> origin/gh/jansel/451/orig 2025-12-04T08:53:08.4153926Z * [new branch] gh/jansel/462/base -> origin/gh/jansel/462/base 2025-12-04T08:53:08.4153992Z * [new branch] gh/jansel/462/head -> origin/gh/jansel/462/head 2025-12-04T08:53:08.4154058Z * [new branch] gh/jansel/462/orig -> origin/gh/jansel/462/orig 2025-12-04T08:53:08.4154125Z * [new branch] gh/jansel/533/base -> origin/gh/jansel/533/base 2025-12-04T08:53:08.4154191Z * [new branch] gh/jansel/533/head -> origin/gh/jansel/533/head 2025-12-04T08:53:08.4154257Z * [new branch] gh/jansel/533/orig -> origin/gh/jansel/533/orig 2025-12-04T08:53:08.4154325Z * [new branch] gh/jansel/552/base -> origin/gh/jansel/552/base 2025-12-04T08:53:08.4154392Z * [new branch] gh/jansel/552/head -> origin/gh/jansel/552/head 2025-12-04T08:53:08.4154461Z * [new branch] gh/jansel/552/orig -> origin/gh/jansel/552/orig 2025-12-04T08:53:08.4154531Z * [new branch] gh/jansel/553/base -> origin/gh/jansel/553/base 2025-12-04T08:53:08.4154598Z * [new branch] gh/jansel/553/head -> origin/gh/jansel/553/head 2025-12-04T08:53:08.4154665Z * [new branch] gh/jansel/553/orig -> origin/gh/jansel/553/orig 2025-12-04T08:53:08.4154732Z * [new branch] gh/jansel/554/base -> origin/gh/jansel/554/base 2025-12-04T08:53:08.4154799Z * [new branch] gh/jansel/554/head -> origin/gh/jansel/554/head 2025-12-04T08:53:08.4154866Z * [new branch] gh/jansel/554/orig -> origin/gh/jansel/554/orig 2025-12-04T08:53:08.4154936Z * [new branch] gh/jansel/555/base -> origin/gh/jansel/555/base 2025-12-04T08:53:08.4155003Z * [new branch] gh/jansel/555/head -> origin/gh/jansel/555/head 2025-12-04T08:53:08.4155071Z * [new branch] gh/jansel/555/orig -> origin/gh/jansel/555/orig 2025-12-04T08:53:08.4155142Z * [new branch] gh/jansel/556/base -> origin/gh/jansel/556/base 2025-12-04T08:53:08.4155209Z * [new branch] gh/jansel/556/head -> origin/gh/jansel/556/head 2025-12-04T08:53:08.4155274Z * [new branch] gh/jansel/556/orig -> origin/gh/jansel/556/orig 2025-12-04T08:53:08.4155342Z * [new branch] gh/jansel/557/base -> origin/gh/jansel/557/base 2025-12-04T08:53:08.4155409Z * [new branch] gh/jansel/557/head -> origin/gh/jansel/557/head 2025-12-04T08:53:08.4155478Z * [new branch] gh/jansel/557/orig -> origin/gh/jansel/557/orig 2025-12-04T08:53:08.4155545Z * [new branch] gh/jansel/558/base -> origin/gh/jansel/558/base 2025-12-04T08:53:08.4155613Z * [new branch] gh/jansel/558/head -> origin/gh/jansel/558/head 2025-12-04T08:53:08.4155726Z * [new branch] gh/jansel/558/orig -> origin/gh/jansel/558/orig 2025-12-04T08:53:08.4155793Z * [new branch] gh/jansel/559/base -> origin/gh/jansel/559/base 2025-12-04T08:53:08.4155906Z * [new branch] gh/jansel/559/head -> origin/gh/jansel/559/head 2025-12-04T08:53:08.4155976Z * [new branch] gh/jansel/559/orig -> origin/gh/jansel/559/orig 2025-12-04T08:53:08.4156042Z * [new branch] gh/jansel/560/base -> origin/gh/jansel/560/base 2025-12-04T08:53:08.4156110Z * [new branch] gh/jansel/560/head -> origin/gh/jansel/560/head 2025-12-04T08:53:08.4156180Z * [new branch] gh/jansel/560/orig -> origin/gh/jansel/560/orig 2025-12-04T08:53:08.4156247Z * [new branch] gh/jansel/561/base -> origin/gh/jansel/561/base 2025-12-04T08:53:08.4156314Z * [new branch] gh/jansel/561/head -> origin/gh/jansel/561/head 2025-12-04T08:53:08.4156387Z * [new branch] gh/jansel/561/orig -> origin/gh/jansel/561/orig 2025-12-04T08:53:08.4156455Z * [new branch] gh/jansel/562/base -> origin/gh/jansel/562/base 2025-12-04T08:53:08.4156522Z * [new branch] gh/jansel/562/head -> origin/gh/jansel/562/head 2025-12-04T08:53:08.4156589Z * [new branch] gh/jansel/562/orig -> origin/gh/jansel/562/orig 2025-12-04T08:53:08.4156654Z * [new branch] gh/jansel/563/base -> origin/gh/jansel/563/base 2025-12-04T08:53:08.4156721Z * [new branch] gh/jansel/563/head -> origin/gh/jansel/563/head 2025-12-04T08:53:08.4156791Z * [new branch] gh/jansel/563/orig -> origin/gh/jansel/563/orig 2025-12-04T08:53:08.4156858Z * [new branch] gh/jansel/564/base -> origin/gh/jansel/564/base 2025-12-04T08:53:08.4156925Z * [new branch] gh/jansel/564/head -> origin/gh/jansel/564/head 2025-12-04T08:53:08.4156996Z * [new branch] gh/jansel/564/orig -> origin/gh/jansel/564/orig 2025-12-04T08:53:08.4157063Z * [new branch] gh/jansel/565/base -> origin/gh/jansel/565/base 2025-12-04T08:53:08.4157133Z * [new branch] gh/jansel/565/head -> origin/gh/jansel/565/head 2025-12-04T08:53:08.4157200Z * [new branch] gh/jansel/565/orig -> origin/gh/jansel/565/orig 2025-12-04T08:53:08.4157266Z * [new branch] gh/jansel/566/base -> origin/gh/jansel/566/base 2025-12-04T08:53:08.4157333Z * [new branch] gh/jansel/566/head -> origin/gh/jansel/566/head 2025-12-04T08:53:08.4157400Z * [new branch] gh/jansel/566/orig -> origin/gh/jansel/566/orig 2025-12-04T08:53:08.4157468Z * [new branch] gh/jansel/567/base -> origin/gh/jansel/567/base 2025-12-04T08:53:08.4157538Z * [new branch] gh/jansel/567/head -> origin/gh/jansel/567/head 2025-12-04T08:53:08.4157608Z * [new branch] gh/jansel/567/orig -> origin/gh/jansel/567/orig 2025-12-04T08:53:08.4157675Z * [new branch] gh/jansel/568/base -> origin/gh/jansel/568/base 2025-12-04T08:53:08.4157746Z * [new branch] gh/jansel/568/head -> origin/gh/jansel/568/head 2025-12-04T08:53:08.4157812Z * [new branch] gh/jansel/568/orig -> origin/gh/jansel/568/orig 2025-12-04T08:53:08.4157879Z * [new branch] gh/jansel/569/base -> origin/gh/jansel/569/base 2025-12-04T08:53:08.4157947Z * [new branch] gh/jansel/569/head -> origin/gh/jansel/569/head 2025-12-04T08:53:08.4158014Z * [new branch] gh/jansel/569/orig -> origin/gh/jansel/569/orig 2025-12-04T08:53:08.4158081Z * [new branch] gh/jansel/570/base -> origin/gh/jansel/570/base 2025-12-04T08:53:08.4158150Z * [new branch] gh/jansel/570/head -> origin/gh/jansel/570/head 2025-12-04T08:53:08.4158249Z * [new branch] gh/jansel/570/orig -> origin/gh/jansel/570/orig 2025-12-04T08:53:08.4158315Z * [new branch] gh/jansel/571/base -> origin/gh/jansel/571/base 2025-12-04T08:53:08.4158410Z * [new branch] gh/jansel/571/head -> origin/gh/jansel/571/head 2025-12-04T08:53:08.4158478Z * [new branch] gh/jansel/571/orig -> origin/gh/jansel/571/orig 2025-12-04T08:53:08.4158544Z * [new branch] gh/jansel/572/base -> origin/gh/jansel/572/base 2025-12-04T08:53:08.4158613Z * [new branch] gh/jansel/572/head -> origin/gh/jansel/572/head 2025-12-04T08:53:08.4158680Z * [new branch] gh/jansel/572/orig -> origin/gh/jansel/572/orig 2025-12-04T08:53:08.4158745Z * [new branch] gh/jansel/573/base -> origin/gh/jansel/573/base 2025-12-04T08:53:08.4158813Z * [new branch] gh/jansel/573/head -> origin/gh/jansel/573/head 2025-12-04T08:53:08.4158883Z * [new branch] gh/jansel/573/orig -> origin/gh/jansel/573/orig 2025-12-04T08:53:08.4158951Z * [new branch] gh/jansel/574/base -> origin/gh/jansel/574/base 2025-12-04T08:53:08.4159019Z * [new branch] gh/jansel/574/head -> origin/gh/jansel/574/head 2025-12-04T08:53:08.4159086Z * [new branch] gh/jansel/574/orig -> origin/gh/jansel/574/orig 2025-12-04T08:53:08.4159154Z * [new branch] gh/jansel/575/base -> origin/gh/jansel/575/base 2025-12-04T08:53:08.4159222Z * [new branch] gh/jansel/575/head -> origin/gh/jansel/575/head 2025-12-04T08:53:08.4159289Z * [new branch] gh/jansel/575/orig -> origin/gh/jansel/575/orig 2025-12-04T08:53:08.4159357Z * [new branch] gh/jansel/576/base -> origin/gh/jansel/576/base 2025-12-04T08:53:08.4159423Z * [new branch] gh/jansel/576/head -> origin/gh/jansel/576/head 2025-12-04T08:53:08.4159489Z * [new branch] gh/jansel/576/orig -> origin/gh/jansel/576/orig 2025-12-04T08:53:08.4159574Z * [new branch] gh/jbschlosser/247/base -> origin/gh/jbschlosser/247/base 2025-12-04T08:53:08.4159655Z * [new branch] gh/jbschlosser/247/head -> origin/gh/jbschlosser/247/head 2025-12-04T08:53:08.4159730Z * [new branch] gh/jbschlosser/247/orig -> origin/gh/jbschlosser/247/orig 2025-12-04T08:53:08.4159807Z * [new branch] gh/jbschlosser/250/base -> origin/gh/jbschlosser/250/base 2025-12-04T08:53:08.4159881Z * [new branch] gh/jbschlosser/250/head -> origin/gh/jbschlosser/250/head 2025-12-04T08:53:08.4159956Z * [new branch] gh/jbschlosser/250/orig -> origin/gh/jbschlosser/250/orig 2025-12-04T08:53:08.4160032Z * [new branch] gh/jerryzh168/1/base -> origin/gh/jerryzh168/1/base 2025-12-04T08:53:08.4160105Z * [new branch] gh/jerryzh168/1/head -> origin/gh/jerryzh168/1/head 2025-12-04T08:53:08.4160178Z * [new branch] gh/jerryzh168/1/orig -> origin/gh/jerryzh168/1/orig 2025-12-04T08:53:08.4160252Z * [new branch] gh/jiayisunx/59/base -> origin/gh/jiayisunx/59/base 2025-12-04T08:53:08.4160323Z * [new branch] gh/jiayisunx/59/head -> origin/gh/jiayisunx/59/head 2025-12-04T08:53:08.4160396Z * [new branch] gh/jiayisunx/59/orig -> origin/gh/jiayisunx/59/orig 2025-12-04T08:53:08.4160466Z * [new branch] gh/jiayisunx/61/base -> origin/gh/jiayisunx/61/base 2025-12-04T08:53:08.4160537Z * [new branch] gh/jiayisunx/61/head -> origin/gh/jiayisunx/61/head 2025-12-04T08:53:08.4160609Z * [new branch] gh/jiayisunx/61/orig -> origin/gh/jiayisunx/61/orig 2025-12-04T08:53:08.4160680Z * [new branch] gh/jiayisunx/68/base -> origin/gh/jiayisunx/68/base 2025-12-04T08:53:08.4160751Z * [new branch] gh/jiayisunx/68/head -> origin/gh/jiayisunx/68/head 2025-12-04T08:53:08.4160858Z * [new branch] gh/jiayisunx/68/orig -> origin/gh/jiayisunx/68/orig 2025-12-04T08:53:08.4160929Z * [new branch] gh/jiayisunx/77/base -> origin/gh/jiayisunx/77/base 2025-12-04T08:53:08.4161024Z * [new branch] gh/jiayisunx/77/head -> origin/gh/jiayisunx/77/head 2025-12-04T08:53:08.4161099Z * [new branch] gh/jiayisunx/77/orig -> origin/gh/jiayisunx/77/orig 2025-12-04T08:53:08.4161168Z * [new branch] gh/jiayisunx/78/base -> origin/gh/jiayisunx/78/base 2025-12-04T08:53:08.4161238Z * [new branch] gh/jiayisunx/78/head -> origin/gh/jiayisunx/78/head 2025-12-04T08:53:08.4161310Z * [new branch] gh/jiayisunx/78/orig -> origin/gh/jiayisunx/78/orig 2025-12-04T08:53:08.4161381Z * [new branch] gh/jiayisunx/79/base -> origin/gh/jiayisunx/79/base 2025-12-04T08:53:08.4161452Z * [new branch] gh/jiayisunx/79/head -> origin/gh/jiayisunx/79/head 2025-12-04T08:53:08.4161524Z * [new branch] gh/jiayisunx/79/orig -> origin/gh/jiayisunx/79/orig 2025-12-04T08:53:08.4161595Z * [new branch] gh/jiayisunx/82/base -> origin/gh/jiayisunx/82/base 2025-12-04T08:53:08.4161666Z * [new branch] gh/jiayisunx/82/head -> origin/gh/jiayisunx/82/head 2025-12-04T08:53:08.4161736Z * [new branch] gh/jiayisunx/82/orig -> origin/gh/jiayisunx/82/orig 2025-12-04T08:53:08.4161805Z * [new branch] gh/jiayisunx/83/base -> origin/gh/jiayisunx/83/base 2025-12-04T08:53:08.4161876Z * [new branch] gh/jiayisunx/83/head -> origin/gh/jiayisunx/83/head 2025-12-04T08:53:08.4161948Z * [new branch] gh/jiayisunx/83/orig -> origin/gh/jiayisunx/83/orig 2025-12-04T08:53:08.4162019Z * [new branch] gh/jiayisunx/84/base -> origin/gh/jiayisunx/84/base 2025-12-04T08:53:08.4162092Z * [new branch] gh/jiayisunx/84/head -> origin/gh/jiayisunx/84/head 2025-12-04T08:53:08.4162164Z * [new branch] gh/jiayisunx/84/orig -> origin/gh/jiayisunx/84/orig 2025-12-04T08:53:08.4162234Z * [new branch] gh/jiayisunx/85/base -> origin/gh/jiayisunx/85/base 2025-12-04T08:53:08.4162305Z * [new branch] gh/jiayisunx/85/head -> origin/gh/jiayisunx/85/head 2025-12-04T08:53:08.4162375Z * [new branch] gh/jiayisunx/85/orig -> origin/gh/jiayisunx/85/orig 2025-12-04T08:53:08.4162444Z * [new branch] gh/jiayisunx/86/base -> origin/gh/jiayisunx/86/base 2025-12-04T08:53:08.4162517Z * [new branch] gh/jiayisunx/86/head -> origin/gh/jiayisunx/86/head 2025-12-04T08:53:08.4162586Z * [new branch] gh/jiayisunx/86/orig -> origin/gh/jiayisunx/86/orig 2025-12-04T08:53:08.4162656Z * [new branch] gh/jiayisunx/87/base -> origin/gh/jiayisunx/87/base 2025-12-04T08:53:08.4162726Z * [new branch] gh/jiayisunx/87/head -> origin/gh/jiayisunx/87/head 2025-12-04T08:53:08.4162797Z * [new branch] gh/jiayisunx/87/orig -> origin/gh/jiayisunx/87/orig 2025-12-04T08:53:08.4162869Z * [new branch] gh/jiayisunx/88/base -> origin/gh/jiayisunx/88/base 2025-12-04T08:53:08.4162942Z * [new branch] gh/jiayisunx/88/head -> origin/gh/jiayisunx/88/head 2025-12-04T08:53:08.4163011Z * [new branch] gh/jiayisunx/88/orig -> origin/gh/jiayisunx/88/orig 2025-12-04T08:53:08.4163080Z * [new branch] gh/jiayisunx/89/base -> origin/gh/jiayisunx/89/base 2025-12-04T08:53:08.4163152Z * [new branch] gh/jiayisunx/89/head -> origin/gh/jiayisunx/89/head 2025-12-04T08:53:08.4163222Z * [new branch] gh/jiayisunx/89/orig -> origin/gh/jiayisunx/89/orig 2025-12-04T08:53:08.4163336Z * [new branch] gh/jiayisunx/90/base -> origin/gh/jiayisunx/90/base 2025-12-04T08:53:08.4163409Z * [new branch] gh/jiayisunx/90/head -> origin/gh/jiayisunx/90/head 2025-12-04T08:53:08.4163532Z * [new branch] gh/jiayisunx/90/orig -> origin/gh/jiayisunx/90/orig 2025-12-04T08:53:08.4163656Z * [new branch] gh/jjwu@meta.com/1/base -> origin/gh/jjwu@meta.com/1/base 2025-12-04T08:53:08.4163731Z * [new branch] gh/jjwu@meta.com/1/head -> origin/gh/jjwu@meta.com/1/head 2025-12-04T08:53:08.4163801Z * [new branch] gh/jturney/1/base -> origin/gh/jturney/1/base 2025-12-04T08:53:08.4163870Z * [new branch] gh/jturney/1/head -> origin/gh/jturney/1/head 2025-12-04T08:53:08.4163938Z * [new branch] gh/jturney/1/orig -> origin/gh/jturney/1/orig 2025-12-04T08:53:08.4164004Z * [new branch] gh/jturney/2/base -> origin/gh/jturney/2/base 2025-12-04T08:53:08.4164072Z * [new branch] gh/jturney/2/head -> origin/gh/jturney/2/head 2025-12-04T08:53:08.4164138Z * [new branch] gh/jturney/2/orig -> origin/gh/jturney/2/orig 2025-12-04T08:53:08.4164217Z * [new branch] gh/karthickai/10/base -> origin/gh/karthickai/10/base 2025-12-04T08:53:08.4164295Z * [new branch] gh/karthickai/10/head -> origin/gh/karthickai/10/head 2025-12-04T08:53:08.4164368Z * [new branch] gh/karthickai/10/orig -> origin/gh/karthickai/10/orig 2025-12-04T08:53:08.4164440Z * [new branch] gh/karthickai/11/base -> origin/gh/karthickai/11/base 2025-12-04T08:53:08.4164513Z * [new branch] gh/karthickai/11/head -> origin/gh/karthickai/11/head 2025-12-04T08:53:08.4164585Z * [new branch] gh/karthickai/11/orig -> origin/gh/karthickai/11/orig 2025-12-04T08:53:08.4164659Z * [new branch] gh/karthickai/12/base -> origin/gh/karthickai/12/base 2025-12-04T08:53:08.4164734Z * [new branch] gh/karthickai/12/head -> origin/gh/karthickai/12/head 2025-12-04T08:53:08.4164807Z * [new branch] gh/karthickai/12/orig -> origin/gh/karthickai/12/orig 2025-12-04T08:53:08.4164881Z * [new branch] gh/karthickai/13/base -> origin/gh/karthickai/13/base 2025-12-04T08:53:08.4164957Z * [new branch] gh/karthickai/13/head -> origin/gh/karthickai/13/head 2025-12-04T08:53:08.4165029Z * [new branch] gh/karthickai/13/orig -> origin/gh/karthickai/13/orig 2025-12-04T08:53:08.4165102Z * [new branch] gh/karthickai/14/base -> origin/gh/karthickai/14/base 2025-12-04T08:53:08.4165177Z * [new branch] gh/karthickai/14/head -> origin/gh/karthickai/14/head 2025-12-04T08:53:08.4165249Z * [new branch] gh/karthickai/14/orig -> origin/gh/karthickai/14/orig 2025-12-04T08:53:08.4165324Z * [new branch] gh/karthickai/15/base -> origin/gh/karthickai/15/base 2025-12-04T08:53:08.4165396Z * [new branch] gh/karthickai/15/head -> origin/gh/karthickai/15/head 2025-12-04T08:53:08.4165471Z * [new branch] gh/karthickai/15/orig -> origin/gh/karthickai/15/orig 2025-12-04T08:53:08.4165545Z * [new branch] gh/karthickai/16/base -> origin/gh/karthickai/16/base 2025-12-04T08:53:08.4165619Z * [new branch] gh/karthickai/16/head -> origin/gh/karthickai/16/head 2025-12-04T08:53:08.4165691Z * [new branch] gh/karthickai/16/orig -> origin/gh/karthickai/16/orig 2025-12-04T08:53:08.4165766Z * [new branch] gh/karthickai/17/base -> origin/gh/karthickai/17/base 2025-12-04T08:53:08.4165837Z * [new branch] gh/karthickai/17/head -> origin/gh/karthickai/17/head 2025-12-04T08:53:08.4165910Z * [new branch] gh/karthickai/17/orig -> origin/gh/karthickai/17/orig 2025-12-04T08:53:08.4165984Z * [new branch] gh/karthickai/18/base -> origin/gh/karthickai/18/base 2025-12-04T08:53:08.4166057Z * [new branch] gh/karthickai/18/head -> origin/gh/karthickai/18/head 2025-12-04T08:53:08.4166162Z * [new branch] gh/karthickai/18/orig -> origin/gh/karthickai/18/orig 2025-12-04T08:53:08.4166238Z * [new branch] gh/karthickai/19/base -> origin/gh/karthickai/19/base 2025-12-04T08:53:08.4166343Z * [new branch] gh/karthickai/19/head -> origin/gh/karthickai/19/head 2025-12-04T08:53:08.4166416Z * [new branch] gh/karthickai/19/orig -> origin/gh/karthickai/19/orig 2025-12-04T08:53:08.4166490Z * [new branch] gh/karthickai/20/base -> origin/gh/karthickai/20/base 2025-12-04T08:53:08.4166561Z * [new branch] gh/karthickai/20/head -> origin/gh/karthickai/20/head 2025-12-04T08:53:08.4166634Z * [new branch] gh/karthickai/20/orig -> origin/gh/karthickai/20/orig 2025-12-04T08:53:08.4166711Z * [new branch] gh/karthickai/21/base -> origin/gh/karthickai/21/base 2025-12-04T08:53:08.4166784Z * [new branch] gh/karthickai/21/head -> origin/gh/karthickai/21/head 2025-12-04T08:53:08.4166859Z * [new branch] gh/karthickai/21/orig -> origin/gh/karthickai/21/orig 2025-12-04T08:53:08.4166933Z * [new branch] gh/karthickai/22/base -> origin/gh/karthickai/22/base 2025-12-04T08:53:08.4167005Z * [new branch] gh/karthickai/22/head -> origin/gh/karthickai/22/head 2025-12-04T08:53:08.4167079Z * [new branch] gh/karthickai/22/orig -> origin/gh/karthickai/22/orig 2025-12-04T08:53:08.4167150Z * [new branch] gh/karthickai/23/base -> origin/gh/karthickai/23/base 2025-12-04T08:53:08.4167222Z * [new branch] gh/karthickai/23/head -> origin/gh/karthickai/23/head 2025-12-04T08:53:08.4167296Z * [new branch] gh/karthickai/23/orig -> origin/gh/karthickai/23/orig 2025-12-04T08:53:08.4167368Z * [new branch] gh/karthickai/24/base -> origin/gh/karthickai/24/base 2025-12-04T08:53:08.4167441Z * [new branch] gh/karthickai/24/head -> origin/gh/karthickai/24/head 2025-12-04T08:53:08.4167518Z * [new branch] gh/karthickai/24/orig -> origin/gh/karthickai/24/orig 2025-12-04T08:53:08.4167591Z * [new branch] gh/karthickai/25/base -> origin/gh/karthickai/25/base 2025-12-04T08:53:08.4167664Z * [new branch] gh/karthickai/25/head -> origin/gh/karthickai/25/head 2025-12-04T08:53:08.4167737Z * [new branch] gh/karthickai/25/orig -> origin/gh/karthickai/25/orig 2025-12-04T08:53:08.4167808Z * [new branch] gh/karthickai/26/base -> origin/gh/karthickai/26/base 2025-12-04T08:53:08.4167879Z * [new branch] gh/karthickai/26/head -> origin/gh/karthickai/26/head 2025-12-04T08:53:08.4167952Z * [new branch] gh/karthickai/26/orig -> origin/gh/karthickai/26/orig 2025-12-04T08:53:08.4168023Z * [new branch] gh/karthickai/6/base -> origin/gh/karthickai/6/base 2025-12-04T08:53:08.4168094Z * [new branch] gh/karthickai/6/head -> origin/gh/karthickai/6/head 2025-12-04T08:53:08.4168170Z * [new branch] gh/karthickai/6/orig -> origin/gh/karthickai/6/orig 2025-12-04T08:53:08.4168240Z * [new branch] gh/krocki/1/base -> origin/gh/krocki/1/base 2025-12-04T08:53:08.4168310Z * [new branch] gh/krocki/1/head -> origin/gh/krocki/1/head 2025-12-04T08:53:08.4168378Z * [new branch] gh/krocki/1/orig -> origin/gh/krocki/1/orig 2025-12-04T08:53:08.4168444Z * [new branch] gh/krocki/2/base -> origin/gh/krocki/2/base 2025-12-04T08:53:08.4168513Z * [new branch] gh/krocki/2/head -> origin/gh/krocki/2/head 2025-12-04T08:53:08.4168579Z * [new branch] gh/krocki/2/orig -> origin/gh/krocki/2/orig 2025-12-04T08:53:08.4168659Z * [new branch] gh/kurtamohler/60/base -> origin/gh/kurtamohler/60/base 2025-12-04T08:53:08.4168737Z * [new branch] gh/kurtamohler/60/head -> origin/gh/kurtamohler/60/head 2025-12-04T08:53:08.4168844Z * [new branch] gh/kurtamohler/60/orig -> origin/gh/kurtamohler/60/orig 2025-12-04T08:53:08.4168945Z * [new branch] gh/kurtamohler/61/base -> origin/gh/kurtamohler/61/base 2025-12-04T08:53:08.4169021Z * [new branch] gh/kurtamohler/61/head -> origin/gh/kurtamohler/61/head 2025-12-04T08:53:08.4169095Z * [new branch] gh/kurtamohler/61/orig -> origin/gh/kurtamohler/61/orig 2025-12-04T08:53:08.4169168Z * [new branch] gh/kurtamohler/62/base -> origin/gh/kurtamohler/62/base 2025-12-04T08:53:08.4169244Z * [new branch] gh/kurtamohler/62/head -> origin/gh/kurtamohler/62/head 2025-12-04T08:53:08.4169317Z * [new branch] gh/kurtamohler/62/orig -> origin/gh/kurtamohler/62/orig 2025-12-04T08:53:08.4169391Z * [new branch] gh/kurtamohler/63/base -> origin/gh/kurtamohler/63/base 2025-12-04T08:53:08.4169466Z * [new branch] gh/kurtamohler/63/head -> origin/gh/kurtamohler/63/head 2025-12-04T08:53:08.4169541Z * [new branch] gh/kurtamohler/63/orig -> origin/gh/kurtamohler/63/orig 2025-12-04T08:53:08.4169616Z * [new branch] gh/kurtamohler/64/base -> origin/gh/kurtamohler/64/base 2025-12-04T08:53:08.4169691Z * [new branch] gh/kurtamohler/64/head -> origin/gh/kurtamohler/64/head 2025-12-04T08:53:08.4169765Z * [new branch] gh/kurtamohler/64/orig -> origin/gh/kurtamohler/64/orig 2025-12-04T08:53:08.4169838Z * [new branch] gh/kurtamohler/65/base -> origin/gh/kurtamohler/65/base 2025-12-04T08:53:08.4169914Z * [new branch] gh/kurtamohler/65/head -> origin/gh/kurtamohler/65/head 2025-12-04T08:53:08.4169986Z * [new branch] gh/kurtamohler/65/orig -> origin/gh/kurtamohler/65/orig 2025-12-04T08:53:08.4170063Z * [new branch] gh/kurtamohler/66/base -> origin/gh/kurtamohler/66/base 2025-12-04T08:53:08.4170138Z * [new branch] gh/kurtamohler/66/head -> origin/gh/kurtamohler/66/head 2025-12-04T08:53:08.4170211Z * [new branch] gh/kurtamohler/66/orig -> origin/gh/kurtamohler/66/orig 2025-12-04T08:53:08.4170288Z * [new branch] gh/kurtamohler/67/base -> origin/gh/kurtamohler/67/base 2025-12-04T08:53:08.4170362Z * [new branch] gh/kurtamohler/67/head -> origin/gh/kurtamohler/67/head 2025-12-04T08:53:08.4170436Z * [new branch] gh/kurtamohler/67/orig -> origin/gh/kurtamohler/67/orig 2025-12-04T08:53:08.4170508Z * [new branch] gh/kwen2501/130/base -> origin/gh/kwen2501/130/base 2025-12-04T08:53:08.4170578Z * [new branch] gh/kwen2501/130/head -> origin/gh/kwen2501/130/head 2025-12-04T08:53:08.4170647Z * [new branch] gh/kwen2501/130/orig -> origin/gh/kwen2501/130/orig 2025-12-04T08:53:08.4170719Z * [new branch] gh/kwen2501/170/base -> origin/gh/kwen2501/170/base 2025-12-04T08:53:08.4170790Z * [new branch] gh/kwen2501/170/head -> origin/gh/kwen2501/170/head 2025-12-04T08:53:08.4170858Z * [new branch] gh/kwen2501/187/base -> origin/gh/kwen2501/187/base 2025-12-04T08:53:08.4170929Z * [new branch] gh/kwen2501/187/head -> origin/gh/kwen2501/187/head 2025-12-04T08:53:08.4170996Z * [new branch] gh/kwen2501/187/orig -> origin/gh/kwen2501/187/orig 2025-12-04T08:53:08.4171065Z * [new branch] gh/kwen2501/188/base -> origin/gh/kwen2501/188/base 2025-12-04T08:53:08.4171134Z * [new branch] gh/kwen2501/188/head -> origin/gh/kwen2501/188/head 2025-12-04T08:53:08.4171201Z * [new branch] gh/kwen2501/188/orig -> origin/gh/kwen2501/188/orig 2025-12-04T08:53:08.4171270Z * [new branch] gh/kwen2501/211/base -> origin/gh/kwen2501/211/base 2025-12-04T08:53:08.4171339Z * [new branch] gh/kwen2501/211/head -> origin/gh/kwen2501/211/head 2025-12-04T08:53:08.4171442Z * [new branch] gh/kwen2501/224/base -> origin/gh/kwen2501/224/base 2025-12-04T08:53:08.4171533Z * [new branch] gh/kwen2501/224/head -> origin/gh/kwen2501/224/head 2025-12-04T08:53:08.4171603Z * [new branch] gh/kwen2501/224/orig -> origin/gh/kwen2501/224/orig 2025-12-04T08:53:08.4171671Z * [new branch] gh/kwen2501/228/base -> origin/gh/kwen2501/228/base 2025-12-04T08:53:08.4171740Z * [new branch] gh/kwen2501/228/head -> origin/gh/kwen2501/228/head 2025-12-04T08:53:08.4171807Z * [new branch] gh/kwen2501/228/orig -> origin/gh/kwen2501/228/orig 2025-12-04T08:53:08.4171876Z * [new branch] gh/kwen2501/234/base -> origin/gh/kwen2501/234/base 2025-12-04T08:53:08.4171946Z * [new branch] gh/kwen2501/234/head -> origin/gh/kwen2501/234/head 2025-12-04T08:53:08.4172014Z * [new branch] gh/kwen2501/234/orig -> origin/gh/kwen2501/234/orig 2025-12-04T08:53:08.4172083Z * [new branch] gh/kwen2501/235/base -> origin/gh/kwen2501/235/base 2025-12-04T08:53:08.4172153Z * [new branch] gh/kwen2501/235/head -> origin/gh/kwen2501/235/head 2025-12-04T08:53:08.4172222Z * [new branch] gh/kwen2501/235/orig -> origin/gh/kwen2501/235/orig 2025-12-04T08:53:08.4172290Z * [new branch] gh/kwen2501/236/base -> origin/gh/kwen2501/236/base 2025-12-04T08:53:08.4172359Z * [new branch] gh/kwen2501/236/head -> origin/gh/kwen2501/236/head 2025-12-04T08:53:08.4172428Z * [new branch] gh/kwen2501/236/orig -> origin/gh/kwen2501/236/orig 2025-12-04T08:53:08.4172497Z * [new branch] gh/kwen2501/237/base -> origin/gh/kwen2501/237/base 2025-12-04T08:53:08.4172566Z * [new branch] gh/kwen2501/237/head -> origin/gh/kwen2501/237/head 2025-12-04T08:53:08.4172634Z * [new branch] gh/kwen2501/237/orig -> origin/gh/kwen2501/237/orig 2025-12-04T08:53:08.4172703Z * [new branch] gh/kwen2501/238/base -> origin/gh/kwen2501/238/base 2025-12-04T08:53:08.4172774Z * [new branch] gh/kwen2501/238/head -> origin/gh/kwen2501/238/head 2025-12-04T08:53:08.4172842Z * [new branch] gh/kwen2501/238/orig -> origin/gh/kwen2501/238/orig 2025-12-04T08:53:08.4172912Z * [new branch] gh/kwen2501/240/base -> origin/gh/kwen2501/240/base 2025-12-04T08:53:08.4172981Z * [new branch] gh/kwen2501/240/head -> origin/gh/kwen2501/240/head 2025-12-04T08:53:08.4173049Z * [new branch] gh/kwen2501/240/orig -> origin/gh/kwen2501/240/orig 2025-12-04T08:53:08.4173119Z * [new branch] gh/kwen2501/241/base -> origin/gh/kwen2501/241/base 2025-12-04T08:53:08.4173187Z * [new branch] gh/kwen2501/241/head -> origin/gh/kwen2501/241/head 2025-12-04T08:53:08.4173303Z * [new branch] gh/kwen2501/241/orig -> origin/gh/kwen2501/241/orig 2025-12-04T08:53:08.4173378Z * [new branch] gh/kwen2501/247/base -> origin/gh/kwen2501/247/base 2025-12-04T08:53:08.4173446Z * [new branch] gh/kwen2501/247/head -> origin/gh/kwen2501/247/head 2025-12-04T08:53:08.4173513Z * [new branch] gh/kwen2501/247/orig -> origin/gh/kwen2501/247/orig 2025-12-04T08:53:08.4173582Z * [new branch] gh/kwen2501/252/base -> origin/gh/kwen2501/252/base 2025-12-04T08:53:08.4173650Z * [new branch] gh/kwen2501/252/head -> origin/gh/kwen2501/252/head 2025-12-04T08:53:08.4173718Z * [new branch] gh/kwen2501/252/orig -> origin/gh/kwen2501/252/orig 2025-12-04T08:53:08.4173787Z * [new branch] gh/kwen2501/259/base -> origin/gh/kwen2501/259/base 2025-12-04T08:53:08.4173855Z * [new branch] gh/kwen2501/259/head -> origin/gh/kwen2501/259/head 2025-12-04T08:53:08.4173924Z * [new branch] gh/kwen2501/259/orig -> origin/gh/kwen2501/259/orig 2025-12-04T08:53:08.4174038Z * [new branch] gh/kwen2501/260/base -> origin/gh/kwen2501/260/base 2025-12-04T08:53:08.4174158Z * [new branch] gh/kwen2501/260/head -> origin/gh/kwen2501/260/head 2025-12-04T08:53:08.4174226Z * [new branch] gh/kwen2501/260/orig -> origin/gh/kwen2501/260/orig 2025-12-04T08:53:08.4174296Z * [new branch] gh/kwen2501/268/base -> origin/gh/kwen2501/268/base 2025-12-04T08:53:08.4174364Z * [new branch] gh/kwen2501/268/head -> origin/gh/kwen2501/268/head 2025-12-04T08:53:08.4174431Z * [new branch] gh/kwen2501/268/orig -> origin/gh/kwen2501/268/orig 2025-12-04T08:53:08.4174501Z * [new branch] gh/kwen2501/269/base -> origin/gh/kwen2501/269/base 2025-12-04T08:53:08.4174569Z * [new branch] gh/kwen2501/269/head -> origin/gh/kwen2501/269/head 2025-12-04T08:53:08.4174639Z * [new branch] gh/kwen2501/269/orig -> origin/gh/kwen2501/269/orig 2025-12-04T08:53:08.4174709Z * [new branch] gh/kwen2501/270/base -> origin/gh/kwen2501/270/base 2025-12-04T08:53:08.4174778Z * [new branch] gh/kwen2501/270/head -> origin/gh/kwen2501/270/head 2025-12-04T08:53:08.4174848Z * [new branch] gh/kwen2501/270/orig -> origin/gh/kwen2501/270/orig 2025-12-04T08:53:08.4174915Z * [new branch] gh/kwen2501/271/base -> origin/gh/kwen2501/271/base 2025-12-04T08:53:08.4174984Z * [new branch] gh/kwen2501/271/head -> origin/gh/kwen2501/271/head 2025-12-04T08:53:08.4175053Z * [new branch] gh/kwen2501/271/orig -> origin/gh/kwen2501/271/orig 2025-12-04T08:53:08.4175122Z * [new branch] gh/kwen2501/274/base -> origin/gh/kwen2501/274/base 2025-12-04T08:53:08.4175191Z * [new branch] gh/kwen2501/274/head -> origin/gh/kwen2501/274/head 2025-12-04T08:53:08.4175260Z * [new branch] gh/kwen2501/274/orig -> origin/gh/kwen2501/274/orig 2025-12-04T08:53:08.4175330Z * [new branch] gh/kwen2501/275/base -> origin/gh/kwen2501/275/base 2025-12-04T08:53:08.4175401Z * [new branch] gh/kwen2501/275/head -> origin/gh/kwen2501/275/head 2025-12-04T08:53:08.4175470Z * [new branch] gh/kwen2501/275/orig -> origin/gh/kwen2501/275/orig 2025-12-04T08:53:08.4175537Z * [new branch] gh/kwen2501/276/base -> origin/gh/kwen2501/276/base 2025-12-04T08:53:08.4175605Z * [new branch] gh/kwen2501/276/head -> origin/gh/kwen2501/276/head 2025-12-04T08:53:08.4175674Z * [new branch] gh/kwen2501/276/orig -> origin/gh/kwen2501/276/orig 2025-12-04T08:53:08.4175743Z * [new branch] gh/kwen2501/277/base -> origin/gh/kwen2501/277/base 2025-12-04T08:53:08.4175812Z * [new branch] gh/kwen2501/277/head -> origin/gh/kwen2501/277/head 2025-12-04T08:53:08.4175885Z * [new branch] gh/kwen2501/277/orig -> origin/gh/kwen2501/277/orig 2025-12-04T08:53:08.4175954Z * [new branch] gh/kwen2501/278/base -> origin/gh/kwen2501/278/base 2025-12-04T08:53:08.4176024Z * [new branch] gh/kwen2501/278/head -> origin/gh/kwen2501/278/head 2025-12-04T08:53:08.4176093Z * [new branch] gh/kwen2501/278/orig -> origin/gh/kwen2501/278/orig 2025-12-04T08:53:08.4176162Z * [new branch] gh/kwen2501/279/base -> origin/gh/kwen2501/279/base 2025-12-04T08:53:08.4176232Z * [new branch] gh/kwen2501/279/head -> origin/gh/kwen2501/279/head 2025-12-04T08:53:08.4176301Z * [new branch] gh/kwen2501/279/orig -> origin/gh/kwen2501/279/orig 2025-12-04T08:53:08.4176370Z * [new branch] gh/kwen2501/280/base -> origin/gh/kwen2501/280/base 2025-12-04T08:53:08.4176442Z * [new branch] gh/kwen2501/280/head -> origin/gh/kwen2501/280/head 2025-12-04T08:53:08.4176542Z * [new branch] gh/kwen2501/280/orig -> origin/gh/kwen2501/280/orig 2025-12-04T08:53:08.4176609Z * [new branch] gh/kwen2501/281/base -> origin/gh/kwen2501/281/base 2025-12-04T08:53:08.4176708Z * [new branch] gh/kwen2501/281/head -> origin/gh/kwen2501/281/head 2025-12-04T08:53:08.4176776Z * [new branch] gh/kwen2501/281/orig -> origin/gh/kwen2501/281/orig 2025-12-04T08:53:08.4176843Z * [new branch] gh/kwen2501/282/base -> origin/gh/kwen2501/282/base 2025-12-04T08:53:08.4176914Z * [new branch] gh/kwen2501/282/head -> origin/gh/kwen2501/282/head 2025-12-04T08:53:08.4176984Z * [new branch] gh/kwen2501/282/orig -> origin/gh/kwen2501/282/orig 2025-12-04T08:53:08.4177052Z * [new branch] gh/kwen2501/283/base -> origin/gh/kwen2501/283/base 2025-12-04T08:53:08.4177121Z * [new branch] gh/kwen2501/283/head -> origin/gh/kwen2501/283/head 2025-12-04T08:53:08.4177190Z * [new branch] gh/kwen2501/283/orig -> origin/gh/kwen2501/283/orig 2025-12-04T08:53:08.4177259Z * [new branch] gh/kwen2501/284/base -> origin/gh/kwen2501/284/base 2025-12-04T08:53:08.4177332Z * [new branch] gh/kwen2501/284/head -> origin/gh/kwen2501/284/head 2025-12-04T08:53:08.4177401Z * [new branch] gh/kwen2501/284/orig -> origin/gh/kwen2501/284/orig 2025-12-04T08:53:08.4177472Z * [new branch] gh/kwen2501/285/base -> origin/gh/kwen2501/285/base 2025-12-04T08:53:08.4177540Z * [new branch] gh/kwen2501/285/head -> origin/gh/kwen2501/285/head 2025-12-04T08:53:08.4177608Z * [new branch] gh/kwen2501/285/orig -> origin/gh/kwen2501/285/orig 2025-12-04T08:53:08.4177679Z * [new branch] gh/kwen2501/286/base -> origin/gh/kwen2501/286/base 2025-12-04T08:53:08.4177748Z * [new branch] gh/kwen2501/286/head -> origin/gh/kwen2501/286/head 2025-12-04T08:53:08.4177818Z * [new branch] gh/kwen2501/286/orig -> origin/gh/kwen2501/286/orig 2025-12-04T08:53:08.4177889Z * [new branch] gh/kwen2501/287/base -> origin/gh/kwen2501/287/base 2025-12-04T08:53:08.4177962Z * [new branch] gh/kwen2501/287/head -> origin/gh/kwen2501/287/head 2025-12-04T08:53:08.4178031Z * [new branch] gh/kwen2501/287/orig -> origin/gh/kwen2501/287/orig 2025-12-04T08:53:08.4178102Z * [new branch] gh/kwen2501/288/base -> origin/gh/kwen2501/288/base 2025-12-04T08:53:08.4178170Z * [new branch] gh/kwen2501/288/head -> origin/gh/kwen2501/288/head 2025-12-04T08:53:08.4178238Z * [new branch] gh/kwen2501/288/orig -> origin/gh/kwen2501/288/orig 2025-12-04T08:53:08.4178316Z * [new branch] gh/laithsakka/251/base -> origin/gh/laithsakka/251/base 2025-12-04T08:53:08.4178390Z * [new branch] gh/laithsakka/251/head -> origin/gh/laithsakka/251/head 2025-12-04T08:53:08.4178465Z * [new branch] gh/laithsakka/251/orig -> origin/gh/laithsakka/251/orig 2025-12-04T08:53:08.4178540Z * [new branch] gh/laithsakka/276/base -> origin/gh/laithsakka/276/base 2025-12-04T08:53:08.4178614Z * [new branch] gh/laithsakka/276/head -> origin/gh/laithsakka/276/head 2025-12-04T08:53:08.4178687Z * [new branch] gh/laithsakka/276/orig -> origin/gh/laithsakka/276/orig 2025-12-04T08:53:08.4178764Z * [new branch] gh/laithsakka/28/base -> origin/gh/laithsakka/28/base 2025-12-04T08:53:08.4178837Z * [new branch] gh/laithsakka/29/base -> origin/gh/laithsakka/29/base 2025-12-04T08:53:08.4178911Z * [new branch] gh/laithsakka/30/base -> origin/gh/laithsakka/30/base 2025-12-04T08:53:08.4178984Z * [new branch] gh/laithsakka/30/head -> origin/gh/laithsakka/30/head 2025-12-04T08:53:08.4179055Z * [new branch] gh/laithsakka/31/base -> origin/gh/laithsakka/31/base 2025-12-04T08:53:08.4179158Z * [new branch] gh/laithsakka/31/head -> origin/gh/laithsakka/31/head 2025-12-04T08:53:08.4179260Z * [new branch] gh/laithsakka/313/base -> origin/gh/laithsakka/313/base 2025-12-04T08:53:08.4179335Z * [new branch] gh/laithsakka/313/head -> origin/gh/laithsakka/313/head 2025-12-04T08:53:08.4179408Z * [new branch] gh/laithsakka/313/orig -> origin/gh/laithsakka/313/orig 2025-12-04T08:53:08.4179482Z * [new branch] gh/laithsakka/316/base -> origin/gh/laithsakka/316/base 2025-12-04T08:53:08.4179555Z * [new branch] gh/laithsakka/316/head -> origin/gh/laithsakka/316/head 2025-12-04T08:53:08.4179628Z * [new branch] gh/laithsakka/316/orig -> origin/gh/laithsakka/316/orig 2025-12-04T08:53:08.4179701Z * [new branch] gh/laithsakka/317/base -> origin/gh/laithsakka/317/base 2025-12-04T08:53:08.4179773Z * [new branch] gh/laithsakka/317/head -> origin/gh/laithsakka/317/head 2025-12-04T08:53:08.4179847Z * [new branch] gh/laithsakka/317/orig -> origin/gh/laithsakka/317/orig 2025-12-04T08:53:08.4179919Z * [new branch] gh/laithsakka/319/base -> origin/gh/laithsakka/319/base 2025-12-04T08:53:08.4179991Z * [new branch] gh/laithsakka/319/head -> origin/gh/laithsakka/319/head 2025-12-04T08:53:08.4180065Z * [new branch] gh/laithsakka/319/orig -> origin/gh/laithsakka/319/orig 2025-12-04T08:53:08.4180138Z * [new branch] gh/laithsakka/32/base -> origin/gh/laithsakka/32/base 2025-12-04T08:53:08.4180211Z * [new branch] gh/laithsakka/32/head -> origin/gh/laithsakka/32/head 2025-12-04T08:53:08.4180286Z * [new branch] gh/laithsakka/320/base -> origin/gh/laithsakka/320/base 2025-12-04T08:53:08.4180357Z * [new branch] gh/laithsakka/320/head -> origin/gh/laithsakka/320/head 2025-12-04T08:53:08.4180431Z * [new branch] gh/laithsakka/320/orig -> origin/gh/laithsakka/320/orig 2025-12-04T08:53:08.4180506Z * [new branch] gh/laithsakka/321/base -> origin/gh/laithsakka/321/base 2025-12-04T08:53:08.4180580Z * [new branch] gh/laithsakka/321/head -> origin/gh/laithsakka/321/head 2025-12-04T08:53:08.4180653Z * [new branch] gh/laithsakka/321/orig -> origin/gh/laithsakka/321/orig 2025-12-04T08:53:08.4180726Z * [new branch] gh/laithsakka/322/base -> origin/gh/laithsakka/322/base 2025-12-04T08:53:08.4180799Z * [new branch] gh/laithsakka/322/head -> origin/gh/laithsakka/322/head 2025-12-04T08:53:08.4180874Z * [new branch] gh/laithsakka/322/orig -> origin/gh/laithsakka/322/orig 2025-12-04T08:53:08.4180946Z * [new branch] gh/laithsakka/323/base -> origin/gh/laithsakka/323/base 2025-12-04T08:53:08.4181019Z * [new branch] gh/laithsakka/323/head -> origin/gh/laithsakka/323/head 2025-12-04T08:53:08.4181094Z * [new branch] gh/laithsakka/323/orig -> origin/gh/laithsakka/323/orig 2025-12-04T08:53:08.4181168Z * [new branch] gh/laithsakka/324/base -> origin/gh/laithsakka/324/base 2025-12-04T08:53:08.4181239Z * [new branch] gh/laithsakka/324/head -> origin/gh/laithsakka/324/head 2025-12-04T08:53:08.4181314Z * [new branch] gh/laithsakka/324/orig -> origin/gh/laithsakka/324/orig 2025-12-04T08:53:08.4181387Z * [new branch] gh/laithsakka/325/base -> origin/gh/laithsakka/325/base 2025-12-04T08:53:08.4181458Z * [new branch] gh/laithsakka/325/head -> origin/gh/laithsakka/325/head 2025-12-04T08:53:08.4181532Z * [new branch] gh/laithsakka/325/orig -> origin/gh/laithsakka/325/orig 2025-12-04T08:53:08.4181605Z * [new branch] gh/laithsakka/326/base -> origin/gh/laithsakka/326/base 2025-12-04T08:53:08.4181677Z * [new branch] gh/laithsakka/326/head -> origin/gh/laithsakka/326/head 2025-12-04T08:53:08.4181777Z * [new branch] gh/laithsakka/326/orig -> origin/gh/laithsakka/326/orig 2025-12-04T08:53:08.4181877Z * [new branch] gh/laithsakka/327/base -> origin/gh/laithsakka/327/base 2025-12-04T08:53:08.4181950Z * [new branch] gh/laithsakka/327/head -> origin/gh/laithsakka/327/head 2025-12-04T08:53:08.4182023Z * [new branch] gh/laithsakka/327/orig -> origin/gh/laithsakka/327/orig 2025-12-04T08:53:08.4182094Z * [new branch] gh/laithsakka/328/base -> origin/gh/laithsakka/328/base 2025-12-04T08:53:08.4182168Z * [new branch] gh/laithsakka/328/head -> origin/gh/laithsakka/328/head 2025-12-04T08:53:08.4182240Z * [new branch] gh/laithsakka/328/orig -> origin/gh/laithsakka/328/orig 2025-12-04T08:53:08.4182309Z * [new branch] gh/liangel/4/base -> origin/gh/liangel/4/base 2025-12-04T08:53:08.4182379Z * [new branch] gh/liangel/4/head -> origin/gh/liangel/4/head 2025-12-04T08:53:08.4182449Z * [new branch] gh/liangel/4/orig -> origin/gh/liangel/4/orig 2025-12-04T08:53:08.4182526Z * [new branch] gh/lucaskabela/1/base -> origin/gh/lucaskabela/1/base 2025-12-04T08:53:08.4182602Z * [new branch] gh/lucaskabela/1/head -> origin/gh/lucaskabela/1/head 2025-12-04T08:53:08.4182666Z * [new branch] gh/lw/4/base -> origin/gh/lw/4/base 2025-12-04T08:53:08.4182728Z * [new branch] gh/lw/4/head -> origin/gh/lw/4/head 2025-12-04T08:53:08.4182792Z * [new branch] gh/lw/4/orig -> origin/gh/lw/4/orig 2025-12-04T08:53:08.4182853Z * [new branch] gh/lw/5/base -> origin/gh/lw/5/base 2025-12-04T08:53:08.4182914Z * [new branch] gh/lw/5/head -> origin/gh/lw/5/head 2025-12-04T08:53:08.4182977Z * [new branch] gh/lw/5/orig -> origin/gh/lw/5/orig 2025-12-04T08:53:08.4183039Z * [new branch] gh/lw/6/base -> origin/gh/lw/6/base 2025-12-04T08:53:08.4183101Z * [new branch] gh/lw/6/head -> origin/gh/lw/6/head 2025-12-04T08:53:08.4183163Z * [new branch] gh/lw/6/orig -> origin/gh/lw/6/orig 2025-12-04T08:53:08.4183230Z * [new branch] gh/malfet/14/base -> origin/gh/malfet/14/base 2025-12-04T08:53:08.4183344Z * [new branch] gh/malfet/417/base -> origin/gh/malfet/417/base 2025-12-04T08:53:08.4183416Z * [new branch] gh/malfet/417/head -> origin/gh/malfet/417/head 2025-12-04T08:53:08.4183484Z * [new branch] gh/malfet/417/orig -> origin/gh/malfet/417/orig 2025-12-04T08:53:08.4183551Z * [new branch] gh/malfet/506/base -> origin/gh/malfet/506/base 2025-12-04T08:53:08.4183619Z * [new branch] gh/malfet/506/head -> origin/gh/malfet/506/head 2025-12-04T08:53:08.4183687Z * [new branch] gh/malfet/506/orig -> origin/gh/malfet/506/orig 2025-12-04T08:53:08.4183754Z * [new branch] gh/malfet/517/base -> origin/gh/malfet/517/base 2025-12-04T08:53:08.4183824Z * [new branch] gh/malfet/517/head -> origin/gh/malfet/517/head 2025-12-04T08:53:08.4183891Z * [new branch] gh/malfet/528/base -> origin/gh/malfet/528/base 2025-12-04T08:53:08.4183961Z * [new branch] gh/malfet/528/head -> origin/gh/malfet/528/head 2025-12-04T08:53:08.4184028Z * [new branch] gh/malfet/528/orig -> origin/gh/malfet/528/orig 2025-12-04T08:53:08.4184093Z * [new branch] gh/malfet/537/base -> origin/gh/malfet/537/base 2025-12-04T08:53:08.4184163Z * [new branch] gh/malfet/537/head -> origin/gh/malfet/537/head 2025-12-04T08:53:08.4184230Z * [new branch] gh/malfet/537/orig -> origin/gh/malfet/537/orig 2025-12-04T08:53:08.4184348Z * [new branch] gh/malfet/546/base -> origin/gh/malfet/546/base 2025-12-04T08:53:08.4184418Z * [new branch] gh/malfet/546/head -> origin/gh/malfet/546/head 2025-12-04T08:53:08.4184533Z * [new branch] gh/malfet/546/orig -> origin/gh/malfet/546/orig 2025-12-04T08:53:08.4184600Z * [new branch] gh/malfet/565/base -> origin/gh/malfet/565/base 2025-12-04T08:53:08.4184667Z * [new branch] gh/malfet/565/head -> origin/gh/malfet/565/head 2025-12-04T08:53:08.4184734Z * [new branch] gh/malfet/565/orig -> origin/gh/malfet/565/orig 2025-12-04T08:53:08.4184799Z * [new branch] gh/malfet/575/base -> origin/gh/malfet/575/base 2025-12-04T08:53:08.4184866Z * [new branch] gh/malfet/575/head -> origin/gh/malfet/575/head 2025-12-04T08:53:08.4184933Z * [new branch] gh/malfet/575/orig -> origin/gh/malfet/575/orig 2025-12-04T08:53:08.4185001Z * [new branch] gh/malfet/580/base -> origin/gh/malfet/580/base 2025-12-04T08:53:08.4185071Z * [new branch] gh/malfet/580/head -> origin/gh/malfet/580/head 2025-12-04T08:53:08.4185138Z * [new branch] gh/malfet/580/orig -> origin/gh/malfet/580/orig 2025-12-04T08:53:08.4185206Z * [new branch] gh/malfet/581/base -> origin/gh/malfet/581/base 2025-12-04T08:53:08.4185274Z * [new branch] gh/malfet/581/head -> origin/gh/malfet/581/head 2025-12-04T08:53:08.4185342Z * [new branch] gh/malfet/581/orig -> origin/gh/malfet/581/orig 2025-12-04T08:53:08.4185409Z * [new branch] gh/malfet/583/base -> origin/gh/malfet/583/base 2025-12-04T08:53:08.4185478Z * [new branch] gh/malfet/583/head -> origin/gh/malfet/583/head 2025-12-04T08:53:08.4185543Z * [new branch] gh/malfet/583/orig -> origin/gh/malfet/583/orig 2025-12-04T08:53:08.4185613Z * [new branch] gh/malfet/586/base -> origin/gh/malfet/586/base 2025-12-04T08:53:08.4185683Z * [new branch] gh/malfet/586/head -> origin/gh/malfet/586/head 2025-12-04T08:53:08.4185750Z * [new branch] gh/malfet/586/orig -> origin/gh/malfet/586/orig 2025-12-04T08:53:08.4185819Z * [new branch] gh/malfet/587/base -> origin/gh/malfet/587/base 2025-12-04T08:53:08.4185885Z * [new branch] gh/malfet/587/head -> origin/gh/malfet/587/head 2025-12-04T08:53:08.4185950Z * [new branch] gh/malfet/587/orig -> origin/gh/malfet/587/orig 2025-12-04T08:53:08.4186020Z * [new branch] gh/malfet/588/base -> origin/gh/malfet/588/base 2025-12-04T08:53:08.4186087Z * [new branch] gh/malfet/588/head -> origin/gh/malfet/588/head 2025-12-04T08:53:08.4186154Z * [new branch] gh/malfet/588/orig -> origin/gh/malfet/588/orig 2025-12-04T08:53:08.4186223Z * [new branch] gh/malfet/589/base -> origin/gh/malfet/589/base 2025-12-04T08:53:08.4186290Z * [new branch] gh/malfet/589/head -> origin/gh/malfet/589/head 2025-12-04T08:53:08.4186357Z * [new branch] gh/malfet/589/orig -> origin/gh/malfet/589/orig 2025-12-04T08:53:08.4186426Z * [new branch] gh/malfet/590/base -> origin/gh/malfet/590/base 2025-12-04T08:53:08.4186492Z * [new branch] gh/malfet/590/head -> origin/gh/malfet/590/head 2025-12-04T08:53:08.4186558Z * [new branch] gh/malfet/590/orig -> origin/gh/malfet/590/orig 2025-12-04T08:53:08.4186627Z * [new branch] gh/malfet/591/base -> origin/gh/malfet/591/base 2025-12-04T08:53:08.4186694Z * [new branch] gh/malfet/591/head -> origin/gh/malfet/591/head 2025-12-04T08:53:08.4186760Z * [new branch] gh/malfet/591/orig -> origin/gh/malfet/591/orig 2025-12-04T08:53:08.4186827Z * [new branch] gh/malfet/592/base -> origin/gh/malfet/592/base 2025-12-04T08:53:08.4186925Z * [new branch] gh/malfet/592/head -> origin/gh/malfet/592/head 2025-12-04T08:53:08.4187022Z * [new branch] gh/malfet/592/orig -> origin/gh/malfet/592/orig 2025-12-04T08:53:08.4187090Z * [new branch] gh/malfet/593/base -> origin/gh/malfet/593/base 2025-12-04T08:53:08.4187156Z * [new branch] gh/malfet/593/head -> origin/gh/malfet/593/head 2025-12-04T08:53:08.4187223Z * [new branch] gh/malfet/593/orig -> origin/gh/malfet/593/orig 2025-12-04T08:53:08.4187289Z * [new branch] gh/malfet/594/base -> origin/gh/malfet/594/base 2025-12-04T08:53:08.4187355Z * [new branch] gh/malfet/594/head -> origin/gh/malfet/594/head 2025-12-04T08:53:08.4187422Z * [new branch] gh/malfet/594/orig -> origin/gh/malfet/594/orig 2025-12-04T08:53:08.4187489Z * [new branch] gh/malfet/595/base -> origin/gh/malfet/595/base 2025-12-04T08:53:08.4187557Z * [new branch] gh/malfet/595/head -> origin/gh/malfet/595/head 2025-12-04T08:53:08.4187626Z * [new branch] gh/malfet/595/orig -> origin/gh/malfet/595/orig 2025-12-04T08:53:08.4187693Z * [new branch] gh/malfet/596/base -> origin/gh/malfet/596/base 2025-12-04T08:53:08.4187759Z * [new branch] gh/malfet/596/head -> origin/gh/malfet/596/head 2025-12-04T08:53:08.4187827Z * [new branch] gh/malfet/596/orig -> origin/gh/malfet/596/orig 2025-12-04T08:53:08.4187893Z * [new branch] gh/malfet/597/base -> origin/gh/malfet/597/base 2025-12-04T08:53:08.4187959Z * [new branch] gh/malfet/597/head -> origin/gh/malfet/597/head 2025-12-04T08:53:08.4188027Z * [new branch] gh/malfet/597/orig -> origin/gh/malfet/597/orig 2025-12-04T08:53:08.4188094Z * [new branch] gh/malfet/598/base -> origin/gh/malfet/598/base 2025-12-04T08:53:08.4188162Z * [new branch] gh/malfet/598/head -> origin/gh/malfet/598/head 2025-12-04T08:53:08.4188230Z * [new branch] gh/malfet/598/orig -> origin/gh/malfet/598/orig 2025-12-04T08:53:08.4188297Z * [new branch] gh/malfet/599/base -> origin/gh/malfet/599/base 2025-12-04T08:53:08.4188363Z * [new branch] gh/malfet/599/head -> origin/gh/malfet/599/head 2025-12-04T08:53:08.4188432Z * [new branch] gh/malfet/599/orig -> origin/gh/malfet/599/orig 2025-12-04T08:53:08.4188498Z * [new branch] gh/malfet/600/base -> origin/gh/malfet/600/base 2025-12-04T08:53:08.4188564Z * [new branch] gh/malfet/600/head -> origin/gh/malfet/600/head 2025-12-04T08:53:08.4188632Z * [new branch] gh/malfet/600/orig -> origin/gh/malfet/600/orig 2025-12-04T08:53:08.4188697Z * [new branch] gh/malfet/601/base -> origin/gh/malfet/601/base 2025-12-04T08:53:08.4188766Z * [new branch] gh/malfet/601/head -> origin/gh/malfet/601/head 2025-12-04T08:53:08.4188833Z * [new branch] gh/malfet/601/orig -> origin/gh/malfet/601/orig 2025-12-04T08:53:08.4188899Z * [new branch] gh/malfet/602/base -> origin/gh/malfet/602/base 2025-12-04T08:53:08.4188966Z * [new branch] gh/malfet/602/head -> origin/gh/malfet/602/head 2025-12-04T08:53:08.4189033Z * [new branch] gh/malfet/602/orig -> origin/gh/malfet/602/orig 2025-12-04T08:53:08.4189099Z * [new branch] gh/malfet/603/base -> origin/gh/malfet/603/base 2025-12-04T08:53:08.4189166Z * [new branch] gh/malfet/603/head -> origin/gh/malfet/603/head 2025-12-04T08:53:08.4189233Z * [new branch] gh/malfet/603/orig -> origin/gh/malfet/603/orig 2025-12-04T08:53:08.4189299Z * [new branch] gh/malfet/604/base -> origin/gh/malfet/604/base 2025-12-04T08:53:08.4189393Z * [new branch] gh/malfet/604/head -> origin/gh/malfet/604/head 2025-12-04T08:53:08.4189487Z * [new branch] gh/malfet/604/orig -> origin/gh/malfet/604/orig 2025-12-04T08:53:08.4189554Z * [new branch] gh/malfet/605/base -> origin/gh/malfet/605/base 2025-12-04T08:53:08.4189622Z * [new branch] gh/malfet/605/head -> origin/gh/malfet/605/head 2025-12-04T08:53:08.4189689Z * [new branch] gh/malfet/605/orig -> origin/gh/malfet/605/orig 2025-12-04T08:53:08.4189755Z * [new branch] gh/malfet/606/base -> origin/gh/malfet/606/base 2025-12-04T08:53:08.4189822Z * [new branch] gh/malfet/606/head -> origin/gh/malfet/606/head 2025-12-04T08:53:08.4189889Z * [new branch] gh/malfet/606/orig -> origin/gh/malfet/606/orig 2025-12-04T08:53:08.4189956Z * [new branch] gh/malfet/607/base -> origin/gh/malfet/607/base 2025-12-04T08:53:08.4190025Z * [new branch] gh/malfet/607/head -> origin/gh/malfet/607/head 2025-12-04T08:53:08.4190093Z * [new branch] gh/malfet/607/orig -> origin/gh/malfet/607/orig 2025-12-04T08:53:08.4190160Z * [new branch] gh/malfet/608/base -> origin/gh/malfet/608/base 2025-12-04T08:53:08.4190227Z * [new branch] gh/malfet/608/head -> origin/gh/malfet/608/head 2025-12-04T08:53:08.4190293Z * [new branch] gh/malfet/608/orig -> origin/gh/malfet/608/orig 2025-12-04T08:53:08.4190359Z * [new branch] gh/malfet/609/base -> origin/gh/malfet/609/base 2025-12-04T08:53:08.4190426Z * [new branch] gh/malfet/609/head -> origin/gh/malfet/609/head 2025-12-04T08:53:08.4190491Z * [new branch] gh/malfet/609/orig -> origin/gh/malfet/609/orig 2025-12-04T08:53:08.4190559Z * [new branch] gh/malfet/610/base -> origin/gh/malfet/610/base 2025-12-04T08:53:08.4190626Z * [new branch] gh/malfet/610/head -> origin/gh/malfet/610/head 2025-12-04T08:53:08.4190694Z * [new branch] gh/malfet/610/orig -> origin/gh/malfet/610/orig 2025-12-04T08:53:08.4190762Z * [new branch] gh/malfet/611/base -> origin/gh/malfet/611/base 2025-12-04T08:53:08.4190829Z * [new branch] gh/malfet/611/head -> origin/gh/malfet/611/head 2025-12-04T08:53:08.4190895Z * [new branch] gh/malfet/611/orig -> origin/gh/malfet/611/orig 2025-12-04T08:53:08.4190963Z * [new branch] gh/malfet/612/base -> origin/gh/malfet/612/base 2025-12-04T08:53:08.4191029Z * [new branch] gh/malfet/612/head -> origin/gh/malfet/612/head 2025-12-04T08:53:08.4191096Z * [new branch] gh/malfet/612/orig -> origin/gh/malfet/612/orig 2025-12-04T08:53:08.4191162Z * [new branch] gh/malfet/64/base -> origin/gh/malfet/64/base 2025-12-04T08:53:08.4191229Z * [new branch] gh/malfet/64/head -> origin/gh/malfet/64/head 2025-12-04T08:53:08.4191319Z * [new branch] gh/manuelcandales/11/base -> origin/gh/manuelcandales/11/base 2025-12-04T08:53:08.4191405Z * [new branch] gh/manuelcandales/11/head -> origin/gh/manuelcandales/11/head 2025-12-04T08:53:08.4191487Z * [new branch] gh/manuelcandales/11/orig -> origin/gh/manuelcandales/11/orig 2025-12-04T08:53:08.4191555Z * [new branch] gh/markkm/1/base -> origin/gh/markkm/1/base 2025-12-04T08:53:08.4191628Z * [new branch] gh/masnesral/1/base -> origin/gh/masnesral/1/base 2025-12-04T08:53:08.4191699Z * [new branch] gh/masnesral/1/head -> origin/gh/masnesral/1/head 2025-12-04T08:53:08.4191768Z * [new branch] gh/masnesral/1/orig -> origin/gh/masnesral/1/orig 2025-12-04T08:53:08.4191839Z * [new branch] gh/mhorowitz/0/base -> origin/gh/mhorowitz/0/base 2025-12-04T08:53:08.4191935Z * [new branch] gh/mhorowitz/0/head -> origin/gh/mhorowitz/0/head 2025-12-04T08:53:08.4192034Z * [new branch] gh/mhorowitz/1/base -> origin/gh/mhorowitz/1/base 2025-12-04T08:53:08.4192105Z * [new branch] gh/mhorowitz/1/head -> origin/gh/mhorowitz/1/head 2025-12-04T08:53:08.4192173Z * [new branch] gh/mhorowitz/2/base -> origin/gh/mhorowitz/2/base 2025-12-04T08:53:08.4192244Z * [new branch] gh/mhorowitz/2/head -> origin/gh/mhorowitz/2/head 2025-12-04T08:53:08.4192312Z * [new branch] gh/mhorowitz/3/base -> origin/gh/mhorowitz/3/base 2025-12-04T08:53:08.4192381Z * [new branch] gh/mhorowitz/3/head -> origin/gh/mhorowitz/3/head 2025-12-04T08:53:08.4192450Z * [new branch] gh/mhorowitz/4/base -> origin/gh/mhorowitz/4/base 2025-12-04T08:53:08.4192519Z * [new branch] gh/mhorowitz/4/head -> origin/gh/mhorowitz/4/head 2025-12-04T08:53:08.4192589Z * [new branch] gh/mhorowitz/5/base -> origin/gh/mhorowitz/5/base 2025-12-04T08:53:08.4192659Z * [new branch] gh/mhorowitz/5/head -> origin/gh/mhorowitz/5/head 2025-12-04T08:53:08.4192729Z * [new branch] gh/mhorowitz/6/base -> origin/gh/mhorowitz/6/base 2025-12-04T08:53:08.4192797Z * [new branch] gh/mhorowitz/6/head -> origin/gh/mhorowitz/6/head 2025-12-04T08:53:08.4192897Z * [new branch] gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base 2025-12-04T08:53:08.4192993Z * [new branch] gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head 2025-12-04T08:53:08.4193085Z * [new branch] gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base 2025-12-04T08:53:08.4193178Z * [new branch] gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head 2025-12-04T08:53:08.4193326Z * [new branch] gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base 2025-12-04T08:53:08.4193421Z * [new branch] gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head 2025-12-04T08:53:08.4193514Z * [new branch] gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base 2025-12-04T08:53:08.4193604Z * [new branch] gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head 2025-12-04T08:53:08.4193696Z * [new branch] gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base 2025-12-04T08:53:08.4193787Z * [new branch] gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head 2025-12-04T08:53:08.4193877Z * [new branch] gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base 2025-12-04T08:53:08.4193969Z * [new branch] gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head 2025-12-04T08:53:08.4194061Z * [new branch] gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig 2025-12-04T08:53:08.4194153Z * [new branch] gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base 2025-12-04T08:53:08.4194244Z * [new branch] gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head 2025-12-04T08:53:08.4194335Z * [new branch] gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig 2025-12-04T08:53:08.4194425Z * [new branch] gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base 2025-12-04T08:53:08.4194517Z * [new branch] gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head 2025-12-04T08:53:08.4194608Z * [new branch] gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig 2025-12-04T08:53:08.4194698Z * [new branch] gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base 2025-12-04T08:53:08.4194845Z * [new branch] gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head 2025-12-04T08:53:08.4194979Z * [new branch] gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig 2025-12-04T08:53:08.4195069Z * [new branch] gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base 2025-12-04T08:53:08.4195162Z * [new branch] gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head 2025-12-04T08:53:08.4195252Z * [new branch] gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig 2025-12-04T08:53:08.4195344Z * [new branch] gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base 2025-12-04T08:53:08.4195433Z * [new branch] gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head 2025-12-04T08:53:08.4195524Z * [new branch] gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig 2025-12-04T08:53:08.4195617Z * [new branch] gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base 2025-12-04T08:53:08.4195710Z * [new branch] gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head 2025-12-04T08:53:08.4195801Z * [new branch] gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig 2025-12-04T08:53:08.4195893Z * [new branch] gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base 2025-12-04T08:53:08.4195985Z * [new branch] gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head 2025-12-04T08:53:08.4196075Z * [new branch] gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig 2025-12-04T08:53:08.4196167Z * [new branch] gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base 2025-12-04T08:53:08.4196257Z * [new branch] gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head 2025-12-04T08:53:08.4196348Z * [new branch] gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig 2025-12-04T08:53:08.4196440Z * [new branch] gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base 2025-12-04T08:53:08.4196530Z * [new branch] gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head 2025-12-04T08:53:08.4196622Z * [new branch] gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig 2025-12-04T08:53:08.4196712Z * [new branch] gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base 2025-12-04T08:53:08.4196802Z * [new branch] gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head 2025-12-04T08:53:08.4196893Z * [new branch] gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig 2025-12-04T08:53:08.4196983Z * [new branch] gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base 2025-12-04T08:53:08.4197075Z * [new branch] gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head 2025-12-04T08:53:08.4197167Z * [new branch] gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig 2025-12-04T08:53:08.4197258Z * [new branch] gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base 2025-12-04T08:53:08.4197348Z * [new branch] gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head 2025-12-04T08:53:08.4197440Z * [new branch] gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig 2025-12-04T08:53:08.4197530Z * [new branch] gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base 2025-12-04T08:53:08.4197619Z * [new branch] gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head 2025-12-04T08:53:08.4197711Z * [new branch] gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig 2025-12-04T08:53:08.4197832Z * [new branch] gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base 2025-12-04T08:53:08.4197949Z * [new branch] gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head 2025-12-04T08:53:08.4198041Z * [new branch] gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig 2025-12-04T08:53:08.4198131Z * [new branch] gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base 2025-12-04T08:53:08.4198222Z * [new branch] gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head 2025-12-04T08:53:08.4198312Z * [new branch] gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig 2025-12-04T08:53:08.4198402Z * [new branch] gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base 2025-12-04T08:53:08.4198493Z * [new branch] gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head 2025-12-04T08:53:08.4198586Z * [new branch] gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig 2025-12-04T08:53:08.4198677Z * [new branch] gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base 2025-12-04T08:53:08.4198768Z * [new branch] gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head 2025-12-04T08:53:08.4198859Z * [new branch] gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig 2025-12-04T08:53:08.4198949Z * [new branch] gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base 2025-12-04T08:53:08.4199041Z * [new branch] gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head 2025-12-04T08:53:08.4199131Z * [new branch] gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig 2025-12-04T08:53:08.4199222Z * [new branch] gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base 2025-12-04T08:53:08.4199314Z * [new branch] gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head 2025-12-04T08:53:08.4199405Z * [new branch] gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig 2025-12-04T08:53:08.4199496Z * [new branch] gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base 2025-12-04T08:53:08.4199587Z * [new branch] gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head 2025-12-04T08:53:08.4199677Z * [new branch] gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig 2025-12-04T08:53:08.4199768Z * [new branch] gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base 2025-12-04T08:53:08.4199858Z * [new branch] gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head 2025-12-04T08:53:08.4199948Z * [new branch] gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig 2025-12-04T08:53:08.4200041Z * [new branch] gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base 2025-12-04T08:53:08.4200132Z * [new branch] gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head 2025-12-04T08:53:08.4200222Z * [new branch] gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig 2025-12-04T08:53:08.4200314Z * [new branch] gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base 2025-12-04T08:53:08.4200404Z * [new branch] gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head 2025-12-04T08:53:08.4200495Z * [new branch] gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig 2025-12-04T08:53:08.4200587Z * [new branch] gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base 2025-12-04T08:53:08.4200679Z * [new branch] gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head 2025-12-04T08:53:08.4200804Z * [new branch] gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig 2025-12-04T08:53:08.4200927Z * [new branch] gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base 2025-12-04T08:53:08.4201018Z * [new branch] gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head 2025-12-04T08:53:08.4201109Z * [new branch] gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig 2025-12-04T08:53:08.4201199Z * [new branch] gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base 2025-12-04T08:53:08.4201288Z * [new branch] gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head 2025-12-04T08:53:08.4201380Z * [new branch] gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig 2025-12-04T08:53:08.4201470Z * [new branch] gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base 2025-12-04T08:53:08.4201561Z * [new branch] gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head 2025-12-04T08:53:08.4201653Z * [new branch] gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig 2025-12-04T08:53:08.4201743Z * [new branch] gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base 2025-12-04T08:53:08.4201833Z * [new branch] gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head 2025-12-04T08:53:08.4201924Z * [new branch] gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig 2025-12-04T08:53:08.4202013Z * [new branch] gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base 2025-12-04T08:53:08.4202103Z * [new branch] gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head 2025-12-04T08:53:08.4202194Z * [new branch] gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig 2025-12-04T08:53:08.4202285Z * [new branch] gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base 2025-12-04T08:53:08.4202378Z * [new branch] gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head 2025-12-04T08:53:08.4202468Z * [new branch] gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig 2025-12-04T08:53:08.4202557Z * [new branch] gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base 2025-12-04T08:53:08.4202650Z * [new branch] gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head 2025-12-04T08:53:08.4202742Z * [new branch] gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig 2025-12-04T08:53:08.4202833Z * [new branch] gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base 2025-12-04T08:53:08.4202926Z * [new branch] gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head 2025-12-04T08:53:08.4203017Z * [new branch] gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig 2025-12-04T08:53:08.4203108Z * [new branch] gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base 2025-12-04T08:53:08.4203201Z * [new branch] gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head 2025-12-04T08:53:08.4203328Z * [new branch] gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig 2025-12-04T08:53:08.4203421Z * [new branch] gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base 2025-12-04T08:53:08.4203516Z * [new branch] gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head 2025-12-04T08:53:08.4203609Z * [new branch] gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig 2025-12-04T08:53:08.4203702Z * [new branch] gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base 2025-12-04T08:53:08.4203845Z * [new branch] gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head 2025-12-04T08:53:08.4203980Z * [new branch] gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig 2025-12-04T08:53:08.4204074Z * [new branch] gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base 2025-12-04T08:53:08.4204164Z * [new branch] gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head 2025-12-04T08:53:08.4204255Z * [new branch] gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig 2025-12-04T08:53:08.4204348Z * [new branch] gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base 2025-12-04T08:53:08.4204440Z * [new branch] gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head 2025-12-04T08:53:08.4204533Z * [new branch] gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig 2025-12-04T08:53:08.4204626Z * [new branch] gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base 2025-12-04T08:53:08.4204718Z * [new branch] gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head 2025-12-04T08:53:08.4204808Z * [new branch] gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig 2025-12-04T08:53:08.4204900Z * [new branch] gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base 2025-12-04T08:53:08.4204990Z * [new branch] gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head 2025-12-04T08:53:08.4205081Z * [new branch] gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig 2025-12-04T08:53:08.4205176Z * [new branch] gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base 2025-12-04T08:53:08.4205268Z * [new branch] gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head 2025-12-04T08:53:08.4205364Z * [new branch] gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig 2025-12-04T08:53:08.4205455Z * [new branch] gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base 2025-12-04T08:53:08.4205546Z * [new branch] gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head 2025-12-04T08:53:08.4205639Z * [new branch] gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig 2025-12-04T08:53:08.4205730Z * [new branch] gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base 2025-12-04T08:53:08.4205820Z * [new branch] gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head 2025-12-04T08:53:08.4205915Z * [new branch] gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig 2025-12-04T08:53:08.4206006Z * [new branch] gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base 2025-12-04T08:53:08.4206098Z * [new branch] gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head 2025-12-04T08:53:08.4206191Z * [new branch] gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig 2025-12-04T08:53:08.4206282Z * [new branch] gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base 2025-12-04T08:53:08.4206372Z * [new branch] gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head 2025-12-04T08:53:08.4206464Z * [new branch] gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig 2025-12-04T08:53:08.4206554Z * [new branch] gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base 2025-12-04T08:53:08.4206647Z * [new branch] gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head 2025-12-04T08:53:08.4206737Z * [new branch] gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig 2025-12-04T08:53:08.4206868Z * [new branch] gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base 2025-12-04T08:53:08.4206985Z * [new branch] gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head 2025-12-04T08:53:08.4207076Z * [new branch] gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig 2025-12-04T08:53:08.4207146Z * [new branch] gh/mlazos/41/base -> origin/gh/mlazos/41/base 2025-12-04T08:53:08.4207216Z * [new branch] gh/mlazos/41/head -> origin/gh/mlazos/41/head 2025-12-04T08:53:08.4207283Z * [new branch] gh/mlazos/41/orig -> origin/gh/mlazos/41/orig 2025-12-04T08:53:08.4207351Z * [new branch] gh/mlazos/42/base -> origin/gh/mlazos/42/base 2025-12-04T08:53:08.4207418Z * [new branch] gh/mlazos/42/head -> origin/gh/mlazos/42/head 2025-12-04T08:53:08.4207484Z * [new branch] gh/mlazos/42/orig -> origin/gh/mlazos/42/orig 2025-12-04T08:53:08.4207551Z * [new branch] gh/mlazos/43/base -> origin/gh/mlazos/43/base 2025-12-04T08:53:08.4207620Z * [new branch] gh/mlazos/43/head -> origin/gh/mlazos/43/head 2025-12-04T08:53:08.4207686Z * [new branch] gh/mlazos/43/orig -> origin/gh/mlazos/43/orig 2025-12-04T08:53:08.4207751Z * [new branch] gh/mlazos/44/base -> origin/gh/mlazos/44/base 2025-12-04T08:53:08.4207819Z * [new branch] gh/mlazos/44/head -> origin/gh/mlazos/44/head 2025-12-04T08:53:08.4207884Z * [new branch] gh/mlazos/44/orig -> origin/gh/mlazos/44/orig 2025-12-04T08:53:08.4207950Z * [new branch] gh/mlazos/47/base -> origin/gh/mlazos/47/base 2025-12-04T08:53:08.4208016Z * [new branch] gh/mlazos/47/head -> origin/gh/mlazos/47/head 2025-12-04T08:53:08.4208081Z * [new branch] gh/mlazos/47/orig -> origin/gh/mlazos/47/orig 2025-12-04T08:53:08.4208148Z * [new branch] gh/mlazos/48/base -> origin/gh/mlazos/48/base 2025-12-04T08:53:08.4208216Z * [new branch] gh/mlazos/48/head -> origin/gh/mlazos/48/head 2025-12-04T08:53:08.4208282Z * [new branch] gh/mlazos/48/orig -> origin/gh/mlazos/48/orig 2025-12-04T08:53:08.4208349Z * [new branch] gh/mlazos/49/base -> origin/gh/mlazos/49/base 2025-12-04T08:53:08.4208415Z * [new branch] gh/mlazos/49/head -> origin/gh/mlazos/49/head 2025-12-04T08:53:08.4208481Z * [new branch] gh/mlazos/49/orig -> origin/gh/mlazos/49/orig 2025-12-04T08:53:08.4208547Z * [new branch] gh/mlazos/50/base -> origin/gh/mlazos/50/base 2025-12-04T08:53:08.4208613Z * [new branch] gh/mlazos/50/head -> origin/gh/mlazos/50/head 2025-12-04T08:53:08.4208678Z * [new branch] gh/mlazos/50/orig -> origin/gh/mlazos/50/orig 2025-12-04T08:53:08.4208747Z * [new branch] gh/mlazos/51/base -> origin/gh/mlazos/51/base 2025-12-04T08:53:08.4208812Z * [new branch] gh/mlazos/51/head -> origin/gh/mlazos/51/head 2025-12-04T08:53:08.4208878Z * [new branch] gh/mlazos/51/orig -> origin/gh/mlazos/51/orig 2025-12-04T08:53:08.4208946Z * [new branch] gh/mlazos/52/base -> origin/gh/mlazos/52/base 2025-12-04T08:53:08.4209011Z * [new branch] gh/mlazos/52/head -> origin/gh/mlazos/52/head 2025-12-04T08:53:08.4209076Z * [new branch] gh/mlazos/52/orig -> origin/gh/mlazos/52/orig 2025-12-04T08:53:08.4209222Z * [new branch] gh/mlazos/53/base -> origin/gh/mlazos/53/base 2025-12-04T08:53:08.4209304Z * [new branch] gh/mlazos/53/head -> origin/gh/mlazos/53/head 2025-12-04T08:53:08.4209388Z * [new branch] gh/mlazos/53/orig -> origin/gh/mlazos/53/orig 2025-12-04T08:53:08.4209522Z * [new branch] gh/mlazos/54/base -> origin/gh/mlazos/54/base 2025-12-04T08:53:08.4209601Z * [new branch] gh/mlazos/54/head -> origin/gh/mlazos/54/head 2025-12-04T08:53:08.4209703Z * [new branch] gh/mlazos/54/orig -> origin/gh/mlazos/54/orig 2025-12-04T08:53:08.4209816Z * [new branch] gh/mlazos/55/base -> origin/gh/mlazos/55/base 2025-12-04T08:53:08.4209901Z * [new branch] gh/mlazos/55/head -> origin/gh/mlazos/55/head 2025-12-04T08:53:08.4209979Z * [new branch] gh/mlazos/55/orig -> origin/gh/mlazos/55/orig 2025-12-04T08:53:08.4210073Z * [new branch] gh/mlazos/56/base -> origin/gh/mlazos/56/base 2025-12-04T08:53:08.4210153Z * [new branch] gh/mlazos/56/head -> origin/gh/mlazos/56/head 2025-12-04T08:53:08.4210251Z * [new branch] gh/mlazos/56/orig -> origin/gh/mlazos/56/orig 2025-12-04T08:53:08.4210340Z * [new branch] gh/mlazos/57/base -> origin/gh/mlazos/57/base 2025-12-04T08:53:08.4210430Z * [new branch] gh/mlazos/57/head -> origin/gh/mlazos/57/head 2025-12-04T08:53:08.4210525Z * [new branch] gh/mlazos/57/orig -> origin/gh/mlazos/57/orig 2025-12-04T08:53:08.4210604Z * [new branch] gh/mlazos/58/base -> origin/gh/mlazos/58/base 2025-12-04T08:53:08.4210685Z * [new branch] gh/mlazos/58/head -> origin/gh/mlazos/58/head 2025-12-04T08:53:08.4210786Z * [new branch] gh/mlazos/58/orig -> origin/gh/mlazos/58/orig 2025-12-04T08:53:08.4210878Z * [new branch] gh/mlazos/59/base -> origin/gh/mlazos/59/base 2025-12-04T08:53:08.4210956Z * [new branch] gh/mlazos/59/head -> origin/gh/mlazos/59/head 2025-12-04T08:53:08.4211052Z * [new branch] gh/mlazos/59/orig -> origin/gh/mlazos/59/orig 2025-12-04T08:53:08.4211130Z * [new branch] gh/mlazos/60/base -> origin/gh/mlazos/60/base 2025-12-04T08:53:08.4211214Z * [new branch] gh/mlazos/60/head -> origin/gh/mlazos/60/head 2025-12-04T08:53:08.4211317Z * [new branch] gh/mlazos/60/orig -> origin/gh/mlazos/60/orig 2025-12-04T08:53:08.4211408Z * [new branch] gh/mlazos/61/base -> origin/gh/mlazos/61/base 2025-12-04T08:53:08.4211488Z * [new branch] gh/mlazos/61/head -> origin/gh/mlazos/61/head 2025-12-04T08:53:08.4211583Z * [new branch] gh/mlazos/61/orig -> origin/gh/mlazos/61/orig 2025-12-04T08:53:08.4211661Z * [new branch] gh/mlazos/62/base -> origin/gh/mlazos/62/base 2025-12-04T08:53:08.4211740Z * [new branch] gh/mlazos/62/head -> origin/gh/mlazos/62/head 2025-12-04T08:53:08.4211844Z * [new branch] gh/mlazos/62/orig -> origin/gh/mlazos/62/orig 2025-12-04T08:53:08.4211938Z * [new branch] gh/mlazos/63/base -> origin/gh/mlazos/63/base 2025-12-04T08:53:08.4212033Z * [new branch] gh/mlazos/63/head -> origin/gh/mlazos/63/head 2025-12-04T08:53:08.4212118Z * [new branch] gh/mlazos/63/orig -> origin/gh/mlazos/63/orig 2025-12-04T08:53:08.4212200Z * [new branch] gh/mlazos/64/base -> origin/gh/mlazos/64/base 2025-12-04T08:53:08.4212287Z * [new branch] gh/mlazos/64/head -> origin/gh/mlazos/64/head 2025-12-04T08:53:08.4212380Z * [new branch] gh/mlazos/64/orig -> origin/gh/mlazos/64/orig 2025-12-04T08:53:08.4212461Z * [new branch] gh/mlazos/65/base -> origin/gh/mlazos/65/base 2025-12-04T08:53:08.4212554Z * [new branch] gh/mlazos/65/head -> origin/gh/mlazos/65/head 2025-12-04T08:53:08.4212631Z * [new branch] gh/mlazos/65/orig -> origin/gh/mlazos/65/orig 2025-12-04T08:53:08.4212708Z * [new branch] gh/mlazos/66/base -> origin/gh/mlazos/66/base 2025-12-04T08:53:08.4212834Z * [new branch] gh/mlazos/66/head -> origin/gh/mlazos/66/head 2025-12-04T08:53:08.4212951Z * [new branch] gh/mlazos/66/orig -> origin/gh/mlazos/66/orig 2025-12-04T08:53:08.4213035Z * [new branch] gh/mlazos/67/base -> origin/gh/mlazos/67/base 2025-12-04T08:53:08.4213127Z * [new branch] gh/mlazos/67/head -> origin/gh/mlazos/67/head 2025-12-04T08:53:08.4213206Z * [new branch] gh/mlazos/67/orig -> origin/gh/mlazos/67/orig 2025-12-04T08:53:08.4213349Z * [new branch] gh/mlazos/68/base -> origin/gh/mlazos/68/base 2025-12-04T08:53:08.4213437Z * [new branch] gh/mlazos/68/head -> origin/gh/mlazos/68/head 2025-12-04T08:53:08.4213523Z * [new branch] gh/mlazos/68/orig -> origin/gh/mlazos/68/orig 2025-12-04T08:53:08.4213624Z * [new branch] gh/mlazos/69/base -> origin/gh/mlazos/69/base 2025-12-04T08:53:08.4213706Z * [new branch] gh/mlazos/69/head -> origin/gh/mlazos/69/head 2025-12-04T08:53:08.4213788Z * [new branch] gh/mlazos/69/orig -> origin/gh/mlazos/69/orig 2025-12-04T08:53:08.4213884Z * [new branch] gh/mlazos/70/base -> origin/gh/mlazos/70/base 2025-12-04T08:53:08.4213955Z * [new branch] gh/mlazos/70/head -> origin/gh/mlazos/70/head 2025-12-04T08:53:08.4214046Z * [new branch] gh/mlazos/70/orig -> origin/gh/mlazos/70/orig 2025-12-04T08:53:08.4214145Z * [new branch] gh/mlazos/71/base -> origin/gh/mlazos/71/base 2025-12-04T08:53:08.4214221Z * [new branch] gh/mlazos/71/head -> origin/gh/mlazos/71/head 2025-12-04T08:53:08.4214304Z * [new branch] gh/mlazos/71/orig -> origin/gh/mlazos/71/orig 2025-12-04T08:53:08.4214396Z * [new branch] gh/mlazos/72/base -> origin/gh/mlazos/72/base 2025-12-04T08:53:08.4214469Z * [new branch] gh/mlazos/72/head -> origin/gh/mlazos/72/head 2025-12-04T08:53:08.4214560Z * [new branch] gh/mlazos/72/orig -> origin/gh/mlazos/72/orig 2025-12-04T08:53:08.4214658Z * [new branch] gh/mlazos/73/base -> origin/gh/mlazos/73/base 2025-12-04T08:53:08.4214740Z * [new branch] gh/mlazos/73/head -> origin/gh/mlazos/73/head 2025-12-04T08:53:08.4214819Z * [new branch] gh/mlazos/73/orig -> origin/gh/mlazos/73/orig 2025-12-04T08:53:08.4214915Z * [new branch] gh/mrmiywj/1/base -> origin/gh/mrmiywj/1/base 2025-12-04T08:53:08.4214991Z * [new branch] gh/mrmiywj/1/head -> origin/gh/mrmiywj/1/head 2025-12-04T08:53:08.4215112Z * [new branch] gh/muchulee8/73/base -> origin/gh/muchulee8/73/base 2025-12-04T08:53:08.4215199Z * [new branch] gh/muchulee8/73/head -> origin/gh/muchulee8/73/head 2025-12-04T08:53:08.4215289Z * [new branch] gh/muchulee8/73/orig -> origin/gh/muchulee8/73/orig 2025-12-04T08:53:08.4215400Z * [new branch] gh/naveenthangudu/1/base -> origin/gh/naveenthangudu/1/base 2025-12-04T08:53:08.4215496Z * [new branch] gh/naveenthangudu/1/head -> origin/gh/naveenthangudu/1/head 2025-12-04T08:53:08.4215585Z * [new branch] gh/naveenthangudu/1/orig -> origin/gh/naveenthangudu/1/orig 2025-12-04T08:53:08.4215712Z * [new branch] gh/naveenthangudu/2/base -> origin/gh/naveenthangudu/2/base 2025-12-04T08:53:08.4215808Z * [new branch] gh/naveenthangudu/2/head -> origin/gh/naveenthangudu/2/head 2025-12-04T08:53:08.4215899Z * [new branch] gh/naveenthangudu/2/orig -> origin/gh/naveenthangudu/2/orig 2025-12-04T08:53:08.4216002Z * [new branch] gh/naveenthangudu/3/base -> origin/gh/naveenthangudu/3/base 2025-12-04T08:53:08.4216094Z * [new branch] gh/naveenthangudu/3/head -> origin/gh/naveenthangudu/3/head 2025-12-04T08:53:08.4216233Z * [new branch] gh/naveenthangudu/3/orig -> origin/gh/naveenthangudu/3/orig 2025-12-04T08:53:08.4216402Z * [new branch] gh/naveenthangudu/4/base -> origin/gh/naveenthangudu/4/base 2025-12-04T08:53:08.4216493Z * [new branch] gh/naveenthangudu/4/head -> origin/gh/naveenthangudu/4/head 2025-12-04T08:53:08.4216600Z * [new branch] gh/naveenthangudu/4/orig -> origin/gh/naveenthangudu/4/orig 2025-12-04T08:53:08.4216691Z * [new branch] gh/naveenthangudu/5/base -> origin/gh/naveenthangudu/5/base 2025-12-04T08:53:08.4216781Z * [new branch] gh/naveenthangudu/5/head -> origin/gh/naveenthangudu/5/head 2025-12-04T08:53:08.4216897Z * [new branch] gh/naveenthangudu/5/orig -> origin/gh/naveenthangudu/5/orig 2025-12-04T08:53:08.4216992Z * [new branch] gh/naveenthangudu/6/base -> origin/gh/naveenthangudu/6/base 2025-12-04T08:53:08.4217089Z * [new branch] gh/naveenthangudu/6/head -> origin/gh/naveenthangudu/6/head 2025-12-04T08:53:08.4217198Z * [new branch] gh/naveenthangudu/6/orig -> origin/gh/naveenthangudu/6/orig 2025-12-04T08:53:08.4217290Z * [new branch] gh/naveenthangudu/7/base -> origin/gh/naveenthangudu/7/base 2025-12-04T08:53:08.4217381Z * [new branch] gh/naveenthangudu/7/head -> origin/gh/naveenthangudu/7/head 2025-12-04T08:53:08.4217501Z * [new branch] gh/naveenthangudu/7/orig -> origin/gh/naveenthangudu/7/orig 2025-12-04T08:53:08.4217598Z * [new branch] gh/naveenthangudu/8/base -> origin/gh/naveenthangudu/8/base 2025-12-04T08:53:08.4217689Z * [new branch] gh/naveenthangudu/8/head -> origin/gh/naveenthangudu/8/head 2025-12-04T08:53:08.4217796Z * [new branch] gh/naveenthangudu/8/orig -> origin/gh/naveenthangudu/8/orig 2025-12-04T08:53:08.4217888Z * [new branch] gh/naveenthangudu/9/base -> origin/gh/naveenthangudu/9/base 2025-12-04T08:53:08.4217988Z * [new branch] gh/naveenthangudu/9/head -> origin/gh/naveenthangudu/9/head 2025-12-04T08:53:08.4218101Z * [new branch] gh/naveenthangudu/9/orig -> origin/gh/naveenthangudu/9/orig 2025-12-04T08:53:08.4218194Z * [new branch] gh/nikitaved/1/base -> origin/gh/nikitaved/1/base 2025-12-04T08:53:08.4218296Z * [new branch] gh/nikitaved/1/head -> origin/gh/nikitaved/1/head 2025-12-04T08:53:08.4218379Z * [new branch] gh/nikitaved/1/orig -> origin/gh/nikitaved/1/orig 2025-12-04T08:53:08.4218465Z * [new branch] gh/nikitaved/10/base -> origin/gh/nikitaved/10/base 2025-12-04T08:53:08.4218566Z * [new branch] gh/nikitaved/10/head -> origin/gh/nikitaved/10/head 2025-12-04T08:53:08.4218661Z * [new branch] gh/nikitaved/10/orig -> origin/gh/nikitaved/10/orig 2025-12-04T08:53:08.4218753Z * [new branch] gh/nikitaved/11/base -> origin/gh/nikitaved/11/base 2025-12-04T08:53:08.4218857Z * [new branch] gh/nikitaved/11/head -> origin/gh/nikitaved/11/head 2025-12-04T08:53:08.4218942Z * [new branch] gh/nikitaved/11/orig -> origin/gh/nikitaved/11/orig 2025-12-04T08:53:08.4219032Z * [new branch] gh/nikitaved/12/base -> origin/gh/nikitaved/12/base 2025-12-04T08:53:08.4219126Z * [new branch] gh/nikitaved/12/head -> origin/gh/nikitaved/12/head 2025-12-04T08:53:08.4219219Z * [new branch] gh/nikitaved/12/orig -> origin/gh/nikitaved/12/orig 2025-12-04T08:53:08.4219306Z * [new branch] gh/nikitaved/13/base -> origin/gh/nikitaved/13/base 2025-12-04T08:53:08.4219404Z * [new branch] gh/nikitaved/13/head -> origin/gh/nikitaved/13/head 2025-12-04T08:53:08.4219488Z * [new branch] gh/nikitaved/13/orig -> origin/gh/nikitaved/13/orig 2025-12-04T08:53:08.4219589Z * [new branch] gh/nikitaved/14/base -> origin/gh/nikitaved/14/base 2025-12-04T08:53:08.4219729Z * [new branch] gh/nikitaved/14/head -> origin/gh/nikitaved/14/head 2025-12-04T08:53:08.4219860Z * [new branch] gh/nikitaved/14/orig -> origin/gh/nikitaved/14/orig 2025-12-04T08:53:08.4219969Z * [new branch] gh/nikitaved/15/base -> origin/gh/nikitaved/15/base 2025-12-04T08:53:08.4220053Z * [new branch] gh/nikitaved/15/head -> origin/gh/nikitaved/15/head 2025-12-04T08:53:08.4220135Z * [new branch] gh/nikitaved/15/orig -> origin/gh/nikitaved/15/orig 2025-12-04T08:53:08.4220232Z * [new branch] gh/nikitaved/16/base -> origin/gh/nikitaved/16/base 2025-12-04T08:53:08.4220309Z * [new branch] gh/nikitaved/16/head -> origin/gh/nikitaved/16/head 2025-12-04T08:53:08.4220401Z * [new branch] gh/nikitaved/16/orig -> origin/gh/nikitaved/16/orig 2025-12-04T08:53:08.4220509Z * [new branch] gh/nikitaved/2/base -> origin/gh/nikitaved/2/base 2025-12-04T08:53:08.4220593Z * [new branch] gh/nikitaved/2/head -> origin/gh/nikitaved/2/head 2025-12-04T08:53:08.4220678Z * [new branch] gh/nikitaved/2/orig -> origin/gh/nikitaved/2/orig 2025-12-04T08:53:08.4220775Z * [new branch] gh/nikitaved/4/base -> origin/gh/nikitaved/4/base 2025-12-04T08:53:08.4220855Z * [new branch] gh/nikitaved/4/head -> origin/gh/nikitaved/4/head 2025-12-04T08:53:08.4220949Z * [new branch] gh/nikitaved/4/orig -> origin/gh/nikitaved/4/orig 2025-12-04T08:53:08.4221054Z * [new branch] gh/nikitaved/5/base -> origin/gh/nikitaved/5/base 2025-12-04T08:53:08.4221136Z * [new branch] gh/nikitaved/5/head -> origin/gh/nikitaved/5/head 2025-12-04T08:53:08.4221234Z * [new branch] gh/nikitaved/5/orig -> origin/gh/nikitaved/5/orig 2025-12-04T08:53:08.4221316Z * [new branch] gh/nikitaved/6/base -> origin/gh/nikitaved/6/base 2025-12-04T08:53:08.4221395Z * [new branch] gh/nikitaved/6/head -> origin/gh/nikitaved/6/head 2025-12-04T08:53:08.4221516Z * [new branch] gh/nikitaved/6/orig -> origin/gh/nikitaved/6/orig 2025-12-04T08:53:08.4221597Z * [new branch] gh/nikitaved/8/base -> origin/gh/nikitaved/8/base 2025-12-04T08:53:08.4221678Z * [new branch] gh/nikitaved/8/head -> origin/gh/nikitaved/8/head 2025-12-04T08:53:08.4221772Z * [new branch] gh/nikitaved/8/orig -> origin/gh/nikitaved/8/orig 2025-12-04T08:53:08.4221853Z * [new branch] gh/nikitaved/9/base -> origin/gh/nikitaved/9/base 2025-12-04T08:53:08.4221931Z * [new branch] gh/nikitaved/9/head -> origin/gh/nikitaved/9/head 2025-12-04T08:53:08.4222054Z * [new branch] gh/nikitaved/9/orig -> origin/gh/nikitaved/9/orig 2025-12-04T08:53:08.4222136Z * [new branch] gh/oulgen/10/base -> origin/gh/oulgen/10/base 2025-12-04T08:53:08.4222218Z * [new branch] gh/oulgen/10/head -> origin/gh/oulgen/10/head 2025-12-04T08:53:08.4222312Z * [new branch] gh/oulgen/10/orig -> origin/gh/oulgen/10/orig 2025-12-04T08:53:08.4222392Z * [new branch] gh/oulgen/11/base -> origin/gh/oulgen/11/base 2025-12-04T08:53:08.4222465Z * [new branch] gh/oulgen/11/head -> origin/gh/oulgen/11/head 2025-12-04T08:53:08.4222583Z * [new branch] gh/oulgen/11/orig -> origin/gh/oulgen/11/orig 2025-12-04T08:53:08.4222660Z * [new branch] gh/oulgen/12/base -> origin/gh/oulgen/12/base 2025-12-04T08:53:08.4222753Z * [new branch] gh/oulgen/12/head -> origin/gh/oulgen/12/head 2025-12-04T08:53:08.4222832Z * [new branch] gh/oulgen/12/orig -> origin/gh/oulgen/12/orig 2025-12-04T08:53:08.4222909Z * [new branch] gh/oulgen/13/base -> origin/gh/oulgen/13/base 2025-12-04T08:53:08.4223041Z * [new branch] gh/oulgen/13/head -> origin/gh/oulgen/13/head 2025-12-04T08:53:08.4223152Z * [new branch] gh/oulgen/13/orig -> origin/gh/oulgen/13/orig 2025-12-04T08:53:08.4223230Z * [new branch] gh/oulgen/14/base -> origin/gh/oulgen/14/base 2025-12-04T08:53:08.4223359Z * [new branch] gh/oulgen/14/head -> origin/gh/oulgen/14/head 2025-12-04T08:53:08.4223445Z * [new branch] gh/oulgen/14/orig -> origin/gh/oulgen/14/orig 2025-12-04T08:53:08.4223525Z * [new branch] gh/oulgen/15/base -> origin/gh/oulgen/15/base 2025-12-04T08:53:08.4223624Z * [new branch] gh/oulgen/15/head -> origin/gh/oulgen/15/head 2025-12-04T08:53:08.4223707Z * [new branch] gh/oulgen/15/orig -> origin/gh/oulgen/15/orig 2025-12-04T08:53:08.4223786Z * [new branch] gh/oulgen/16/base -> origin/gh/oulgen/16/base 2025-12-04T08:53:08.4223888Z * [new branch] gh/oulgen/16/head -> origin/gh/oulgen/16/head 2025-12-04T08:53:08.4223970Z * [new branch] gh/oulgen/16/orig -> origin/gh/oulgen/16/orig 2025-12-04T08:53:08.4224049Z * [new branch] gh/oulgen/17/base -> origin/gh/oulgen/17/base 2025-12-04T08:53:08.4224150Z * [new branch] gh/oulgen/17/head -> origin/gh/oulgen/17/head 2025-12-04T08:53:08.4224234Z * [new branch] gh/oulgen/17/orig -> origin/gh/oulgen/17/orig 2025-12-04T08:53:08.4224323Z * [new branch] gh/oulgen/18/base -> origin/gh/oulgen/18/base 2025-12-04T08:53:08.4224409Z * [new branch] gh/oulgen/18/head -> origin/gh/oulgen/18/head 2025-12-04T08:53:08.4224488Z * [new branch] gh/oulgen/18/orig -> origin/gh/oulgen/18/orig 2025-12-04T08:53:08.4224573Z * [new branch] gh/oulgen/19/base -> origin/gh/oulgen/19/base 2025-12-04T08:53:08.4224664Z * [new branch] gh/oulgen/19/head -> origin/gh/oulgen/19/head 2025-12-04T08:53:08.4224748Z * [new branch] gh/oulgen/19/orig -> origin/gh/oulgen/19/orig 2025-12-04T08:53:08.4224845Z * [new branch] gh/oulgen/20/base -> origin/gh/oulgen/20/base 2025-12-04T08:53:08.4224924Z * [new branch] gh/oulgen/20/head -> origin/gh/oulgen/20/head 2025-12-04T08:53:08.4225003Z * [new branch] gh/oulgen/20/orig -> origin/gh/oulgen/20/orig 2025-12-04T08:53:08.4225090Z * [new branch] gh/oulgen/21/base -> origin/gh/oulgen/21/base 2025-12-04T08:53:08.4225179Z * [new branch] gh/oulgen/21/head -> origin/gh/oulgen/21/head 2025-12-04T08:53:08.4225265Z * [new branch] gh/oulgen/21/orig -> origin/gh/oulgen/21/orig 2025-12-04T08:53:08.4225362Z * [new branch] gh/oulgen/22/base -> origin/gh/oulgen/22/base 2025-12-04T08:53:08.4225445Z * [new branch] gh/oulgen/22/head -> origin/gh/oulgen/22/head 2025-12-04T08:53:08.4225523Z * [new branch] gh/oulgen/22/orig -> origin/gh/oulgen/22/orig 2025-12-04T08:53:08.4225613Z * [new branch] gh/oulgen/23/base -> origin/gh/oulgen/23/base 2025-12-04T08:53:08.4225700Z * [new branch] gh/oulgen/23/head -> origin/gh/oulgen/23/head 2025-12-04T08:53:08.4225807Z * [new branch] gh/oulgen/23/orig -> origin/gh/oulgen/23/orig 2025-12-04T08:53:08.4225885Z * [new branch] gh/oulgen/24/base -> origin/gh/oulgen/24/base 2025-12-04T08:53:08.4225963Z * [new branch] gh/oulgen/24/head -> origin/gh/oulgen/24/head 2025-12-04T08:53:08.4226055Z * [new branch] gh/oulgen/24/orig -> origin/gh/oulgen/24/orig 2025-12-04T08:53:08.4226129Z * [new branch] gh/oulgen/25/base -> origin/gh/oulgen/25/base 2025-12-04T08:53:08.4226273Z * [new branch] gh/oulgen/25/head -> origin/gh/oulgen/25/head 2025-12-04T08:53:08.4226376Z * [new branch] gh/oulgen/25/orig -> origin/gh/oulgen/25/orig 2025-12-04T08:53:08.4226498Z * [new branch] gh/oulgen/26/base -> origin/gh/oulgen/26/base 2025-12-04T08:53:08.4226576Z * [new branch] gh/oulgen/26/head -> origin/gh/oulgen/26/head 2025-12-04T08:53:08.4226669Z * [new branch] gh/oulgen/26/orig -> origin/gh/oulgen/26/orig 2025-12-04T08:53:08.4226743Z * [new branch] gh/oulgen/4/base -> origin/gh/oulgen/4/base 2025-12-04T08:53:08.4226839Z * [new branch] gh/oulgen/4/head -> origin/gh/oulgen/4/head 2025-12-04T08:53:08.4226936Z * [new branch] gh/oulgen/4/orig -> origin/gh/oulgen/4/orig 2025-12-04T08:53:08.4227013Z * [new branch] gh/oulgen/7/base -> origin/gh/oulgen/7/base 2025-12-04T08:53:08.4227089Z * [new branch] gh/oulgen/7/head -> origin/gh/oulgen/7/head 2025-12-04T08:53:08.4227183Z * [new branch] gh/oulgen/7/orig -> origin/gh/oulgen/7/orig 2025-12-04T08:53:08.4227264Z * [new branch] gh/oulgen/8/base -> origin/gh/oulgen/8/base 2025-12-04T08:53:08.4227372Z * [new branch] gh/oulgen/8/head -> origin/gh/oulgen/8/head 2025-12-04T08:53:08.4227454Z * [new branch] gh/oulgen/8/orig -> origin/gh/oulgen/8/orig 2025-12-04T08:53:08.4227532Z * [new branch] gh/oulgen/9/base -> origin/gh/oulgen/9/base 2025-12-04T08:53:08.4227625Z * [new branch] gh/oulgen/9/head -> origin/gh/oulgen/9/head 2025-12-04T08:53:08.4227709Z * [new branch] gh/oulgen/9/orig -> origin/gh/oulgen/9/orig 2025-12-04T08:53:08.4227819Z * [new branch] gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization 2025-12-04T08:53:08.4227930Z * [new branch] gh/pearu/108/base -> origin/gh/pearu/108/base 2025-12-04T08:53:08.4228011Z * [new branch] gh/pearu/108/head -> origin/gh/pearu/108/head 2025-12-04T08:53:08.4228091Z * [new branch] gh/pearu/108/orig -> origin/gh/pearu/108/orig 2025-12-04T08:53:08.4228184Z * [new branch] gh/pearu/109/base -> origin/gh/pearu/109/base 2025-12-04T08:53:08.4228268Z * [new branch] gh/pearu/109/head -> origin/gh/pearu/109/head 2025-12-04T08:53:08.4228341Z * [new branch] gh/pearu/109/orig -> origin/gh/pearu/109/orig 2025-12-04T08:53:08.4228447Z * [new branch] gh/pearu/110/base -> origin/gh/pearu/110/base 2025-12-04T08:53:08.4228524Z * [new branch] gh/pearu/110/head -> origin/gh/pearu/110/head 2025-12-04T08:53:08.4228603Z * [new branch] gh/pearu/110/orig -> origin/gh/pearu/110/orig 2025-12-04T08:53:08.4228701Z * [new branch] gh/pearu/111/base -> origin/gh/pearu/111/base 2025-12-04T08:53:08.4228781Z * [new branch] gh/pearu/111/head -> origin/gh/pearu/111/head 2025-12-04T08:53:08.4228880Z * [new branch] gh/pearu/111/orig -> origin/gh/pearu/111/orig 2025-12-04T08:53:08.4228966Z * [new branch] gh/pearu/112/base -> origin/gh/pearu/112/base 2025-12-04T08:53:08.4229043Z * [new branch] gh/pearu/112/head -> origin/gh/pearu/112/head 2025-12-04T08:53:08.4229135Z * [new branch] gh/pearu/112/orig -> origin/gh/pearu/112/orig 2025-12-04T08:53:08.4229220Z * [new branch] gh/pearu/115/base -> origin/gh/pearu/115/base 2025-12-04T08:53:08.4229301Z * [new branch] gh/pearu/115/head -> origin/gh/pearu/115/head 2025-12-04T08:53:08.4229399Z * [new branch] gh/pearu/115/orig -> origin/gh/pearu/115/orig 2025-12-04T08:53:08.4229485Z * [new branch] gh/pearu/116/base -> origin/gh/pearu/116/base 2025-12-04T08:53:08.4229601Z * [new branch] gh/pearu/116/head -> origin/gh/pearu/116/head 2025-12-04T08:53:08.4229726Z * [new branch] gh/pearu/116/orig -> origin/gh/pearu/116/orig 2025-12-04T08:53:08.4229804Z * [new branch] gh/pearu/117/base -> origin/gh/pearu/117/base 2025-12-04T08:53:08.4229881Z * [new branch] gh/pearu/117/head -> origin/gh/pearu/117/head 2025-12-04T08:53:08.4229981Z * [new branch] gh/pearu/117/orig -> origin/gh/pearu/117/orig 2025-12-04T08:53:08.4230064Z * [new branch] gh/pearu/118/base -> origin/gh/pearu/118/base 2025-12-04T08:53:08.4230151Z * [new branch] gh/pearu/118/head -> origin/gh/pearu/118/head 2025-12-04T08:53:08.4230242Z * [new branch] gh/pearu/118/orig -> origin/gh/pearu/118/orig 2025-12-04T08:53:08.4230319Z * [new branch] gh/pearu/119/base -> origin/gh/pearu/119/base 2025-12-04T08:53:08.4230408Z * [new branch] gh/pearu/119/head -> origin/gh/pearu/119/head 2025-12-04T08:53:08.4230498Z * [new branch] gh/pearu/119/orig -> origin/gh/pearu/119/orig 2025-12-04T08:53:08.4230588Z * [new branch] gh/pearu/139/base -> origin/gh/pearu/139/base 2025-12-04T08:53:08.4230677Z * [new branch] gh/pearu/139/head -> origin/gh/pearu/139/head 2025-12-04T08:53:08.4230757Z * [new branch] gh/pearu/139/orig -> origin/gh/pearu/139/orig 2025-12-04T08:53:08.4230836Z * [new branch] gh/pearu/140/base -> origin/gh/pearu/140/base 2025-12-04T08:53:08.4230922Z * [new branch] gh/pearu/140/head -> origin/gh/pearu/140/head 2025-12-04T08:53:08.4231010Z * [new branch] gh/pearu/140/orig -> origin/gh/pearu/140/orig 2025-12-04T08:53:08.4231097Z * [new branch] gh/pearu/142/base -> origin/gh/pearu/142/base 2025-12-04T08:53:08.4231190Z * [new branch] gh/pearu/142/head -> origin/gh/pearu/142/head 2025-12-04T08:53:08.4231269Z * [new branch] gh/pearu/142/orig -> origin/gh/pearu/142/orig 2025-12-04T08:53:08.4231348Z * [new branch] gh/pearu/143/base -> origin/gh/pearu/143/base 2025-12-04T08:53:08.4231436Z * [new branch] gh/pearu/143/head -> origin/gh/pearu/143/head 2025-12-04T08:53:08.4231529Z * [new branch] gh/pearu/143/orig -> origin/gh/pearu/143/orig 2025-12-04T08:53:08.4231620Z * [new branch] gh/pearu/147/base -> origin/gh/pearu/147/base 2025-12-04T08:53:08.4231713Z * [new branch] gh/pearu/147/head -> origin/gh/pearu/147/head 2025-12-04T08:53:08.4231792Z * [new branch] gh/pearu/147/orig -> origin/gh/pearu/147/orig 2025-12-04T08:53:08.4231883Z * [new branch] gh/pearu/149/base -> origin/gh/pearu/149/base 2025-12-04T08:53:08.4231957Z * [new branch] gh/pearu/149/head -> origin/gh/pearu/149/head 2025-12-04T08:53:08.4232051Z * [new branch] gh/pearu/149/orig -> origin/gh/pearu/149/orig 2025-12-04T08:53:08.4232149Z * [new branch] gh/pearu/150/base -> origin/gh/pearu/150/base 2025-12-04T08:53:08.4232227Z * [new branch] gh/pearu/150/head -> origin/gh/pearu/150/head 2025-12-04T08:53:08.4232305Z * [new branch] gh/pearu/150/orig -> origin/gh/pearu/150/orig 2025-12-04T08:53:08.4232395Z * [new branch] gh/pearu/151/base -> origin/gh/pearu/151/base 2025-12-04T08:53:08.4232467Z * [new branch] gh/pearu/151/head -> origin/gh/pearu/151/head 2025-12-04T08:53:08.4232570Z * [new branch] gh/pearu/151/orig -> origin/gh/pearu/151/orig 2025-12-04T08:53:08.4232666Z * [new branch] gh/pearu/152/base -> origin/gh/pearu/152/base 2025-12-04T08:53:08.4232772Z * [new branch] gh/pearu/152/head -> origin/gh/pearu/152/head 2025-12-04T08:53:08.4232852Z * [new branch] gh/pearu/152/orig -> origin/gh/pearu/152/orig 2025-12-04T08:53:08.4232978Z * [new branch] gh/pearu/153/base -> origin/gh/pearu/153/base 2025-12-04T08:53:08.4233050Z * [new branch] gh/pearu/153/head -> origin/gh/pearu/153/head 2025-12-04T08:53:08.4233142Z * [new branch] gh/pearu/153/orig -> origin/gh/pearu/153/orig 2025-12-04T08:53:08.4233239Z * [new branch] gh/pearu/154/base -> origin/gh/pearu/154/base 2025-12-04T08:53:08.4233375Z * [new branch] gh/pearu/154/head -> origin/gh/pearu/154/head 2025-12-04T08:53:08.4233460Z * [new branch] gh/pearu/154/orig -> origin/gh/pearu/154/orig 2025-12-04T08:53:08.4233553Z * [new branch] gh/pearu/155/base -> origin/gh/pearu/155/base 2025-12-04T08:53:08.4233625Z * [new branch] gh/pearu/155/head -> origin/gh/pearu/155/head 2025-12-04T08:53:08.4233738Z * [new branch] gh/pearu/155/orig -> origin/gh/pearu/155/orig 2025-12-04T08:53:08.4233819Z * [new branch] gh/pearu/156/base -> origin/gh/pearu/156/base 2025-12-04T08:53:08.4233898Z * [new branch] gh/pearu/156/head -> origin/gh/pearu/156/head 2025-12-04T08:53:08.4233995Z * [new branch] gh/pearu/156/orig -> origin/gh/pearu/156/orig 2025-12-04T08:53:08.4234073Z * [new branch] gh/pearu/56/base -> origin/gh/pearu/56/base 2025-12-04T08:53:08.4234144Z * [new branch] gh/pearu/56/head -> origin/gh/pearu/56/head 2025-12-04T08:53:08.4234255Z * [new branch] gh/pearu/56/orig -> origin/gh/pearu/56/orig 2025-12-04T08:53:08.4234331Z * [new branch] gh/pearu/97/base -> origin/gh/pearu/97/base 2025-12-04T08:53:08.4234413Z * [new branch] gh/pearu/97/head -> origin/gh/pearu/97/head 2025-12-04T08:53:08.4234505Z * [new branch] gh/pearu/97/orig -> origin/gh/pearu/97/orig 2025-12-04T08:53:08.4234595Z * [new branch] gh/pianpwk/21/base -> origin/gh/pianpwk/21/base 2025-12-04T08:53:08.4234674Z * [new branch] gh/pianpwk/21/head -> origin/gh/pianpwk/21/head 2025-12-04T08:53:08.4234785Z * [new branch] gh/pianpwk/28/base -> origin/gh/pianpwk/28/base 2025-12-04T08:53:08.4234868Z * [new branch] gh/pianpwk/28/head -> origin/gh/pianpwk/28/head 2025-12-04T08:53:08.4234956Z * [new branch] gh/pianpwk/28/orig -> origin/gh/pianpwk/28/orig 2025-12-04T08:53:08.4235049Z * [new branch] gh/pianpwk/29/base -> origin/gh/pianpwk/29/base 2025-12-04T08:53:08.4235134Z * [new branch] gh/pianpwk/29/head -> origin/gh/pianpwk/29/head 2025-12-04T08:53:08.4235235Z * [new branch] gh/pianpwk/29/orig -> origin/gh/pianpwk/29/orig 2025-12-04T08:53:08.4235322Z * [new branch] gh/pianpwk/30/base -> origin/gh/pianpwk/30/base 2025-12-04T08:53:08.4235409Z * [new branch] gh/pianpwk/30/head -> origin/gh/pianpwk/30/head 2025-12-04T08:53:08.4235504Z * [new branch] gh/pianpwk/30/orig -> origin/gh/pianpwk/30/orig 2025-12-04T08:53:08.4235585Z * [new branch] gh/pianpwk/31/base -> origin/gh/pianpwk/31/base 2025-12-04T08:53:08.4235671Z * [new branch] gh/pianpwk/31/head -> origin/gh/pianpwk/31/head 2025-12-04T08:53:08.4235772Z * [new branch] gh/pianpwk/31/orig -> origin/gh/pianpwk/31/orig 2025-12-04T08:53:08.4235859Z * [new branch] gh/pianpwk/32/base -> origin/gh/pianpwk/32/base 2025-12-04T08:53:08.4235945Z * [new branch] gh/pianpwk/32/head -> origin/gh/pianpwk/32/head 2025-12-04T08:53:08.4236040Z * [new branch] gh/pianpwk/32/orig -> origin/gh/pianpwk/32/orig 2025-12-04T08:53:08.4236155Z * [new branch] gh/pianpwk/33/base -> origin/gh/pianpwk/33/base 2025-12-04T08:53:08.4236282Z * [new branch] gh/pianpwk/33/head -> origin/gh/pianpwk/33/head 2025-12-04T08:53:08.4236386Z * [new branch] gh/pianpwk/33/orig -> origin/gh/pianpwk/33/orig 2025-12-04T08:53:08.4236474Z * [new branch] gh/pianpwk/34/base -> origin/gh/pianpwk/34/base 2025-12-04T08:53:08.4236558Z * [new branch] gh/pianpwk/34/head -> origin/gh/pianpwk/34/head 2025-12-04T08:53:08.4236651Z * [new branch] gh/pianpwk/34/orig -> origin/gh/pianpwk/34/orig 2025-12-04T08:53:08.4236731Z * [new branch] gh/pianpwk/35/base -> origin/gh/pianpwk/35/base 2025-12-04T08:53:08.4236813Z * [new branch] gh/pianpwk/35/head -> origin/gh/pianpwk/35/head 2025-12-04T08:53:08.4236931Z * [new branch] gh/pianpwk/35/orig -> origin/gh/pianpwk/35/orig 2025-12-04T08:53:08.4240964Z * [new branch] gh/rec/141/base -> origin/gh/rec/141/base 2025-12-04T08:53:08.4241049Z * [new branch] gh/rec/141/head -> origin/gh/rec/141/head 2025-12-04T08:53:08.4241117Z * [new branch] gh/rec/153/base -> origin/gh/rec/153/base 2025-12-04T08:53:08.4241184Z * [new branch] gh/rec/153/head -> origin/gh/rec/153/head 2025-12-04T08:53:08.4241249Z * [new branch] gh/rec/153/orig -> origin/gh/rec/153/orig 2025-12-04T08:53:08.4241315Z * [new branch] gh/rec/154/base -> origin/gh/rec/154/base 2025-12-04T08:53:08.4241379Z * [new branch] gh/rec/154/head -> origin/gh/rec/154/head 2025-12-04T08:53:08.4241442Z * [new branch] gh/rec/154/orig -> origin/gh/rec/154/orig 2025-12-04T08:53:08.4241506Z * [new branch] gh/rec/164/base -> origin/gh/rec/164/base 2025-12-04T08:53:08.4241574Z * [new branch] gh/rec/164/head -> origin/gh/rec/164/head 2025-12-04T08:53:08.4241638Z * [new branch] gh/rec/164/orig -> origin/gh/rec/164/orig 2025-12-04T08:53:08.4241704Z * [new branch] gh/rec/166/base -> origin/gh/rec/166/base 2025-12-04T08:53:08.4241767Z * [new branch] gh/rec/166/head -> origin/gh/rec/166/head 2025-12-04T08:53:08.4241830Z * [new branch] gh/rec/166/orig -> origin/gh/rec/166/orig 2025-12-04T08:53:08.4241896Z * [new branch] gh/rec/167/base -> origin/gh/rec/167/base 2025-12-04T08:53:08.4241961Z * [new branch] gh/rec/167/head -> origin/gh/rec/167/head 2025-12-04T08:53:08.4242026Z * [new branch] gh/rec/167/orig -> origin/gh/rec/167/orig 2025-12-04T08:53:08.4242093Z * [new branch] gh/rec/168/base -> origin/gh/rec/168/base 2025-12-04T08:53:08.4242155Z * [new branch] gh/rec/168/head -> origin/gh/rec/168/head 2025-12-04T08:53:08.4242220Z * [new branch] gh/rec/168/orig -> origin/gh/rec/168/orig 2025-12-04T08:53:08.4242285Z * [new branch] gh/rec/169/base -> origin/gh/rec/169/base 2025-12-04T08:53:08.4242348Z * [new branch] gh/rec/169/head -> origin/gh/rec/169/head 2025-12-04T08:53:08.4242410Z * [new branch] gh/rec/169/orig -> origin/gh/rec/169/orig 2025-12-04T08:53:08.4242478Z * [new branch] gh/rec/170/base -> origin/gh/rec/170/base 2025-12-04T08:53:08.4242542Z * [new branch] gh/rec/170/head -> origin/gh/rec/170/head 2025-12-04T08:53:08.4242605Z * [new branch] gh/rec/170/orig -> origin/gh/rec/170/orig 2025-12-04T08:53:08.4242671Z * [new branch] gh/rec/171/base -> origin/gh/rec/171/base 2025-12-04T08:53:08.4242732Z * [new branch] gh/rec/171/head -> origin/gh/rec/171/head 2025-12-04T08:53:08.4242845Z * [new branch] gh/rec/171/orig -> origin/gh/rec/171/orig 2025-12-04T08:53:08.4242940Z * [new branch] gh/rec/172/base -> origin/gh/rec/172/base 2025-12-04T08:53:08.4243003Z * [new branch] gh/rec/172/head -> origin/gh/rec/172/head 2025-12-04T08:53:08.4243065Z * [new branch] gh/rec/172/orig -> origin/gh/rec/172/orig 2025-12-04T08:53:08.4243128Z * [new branch] gh/rec/173/base -> origin/gh/rec/173/base 2025-12-04T08:53:08.4243191Z * [new branch] gh/rec/173/head -> origin/gh/rec/173/head 2025-12-04T08:53:08.4243295Z * [new branch] gh/rec/173/orig -> origin/gh/rec/173/orig 2025-12-04T08:53:08.4243364Z * [new branch] gh/rec/174/base -> origin/gh/rec/174/base 2025-12-04T08:53:08.4243429Z * [new branch] gh/rec/174/head -> origin/gh/rec/174/head 2025-12-04T08:53:08.4243497Z * [new branch] gh/rec/174/orig -> origin/gh/rec/174/orig 2025-12-04T08:53:08.4243561Z * [new branch] gh/rec/175/base -> origin/gh/rec/175/base 2025-12-04T08:53:08.4243625Z * [new branch] gh/rec/175/head -> origin/gh/rec/175/head 2025-12-04T08:53:08.4243690Z * [new branch] gh/rec/175/orig -> origin/gh/rec/175/orig 2025-12-04T08:53:08.4243755Z * [new branch] gh/rec/176/base -> origin/gh/rec/176/base 2025-12-04T08:53:08.4243819Z * [new branch] gh/rec/176/head -> origin/gh/rec/176/head 2025-12-04T08:53:08.4243884Z * [new branch] gh/rec/176/orig -> origin/gh/rec/176/orig 2025-12-04T08:53:08.4243946Z * [new branch] gh/rec/177/base -> origin/gh/rec/177/base 2025-12-04T08:53:08.4244007Z * [new branch] gh/rec/177/head -> origin/gh/rec/177/head 2025-12-04T08:53:08.4244072Z * [new branch] gh/rec/177/orig -> origin/gh/rec/177/orig 2025-12-04T08:53:08.4244165Z * [new branch] gh/robert-hardwick/3/base -> origin/gh/robert-hardwick/3/base 2025-12-04T08:53:08.4244252Z * [new branch] gh/robert-hardwick/3/head -> origin/gh/robert-hardwick/3/head 2025-12-04T08:53:08.4244336Z * [new branch] gh/robert-hardwick/3/orig -> origin/gh/robert-hardwick/3/orig 2025-12-04T08:53:08.4244418Z * [new branch] gh/robert-hardwick/4/base -> origin/gh/robert-hardwick/4/base 2025-12-04T08:53:08.4244499Z * [new branch] gh/robert-hardwick/4/head -> origin/gh/robert-hardwick/4/head 2025-12-04T08:53:08.4244583Z * [new branch] gh/robert-hardwick/4/orig -> origin/gh/robert-hardwick/4/orig 2025-12-04T08:53:08.4244664Z * [new branch] gh/robert-hardwick/5/base -> origin/gh/robert-hardwick/5/base 2025-12-04T08:53:08.4244746Z * [new branch] gh/robert-hardwick/5/head -> origin/gh/robert-hardwick/5/head 2025-12-04T08:53:08.4244830Z * [new branch] gh/robert-hardwick/5/orig -> origin/gh/robert-hardwick/5/orig 2025-12-04T08:53:08.4244914Z * [new branch] gh/robert-hardwick/6/base -> origin/gh/robert-hardwick/6/base 2025-12-04T08:53:08.4244995Z * [new branch] gh/robert-hardwick/6/head -> origin/gh/robert-hardwick/6/head 2025-12-04T08:53:08.4245079Z * [new branch] gh/robert-hardwick/6/orig -> origin/gh/robert-hardwick/6/orig 2025-12-04T08:53:08.4245160Z * [new branch] gh/robert-hardwick/7/base -> origin/gh/robert-hardwick/7/base 2025-12-04T08:53:08.4245242Z * [new branch] gh/robert-hardwick/7/head -> origin/gh/robert-hardwick/7/head 2025-12-04T08:53:08.4245323Z * [new branch] gh/robert-hardwick/7/orig -> origin/gh/robert-hardwick/7/orig 2025-12-04T08:53:08.4245404Z * [new branch] gh/robert-hardwick/8/base -> origin/gh/robert-hardwick/8/base 2025-12-04T08:53:08.4245538Z * [new branch] gh/robert-hardwick/8/head -> origin/gh/robert-hardwick/8/head 2025-12-04T08:53:08.4245619Z * [new branch] gh/robert-hardwick/8/orig -> origin/gh/robert-hardwick/8/orig 2025-12-04T08:53:08.4245741Z * [new branch] gh/robert-hardwick/9/base -> origin/gh/robert-hardwick/9/base 2025-12-04T08:53:08.4245824Z * [new branch] gh/robert-hardwick/9/head -> origin/gh/robert-hardwick/9/head 2025-12-04T08:53:08.4245904Z * [new branch] gh/robert-hardwick/9/orig -> origin/gh/robert-hardwick/9/orig 2025-12-04T08:53:08.4245974Z * [new branch] gh/rtimpe/1/base -> origin/gh/rtimpe/1/base 2025-12-04T08:53:08.4246047Z * [new branch] gh/rtimpe/1/head -> origin/gh/rtimpe/1/head 2025-12-04T08:53:08.4246115Z * [new branch] gh/rtimpe/2/base -> origin/gh/rtimpe/2/base 2025-12-04T08:53:08.4246181Z * [new branch] gh/rtimpe/2/head -> origin/gh/rtimpe/2/head 2025-12-04T08:53:08.4246256Z * [new branch] gh/rtimpe/22/base -> origin/gh/rtimpe/22/base 2025-12-04T08:53:08.4246324Z * [new branch] gh/rtimpe/22/head -> origin/gh/rtimpe/22/head 2025-12-04T08:53:08.4246392Z * [new branch] gh/rtimpe/22/orig -> origin/gh/rtimpe/22/orig 2025-12-04T08:53:08.4246464Z * [new branch] gh/rtimpe/23/base -> origin/gh/rtimpe/23/base 2025-12-04T08:53:08.4246532Z * [new branch] gh/rtimpe/23/head -> origin/gh/rtimpe/23/head 2025-12-04T08:53:08.4246599Z * [new branch] gh/rtimpe/23/orig -> origin/gh/rtimpe/23/orig 2025-12-04T08:53:08.4246668Z * [new branch] gh/rtimpe/24/base -> origin/gh/rtimpe/24/base 2025-12-04T08:53:08.4246735Z * [new branch] gh/rtimpe/24/head -> origin/gh/rtimpe/24/head 2025-12-04T08:53:08.4246803Z * [new branch] gh/rtimpe/24/orig -> origin/gh/rtimpe/24/orig 2025-12-04T08:53:08.4246871Z * [new branch] gh/rtimpe/25/base -> origin/gh/rtimpe/25/base 2025-12-04T08:53:08.4246938Z * [new branch] gh/rtimpe/25/head -> origin/gh/rtimpe/25/head 2025-12-04T08:53:08.4247008Z * [new branch] gh/rtimpe/25/orig -> origin/gh/rtimpe/25/orig 2025-12-04T08:53:08.4247075Z * [new branch] gh/rtimpe/26/base -> origin/gh/rtimpe/26/base 2025-12-04T08:53:08.4247144Z * [new branch] gh/rtimpe/26/head -> origin/gh/rtimpe/26/head 2025-12-04T08:53:08.4247214Z * [new branch] gh/rtimpe/26/orig -> origin/gh/rtimpe/26/orig 2025-12-04T08:53:08.4247279Z * [new branch] gh/rtimpe/27/base -> origin/gh/rtimpe/27/base 2025-12-04T08:53:08.4247349Z * [new branch] gh/rtimpe/27/head -> origin/gh/rtimpe/27/head 2025-12-04T08:53:08.4247416Z * [new branch] gh/rtimpe/27/orig -> origin/gh/rtimpe/27/orig 2025-12-04T08:53:08.4247484Z * [new branch] gh/rtimpe/28/base -> origin/gh/rtimpe/28/base 2025-12-04T08:53:08.4247550Z * [new branch] gh/rtimpe/28/head -> origin/gh/rtimpe/28/head 2025-12-04T08:53:08.4247623Z * [new branch] gh/rtimpe/28/orig -> origin/gh/rtimpe/28/orig 2025-12-04T08:53:08.4247693Z * [new branch] gh/rtimpe/29/base -> origin/gh/rtimpe/29/base 2025-12-04T08:53:08.4247759Z * [new branch] gh/rtimpe/29/head -> origin/gh/rtimpe/29/head 2025-12-04T08:53:08.4247826Z * [new branch] gh/rtimpe/29/orig -> origin/gh/rtimpe/29/orig 2025-12-04T08:53:08.4247893Z * [new branch] gh/rtimpe/3/base -> origin/gh/rtimpe/3/base 2025-12-04T08:53:08.4247962Z * [new branch] gh/rtimpe/3/head -> origin/gh/rtimpe/3/head 2025-12-04T08:53:08.4248027Z * [new branch] gh/rtimpe/30/base -> origin/gh/rtimpe/30/base 2025-12-04T08:53:08.4248097Z * [new branch] gh/rtimpe/30/head -> origin/gh/rtimpe/30/head 2025-12-04T08:53:08.4248194Z * [new branch] gh/rtimpe/30/orig -> origin/gh/rtimpe/30/orig 2025-12-04T08:53:08.4248287Z * [new branch] gh/rtimpe/31/base -> origin/gh/rtimpe/31/base 2025-12-04T08:53:08.4248354Z * [new branch] gh/rtimpe/31/head -> origin/gh/rtimpe/31/head 2025-12-04T08:53:08.4248420Z * [new branch] gh/rtimpe/31/orig -> origin/gh/rtimpe/31/orig 2025-12-04T08:53:08.4248486Z * [new branch] gh/rtimpe/32/base -> origin/gh/rtimpe/32/base 2025-12-04T08:53:08.4248554Z * [new branch] gh/rtimpe/32/head -> origin/gh/rtimpe/32/head 2025-12-04T08:53:08.4248621Z * [new branch] gh/rtimpe/32/orig -> origin/gh/rtimpe/32/orig 2025-12-04T08:53:08.4248687Z * [new branch] gh/rtimpe/33/base -> origin/gh/rtimpe/33/base 2025-12-04T08:53:08.4248758Z * [new branch] gh/rtimpe/33/head -> origin/gh/rtimpe/33/head 2025-12-04T08:53:08.4248826Z * [new branch] gh/rtimpe/33/orig -> origin/gh/rtimpe/33/orig 2025-12-04T08:53:08.4248893Z * [new branch] gh/rtimpe/34/base -> origin/gh/rtimpe/34/base 2025-12-04T08:53:08.4248962Z * [new branch] gh/rtimpe/34/head -> origin/gh/rtimpe/34/head 2025-12-04T08:53:08.4249027Z * [new branch] gh/rtimpe/34/orig -> origin/gh/rtimpe/34/orig 2025-12-04T08:53:08.4249092Z * [new branch] gh/rtimpe/35/base -> origin/gh/rtimpe/35/base 2025-12-04T08:53:08.4249160Z * [new branch] gh/rtimpe/35/head -> origin/gh/rtimpe/35/head 2025-12-04T08:53:08.4249227Z * [new branch] gh/rtimpe/35/orig -> origin/gh/rtimpe/35/orig 2025-12-04T08:53:08.4249295Z * [new branch] gh/rtimpe/4/base -> origin/gh/rtimpe/4/base 2025-12-04T08:53:08.4249362Z * [new branch] gh/rtimpe/4/head -> origin/gh/rtimpe/4/head 2025-12-04T08:53:08.4249446Z * [new branch] gh/ruisizhang123/1/base -> origin/gh/ruisizhang123/1/base 2025-12-04T08:53:08.4249528Z * [new branch] gh/ruisizhang123/1/head -> origin/gh/ruisizhang123/1/head 2025-12-04T08:53:08.4249605Z * [new branch] gh/ruisizhang123/1/orig -> origin/gh/ruisizhang123/1/orig 2025-12-04T08:53:08.4249681Z * [new branch] gh/ruisizhang123/4/base -> origin/gh/ruisizhang123/4/base 2025-12-04T08:53:08.4249758Z * [new branch] gh/ruisizhang123/4/head -> origin/gh/ruisizhang123/4/head 2025-12-04T08:53:08.4249833Z * [new branch] gh/ruisizhang123/4/orig -> origin/gh/ruisizhang123/4/orig 2025-12-04T08:53:08.4249909Z * [new branch] gh/ruisizhang123/5/base -> origin/gh/ruisizhang123/5/base 2025-12-04T08:53:08.4249986Z * [new branch] gh/ruisizhang123/5/head -> origin/gh/ruisizhang123/5/head 2025-12-04T08:53:08.4250060Z * [new branch] gh/ruisizhang123/5/orig -> origin/gh/ruisizhang123/5/orig 2025-12-04T08:53:08.4250137Z * [new branch] gh/ruisizhang123/6/base -> origin/gh/ruisizhang123/6/base 2025-12-04T08:53:08.4250218Z * [new branch] gh/ruisizhang123/6/head -> origin/gh/ruisizhang123/6/head 2025-12-04T08:53:08.4250292Z * [new branch] gh/ruisizhang123/6/orig -> origin/gh/ruisizhang123/6/orig 2025-12-04T08:53:08.4250366Z * [new branch] gh/ruisizhang123/7/base -> origin/gh/ruisizhang123/7/base 2025-12-04T08:53:08.4250443Z * [new branch] gh/ruisizhang123/7/head -> origin/gh/ruisizhang123/7/head 2025-12-04T08:53:08.4250521Z * [new branch] gh/ruisizhang123/7/orig -> origin/gh/ruisizhang123/7/orig 2025-12-04T08:53:08.4250594Z * [new branch] gh/ruisizhang123/8/base -> origin/gh/ruisizhang123/8/base 2025-12-04T08:53:08.4250671Z * [new branch] gh/ruisizhang123/8/head -> origin/gh/ruisizhang123/8/head 2025-12-04T08:53:08.4250784Z * [new branch] gh/ruisizhang123/8/orig -> origin/gh/ruisizhang123/8/orig 2025-12-04T08:53:08.4250861Z * [new branch] gh/ruisizhang123/9/base -> origin/gh/ruisizhang123/9/base 2025-12-04T08:53:08.4250964Z * [new branch] gh/ruisizhang123/9/head -> origin/gh/ruisizhang123/9/head 2025-12-04T08:53:08.4251039Z * [new branch] gh/ruisizhang123/9/orig -> origin/gh/ruisizhang123/9/orig 2025-12-04T08:53:08.4251117Z * [new branch] gh/seemethere/52/base -> origin/gh/seemethere/52/base 2025-12-04T08:53:08.4251192Z * [new branch] gh/seemethere/52/head -> origin/gh/seemethere/52/head 2025-12-04T08:53:08.4251265Z * [new branch] gh/seemethere/52/orig -> origin/gh/seemethere/52/orig 2025-12-04T08:53:08.4251338Z * [new branch] gh/seemethere/53/base -> origin/gh/seemethere/53/base 2025-12-04T08:53:08.4251409Z * [new branch] gh/seemethere/53/head -> origin/gh/seemethere/53/head 2025-12-04T08:53:08.4251483Z * [new branch] gh/seemethere/53/orig -> origin/gh/seemethere/53/orig 2025-12-04T08:53:08.4251560Z * [new branch] gh/seemethere/54/base -> origin/gh/seemethere/54/base 2025-12-04T08:53:08.4251632Z * [new branch] gh/seemethere/54/head -> origin/gh/seemethere/54/head 2025-12-04T08:53:08.4251703Z * [new branch] gh/seemethere/54/orig -> origin/gh/seemethere/54/orig 2025-12-04T08:53:08.4251777Z * [new branch] gh/seemethere/55/base -> origin/gh/seemethere/55/base 2025-12-04T08:53:08.4251849Z * [new branch] gh/seemethere/55/head -> origin/gh/seemethere/55/head 2025-12-04T08:53:08.4251922Z * [new branch] gh/seemethere/55/orig -> origin/gh/seemethere/55/orig 2025-12-04T08:53:08.4251996Z * [new branch] gh/seemethere/59/base -> origin/gh/seemethere/59/base 2025-12-04T08:53:08.4252067Z * [new branch] gh/seemethere/59/head -> origin/gh/seemethere/59/head 2025-12-04T08:53:08.4252144Z * [new branch] gh/seemethere/59/orig -> origin/gh/seemethere/59/orig 2025-12-04T08:53:08.4252220Z * [new branch] gh/seemethere/62/base -> origin/gh/seemethere/62/base 2025-12-04T08:53:08.4252293Z * [new branch] gh/seemethere/62/head -> origin/gh/seemethere/62/head 2025-12-04T08:53:08.4252366Z * [new branch] gh/seemethere/62/orig -> origin/gh/seemethere/62/orig 2025-12-04T08:53:08.4252438Z * [new branch] gh/seemethere/63/base -> origin/gh/seemethere/63/base 2025-12-04T08:53:08.4252510Z * [new branch] gh/seemethere/63/head -> origin/gh/seemethere/63/head 2025-12-04T08:53:08.4252586Z * [new branch] gh/seemethere/63/orig -> origin/gh/seemethere/63/orig 2025-12-04T08:53:08.4252657Z * [new branch] gh/seemethere/71/base -> origin/gh/seemethere/71/base 2025-12-04T08:53:08.4252729Z * [new branch] gh/seemethere/71/head -> origin/gh/seemethere/71/head 2025-12-04T08:53:08.4252804Z * [new branch] gh/seemethere/71/orig -> origin/gh/seemethere/71/orig 2025-12-04T08:53:08.4252878Z * [new branch] gh/seemethere/72/base -> origin/gh/seemethere/72/base 2025-12-04T08:53:08.4252949Z * [new branch] gh/seemethere/72/head -> origin/gh/seemethere/72/head 2025-12-04T08:53:08.4253022Z * [new branch] gh/seemethere/72/orig -> origin/gh/seemethere/72/orig 2025-12-04T08:53:08.4253095Z * [new branch] gh/seemethere/73/base -> origin/gh/seemethere/73/base 2025-12-04T08:53:08.4253167Z * [new branch] gh/seemethere/73/head -> origin/gh/seemethere/73/head 2025-12-04T08:53:08.4253242Z * [new branch] gh/seemethere/73/orig -> origin/gh/seemethere/73/orig 2025-12-04T08:53:08.4253356Z * [new branch] gh/seemethere/74/base -> origin/gh/seemethere/74/base 2025-12-04T08:53:08.4253471Z * [new branch] gh/seemethere/74/head -> origin/gh/seemethere/74/head 2025-12-04T08:53:08.4253547Z * [new branch] gh/seemethere/74/orig -> origin/gh/seemethere/74/orig 2025-12-04T08:53:08.4253661Z * [new branch] gh/seemethere/75/base -> origin/gh/seemethere/75/base 2025-12-04T08:53:08.4253735Z * [new branch] gh/seemethere/75/head -> origin/gh/seemethere/75/head 2025-12-04T08:53:08.4253809Z * [new branch] gh/seemethere/75/orig -> origin/gh/seemethere/75/orig 2025-12-04T08:53:08.4253881Z * [new branch] gh/seemethere/76/base -> origin/gh/seemethere/76/base 2025-12-04T08:53:08.4253956Z * [new branch] gh/seemethere/76/head -> origin/gh/seemethere/76/head 2025-12-04T08:53:08.4254034Z * [new branch] gh/seemethere/76/orig -> origin/gh/seemethere/76/orig 2025-12-04T08:53:08.4254114Z * [new branch] gh/shunting314/145/base -> origin/gh/shunting314/145/base 2025-12-04T08:53:08.4254193Z * [new branch] gh/shunting314/145/head -> origin/gh/shunting314/145/head 2025-12-04T08:53:08.4254269Z * [new branch] gh/shunting314/145/orig -> origin/gh/shunting314/145/orig 2025-12-04T08:53:08.4254346Z * [new branch] gh/shunting314/176/base -> origin/gh/shunting314/176/base 2025-12-04T08:53:08.4254423Z * [new branch] gh/shunting314/176/head -> origin/gh/shunting314/176/head 2025-12-04T08:53:08.4254496Z * [new branch] gh/shunting314/176/orig -> origin/gh/shunting314/176/orig 2025-12-04T08:53:08.4254570Z * [new branch] gh/shunting314/249/base -> origin/gh/shunting314/249/base 2025-12-04T08:53:08.4254645Z * [new branch] gh/shunting314/249/head -> origin/gh/shunting314/249/head 2025-12-04T08:53:08.4254718Z * [new branch] gh/shunting314/249/orig -> origin/gh/shunting314/249/orig 2025-12-04T08:53:08.4254790Z * [new branch] gh/shunting314/253/base -> origin/gh/shunting314/253/base 2025-12-04T08:53:08.4254865Z * [new branch] gh/shunting314/253/head -> origin/gh/shunting314/253/head 2025-12-04T08:53:08.4254940Z * [new branch] gh/shunting314/253/orig -> origin/gh/shunting314/253/orig 2025-12-04T08:53:08.4255013Z * [new branch] gh/shunting314/256/base -> origin/gh/shunting314/256/base 2025-12-04T08:53:08.4255090Z * [new branch] gh/shunting314/256/head -> origin/gh/shunting314/256/head 2025-12-04T08:53:08.4255163Z * [new branch] gh/shunting314/256/orig -> origin/gh/shunting314/256/orig 2025-12-04T08:53:08.4255237Z * [new branch] gh/shunting314/257/base -> origin/gh/shunting314/257/base 2025-12-04T08:53:08.4255312Z * [new branch] gh/shunting314/257/head -> origin/gh/shunting314/257/head 2025-12-04T08:53:08.4255387Z * [new branch] gh/shunting314/257/orig -> origin/gh/shunting314/257/orig 2025-12-04T08:53:08.4255462Z * [new branch] gh/shunting314/258/base -> origin/gh/shunting314/258/base 2025-12-04T08:53:08.4255539Z * [new branch] gh/shunting314/258/head -> origin/gh/shunting314/258/head 2025-12-04T08:53:08.4255613Z * [new branch] gh/shunting314/258/orig -> origin/gh/shunting314/258/orig 2025-12-04T08:53:08.4255690Z * [new branch] gh/shunting314/259/base -> origin/gh/shunting314/259/base 2025-12-04T08:53:08.4255763Z * [new branch] gh/shunting314/259/head -> origin/gh/shunting314/259/head 2025-12-04T08:53:08.4255837Z * [new branch] gh/shunting314/259/orig -> origin/gh/shunting314/259/orig 2025-12-04T08:53:08.4255911Z * [new branch] gh/shunting314/260/base -> origin/gh/shunting314/260/base 2025-12-04T08:53:08.4255986Z * [new branch] gh/shunting314/260/head -> origin/gh/shunting314/260/head 2025-12-04T08:53:08.4256061Z * [new branch] gh/shunting314/260/orig -> origin/gh/shunting314/260/orig 2025-12-04T08:53:08.4256164Z * [new branch] gh/shunting314/261/base -> origin/gh/shunting314/261/base 2025-12-04T08:53:08.4256238Z * [new branch] gh/shunting314/261/head -> origin/gh/shunting314/261/head 2025-12-04T08:53:08.4256340Z * [new branch] gh/shunting314/261/orig -> origin/gh/shunting314/261/orig 2025-12-04T08:53:08.4256414Z * [new branch] gh/shunting314/262/base -> origin/gh/shunting314/262/base 2025-12-04T08:53:08.4256487Z * [new branch] gh/shunting314/262/head -> origin/gh/shunting314/262/head 2025-12-04T08:53:08.4256561Z * [new branch] gh/shunting314/262/orig -> origin/gh/shunting314/262/orig 2025-12-04T08:53:08.4256638Z * [new branch] gh/shunting314/263/base -> origin/gh/shunting314/263/base 2025-12-04T08:53:08.4256711Z * [new branch] gh/shunting314/263/head -> origin/gh/shunting314/263/head 2025-12-04T08:53:08.4256785Z * [new branch] gh/shunting314/263/orig -> origin/gh/shunting314/263/orig 2025-12-04T08:53:08.4256861Z * [new branch] gh/shunting314/264/base -> origin/gh/shunting314/264/base 2025-12-04T08:53:08.4256935Z * [new branch] gh/shunting314/264/head -> origin/gh/shunting314/264/head 2025-12-04T08:53:08.4257008Z * [new branch] gh/shunting314/264/orig -> origin/gh/shunting314/264/orig 2025-12-04T08:53:08.4257085Z * [new branch] gh/shunting314/265/base -> origin/gh/shunting314/265/base 2025-12-04T08:53:08.4257159Z * [new branch] gh/shunting314/265/head -> origin/gh/shunting314/265/head 2025-12-04T08:53:08.4257232Z * [new branch] gh/shunting314/265/orig -> origin/gh/shunting314/265/orig 2025-12-04T08:53:08.4257305Z * [new branch] gh/shunting314/266/base -> origin/gh/shunting314/266/base 2025-12-04T08:53:08.4257378Z * [new branch] gh/shunting314/266/head -> origin/gh/shunting314/266/head 2025-12-04T08:53:08.4257454Z * [new branch] gh/shunting314/266/orig -> origin/gh/shunting314/266/orig 2025-12-04T08:53:08.4257530Z * [new branch] gh/shunting314/267/base -> origin/gh/shunting314/267/base 2025-12-04T08:53:08.4257605Z * [new branch] gh/shunting314/267/head -> origin/gh/shunting314/267/head 2025-12-04T08:53:08.4257680Z * [new branch] gh/shunting314/267/orig -> origin/gh/shunting314/267/orig 2025-12-04T08:53:08.4257755Z * [new branch] gh/shunting314/268/base -> origin/gh/shunting314/268/base 2025-12-04T08:53:08.4257829Z * [new branch] gh/shunting314/268/head -> origin/gh/shunting314/268/head 2025-12-04T08:53:08.4257906Z * [new branch] gh/shunting314/268/orig -> origin/gh/shunting314/268/orig 2025-12-04T08:53:08.4257978Z * [new branch] gh/shunting314/269/base -> origin/gh/shunting314/269/base 2025-12-04T08:53:08.4258052Z * [new branch] gh/shunting314/269/head -> origin/gh/shunting314/269/head 2025-12-04T08:53:08.4258128Z * [new branch] gh/shunting314/269/orig -> origin/gh/shunting314/269/orig 2025-12-04T08:53:08.4258203Z * [new branch] gh/silverguo/1/base -> origin/gh/silverguo/1/base 2025-12-04T08:53:08.4258277Z * [new branch] gh/silverguo/1/head -> origin/gh/silverguo/1/head 2025-12-04T08:53:08.4258350Z * [new branch] gh/silverguo/2/base -> origin/gh/silverguo/2/base 2025-12-04T08:53:08.4258419Z * [new branch] gh/silverguo/2/head -> origin/gh/silverguo/2/head 2025-12-04T08:53:08.4258489Z * [new branch] gh/silverguo/3/base -> origin/gh/silverguo/3/base 2025-12-04T08:53:08.4258561Z * [new branch] gh/silverguo/3/head -> origin/gh/silverguo/3/head 2025-12-04T08:53:08.4258631Z * [new branch] gh/silverguo/4/base -> origin/gh/silverguo/4/base 2025-12-04T08:53:08.4258700Z * [new branch] gh/silverguo/4/head -> origin/gh/silverguo/4/head 2025-12-04T08:53:08.4258806Z * [new branch] gh/slayton58/39/base -> origin/gh/slayton58/39/base 2025-12-04T08:53:08.4258879Z * [new branch] gh/slayton58/39/head -> origin/gh/slayton58/39/head 2025-12-04T08:53:08.4258982Z * [new branch] gh/slayton58/39/orig -> origin/gh/slayton58/39/orig 2025-12-04T08:53:08.4259054Z * [new branch] gh/slayton58/42/base -> origin/gh/slayton58/42/base 2025-12-04T08:53:08.4259123Z * [new branch] gh/slayton58/42/head -> origin/gh/slayton58/42/head 2025-12-04T08:53:08.4259195Z * [new branch] gh/slayton58/42/orig -> origin/gh/slayton58/42/orig 2025-12-04T08:53:08.4259265Z * [new branch] gh/slayton58/43/base -> origin/gh/slayton58/43/base 2025-12-04T08:53:08.4259335Z * [new branch] gh/slayton58/43/head -> origin/gh/slayton58/43/head 2025-12-04T08:53:08.4259407Z * [new branch] gh/slayton58/43/orig -> origin/gh/slayton58/43/orig 2025-12-04T08:53:08.4259478Z * [new branch] gh/slayton58/44/base -> origin/gh/slayton58/44/base 2025-12-04T08:53:08.4259547Z * [new branch] gh/slayton58/44/head -> origin/gh/slayton58/44/head 2025-12-04T08:53:08.4259618Z * [new branch] gh/slayton58/44/orig -> origin/gh/slayton58/44/orig 2025-12-04T08:53:08.4259689Z * [new branch] gh/slayton58/45/base -> origin/gh/slayton58/45/base 2025-12-04T08:53:08.4259759Z * [new branch] gh/slayton58/45/head -> origin/gh/slayton58/45/head 2025-12-04T08:53:08.4259830Z * [new branch] gh/slayton58/45/orig -> origin/gh/slayton58/45/orig 2025-12-04T08:53:08.4259901Z * [new branch] gh/slayton58/46/base -> origin/gh/slayton58/46/base 2025-12-04T08:53:08.4259970Z * [new branch] gh/slayton58/46/head -> origin/gh/slayton58/46/head 2025-12-04T08:53:08.4260042Z * [new branch] gh/slayton58/46/orig -> origin/gh/slayton58/46/orig 2025-12-04T08:53:08.4260113Z * [new branch] gh/slayton58/6/base -> origin/gh/slayton58/6/base 2025-12-04T08:53:08.4260182Z * [new branch] gh/slayton58/6/head -> origin/gh/slayton58/6/head 2025-12-04T08:53:08.4260256Z * [new branch] gh/slayton58/7/base -> origin/gh/slayton58/7/base 2025-12-04T08:53:08.4260325Z * [new branch] gh/slayton58/7/head -> origin/gh/slayton58/7/head 2025-12-04T08:53:08.4260403Z * [new branch] gh/soulitzer/269/base -> origin/gh/soulitzer/269/base 2025-12-04T08:53:08.4260478Z * [new branch] gh/soulitzer/269/head -> origin/gh/soulitzer/269/head 2025-12-04T08:53:08.4260551Z * [new branch] gh/soulitzer/269/orig -> origin/gh/soulitzer/269/orig 2025-12-04T08:53:08.4260626Z * [new branch] gh/soulitzer/276/base -> origin/gh/soulitzer/276/base 2025-12-04T08:53:08.4260697Z * [new branch] gh/soulitzer/276/head -> origin/gh/soulitzer/276/head 2025-12-04T08:53:08.4260770Z * [new branch] gh/soulitzer/276/orig -> origin/gh/soulitzer/276/orig 2025-12-04T08:53:08.4260846Z * [new branch] gh/soulitzer/287/base -> origin/gh/soulitzer/287/base 2025-12-04T08:53:08.4260919Z * [new branch] gh/soulitzer/287/head -> origin/gh/soulitzer/287/head 2025-12-04T08:53:08.4260991Z * [new branch] gh/soulitzer/287/orig -> origin/gh/soulitzer/287/orig 2025-12-04T08:53:08.4261064Z * [new branch] gh/soulitzer/296/base -> origin/gh/soulitzer/296/base 2025-12-04T08:53:08.4261135Z * [new branch] gh/soulitzer/296/head -> origin/gh/soulitzer/296/head 2025-12-04T08:53:08.4261206Z * [new branch] gh/soulitzer/296/orig -> origin/gh/soulitzer/296/orig 2025-12-04T08:53:08.4261280Z * [new branch] gh/soulitzer/299/base -> origin/gh/soulitzer/299/base 2025-12-04T08:53:08.4261352Z * [new branch] gh/soulitzer/299/head -> origin/gh/soulitzer/299/head 2025-12-04T08:53:08.4261458Z * [new branch] gh/soulitzer/299/orig -> origin/gh/soulitzer/299/orig 2025-12-04T08:53:08.4261558Z * [new branch] gh/soulitzer/300/base -> origin/gh/soulitzer/300/base 2025-12-04T08:53:08.4261629Z * [new branch] gh/soulitzer/300/head -> origin/gh/soulitzer/300/head 2025-12-04T08:53:08.4261700Z * [new branch] gh/soulitzer/300/orig -> origin/gh/soulitzer/300/orig 2025-12-04T08:53:08.4261774Z * [new branch] gh/soulitzer/301/base -> origin/gh/soulitzer/301/base 2025-12-04T08:53:08.4261845Z * [new branch] gh/soulitzer/301/head -> origin/gh/soulitzer/301/head 2025-12-04T08:53:08.4261918Z * [new branch] gh/soulitzer/301/orig -> origin/gh/soulitzer/301/orig 2025-12-04T08:53:08.4261992Z * [new branch] gh/soulitzer/313/base -> origin/gh/soulitzer/313/base 2025-12-04T08:53:08.4262063Z * [new branch] gh/soulitzer/313/head -> origin/gh/soulitzer/313/head 2025-12-04T08:53:08.4262139Z * [new branch] gh/soulitzer/313/orig -> origin/gh/soulitzer/313/orig 2025-12-04T08:53:08.4262211Z * [new branch] gh/soulitzer/319/base -> origin/gh/soulitzer/319/base 2025-12-04T08:53:08.4262282Z * [new branch] gh/soulitzer/319/head -> origin/gh/soulitzer/319/head 2025-12-04T08:53:08.4262356Z * [new branch] gh/soulitzer/319/orig -> origin/gh/soulitzer/319/orig 2025-12-04T08:53:08.4262428Z * [new branch] gh/soulitzer/320/base -> origin/gh/soulitzer/320/base 2025-12-04T08:53:08.4262500Z * [new branch] gh/soulitzer/320/head -> origin/gh/soulitzer/320/head 2025-12-04T08:53:08.4262571Z * [new branch] gh/soulitzer/320/orig -> origin/gh/soulitzer/320/orig 2025-12-04T08:53:08.4262643Z * [new branch] gh/soulitzer/336/base -> origin/gh/soulitzer/336/base 2025-12-04T08:53:08.4262715Z * [new branch] gh/soulitzer/336/head -> origin/gh/soulitzer/336/head 2025-12-04T08:53:08.4262788Z * [new branch] gh/soulitzer/336/orig -> origin/gh/soulitzer/336/orig 2025-12-04T08:53:08.4262861Z * [new branch] gh/soulitzer/347/base -> origin/gh/soulitzer/347/base 2025-12-04T08:53:08.4262931Z * [new branch] gh/soulitzer/347/head -> origin/gh/soulitzer/347/head 2025-12-04T08:53:08.4263005Z * [new branch] gh/soulitzer/347/orig -> origin/gh/soulitzer/347/orig 2025-12-04T08:53:08.4263075Z * [new branch] gh/soulitzer/349/base -> origin/gh/soulitzer/349/base 2025-12-04T08:53:08.4263148Z * [new branch] gh/soulitzer/349/head -> origin/gh/soulitzer/349/head 2025-12-04T08:53:08.4263220Z * [new branch] gh/soulitzer/349/orig -> origin/gh/soulitzer/349/orig 2025-12-04T08:53:08.4263340Z * [new branch] gh/soulitzer/350/base -> origin/gh/soulitzer/350/base 2025-12-04T08:53:08.4263416Z * [new branch] gh/soulitzer/350/head -> origin/gh/soulitzer/350/head 2025-12-04T08:53:08.4263491Z * [new branch] gh/soulitzer/350/orig -> origin/gh/soulitzer/350/orig 2025-12-04T08:53:08.4263565Z * [new branch] gh/soulitzer/351/base -> origin/gh/soulitzer/351/base 2025-12-04T08:53:08.4263640Z * [new branch] gh/soulitzer/351/head -> origin/gh/soulitzer/351/head 2025-12-04T08:53:08.4263713Z * [new branch] gh/soulitzer/351/orig -> origin/gh/soulitzer/351/orig 2025-12-04T08:53:08.4263784Z * [new branch] gh/soulitzer/353/base -> origin/gh/soulitzer/353/base 2025-12-04T08:53:08.4263858Z * [new branch] gh/soulitzer/353/head -> origin/gh/soulitzer/353/head 2025-12-04T08:53:08.4263930Z * [new branch] gh/soulitzer/353/orig -> origin/gh/soulitzer/353/orig 2025-12-04T08:53:08.4264001Z * [new branch] gh/soulitzer/358/base -> origin/gh/soulitzer/358/base 2025-12-04T08:53:08.4264116Z * [new branch] gh/soulitzer/358/head -> origin/gh/soulitzer/358/head 2025-12-04T08:53:08.4264188Z * [new branch] gh/soulitzer/358/orig -> origin/gh/soulitzer/358/orig 2025-12-04T08:53:08.4264304Z * [new branch] gh/soulitzer/359/base -> origin/gh/soulitzer/359/base 2025-12-04T08:53:08.4264378Z * [new branch] gh/soulitzer/359/head -> origin/gh/soulitzer/359/head 2025-12-04T08:53:08.4264450Z * [new branch] gh/soulitzer/359/orig -> origin/gh/soulitzer/359/orig 2025-12-04T08:53:08.4264523Z * [new branch] gh/soulitzer/374/base -> origin/gh/soulitzer/374/base 2025-12-04T08:53:08.4264596Z * [new branch] gh/soulitzer/374/head -> origin/gh/soulitzer/374/head 2025-12-04T08:53:08.4264669Z * [new branch] gh/soulitzer/374/orig -> origin/gh/soulitzer/374/orig 2025-12-04T08:53:08.4264740Z * [new branch] gh/soulitzer/375/base -> origin/gh/soulitzer/375/base 2025-12-04T08:53:08.4264815Z * [new branch] gh/soulitzer/375/head -> origin/gh/soulitzer/375/head 2025-12-04T08:53:08.4264887Z * [new branch] gh/soulitzer/375/orig -> origin/gh/soulitzer/375/orig 2025-12-04T08:53:08.4264959Z * [new branch] gh/soulitzer/380/base -> origin/gh/soulitzer/380/base 2025-12-04T08:53:08.4265032Z * [new branch] gh/soulitzer/380/head -> origin/gh/soulitzer/380/head 2025-12-04T08:53:08.4265103Z * [new branch] gh/soulitzer/380/orig -> origin/gh/soulitzer/380/orig 2025-12-04T08:53:08.4265176Z * [new branch] gh/soulitzer/385/base -> origin/gh/soulitzer/385/base 2025-12-04T08:53:08.4265248Z * [new branch] gh/soulitzer/385/head -> origin/gh/soulitzer/385/head 2025-12-04T08:53:08.4265320Z * [new branch] gh/soulitzer/385/orig -> origin/gh/soulitzer/385/orig 2025-12-04T08:53:08.4265396Z * [new branch] gh/soulitzer/386/base -> origin/gh/soulitzer/386/base 2025-12-04T08:53:08.4265470Z * [new branch] gh/soulitzer/386/head -> origin/gh/soulitzer/386/head 2025-12-04T08:53:08.4265542Z * [new branch] gh/soulitzer/386/orig -> origin/gh/soulitzer/386/orig 2025-12-04T08:53:08.4265615Z * [new branch] gh/soulitzer/387/base -> origin/gh/soulitzer/387/base 2025-12-04T08:53:08.4265688Z * [new branch] gh/soulitzer/387/head -> origin/gh/soulitzer/387/head 2025-12-04T08:53:08.4265758Z * [new branch] gh/soulitzer/387/orig -> origin/gh/soulitzer/387/orig 2025-12-04T08:53:08.4265831Z * [new branch] gh/soulitzer/388/base -> origin/gh/soulitzer/388/base 2025-12-04T08:53:08.4265904Z * [new branch] gh/soulitzer/388/head -> origin/gh/soulitzer/388/head 2025-12-04T08:53:08.4265976Z * [new branch] gh/soulitzer/388/orig -> origin/gh/soulitzer/388/orig 2025-12-04T08:53:08.4266051Z * [new branch] gh/soulitzer/389/base -> origin/gh/soulitzer/389/base 2025-12-04T08:53:08.4266125Z * [new branch] gh/soulitzer/389/head -> origin/gh/soulitzer/389/head 2025-12-04T08:53:08.4266199Z * [new branch] gh/soulitzer/389/orig -> origin/gh/soulitzer/389/orig 2025-12-04T08:53:08.4266271Z * [new branch] gh/soulitzer/390/base -> origin/gh/soulitzer/390/base 2025-12-04T08:53:08.4266344Z * [new branch] gh/soulitzer/390/head -> origin/gh/soulitzer/390/head 2025-12-04T08:53:08.4266414Z * [new branch] gh/soulitzer/390/orig -> origin/gh/soulitzer/390/orig 2025-12-04T08:53:08.4266489Z * [new branch] gh/soulitzer/391/base -> origin/gh/soulitzer/391/base 2025-12-04T08:53:08.4266561Z * [new branch] gh/soulitzer/391/head -> origin/gh/soulitzer/391/head 2025-12-04T08:53:08.4266634Z * [new branch] gh/soulitzer/391/orig -> origin/gh/soulitzer/391/orig 2025-12-04T08:53:08.4266744Z * [new branch] gh/soulitzer/392/base -> origin/gh/soulitzer/392/base 2025-12-04T08:53:08.4266815Z * [new branch] gh/soulitzer/392/head -> origin/gh/soulitzer/392/head 2025-12-04T08:53:08.4266915Z * [new branch] gh/soulitzer/392/orig -> origin/gh/soulitzer/392/orig 2025-12-04T08:53:08.4266988Z * [new branch] gh/swolchok/728/next -> origin/gh/swolchok/728/next 2025-12-04T08:53:08.4267060Z * [new branch] gh/swolchok/819/base -> origin/gh/swolchok/819/base 2025-12-04T08:53:08.4267133Z * [new branch] gh/swolchok/819/head -> origin/gh/swolchok/819/head 2025-12-04T08:53:08.4267205Z * [new branch] gh/swolchok/819/orig -> origin/gh/swolchok/819/orig 2025-12-04T08:53:08.4267275Z * [new branch] gh/swolchok/824/base -> origin/gh/swolchok/824/base 2025-12-04T08:53:08.4267352Z * [new branch] gh/swolchok/824/head -> origin/gh/swolchok/824/head 2025-12-04T08:53:08.4267424Z * [new branch] gh/swolchok/824/orig -> origin/gh/swolchok/824/orig 2025-12-04T08:53:08.4267495Z * [new branch] gh/swolchok/829/base -> origin/gh/swolchok/829/base 2025-12-04T08:53:08.4267568Z * [new branch] gh/swolchok/829/head -> origin/gh/swolchok/829/head 2025-12-04T08:53:08.4267638Z * [new branch] gh/swolchok/829/orig -> origin/gh/swolchok/829/orig 2025-12-04T08:53:08.4267709Z * [new branch] gh/swolchok/839/base -> origin/gh/swolchok/839/base 2025-12-04T08:53:08.4267779Z * [new branch] gh/swolchok/839/head -> origin/gh/swolchok/839/head 2025-12-04T08:53:08.4267848Z * [new branch] gh/swolchok/839/orig -> origin/gh/swolchok/839/orig 2025-12-04T08:53:08.4267917Z * [new branch] gh/swolchok/841/base -> origin/gh/swolchok/841/base 2025-12-04T08:53:08.4267990Z * [new branch] gh/swolchok/841/head -> origin/gh/swolchok/841/head 2025-12-04T08:53:08.4268062Z * [new branch] gh/swolchok/841/orig -> origin/gh/swolchok/841/orig 2025-12-04T08:53:08.4268132Z * [new branch] gh/swolchok/842/base -> origin/gh/swolchok/842/base 2025-12-04T08:53:08.4268205Z * [new branch] gh/swolchok/842/head -> origin/gh/swolchok/842/head 2025-12-04T08:53:08.4268275Z * [new branch] gh/swolchok/842/orig -> origin/gh/swolchok/842/orig 2025-12-04T08:53:08.4268346Z * [new branch] gh/swolchok/845/base -> origin/gh/swolchok/845/base 2025-12-04T08:53:08.4268415Z * [new branch] gh/swolchok/845/head -> origin/gh/swolchok/845/head 2025-12-04T08:53:08.4268484Z * [new branch] gh/swolchok/845/orig -> origin/gh/swolchok/845/orig 2025-12-04T08:53:08.4268554Z * [new branch] gh/swolchok/848/base -> origin/gh/swolchok/848/base 2025-12-04T08:53:08.4268624Z * [new branch] gh/swolchok/848/head -> origin/gh/swolchok/848/head 2025-12-04T08:53:08.4268696Z * [new branch] gh/swolchok/848/orig -> origin/gh/swolchok/848/orig 2025-12-04T08:53:08.4268770Z * [new branch] gh/swolchok/856/base -> origin/gh/swolchok/856/base 2025-12-04T08:53:08.4268841Z * [new branch] gh/swolchok/856/head -> origin/gh/swolchok/856/head 2025-12-04T08:53:08.4268911Z * [new branch] gh/swolchok/856/orig -> origin/gh/swolchok/856/orig 2025-12-04T08:53:08.4268982Z * [new branch] gh/swolchok/860/base -> origin/gh/swolchok/860/base 2025-12-04T08:53:08.4269053Z * [new branch] gh/swolchok/860/head -> origin/gh/swolchok/860/head 2025-12-04T08:53:08.4269122Z * [new branch] gh/swolchok/860/orig -> origin/gh/swolchok/860/orig 2025-12-04T08:53:08.4269191Z * [new branch] gh/swolchok/861/base -> origin/gh/swolchok/861/base 2025-12-04T08:53:08.4269263Z * [new branch] gh/swolchok/861/head -> origin/gh/swolchok/861/head 2025-12-04T08:53:08.4269360Z * [new branch] gh/swolchok/861/orig -> origin/gh/swolchok/861/orig 2025-12-04T08:53:08.4269469Z * [new branch] gh/swolchok/862/base -> origin/gh/swolchok/862/base 2025-12-04T08:53:08.4269540Z * [new branch] gh/swolchok/862/head -> origin/gh/swolchok/862/head 2025-12-04T08:53:08.4269611Z * [new branch] gh/swolchok/862/orig -> origin/gh/swolchok/862/orig 2025-12-04T08:53:08.4269680Z * [new branch] gh/swolchok/863/base -> origin/gh/swolchok/863/base 2025-12-04T08:53:08.4269751Z * [new branch] gh/swolchok/863/head -> origin/gh/swolchok/863/head 2025-12-04T08:53:08.4269821Z * [new branch] gh/swolchok/863/orig -> origin/gh/swolchok/863/orig 2025-12-04T08:53:08.4269890Z * [new branch] gh/swolchok/864/base -> origin/gh/swolchok/864/base 2025-12-04T08:53:08.4269960Z * [new branch] gh/swolchok/864/head -> origin/gh/swolchok/864/head 2025-12-04T08:53:08.4270032Z * [new branch] gh/swolchok/864/orig -> origin/gh/swolchok/864/orig 2025-12-04T08:53:08.4270103Z * [new branch] gh/swolchok/865/base -> origin/gh/swolchok/865/base 2025-12-04T08:53:08.4270173Z * [new branch] gh/swolchok/865/head -> origin/gh/swolchok/865/head 2025-12-04T08:53:08.4270242Z * [new branch] gh/swolchok/865/orig -> origin/gh/swolchok/865/orig 2025-12-04T08:53:08.4270311Z * [new branch] gh/swolchok/866/base -> origin/gh/swolchok/866/base 2025-12-04T08:53:08.4270382Z * [new branch] gh/swolchok/866/head -> origin/gh/swolchok/866/head 2025-12-04T08:53:08.4270451Z * [new branch] gh/swolchok/866/orig -> origin/gh/swolchok/866/orig 2025-12-04T08:53:08.4270521Z * [new branch] gh/swolchok/867/base -> origin/gh/swolchok/867/base 2025-12-04T08:53:08.4270591Z * [new branch] gh/swolchok/867/head -> origin/gh/swolchok/867/head 2025-12-04T08:53:08.4270662Z * [new branch] gh/swolchok/867/orig -> origin/gh/swolchok/867/orig 2025-12-04T08:53:08.4270733Z * [new branch] gh/swolchok/868/base -> origin/gh/swolchok/868/base 2025-12-04T08:53:08.4270802Z * [new branch] gh/swolchok/868/head -> origin/gh/swolchok/868/head 2025-12-04T08:53:08.4270871Z * [new branch] gh/swolchok/868/orig -> origin/gh/swolchok/868/orig 2025-12-04T08:53:08.4270942Z * [new branch] gh/swolchok/869/base -> origin/gh/swolchok/869/base 2025-12-04T08:53:08.4271012Z * [new branch] gh/swolchok/869/head -> origin/gh/swolchok/869/head 2025-12-04T08:53:08.4271081Z * [new branch] gh/swolchok/869/orig -> origin/gh/swolchok/869/orig 2025-12-04T08:53:08.4271151Z * [new branch] gh/swolchok/870/base -> origin/gh/swolchok/870/base 2025-12-04T08:53:08.4271220Z * [new branch] gh/swolchok/870/head -> origin/gh/swolchok/870/head 2025-12-04T08:53:08.4271291Z * [new branch] gh/swolchok/870/orig -> origin/gh/swolchok/870/orig 2025-12-04T08:53:08.4271363Z * [new branch] gh/swolchok/871/base -> origin/gh/swolchok/871/base 2025-12-04T08:53:08.4271433Z * [new branch] gh/swolchok/871/head -> origin/gh/swolchok/871/head 2025-12-04T08:53:08.4271503Z * [new branch] gh/swolchok/871/orig -> origin/gh/swolchok/871/orig 2025-12-04T08:53:08.4271578Z * [new branch] gh/teja-rao/4/base -> origin/gh/teja-rao/4/base 2025-12-04T08:53:08.4271648Z * [new branch] gh/teja-rao/4/head -> origin/gh/teja-rao/4/head 2025-12-04T08:53:08.4271716Z * [new branch] gh/teja-rao/4/orig -> origin/gh/teja-rao/4/orig 2025-12-04T08:53:08.4271787Z * [new branch] gh/tianyu-l/2/base -> origin/gh/tianyu-l/2/base 2025-12-04T08:53:08.4271856Z * [new branch] gh/tianyu-l/2/head -> origin/gh/tianyu-l/2/head 2025-12-04T08:53:08.4271952Z * [new branch] gh/tianyu-l/2/orig -> origin/gh/tianyu-l/2/orig 2025-12-04T08:53:08.4272048Z * [new branch] gh/tianyu-l/3/base -> origin/gh/tianyu-l/3/base 2025-12-04T08:53:08.4272116Z * [new branch] gh/tianyu-l/3/orig -> origin/gh/tianyu-l/3/orig 2025-12-04T08:53:08.4272184Z * [new branch] gh/tianyu-l/4/base -> origin/gh/tianyu-l/4/base 2025-12-04T08:53:08.4272252Z * [new branch] gh/tianyu-l/4/head -> origin/gh/tianyu-l/4/head 2025-12-04T08:53:08.4272319Z * [new branch] gh/tianyu-l/4/orig -> origin/gh/tianyu-l/4/orig 2025-12-04T08:53:08.4272411Z * [new branch] gh/tugsbayasgalan/10/base -> origin/gh/tugsbayasgalan/10/base 2025-12-04T08:53:08.4272497Z * [new branch] gh/tugsbayasgalan/10/head -> origin/gh/tugsbayasgalan/10/head 2025-12-04T08:53:08.4272579Z * [new branch] gh/tugsbayasgalan/10/orig -> origin/gh/tugsbayasgalan/10/orig 2025-12-04T08:53:08.4272662Z * [new branch] gh/tugsbayasgalan/13/base -> origin/gh/tugsbayasgalan/13/base 2025-12-04T08:53:08.4272745Z * [new branch] gh/tugsbayasgalan/13/head -> origin/gh/tugsbayasgalan/13/head 2025-12-04T08:53:08.4272826Z * [new branch] gh/tugsbayasgalan/13/orig -> origin/gh/tugsbayasgalan/13/orig 2025-12-04T08:53:08.4272909Z * [new branch] gh/tugsbayasgalan/17/base -> origin/gh/tugsbayasgalan/17/base 2025-12-04T08:53:08.4272991Z * [new branch] gh/tugsbayasgalan/17/head -> origin/gh/tugsbayasgalan/17/head 2025-12-04T08:53:08.4273072Z * [new branch] gh/tugsbayasgalan/17/orig -> origin/gh/tugsbayasgalan/17/orig 2025-12-04T08:53:08.4273155Z * [new branch] gh/tugsbayasgalan/2/base -> origin/gh/tugsbayasgalan/2/base 2025-12-04T08:53:08.4273236Z * [new branch] gh/tugsbayasgalan/2/head -> origin/gh/tugsbayasgalan/2/head 2025-12-04T08:53:08.4273370Z * [new branch] gh/tugsbayasgalan/2/orig -> origin/gh/tugsbayasgalan/2/orig 2025-12-04T08:53:08.4273458Z * [new branch] gh/tugsbayasgalan/28/base -> origin/gh/tugsbayasgalan/28/base 2025-12-04T08:53:08.4273540Z * [new branch] gh/tugsbayasgalan/28/head -> origin/gh/tugsbayasgalan/28/head 2025-12-04T08:53:08.4273620Z * [new branch] gh/tugsbayasgalan/28/orig -> origin/gh/tugsbayasgalan/28/orig 2025-12-04T08:53:08.4273703Z * [new branch] gh/tugsbayasgalan/32/base -> origin/gh/tugsbayasgalan/32/base 2025-12-04T08:53:08.4273784Z * [new branch] gh/tugsbayasgalan/32/head -> origin/gh/tugsbayasgalan/32/head 2025-12-04T08:53:08.4273866Z * [new branch] gh/tugsbayasgalan/32/orig -> origin/gh/tugsbayasgalan/32/orig 2025-12-04T08:53:08.4273946Z * [new branch] gh/tugsbayasgalan/35/base -> origin/gh/tugsbayasgalan/35/base 2025-12-04T08:53:08.4274029Z * [new branch] gh/tugsbayasgalan/35/head -> origin/gh/tugsbayasgalan/35/head 2025-12-04T08:53:08.4274112Z * [new branch] gh/tugsbayasgalan/35/orig -> origin/gh/tugsbayasgalan/35/orig 2025-12-04T08:53:08.4274194Z * [new branch] gh/tugsbayasgalan/36/base -> origin/gh/tugsbayasgalan/36/base 2025-12-04T08:53:08.4274275Z * [new branch] gh/tugsbayasgalan/36/head -> origin/gh/tugsbayasgalan/36/head 2025-12-04T08:53:08.4274358Z * [new branch] gh/tugsbayasgalan/36/orig -> origin/gh/tugsbayasgalan/36/orig 2025-12-04T08:53:08.4274439Z * [new branch] gh/tugsbayasgalan/37/base -> origin/gh/tugsbayasgalan/37/base 2025-12-04T08:53:08.4274519Z * [new branch] gh/tugsbayasgalan/37/head -> origin/gh/tugsbayasgalan/37/head 2025-12-04T08:53:08.4274602Z * [new branch] gh/tugsbayasgalan/37/orig -> origin/gh/tugsbayasgalan/37/orig 2025-12-04T08:53:08.4274684Z * [new branch] gh/tugsbayasgalan/43/base -> origin/gh/tugsbayasgalan/43/base 2025-12-04T08:53:08.4274809Z * [new branch] gh/tugsbayasgalan/43/head -> origin/gh/tugsbayasgalan/43/head 2025-12-04T08:53:08.4274930Z * [new branch] gh/tugsbayasgalan/43/orig -> origin/gh/tugsbayasgalan/43/orig 2025-12-04T08:53:08.4275012Z * [new branch] gh/tugsbayasgalan/48/base -> origin/gh/tugsbayasgalan/48/base 2025-12-04T08:53:08.4275093Z * [new branch] gh/tugsbayasgalan/48/head -> origin/gh/tugsbayasgalan/48/head 2025-12-04T08:53:08.4275176Z * [new branch] gh/tugsbayasgalan/48/orig -> origin/gh/tugsbayasgalan/48/orig 2025-12-04T08:53:08.4275258Z * [new branch] gh/tugsbayasgalan/51/base -> origin/gh/tugsbayasgalan/51/base 2025-12-04T08:53:08.4275340Z * [new branch] gh/tugsbayasgalan/51/head -> origin/gh/tugsbayasgalan/51/head 2025-12-04T08:53:08.4275423Z * [new branch] gh/tugsbayasgalan/51/orig -> origin/gh/tugsbayasgalan/51/orig 2025-12-04T08:53:08.4275505Z * [new branch] gh/tugsbayasgalan/52/base -> origin/gh/tugsbayasgalan/52/base 2025-12-04T08:53:08.4275590Z * [new branch] gh/tugsbayasgalan/52/head -> origin/gh/tugsbayasgalan/52/head 2025-12-04T08:53:08.4275671Z * [new branch] gh/tugsbayasgalan/52/orig -> origin/gh/tugsbayasgalan/52/orig 2025-12-04T08:53:08.4275752Z * [new branch] gh/tugsbayasgalan/53/base -> origin/gh/tugsbayasgalan/53/base 2025-12-04T08:53:08.4275834Z * [new branch] gh/tugsbayasgalan/53/head -> origin/gh/tugsbayasgalan/53/head 2025-12-04T08:53:08.4275915Z * [new branch] gh/tugsbayasgalan/53/orig -> origin/gh/tugsbayasgalan/53/orig 2025-12-04T08:53:08.4275995Z * [new branch] gh/tugsbayasgalan/55/base -> origin/gh/tugsbayasgalan/55/base 2025-12-04T08:53:08.4276077Z * [new branch] gh/tugsbayasgalan/55/head -> origin/gh/tugsbayasgalan/55/head 2025-12-04T08:53:08.4276158Z * [new branch] gh/tugsbayasgalan/55/orig -> origin/gh/tugsbayasgalan/55/orig 2025-12-04T08:53:08.4276240Z * [new branch] gh/tugsbayasgalan/59/base -> origin/gh/tugsbayasgalan/59/base 2025-12-04T08:53:08.4276324Z * [new branch] gh/tugsbayasgalan/59/head -> origin/gh/tugsbayasgalan/59/head 2025-12-04T08:53:08.4276405Z * [new branch] gh/tugsbayasgalan/59/orig -> origin/gh/tugsbayasgalan/59/orig 2025-12-04T08:53:08.4276486Z * [new branch] gh/tugsbayasgalan/6/base -> origin/gh/tugsbayasgalan/6/base 2025-12-04T08:53:08.4276568Z * [new branch] gh/tugsbayasgalan/6/head -> origin/gh/tugsbayasgalan/6/head 2025-12-04T08:53:08.4276648Z * [new branch] gh/tugsbayasgalan/6/orig -> origin/gh/tugsbayasgalan/6/orig 2025-12-04T08:53:08.4276729Z * [new branch] gh/tugsbayasgalan/60/base -> origin/gh/tugsbayasgalan/60/base 2025-12-04T08:53:08.4276812Z * [new branch] gh/tugsbayasgalan/60/head -> origin/gh/tugsbayasgalan/60/head 2025-12-04T08:53:08.4276897Z * [new branch] gh/tugsbayasgalan/60/orig -> origin/gh/tugsbayasgalan/60/orig 2025-12-04T08:53:08.4276981Z * [new branch] gh/tugsbayasgalan/61/base -> origin/gh/tugsbayasgalan/61/base 2025-12-04T08:53:08.4277063Z * [new branch] gh/tugsbayasgalan/61/head -> origin/gh/tugsbayasgalan/61/head 2025-12-04T08:53:08.4277144Z * [new branch] gh/tugsbayasgalan/61/orig -> origin/gh/tugsbayasgalan/61/orig 2025-12-04T08:53:08.4277227Z * [new branch] gh/tugsbayasgalan/63/base -> origin/gh/tugsbayasgalan/63/base 2025-12-04T08:53:08.4277308Z * [new branch] gh/tugsbayasgalan/63/head -> origin/gh/tugsbayasgalan/63/head 2025-12-04T08:53:08.4277389Z * [new branch] gh/tugsbayasgalan/63/orig -> origin/gh/tugsbayasgalan/63/orig 2025-12-04T08:53:08.4277471Z * [new branch] gh/tugsbayasgalan/67/base -> origin/gh/tugsbayasgalan/67/base 2025-12-04T08:53:08.4277553Z * [new branch] gh/tugsbayasgalan/67/head -> origin/gh/tugsbayasgalan/67/head 2025-12-04T08:53:08.4277662Z * [new branch] gh/tugsbayasgalan/67/orig -> origin/gh/tugsbayasgalan/67/orig 2025-12-04T08:53:08.4277777Z * [new branch] gh/tugsbayasgalan/68/base -> origin/gh/tugsbayasgalan/68/base 2025-12-04T08:53:08.4277859Z * [new branch] gh/tugsbayasgalan/68/head -> origin/gh/tugsbayasgalan/68/head 2025-12-04T08:53:08.4277941Z * [new branch] gh/tugsbayasgalan/68/orig -> origin/gh/tugsbayasgalan/68/orig 2025-12-04T08:53:08.4278023Z * [new branch] gh/tugsbayasgalan/7/base -> origin/gh/tugsbayasgalan/7/base 2025-12-04T08:53:08.4278103Z * [new branch] gh/tugsbayasgalan/7/head -> origin/gh/tugsbayasgalan/7/head 2025-12-04T08:53:08.4278182Z * [new branch] gh/tugsbayasgalan/7/orig -> origin/gh/tugsbayasgalan/7/orig 2025-12-04T08:53:08.4278266Z * [new branch] gh/tugsbayasgalan/70/base -> origin/gh/tugsbayasgalan/70/base 2025-12-04T08:53:08.4278349Z * [new branch] gh/tugsbayasgalan/70/head -> origin/gh/tugsbayasgalan/70/head 2025-12-04T08:53:08.4278432Z * [new branch] gh/tugsbayasgalan/70/orig -> origin/gh/tugsbayasgalan/70/orig 2025-12-04T08:53:08.4278514Z * [new branch] gh/tugsbayasgalan/71/base -> origin/gh/tugsbayasgalan/71/base 2025-12-04T08:53:08.4278595Z * [new branch] gh/tugsbayasgalan/71/head -> origin/gh/tugsbayasgalan/71/head 2025-12-04T08:53:08.4278677Z * [new branch] gh/tugsbayasgalan/71/orig -> origin/gh/tugsbayasgalan/71/orig 2025-12-04T08:53:08.4278758Z * [new branch] gh/tugsbayasgalan/72/base -> origin/gh/tugsbayasgalan/72/base 2025-12-04T08:53:08.4278838Z * [new branch] gh/tugsbayasgalan/72/head -> origin/gh/tugsbayasgalan/72/head 2025-12-04T08:53:08.4278920Z * [new branch] gh/tugsbayasgalan/72/orig -> origin/gh/tugsbayasgalan/72/orig 2025-12-04T08:53:08.4279002Z * [new branch] gh/tugsbayasgalan/73/base -> origin/gh/tugsbayasgalan/73/base 2025-12-04T08:53:08.4279083Z * [new branch] gh/tugsbayasgalan/73/head -> origin/gh/tugsbayasgalan/73/head 2025-12-04T08:53:08.4279167Z * [new branch] gh/tugsbayasgalan/73/orig -> origin/gh/tugsbayasgalan/73/orig 2025-12-04T08:53:08.4279248Z * [new branch] gh/tugsbayasgalan/74/base -> origin/gh/tugsbayasgalan/74/base 2025-12-04T08:53:08.4279329Z * [new branch] gh/tugsbayasgalan/74/head -> origin/gh/tugsbayasgalan/74/head 2025-12-04T08:53:08.4279412Z * [new branch] gh/tugsbayasgalan/74/orig -> origin/gh/tugsbayasgalan/74/orig 2025-12-04T08:53:08.4279493Z * [new branch] gh/tugsbayasgalan/75/base -> origin/gh/tugsbayasgalan/75/base 2025-12-04T08:53:08.4279574Z * [new branch] gh/tugsbayasgalan/75/head -> origin/gh/tugsbayasgalan/75/head 2025-12-04T08:53:08.4279657Z * [new branch] gh/tugsbayasgalan/75/orig -> origin/gh/tugsbayasgalan/75/orig 2025-12-04T08:53:08.4279739Z * [new branch] gh/tugsbayasgalan/76/base -> origin/gh/tugsbayasgalan/76/base 2025-12-04T08:53:08.4279822Z * [new branch] gh/tugsbayasgalan/76/head -> origin/gh/tugsbayasgalan/76/head 2025-12-04T08:53:08.4279904Z * [new branch] gh/tugsbayasgalan/76/orig -> origin/gh/tugsbayasgalan/76/orig 2025-12-04T08:53:08.4279985Z * [new branch] gh/tugsbayasgalan/77/base -> origin/gh/tugsbayasgalan/77/base 2025-12-04T08:53:08.4280068Z * [new branch] gh/tugsbayasgalan/77/head -> origin/gh/tugsbayasgalan/77/head 2025-12-04T08:53:08.4280149Z * [new branch] gh/tugsbayasgalan/77/orig -> origin/gh/tugsbayasgalan/77/orig 2025-12-04T08:53:08.4280230Z * [new branch] gh/tugsbayasgalan/78/base -> origin/gh/tugsbayasgalan/78/base 2025-12-04T08:53:08.4280312Z * [new branch] gh/tugsbayasgalan/78/head -> origin/gh/tugsbayasgalan/78/head 2025-12-04T08:53:08.4280427Z * [new branch] gh/tugsbayasgalan/78/orig -> origin/gh/tugsbayasgalan/78/orig 2025-12-04T08:53:08.4280529Z * [new branch] gh/tugsbayasgalan/79/base -> origin/gh/tugsbayasgalan/79/base 2025-12-04T08:53:08.4280612Z * [new branch] gh/tugsbayasgalan/79/head -> origin/gh/tugsbayasgalan/79/head 2025-12-04T08:53:08.4280693Z * [new branch] gh/tugsbayasgalan/79/orig -> origin/gh/tugsbayasgalan/79/orig 2025-12-04T08:53:08.4280773Z * [new branch] gh/tugsbayasgalan/8/base -> origin/gh/tugsbayasgalan/8/base 2025-12-04T08:53:08.4280854Z * [new branch] gh/tugsbayasgalan/8/head -> origin/gh/tugsbayasgalan/8/head 2025-12-04T08:53:08.4280932Z * [new branch] gh/tugsbayasgalan/8/orig -> origin/gh/tugsbayasgalan/8/orig 2025-12-04T08:53:08.4281014Z * [new branch] gh/tugsbayasgalan/80/base -> origin/gh/tugsbayasgalan/80/base 2025-12-04T08:53:08.4281096Z * [new branch] gh/tugsbayasgalan/80/head -> origin/gh/tugsbayasgalan/80/head 2025-12-04T08:53:08.4281179Z * [new branch] gh/tugsbayasgalan/80/orig -> origin/gh/tugsbayasgalan/80/orig 2025-12-04T08:53:08.4281262Z * [new branch] gh/tugsbayasgalan/81/base -> origin/gh/tugsbayasgalan/81/base 2025-12-04T08:53:08.4281344Z * [new branch] gh/tugsbayasgalan/81/head -> origin/gh/tugsbayasgalan/81/head 2025-12-04T08:53:08.4281426Z * [new branch] gh/tugsbayasgalan/81/orig -> origin/gh/tugsbayasgalan/81/orig 2025-12-04T08:53:08.4281507Z * [new branch] gh/tugsbayasgalan/82/base -> origin/gh/tugsbayasgalan/82/base 2025-12-04T08:53:08.4281590Z * [new branch] gh/tugsbayasgalan/82/head -> origin/gh/tugsbayasgalan/82/head 2025-12-04T08:53:08.4281671Z * [new branch] gh/tugsbayasgalan/82/orig -> origin/gh/tugsbayasgalan/82/orig 2025-12-04T08:53:08.4281754Z * [new branch] gh/tugsbayasgalan/83/base -> origin/gh/tugsbayasgalan/83/base 2025-12-04T08:53:08.4281836Z * [new branch] gh/tugsbayasgalan/83/head -> origin/gh/tugsbayasgalan/83/head 2025-12-04T08:53:08.4281919Z * [new branch] gh/tugsbayasgalan/83/orig -> origin/gh/tugsbayasgalan/83/orig 2025-12-04T08:53:08.4282001Z * [new branch] gh/tugsbayasgalan/84/base -> origin/gh/tugsbayasgalan/84/base 2025-12-04T08:53:08.4282082Z * [new branch] gh/tugsbayasgalan/84/head -> origin/gh/tugsbayasgalan/84/head 2025-12-04T08:53:08.4282163Z * [new branch] gh/tugsbayasgalan/84/orig -> origin/gh/tugsbayasgalan/84/orig 2025-12-04T08:53:08.4282246Z * [new branch] gh/tugsbayasgalan/85/base -> origin/gh/tugsbayasgalan/85/base 2025-12-04T08:53:08.4282327Z * [new branch] gh/tugsbayasgalan/85/head -> origin/gh/tugsbayasgalan/85/head 2025-12-04T08:53:08.4282408Z * [new branch] gh/tugsbayasgalan/85/orig -> origin/gh/tugsbayasgalan/85/orig 2025-12-04T08:53:08.4282491Z * [new branch] gh/tugsbayasgalan/86/base -> origin/gh/tugsbayasgalan/86/base 2025-12-04T08:53:08.4282572Z * [new branch] gh/tugsbayasgalan/86/head -> origin/gh/tugsbayasgalan/86/head 2025-12-04T08:53:08.4282653Z * [new branch] gh/tugsbayasgalan/86/orig -> origin/gh/tugsbayasgalan/86/orig 2025-12-04T08:53:08.4282736Z * [new branch] gh/tugsbayasgalan/87/base -> origin/gh/tugsbayasgalan/87/base 2025-12-04T08:53:08.4282818Z * [new branch] gh/tugsbayasgalan/87/head -> origin/gh/tugsbayasgalan/87/head 2025-12-04T08:53:08.4282900Z * [new branch] gh/tugsbayasgalan/87/orig -> origin/gh/tugsbayasgalan/87/orig 2025-12-04T08:53:08.4282983Z * [new branch] gh/tugsbayasgalan/88/base -> origin/gh/tugsbayasgalan/88/base 2025-12-04T08:53:08.4283065Z * [new branch] gh/tugsbayasgalan/88/head -> origin/gh/tugsbayasgalan/88/head 2025-12-04T08:53:08.4283147Z * [new branch] gh/tugsbayasgalan/88/orig -> origin/gh/tugsbayasgalan/88/orig 2025-12-04T08:53:08.4283304Z * [new branch] gh/tugsbayasgalan/89/base -> origin/gh/tugsbayasgalan/89/base 2025-12-04T08:53:08.4283429Z * [new branch] gh/tugsbayasgalan/89/head -> origin/gh/tugsbayasgalan/89/head 2025-12-04T08:53:08.4283511Z * [new branch] gh/tugsbayasgalan/89/orig -> origin/gh/tugsbayasgalan/89/orig 2025-12-04T08:53:08.4283590Z * [new branch] gh/tugsbayasgalan/9/base -> origin/gh/tugsbayasgalan/9/base 2025-12-04T08:53:08.4283669Z * [new branch] gh/tugsbayasgalan/9/head -> origin/gh/tugsbayasgalan/9/head 2025-12-04T08:53:08.4283749Z * [new branch] gh/tugsbayasgalan/9/orig -> origin/gh/tugsbayasgalan/9/orig 2025-12-04T08:53:08.4283831Z * [new branch] gh/tugsbayasgalan/90/base -> origin/gh/tugsbayasgalan/90/base 2025-12-04T08:53:08.4283913Z * [new branch] gh/tugsbayasgalan/90/head -> origin/gh/tugsbayasgalan/90/head 2025-12-04T08:53:08.4283997Z * [new branch] gh/tugsbayasgalan/90/orig -> origin/gh/tugsbayasgalan/90/orig 2025-12-04T08:53:08.4284079Z * [new branch] gh/tugsbayasgalan/91/base -> origin/gh/tugsbayasgalan/91/base 2025-12-04T08:53:08.4284160Z * [new branch] gh/tugsbayasgalan/91/head -> origin/gh/tugsbayasgalan/91/head 2025-12-04T08:53:08.4284241Z * [new branch] gh/tugsbayasgalan/91/orig -> origin/gh/tugsbayasgalan/91/orig 2025-12-04T08:53:08.4284322Z * [new branch] gh/tugsbayasgalan/92/base -> origin/gh/tugsbayasgalan/92/base 2025-12-04T08:53:08.4284403Z * [new branch] gh/tugsbayasgalan/92/head -> origin/gh/tugsbayasgalan/92/head 2025-12-04T08:53:08.4284487Z * [new branch] gh/tugsbayasgalan/92/orig -> origin/gh/tugsbayasgalan/92/orig 2025-12-04T08:53:08.4284568Z * [new branch] gh/tugsbayasgalan/93/base -> origin/gh/tugsbayasgalan/93/base 2025-12-04T08:53:08.4284649Z * [new branch] gh/tugsbayasgalan/93/head -> origin/gh/tugsbayasgalan/93/head 2025-12-04T08:53:08.4284732Z * [new branch] gh/tugsbayasgalan/93/orig -> origin/gh/tugsbayasgalan/93/orig 2025-12-04T08:53:08.4284803Z * [new branch] gh/v0i0/14/base -> origin/gh/v0i0/14/base 2025-12-04T08:53:08.4284869Z * [new branch] gh/v0i0/14/head -> origin/gh/v0i0/14/head 2025-12-04T08:53:08.4284933Z * [new branch] gh/v0i0/14/orig -> origin/gh/v0i0/14/orig 2025-12-04T08:53:08.4284997Z * [new branch] gh/v0i0/15/base -> origin/gh/v0i0/15/base 2025-12-04T08:53:08.4285060Z * [new branch] gh/v0i0/15/head -> origin/gh/v0i0/15/head 2025-12-04T08:53:08.4285122Z * [new branch] gh/v0i0/15/orig -> origin/gh/v0i0/15/orig 2025-12-04T08:53:08.4285184Z * [new branch] gh/v0i0/16/base -> origin/gh/v0i0/16/base 2025-12-04T08:53:08.4285246Z * [new branch] gh/v0i0/16/head -> origin/gh/v0i0/16/head 2025-12-04T08:53:08.4285310Z * [new branch] gh/v0i0/16/orig -> origin/gh/v0i0/16/orig 2025-12-04T08:53:08.4285373Z * [new branch] gh/v0i0/17/base -> origin/gh/v0i0/17/base 2025-12-04T08:53:08.4285436Z * [new branch] gh/v0i0/17/head -> origin/gh/v0i0/17/head 2025-12-04T08:53:08.4285497Z * [new branch] gh/v0i0/17/orig -> origin/gh/v0i0/17/orig 2025-12-04T08:53:08.4285559Z * [new branch] gh/v0i0/18/base -> origin/gh/v0i0/18/base 2025-12-04T08:53:08.4285621Z * [new branch] gh/v0i0/18/head -> origin/gh/v0i0/18/head 2025-12-04T08:53:08.4285683Z * [new branch] gh/v0i0/18/orig -> origin/gh/v0i0/18/orig 2025-12-04T08:53:08.4285745Z * [new branch] gh/v0i0/19/base -> origin/gh/v0i0/19/base 2025-12-04T08:53:08.4285807Z * [new branch] gh/v0i0/19/head -> origin/gh/v0i0/19/head 2025-12-04T08:53:08.4285919Z * [new branch] gh/v0i0/19/orig -> origin/gh/v0i0/19/orig 2025-12-04T08:53:08.4286029Z * [new branch] gh/vishal9-team/1/base -> origin/gh/vishal9-team/1/base 2025-12-04T08:53:08.4286109Z * [new branch] gh/vishal9-team/1/head -> origin/gh/vishal9-team/1/head 2025-12-04T08:53:08.4286183Z * [new branch] gh/vishal9-team/2/base -> origin/gh/vishal9-team/2/base 2025-12-04T08:53:08.4286258Z * [new branch] gh/vishal9-team/2/head -> origin/gh/vishal9-team/2/head 2025-12-04T08:53:08.4286332Z * [new branch] gh/vishal9-team/2/orig -> origin/gh/vishal9-team/2/orig 2025-12-04T08:53:08.4286405Z * [new branch] gh/vishal9-team/3/base -> origin/gh/vishal9-team/3/base 2025-12-04T08:53:08.4286478Z * [new branch] gh/vishal9-team/3/head -> origin/gh/vishal9-team/3/head 2025-12-04T08:53:08.4286551Z * [new branch] gh/vishal9-team/3/orig -> origin/gh/vishal9-team/3/orig 2025-12-04T08:53:08.4286626Z * [new branch] gh/vishal9-team/4/base -> origin/gh/vishal9-team/4/base 2025-12-04T08:53:08.4286701Z * [new branch] gh/vishal9-team/4/head -> origin/gh/vishal9-team/4/head 2025-12-04T08:53:08.4286774Z * [new branch] gh/vishal9-team/4/orig -> origin/gh/vishal9-team/4/orig 2025-12-04T08:53:08.4286839Z * [new branch] gh/vkuzo/1/next -> origin/gh/vkuzo/1/next 2025-12-04T08:53:08.4286906Z * [new branch] gh/vkuzo/2/next -> origin/gh/vkuzo/2/next 2025-12-04T08:53:08.4286970Z * [new branch] gh/vkuzo/3/next -> origin/gh/vkuzo/3/next 2025-12-04T08:53:08.4287044Z * [new branch] gh/wconstab/424/base -> origin/gh/wconstab/424/base 2025-12-04T08:53:08.4287118Z * [new branch] gh/wconstab/424/head -> origin/gh/wconstab/424/head 2025-12-04T08:53:08.4287189Z * [new branch] gh/wconstab/424/orig -> origin/gh/wconstab/424/orig 2025-12-04T08:53:08.4287262Z * [new branch] gh/wconstab/435/base -> origin/gh/wconstab/435/base 2025-12-04T08:53:08.4287335Z * [new branch] gh/wconstab/435/head -> origin/gh/wconstab/435/head 2025-12-04T08:53:08.4287405Z * [new branch] gh/wconstab/435/orig -> origin/gh/wconstab/435/orig 2025-12-04T08:53:08.4287475Z * [new branch] gh/wconstab/444/base -> origin/gh/wconstab/444/base 2025-12-04T08:53:08.4287546Z * [new branch] gh/wconstab/444/head -> origin/gh/wconstab/444/head 2025-12-04T08:53:08.4287615Z * [new branch] gh/wconstab/444/orig -> origin/gh/wconstab/444/orig 2025-12-04T08:53:08.4287684Z * [new branch] gh/wconstab/447/base -> origin/gh/wconstab/447/base 2025-12-04T08:53:08.4287756Z * [new branch] gh/wconstab/447/head -> origin/gh/wconstab/447/head 2025-12-04T08:53:08.4287825Z * [new branch] gh/wconstab/447/orig -> origin/gh/wconstab/447/orig 2025-12-04T08:53:08.4287896Z * [new branch] gh/wconstab/448/base -> origin/gh/wconstab/448/base 2025-12-04T08:53:08.4287969Z * [new branch] gh/wconstab/448/head -> origin/gh/wconstab/448/head 2025-12-04T08:53:08.4288039Z * [new branch] gh/wconstab/448/orig -> origin/gh/wconstab/448/orig 2025-12-04T08:53:08.4288110Z * [new branch] gh/wconstab/449/base -> origin/gh/wconstab/449/base 2025-12-04T08:53:08.4288179Z * [new branch] gh/wconstab/449/head -> origin/gh/wconstab/449/head 2025-12-04T08:53:08.4288249Z * [new branch] gh/wconstab/449/orig -> origin/gh/wconstab/449/orig 2025-12-04T08:53:08.4288320Z * [new branch] gh/wconstab/450/base -> origin/gh/wconstab/450/base 2025-12-04T08:53:08.4288389Z * [new branch] gh/wconstab/450/head -> origin/gh/wconstab/450/head 2025-12-04T08:53:08.4288459Z * [new branch] gh/wconstab/450/orig -> origin/gh/wconstab/450/orig 2025-12-04T08:53:08.4288555Z * [new branch] gh/wconstab/451/base -> origin/gh/wconstab/451/base 2025-12-04T08:53:08.4288655Z * [new branch] gh/wconstab/451/head -> origin/gh/wconstab/451/head 2025-12-04T08:53:08.4288726Z * [new branch] gh/wconstab/451/orig -> origin/gh/wconstab/451/orig 2025-12-04T08:53:08.4288797Z * [new branch] gh/wconstab/452/base -> origin/gh/wconstab/452/base 2025-12-04T08:53:08.4288867Z * [new branch] gh/wconstab/452/head -> origin/gh/wconstab/452/head 2025-12-04T08:53:08.4288937Z * [new branch] gh/wconstab/452/orig -> origin/gh/wconstab/452/orig 2025-12-04T08:53:08.4289007Z * [new branch] gh/wconstab/453/base -> origin/gh/wconstab/453/base 2025-12-04T08:53:08.4289077Z * [new branch] gh/wconstab/453/head -> origin/gh/wconstab/453/head 2025-12-04T08:53:08.4289147Z * [new branch] gh/wconstab/453/orig -> origin/gh/wconstab/453/orig 2025-12-04T08:53:08.4289220Z * [new branch] gh/wconstab/454/base -> origin/gh/wconstab/454/base 2025-12-04T08:53:08.4289292Z * [new branch] gh/wconstab/454/head -> origin/gh/wconstab/454/head 2025-12-04T08:53:08.4289363Z * [new branch] gh/wconstab/454/orig -> origin/gh/wconstab/454/orig 2025-12-04T08:53:08.4289434Z * [new branch] gh/wconstab/455/base -> origin/gh/wconstab/455/base 2025-12-04T08:53:08.4289504Z * [new branch] gh/wconstab/455/head -> origin/gh/wconstab/455/head 2025-12-04T08:53:08.4289574Z * [new branch] gh/wconstab/455/orig -> origin/gh/wconstab/455/orig 2025-12-04T08:53:08.4289644Z * [new branch] gh/wconstab/456/base -> origin/gh/wconstab/456/base 2025-12-04T08:53:08.4289713Z * [new branch] gh/wconstab/456/head -> origin/gh/wconstab/456/head 2025-12-04T08:53:08.4289785Z * [new branch] gh/wconstab/456/orig -> origin/gh/wconstab/456/orig 2025-12-04T08:53:08.4289855Z * [new branch] gh/wconstab/457/base -> origin/gh/wconstab/457/base 2025-12-04T08:53:08.4289926Z * [new branch] gh/wconstab/457/head -> origin/gh/wconstab/457/head 2025-12-04T08:53:08.4289997Z * [new branch] gh/wconstab/457/orig -> origin/gh/wconstab/457/orig 2025-12-04T08:53:08.4290067Z * [new branch] gh/wconstab/458/base -> origin/gh/wconstab/458/base 2025-12-04T08:53:08.4290136Z * [new branch] gh/wconstab/458/head -> origin/gh/wconstab/458/head 2025-12-04T08:53:08.4290207Z * [new branch] gh/wconstab/458/orig -> origin/gh/wconstab/458/orig 2025-12-04T08:53:08.4290276Z * [new branch] gh/wconstab/459/base -> origin/gh/wconstab/459/base 2025-12-04T08:53:08.4290346Z * [new branch] gh/wconstab/459/head -> origin/gh/wconstab/459/head 2025-12-04T08:53:08.4290418Z * [new branch] gh/wconstab/459/orig -> origin/gh/wconstab/459/orig 2025-12-04T08:53:08.4290487Z * [new branch] gh/wconstab/460/base -> origin/gh/wconstab/460/base 2025-12-04T08:53:08.4290558Z * [new branch] gh/wconstab/460/head -> origin/gh/wconstab/460/head 2025-12-04T08:53:08.4290630Z * [new branch] gh/wconstab/460/orig -> origin/gh/wconstab/460/orig 2025-12-04T08:53:08.4290700Z * [new branch] gh/wconstab/461/base -> origin/gh/wconstab/461/base 2025-12-04T08:53:08.4290770Z * [new branch] gh/wconstab/461/head -> origin/gh/wconstab/461/head 2025-12-04T08:53:08.4290841Z * [new branch] gh/wconstab/461/orig -> origin/gh/wconstab/461/orig 2025-12-04T08:53:08.4290911Z * [new branch] gh/wconstab/462/base -> origin/gh/wconstab/462/base 2025-12-04T08:53:08.4290980Z * [new branch] gh/wconstab/462/head -> origin/gh/wconstab/462/head 2025-12-04T08:53:08.4291085Z * [new branch] gh/wconstab/462/orig -> origin/gh/wconstab/462/orig 2025-12-04T08:53:08.4291155Z * [new branch] gh/wconstab/463/base -> origin/gh/wconstab/463/base 2025-12-04T08:53:08.4291259Z * [new branch] gh/wconstab/463/head -> origin/gh/wconstab/463/head 2025-12-04T08:53:08.4291330Z * [new branch] gh/wconstab/463/orig -> origin/gh/wconstab/463/orig 2025-12-04T08:53:08.4291399Z * [new branch] gh/wconstab/464/base -> origin/gh/wconstab/464/base 2025-12-04T08:53:08.4291471Z * [new branch] gh/wconstab/464/head -> origin/gh/wconstab/464/head 2025-12-04T08:53:08.4291540Z * [new branch] gh/wconstab/464/orig -> origin/gh/wconstab/464/orig 2025-12-04T08:53:08.4291610Z * [new branch] gh/wconstab/465/base -> origin/gh/wconstab/465/base 2025-12-04T08:53:08.4291682Z * [new branch] gh/wconstab/465/head -> origin/gh/wconstab/465/head 2025-12-04T08:53:08.4291753Z * [new branch] gh/wconstab/465/orig -> origin/gh/wconstab/465/orig 2025-12-04T08:53:08.4291824Z * [new branch] gh/wconstab/466/base -> origin/gh/wconstab/466/base 2025-12-04T08:53:08.4291895Z * [new branch] gh/wconstab/466/head -> origin/gh/wconstab/466/head 2025-12-04T08:53:08.4291965Z * [new branch] gh/wconstab/466/orig -> origin/gh/wconstab/466/orig 2025-12-04T08:53:08.4292035Z * [new branch] gh/wconstab/467/base -> origin/gh/wconstab/467/base 2025-12-04T08:53:08.4292106Z * [new branch] gh/wconstab/467/head -> origin/gh/wconstab/467/head 2025-12-04T08:53:08.4292176Z * [new branch] gh/wconstab/467/orig -> origin/gh/wconstab/467/orig 2025-12-04T08:53:08.4292245Z * [new branch] gh/wconstab/468/base -> origin/gh/wconstab/468/base 2025-12-04T08:53:08.4292316Z * [new branch] gh/wconstab/468/head -> origin/gh/wconstab/468/head 2025-12-04T08:53:08.4292388Z * [new branch] gh/wconstab/468/orig -> origin/gh/wconstab/468/orig 2025-12-04T08:53:08.4292461Z * [new branch] gh/weifengpy/39/base -> origin/gh/weifengpy/39/base 2025-12-04T08:53:08.4292535Z * [new branch] gh/weifengpy/39/head -> origin/gh/weifengpy/39/head 2025-12-04T08:53:08.4292606Z * [new branch] gh/weifengpy/39/orig -> origin/gh/weifengpy/39/orig 2025-12-04T08:53:08.4292678Z * [new branch] gh/weifengpy/40/base -> origin/gh/weifengpy/40/base 2025-12-04T08:53:08.4292749Z * [new branch] gh/weifengpy/40/head -> origin/gh/weifengpy/40/head 2025-12-04T08:53:08.4292819Z * [new branch] gh/weifengpy/40/orig -> origin/gh/weifengpy/40/orig 2025-12-04T08:53:08.4292891Z * [new branch] gh/weifengpy/41/base -> origin/gh/weifengpy/41/base 2025-12-04T08:53:08.4292962Z * [new branch] gh/weifengpy/41/head -> origin/gh/weifengpy/41/head 2025-12-04T08:53:08.4293034Z * [new branch] gh/weifengpy/41/orig -> origin/gh/weifengpy/41/orig 2025-12-04T08:53:08.4293118Z * [new branch] gh/williamwen42/250/base -> origin/gh/williamwen42/250/base 2025-12-04T08:53:08.4293199Z * [new branch] gh/williamwen42/250/head -> origin/gh/williamwen42/250/head 2025-12-04T08:53:08.4293316Z * [new branch] gh/williamwen42/250/orig -> origin/gh/williamwen42/250/orig 2025-12-04T08:53:08.4293398Z * [new branch] gh/williamwen42/279/base -> origin/gh/williamwen42/279/base 2025-12-04T08:53:08.4293475Z * [new branch] gh/williamwen42/279/head -> origin/gh/williamwen42/279/head 2025-12-04T08:53:08.4293551Z * [new branch] gh/williamwen42/279/orig -> origin/gh/williamwen42/279/orig 2025-12-04T08:53:08.4293629Z * [new branch] gh/williamwen42/282/base -> origin/gh/williamwen42/282/base 2025-12-04T08:53:08.4293705Z * [new branch] gh/williamwen42/282/head -> origin/gh/williamwen42/282/head 2025-12-04T08:53:08.4293820Z * [new branch] gh/williamwen42/282/orig -> origin/gh/williamwen42/282/orig 2025-12-04T08:53:08.4293940Z * [new branch] gh/williamwen42/287/base -> origin/gh/williamwen42/287/base 2025-12-04T08:53:08.4294018Z * [new branch] gh/williamwen42/287/head -> origin/gh/williamwen42/287/head 2025-12-04T08:53:08.4294094Z * [new branch] gh/williamwen42/287/orig -> origin/gh/williamwen42/287/orig 2025-12-04T08:53:08.4294172Z * [new branch] gh/williamwen42/288/base -> origin/gh/williamwen42/288/base 2025-12-04T08:53:08.4294249Z * [new branch] gh/williamwen42/288/head -> origin/gh/williamwen42/288/head 2025-12-04T08:53:08.4294328Z * [new branch] gh/williamwen42/288/orig -> origin/gh/williamwen42/288/orig 2025-12-04T08:53:08.4294404Z * [new branch] gh/williamwen42/296/base -> origin/gh/williamwen42/296/base 2025-12-04T08:53:08.4294482Z * [new branch] gh/williamwen42/296/head -> origin/gh/williamwen42/296/head 2025-12-04T08:53:08.4294561Z * [new branch] gh/williamwen42/296/orig -> origin/gh/williamwen42/296/orig 2025-12-04T08:53:08.4294637Z * [new branch] gh/williamwen42/297/base -> origin/gh/williamwen42/297/base 2025-12-04T08:53:08.4294713Z * [new branch] gh/williamwen42/297/head -> origin/gh/williamwen42/297/head 2025-12-04T08:53:08.4294791Z * [new branch] gh/williamwen42/297/orig -> origin/gh/williamwen42/297/orig 2025-12-04T08:53:08.4294866Z * [new branch] gh/williamwen42/306/base -> origin/gh/williamwen42/306/base 2025-12-04T08:53:08.4294942Z * [new branch] gh/williamwen42/306/head -> origin/gh/williamwen42/306/head 2025-12-04T08:53:08.4295020Z * [new branch] gh/williamwen42/306/orig -> origin/gh/williamwen42/306/orig 2025-12-04T08:53:08.4295097Z * [new branch] gh/williamwen42/309/base -> origin/gh/williamwen42/309/base 2025-12-04T08:53:08.4295175Z * [new branch] gh/williamwen42/309/head -> origin/gh/williamwen42/309/head 2025-12-04T08:53:08.4295254Z * [new branch] gh/williamwen42/309/orig -> origin/gh/williamwen42/309/orig 2025-12-04T08:53:08.4295331Z * [new branch] gh/williamwen42/310/base -> origin/gh/williamwen42/310/base 2025-12-04T08:53:08.4295408Z * [new branch] gh/williamwen42/310/head -> origin/gh/williamwen42/310/head 2025-12-04T08:53:08.4295485Z * [new branch] gh/williamwen42/310/orig -> origin/gh/williamwen42/310/orig 2025-12-04T08:53:08.4295562Z * [new branch] gh/williamwen42/311/base -> origin/gh/williamwen42/311/base 2025-12-04T08:53:08.4295640Z * [new branch] gh/williamwen42/311/head -> origin/gh/williamwen42/311/head 2025-12-04T08:53:08.4295716Z * [new branch] gh/williamwen42/311/orig -> origin/gh/williamwen42/311/orig 2025-12-04T08:53:08.4295795Z * [new branch] gh/williamwen42/319/base -> origin/gh/williamwen42/319/base 2025-12-04T08:53:08.4295875Z * [new branch] gh/williamwen42/319/head -> origin/gh/williamwen42/319/head 2025-12-04T08:53:08.4295953Z * [new branch] gh/williamwen42/319/orig -> origin/gh/williamwen42/319/orig 2025-12-04T08:53:08.4296029Z * [new branch] gh/williamwen42/325/base -> origin/gh/williamwen42/325/base 2025-12-04T08:53:08.4296107Z * [new branch] gh/williamwen42/325/head -> origin/gh/williamwen42/325/head 2025-12-04T08:53:08.4296184Z * [new branch] gh/williamwen42/325/orig -> origin/gh/williamwen42/325/orig 2025-12-04T08:53:08.4296260Z * [new branch] gh/williamwen42/326/base -> origin/gh/williamwen42/326/base 2025-12-04T08:53:08.4296339Z * [new branch] gh/williamwen42/326/head -> origin/gh/williamwen42/326/head 2025-12-04T08:53:08.4296415Z * [new branch] gh/williamwen42/326/orig -> origin/gh/williamwen42/326/orig 2025-12-04T08:53:08.4296519Z * [new branch] gh/williamwen42/327/base -> origin/gh/williamwen42/327/base 2025-12-04T08:53:08.4296624Z * [new branch] gh/williamwen42/327/head -> origin/gh/williamwen42/327/head 2025-12-04T08:53:08.4296701Z * [new branch] gh/williamwen42/327/orig -> origin/gh/williamwen42/327/orig 2025-12-04T08:53:08.4296776Z * [new branch] gh/williamwen42/328/base -> origin/gh/williamwen42/328/base 2025-12-04T08:53:08.4296854Z * [new branch] gh/williamwen42/328/head -> origin/gh/williamwen42/328/head 2025-12-04T08:53:08.4296929Z * [new branch] gh/williamwen42/328/orig -> origin/gh/williamwen42/328/orig 2025-12-04T08:53:08.4297006Z * [new branch] gh/williamwen42/329/base -> origin/gh/williamwen42/329/base 2025-12-04T08:53:08.4297084Z * [new branch] gh/williamwen42/329/head -> origin/gh/williamwen42/329/head 2025-12-04T08:53:08.4297162Z * [new branch] gh/williamwen42/329/orig -> origin/gh/williamwen42/329/orig 2025-12-04T08:53:08.4297241Z * [new branch] gh/williamwen42/330/base -> origin/gh/williamwen42/330/base 2025-12-04T08:53:08.4297317Z * [new branch] gh/williamwen42/330/head -> origin/gh/williamwen42/330/head 2025-12-04T08:53:08.4297394Z * [new branch] gh/williamwen42/330/orig -> origin/gh/williamwen42/330/orig 2025-12-04T08:53:08.4297471Z * [new branch] gh/williamwen42/331/base -> origin/gh/williamwen42/331/base 2025-12-04T08:53:08.4297548Z * [new branch] gh/williamwen42/331/head -> origin/gh/williamwen42/331/head 2025-12-04T08:53:08.4297624Z * [new branch] gh/williamwen42/331/orig -> origin/gh/williamwen42/331/orig 2025-12-04T08:53:08.4297702Z * [new branch] gh/williamwen42/332/base -> origin/gh/williamwen42/332/base 2025-12-04T08:53:08.4297778Z * [new branch] gh/williamwen42/332/head -> origin/gh/williamwen42/332/head 2025-12-04T08:53:08.4297855Z * [new branch] gh/williamwen42/332/orig -> origin/gh/williamwen42/332/orig 2025-12-04T08:53:08.4297938Z * [new branch] gh/williamwen42/333/base -> origin/gh/williamwen42/333/base 2025-12-04T08:53:08.4298014Z * [new branch] gh/williamwen42/333/head -> origin/gh/williamwen42/333/head 2025-12-04T08:53:08.4298091Z * [new branch] gh/williamwen42/333/orig -> origin/gh/williamwen42/333/orig 2025-12-04T08:53:08.4298167Z * [new branch] gh/williamwen42/334/base -> origin/gh/williamwen42/334/base 2025-12-04T08:53:08.4298243Z * [new branch] gh/williamwen42/334/head -> origin/gh/williamwen42/334/head 2025-12-04T08:53:08.4298320Z * [new branch] gh/williamwen42/334/orig -> origin/gh/williamwen42/334/orig 2025-12-04T08:53:08.4298397Z * [new branch] gh/williamwen42/335/base -> origin/gh/williamwen42/335/base 2025-12-04T08:53:08.4298475Z * [new branch] gh/williamwen42/335/head -> origin/gh/williamwen42/335/head 2025-12-04T08:53:08.4298554Z * [new branch] gh/williamwen42/335/orig -> origin/gh/williamwen42/335/orig 2025-12-04T08:53:08.4298634Z * [new branch] gh/williamwen42/336/base -> origin/gh/williamwen42/336/base 2025-12-04T08:53:08.4298710Z * [new branch] gh/williamwen42/336/head -> origin/gh/williamwen42/336/head 2025-12-04T08:53:08.4298789Z * [new branch] gh/williamwen42/336/orig -> origin/gh/williamwen42/336/orig 2025-12-04T08:53:08.4298865Z * [new branch] gh/williamwen42/337/base -> origin/gh/williamwen42/337/base 2025-12-04T08:53:08.4298942Z * [new branch] gh/williamwen42/337/head -> origin/gh/williamwen42/337/head 2025-12-04T08:53:08.4299019Z * [new branch] gh/williamwen42/337/orig -> origin/gh/williamwen42/337/orig 2025-12-04T08:53:08.4299095Z * [new branch] gh/williamwen42/338/base -> origin/gh/williamwen42/338/base 2025-12-04T08:53:08.4299197Z * [new branch] gh/williamwen42/338/head -> origin/gh/williamwen42/338/head 2025-12-04T08:53:08.4299309Z * [new branch] gh/williamwen42/338/orig -> origin/gh/williamwen42/338/orig 2025-12-04T08:53:08.4299387Z * [new branch] gh/williamwen42/339/base -> origin/gh/williamwen42/339/base 2025-12-04T08:53:08.4299464Z * [new branch] gh/williamwen42/339/head -> origin/gh/williamwen42/339/head 2025-12-04T08:53:08.4299541Z * [new branch] gh/williamwen42/339/orig -> origin/gh/williamwen42/339/orig 2025-12-04T08:53:08.4299618Z * [new branch] gh/williamwen42/340/base -> origin/gh/williamwen42/340/base 2025-12-04T08:53:08.4299696Z * [new branch] gh/williamwen42/340/head -> origin/gh/williamwen42/340/head 2025-12-04T08:53:08.4299774Z * [new branch] gh/williamwen42/340/orig -> origin/gh/williamwen42/340/orig 2025-12-04T08:53:08.4299851Z * [new branch] gh/williamwen42/341/base -> origin/gh/williamwen42/341/base 2025-12-04T08:53:08.4299927Z * [new branch] gh/williamwen42/341/head -> origin/gh/williamwen42/341/head 2025-12-04T08:53:08.4300006Z * [new branch] gh/williamwen42/341/orig -> origin/gh/williamwen42/341/orig 2025-12-04T08:53:08.4300083Z * [new branch] gh/williamwen42/342/base -> origin/gh/williamwen42/342/base 2025-12-04T08:53:08.4300161Z * [new branch] gh/williamwen42/342/head -> origin/gh/williamwen42/342/head 2025-12-04T08:53:08.4300237Z * [new branch] gh/williamwen42/342/orig -> origin/gh/williamwen42/342/orig 2025-12-04T08:53:08.4300313Z * [new branch] gh/williamwen42/343/base -> origin/gh/williamwen42/343/base 2025-12-04T08:53:08.4300391Z * [new branch] gh/williamwen42/343/head -> origin/gh/williamwen42/343/head 2025-12-04T08:53:08.4300467Z * [new branch] gh/williamwen42/343/orig -> origin/gh/williamwen42/343/orig 2025-12-04T08:53:08.4300546Z * [new branch] gh/williamwen42/344/base -> origin/gh/williamwen42/344/base 2025-12-04T08:53:08.4300627Z * [new branch] gh/williamwen42/344/head -> origin/gh/williamwen42/344/head 2025-12-04T08:53:08.4300704Z * [new branch] gh/williamwen42/344/orig -> origin/gh/williamwen42/344/orig 2025-12-04T08:53:08.4300782Z * [new branch] gh/williamwen42/345/base -> origin/gh/williamwen42/345/base 2025-12-04T08:53:08.4300860Z * [new branch] gh/williamwen42/345/head -> origin/gh/williamwen42/345/head 2025-12-04T08:53:08.4300936Z * [new branch] gh/williamwen42/345/orig -> origin/gh/williamwen42/345/orig 2025-12-04T08:53:08.4301012Z * [new branch] gh/williamwen42/346/base -> origin/gh/williamwen42/346/base 2025-12-04T08:53:08.4301091Z * [new branch] gh/williamwen42/346/head -> origin/gh/williamwen42/346/head 2025-12-04T08:53:08.4301169Z * [new branch] gh/williamwen42/346/orig -> origin/gh/williamwen42/346/orig 2025-12-04T08:53:08.4301247Z * [new branch] gh/williamwen42/347/base -> origin/gh/williamwen42/347/base 2025-12-04T08:53:08.4301328Z * [new branch] gh/williamwen42/347/head -> origin/gh/williamwen42/347/head 2025-12-04T08:53:08.4301405Z * [new branch] gh/williamwen42/347/orig -> origin/gh/williamwen42/347/orig 2025-12-04T08:53:08.4301485Z * [new branch] gh/williamwen42/348/base -> origin/gh/williamwen42/348/base 2025-12-04T08:53:08.4301561Z * [new branch] gh/williamwen42/348/head -> origin/gh/williamwen42/348/head 2025-12-04T08:53:08.4301638Z * [new branch] gh/williamwen42/348/orig -> origin/gh/williamwen42/348/orig 2025-12-04T08:53:08.4301717Z * [new branch] gh/williamwen42/349/base -> origin/gh/williamwen42/349/base 2025-12-04T08:53:08.4301794Z * [new branch] gh/williamwen42/349/head -> origin/gh/williamwen42/349/head 2025-12-04T08:53:08.4301901Z * [new branch] gh/williamwen42/349/orig -> origin/gh/williamwen42/349/orig 2025-12-04T08:53:08.4302004Z * [new branch] gh/williamwen42/350/base -> origin/gh/williamwen42/350/base 2025-12-04T08:53:08.4302082Z * [new branch] gh/williamwen42/350/head -> origin/gh/williamwen42/350/head 2025-12-04T08:53:08.4302158Z * [new branch] gh/williamwen42/350/orig -> origin/gh/williamwen42/350/orig 2025-12-04T08:53:08.4302235Z * [new branch] gh/williamwen42/351/base -> origin/gh/williamwen42/351/base 2025-12-04T08:53:08.4302310Z * [new branch] gh/williamwen42/351/head -> origin/gh/williamwen42/351/head 2025-12-04T08:53:08.4302386Z * [new branch] gh/williamwen42/351/orig -> origin/gh/williamwen42/351/orig 2025-12-04T08:53:08.4302464Z * [new branch] gh/williamwen42/352/base -> origin/gh/williamwen42/352/base 2025-12-04T08:53:08.4302543Z * [new branch] gh/williamwen42/352/head -> origin/gh/williamwen42/352/head 2025-12-04T08:53:08.4302622Z * [new branch] gh/williamwen42/352/orig -> origin/gh/williamwen42/352/orig 2025-12-04T08:53:08.4302702Z * [new branch] gh/williamwen42/353/base -> origin/gh/williamwen42/353/base 2025-12-04T08:53:08.4302778Z * [new branch] gh/williamwen42/353/head -> origin/gh/williamwen42/353/head 2025-12-04T08:53:08.4302856Z * [new branch] gh/williamwen42/353/orig -> origin/gh/williamwen42/353/orig 2025-12-04T08:53:08.4302934Z * [new branch] gh/williamwen42/354/base -> origin/gh/williamwen42/354/base 2025-12-04T08:53:08.4303012Z * [new branch] gh/williamwen42/354/head -> origin/gh/williamwen42/354/head 2025-12-04T08:53:08.4303092Z * [new branch] gh/williamwen42/354/orig -> origin/gh/williamwen42/354/orig 2025-12-04T08:53:08.4303168Z * [new branch] gh/williamwen42/355/base -> origin/gh/williamwen42/355/base 2025-12-04T08:53:08.4303246Z * [new branch] gh/williamwen42/355/head -> origin/gh/williamwen42/355/head 2025-12-04T08:53:08.4303360Z * [new branch] gh/williamwen42/355/orig -> origin/gh/williamwen42/355/orig 2025-12-04T08:53:08.4303437Z * [new branch] gh/williamwen42/356/base -> origin/gh/williamwen42/356/base 2025-12-04T08:53:08.4303513Z * [new branch] gh/williamwen42/356/head -> origin/gh/williamwen42/356/head 2025-12-04T08:53:08.4303591Z * [new branch] gh/williamwen42/356/orig -> origin/gh/williamwen42/356/orig 2025-12-04T08:53:08.4303666Z * [new branch] gh/williamwen42/357/base -> origin/gh/williamwen42/357/base 2025-12-04T08:53:08.4303744Z * [new branch] gh/williamwen42/357/head -> origin/gh/williamwen42/357/head 2025-12-04T08:53:08.4303824Z * [new branch] gh/williamwen42/357/orig -> origin/gh/williamwen42/357/orig 2025-12-04T08:53:08.4303901Z * [new branch] gh/williamwen42/358/base -> origin/gh/williamwen42/358/base 2025-12-04T08:53:08.4303979Z * [new branch] gh/williamwen42/358/head -> origin/gh/williamwen42/358/head 2025-12-04T08:53:08.4304059Z * [new branch] gh/williamwen42/358/orig -> origin/gh/williamwen42/358/orig 2025-12-04T08:53:08.4304129Z * [new branch] gh/xmfan/169/base -> origin/gh/xmfan/169/base 2025-12-04T08:53:08.4304196Z * [new branch] gh/xmfan/169/head -> origin/gh/xmfan/169/head 2025-12-04T08:53:08.4304265Z * [new branch] gh/xmfan/170/base -> origin/gh/xmfan/170/base 2025-12-04T08:53:08.4304331Z * [new branch] gh/xmfan/170/head -> origin/gh/xmfan/170/head 2025-12-04T08:53:08.4304396Z * [new branch] gh/xmfan/274/base -> origin/gh/xmfan/274/base 2025-12-04T08:53:08.4304463Z * [new branch] gh/xmfan/274/head -> origin/gh/xmfan/274/head 2025-12-04T08:53:08.4304581Z * [new branch] gh/xmfan/274/orig -> origin/gh/xmfan/274/orig 2025-12-04T08:53:08.4304649Z * [new branch] gh/xmfan/277/base -> origin/gh/xmfan/277/base 2025-12-04T08:53:08.4304751Z * [new branch] gh/xmfan/277/head -> origin/gh/xmfan/277/head 2025-12-04T08:53:08.4304818Z * [new branch] gh/xmfan/277/orig -> origin/gh/xmfan/277/orig 2025-12-04T08:53:08.4304883Z * [new branch] gh/xmfan/301/base -> origin/gh/xmfan/301/base 2025-12-04T08:53:08.4304949Z * [new branch] gh/xmfan/301/head -> origin/gh/xmfan/301/head 2025-12-04T08:53:08.4305015Z * [new branch] gh/xmfan/301/orig -> origin/gh/xmfan/301/orig 2025-12-04T08:53:08.4305081Z * [new branch] gh/xmfan/304/base -> origin/gh/xmfan/304/base 2025-12-04T08:53:08.4305146Z * [new branch] gh/xmfan/304/head -> origin/gh/xmfan/304/head 2025-12-04T08:53:08.4305210Z * [new branch] gh/xmfan/304/orig -> origin/gh/xmfan/304/orig 2025-12-04T08:53:08.4305281Z * [new branch] gh/xmfan/309/base -> origin/gh/xmfan/309/base 2025-12-04T08:53:08.4305347Z * [new branch] gh/xmfan/309/head -> origin/gh/xmfan/309/head 2025-12-04T08:53:08.4305413Z * [new branch] gh/xmfan/309/orig -> origin/gh/xmfan/309/orig 2025-12-04T08:53:08.4305478Z * [new branch] gh/xmfan/310/base -> origin/gh/xmfan/310/base 2025-12-04T08:53:08.4305543Z * [new branch] gh/xmfan/310/head -> origin/gh/xmfan/310/head 2025-12-04T08:53:08.4305608Z * [new branch] gh/xmfan/310/orig -> origin/gh/xmfan/310/orig 2025-12-04T08:53:08.4305675Z * [new branch] gh/xmfan/311/base -> origin/gh/xmfan/311/base 2025-12-04T08:53:08.4305739Z * [new branch] gh/xmfan/311/head -> origin/gh/xmfan/311/head 2025-12-04T08:53:08.4305806Z * [new branch] gh/xmfan/311/orig -> origin/gh/xmfan/311/orig 2025-12-04T08:53:08.4305873Z * [new branch] gh/xmfan/312/base -> origin/gh/xmfan/312/base 2025-12-04T08:53:08.4305939Z * [new branch] gh/xmfan/312/head -> origin/gh/xmfan/312/head 2025-12-04T08:53:08.4306004Z * [new branch] gh/xmfan/312/orig -> origin/gh/xmfan/312/orig 2025-12-04T08:53:08.4306069Z * [new branch] gh/xmfan/313/base -> origin/gh/xmfan/313/base 2025-12-04T08:53:08.4306134Z * [new branch] gh/xmfan/313/head -> origin/gh/xmfan/313/head 2025-12-04T08:53:08.4306201Z * [new branch] gh/xmfan/313/orig -> origin/gh/xmfan/313/orig 2025-12-04T08:53:08.4306279Z * [new branch] gh/xuanzhang816/27/base -> origin/gh/xuanzhang816/27/base 2025-12-04T08:53:08.4306356Z * [new branch] gh/xuanzhang816/27/head -> origin/gh/xuanzhang816/27/head 2025-12-04T08:53:08.4306435Z * [new branch] gh/xuanzhang816/27/orig -> origin/gh/xuanzhang816/27/orig 2025-12-04T08:53:08.4306511Z * [new branch] gh/xuanzhang816/32/base -> origin/gh/xuanzhang816/32/base 2025-12-04T08:53:08.4306588Z * [new branch] gh/xuanzhang816/32/head -> origin/gh/xuanzhang816/32/head 2025-12-04T08:53:08.4306662Z * [new branch] gh/xuanzhang816/32/orig -> origin/gh/xuanzhang816/32/orig 2025-12-04T08:53:08.4306736Z * [new branch] gh/xuanzhang816/33/base -> origin/gh/xuanzhang816/33/base 2025-12-04T08:53:08.4306811Z * [new branch] gh/xuanzhang816/33/head -> origin/gh/xuanzhang816/33/head 2025-12-04T08:53:08.4306885Z * [new branch] gh/xuanzhang816/33/orig -> origin/gh/xuanzhang816/33/orig 2025-12-04T08:53:08.4306959Z * [new branch] gh/xuanzhang816/34/base -> origin/gh/xuanzhang816/34/base 2025-12-04T08:53:08.4307032Z * [new branch] gh/xuanzhang816/34/head -> origin/gh/xuanzhang816/34/head 2025-12-04T08:53:08.4307137Z * [new branch] gh/xuanzhang816/34/orig -> origin/gh/xuanzhang816/34/orig 2025-12-04T08:53:08.4307212Z * [new branch] gh/xuanzhang816/35/base -> origin/gh/xuanzhang816/35/base 2025-12-04T08:53:08.4307315Z * [new branch] gh/xuanzhang816/35/head -> origin/gh/xuanzhang816/35/head 2025-12-04T08:53:08.4307393Z * [new branch] gh/xuanzhang816/35/orig -> origin/gh/xuanzhang816/35/orig 2025-12-04T08:53:08.4307466Z * [new branch] gh/yanbing-j/11/base -> origin/gh/yanbing-j/11/base 2025-12-04T08:53:08.4307536Z * [new branch] gh/yanbing-j/11/head -> origin/gh/yanbing-j/11/head 2025-12-04T08:53:08.4307607Z * [new branch] gh/yanbing-j/11/orig -> origin/gh/yanbing-j/11/orig 2025-12-04T08:53:08.4307677Z * [new branch] gh/yanbing-j/12/base -> origin/gh/yanbing-j/12/base 2025-12-04T08:53:08.4307746Z * [new branch] gh/yanbing-j/12/head -> origin/gh/yanbing-j/12/head 2025-12-04T08:53:08.4307818Z * [new branch] gh/yanbing-j/12/orig -> origin/gh/yanbing-j/12/orig 2025-12-04T08:53:08.4307886Z * [new branch] gh/yanbing-j/13/base -> origin/gh/yanbing-j/13/base 2025-12-04T08:53:08.4307958Z * [new branch] gh/yanbing-j/13/head -> origin/gh/yanbing-j/13/head 2025-12-04T08:53:08.4308026Z * [new branch] gh/yanbing-j/13/orig -> origin/gh/yanbing-j/13/orig 2025-12-04T08:53:08.4308095Z * [new branch] gh/yanbing-j/14/base -> origin/gh/yanbing-j/14/base 2025-12-04T08:53:08.4308164Z * [new branch] gh/yanbing-j/14/head -> origin/gh/yanbing-j/14/head 2025-12-04T08:53:08.4308233Z * [new branch] gh/yanbing-j/14/orig -> origin/gh/yanbing-j/14/orig 2025-12-04T08:53:08.4308303Z * [new branch] gh/yanbing-j/15/base -> origin/gh/yanbing-j/15/base 2025-12-04T08:53:08.4308374Z * [new branch] gh/yanbing-j/15/head -> origin/gh/yanbing-j/15/head 2025-12-04T08:53:08.4308445Z * [new branch] gh/yanbing-j/15/orig -> origin/gh/yanbing-j/15/orig 2025-12-04T08:53:08.4308514Z * [new branch] gh/yanbing-j/18/base -> origin/gh/yanbing-j/18/base 2025-12-04T08:53:08.4308585Z * [new branch] gh/yanbing-j/18/head -> origin/gh/yanbing-j/18/head 2025-12-04T08:53:08.4308653Z * [new branch] gh/yanbing-j/18/orig -> origin/gh/yanbing-j/18/orig 2025-12-04T08:53:08.4308723Z * [new branch] gh/yanbing-j/19/base -> origin/gh/yanbing-j/19/base 2025-12-04T08:53:08.4308793Z * [new branch] gh/yanbing-j/19/head -> origin/gh/yanbing-j/19/head 2025-12-04T08:53:08.4308861Z * [new branch] gh/yanbing-j/19/orig -> origin/gh/yanbing-j/19/orig 2025-12-04T08:53:08.4308930Z * [new branch] gh/yanbing-j/20/base -> origin/gh/yanbing-j/20/base 2025-12-04T08:53:08.4309001Z * [new branch] gh/yanbing-j/20/head -> origin/gh/yanbing-j/20/head 2025-12-04T08:53:08.4309071Z * [new branch] gh/yanbing-j/20/orig -> origin/gh/yanbing-j/20/orig 2025-12-04T08:53:08.4309141Z * [new branch] gh/yanbing-j/21/base -> origin/gh/yanbing-j/21/base 2025-12-04T08:53:08.4309211Z * [new branch] gh/yanbing-j/21/head -> origin/gh/yanbing-j/21/head 2025-12-04T08:53:08.4309280Z * [new branch] gh/yanbing-j/22/base -> origin/gh/yanbing-j/22/base 2025-12-04T08:53:08.4309351Z * [new branch] gh/yanbing-j/22/head -> origin/gh/yanbing-j/22/head 2025-12-04T08:53:08.4309419Z * [new branch] gh/yanbing-j/22/orig -> origin/gh/yanbing-j/22/orig 2025-12-04T08:53:08.4309487Z * [new branch] gh/yanbing-j/23/base -> origin/gh/yanbing-j/23/base 2025-12-04T08:53:08.4309559Z * [new branch] gh/yanbing-j/23/head -> origin/gh/yanbing-j/23/head 2025-12-04T08:53:08.4309629Z * [new branch] gh/yanbing-j/23/orig -> origin/gh/yanbing-j/23/orig 2025-12-04T08:53:08.4309759Z * [new branch] gh/yanbing-j/24/base -> origin/gh/yanbing-j/24/base 2025-12-04T08:53:08.4309860Z * [new branch] gh/yanbing-j/24/head -> origin/gh/yanbing-j/24/head 2025-12-04T08:53:08.4309931Z * [new branch] gh/yanbing-j/24/orig -> origin/gh/yanbing-j/24/orig 2025-12-04T08:53:08.4310001Z * [new branch] gh/yanbing-j/25/base -> origin/gh/yanbing-j/25/base 2025-12-04T08:53:08.4310072Z * [new branch] gh/yanbing-j/25/head -> origin/gh/yanbing-j/25/head 2025-12-04T08:53:08.4310140Z * [new branch] gh/yanbing-j/25/orig -> origin/gh/yanbing-j/25/orig 2025-12-04T08:53:08.4310210Z * [new branch] gh/yanbing-j/26/base -> origin/gh/yanbing-j/26/base 2025-12-04T08:53:08.4310279Z * [new branch] gh/yanbing-j/26/head -> origin/gh/yanbing-j/26/head 2025-12-04T08:53:08.4310347Z * [new branch] gh/yanbing-j/26/orig -> origin/gh/yanbing-j/26/orig 2025-12-04T08:53:08.4310426Z * [new branch] gh/yang-yu-hang/1/base -> origin/gh/yang-yu-hang/1/base 2025-12-04T08:53:08.4310502Z * [new branch] gh/yang-yu-hang/1/head -> origin/gh/yang-yu-hang/1/head 2025-12-04T08:53:08.4310575Z * [new branch] gh/yang-yu-hang/1/orig -> origin/gh/yang-yu-hang/1/orig 2025-12-04T08:53:08.4310647Z * [new branch] gh/yang-yu-hang/2/base -> origin/gh/yang-yu-hang/2/base 2025-12-04T08:53:08.4310722Z * [new branch] gh/yang-yu-hang/2/head -> origin/gh/yang-yu-hang/2/head 2025-12-04T08:53:08.4310795Z * [new branch] gh/yang-yu-hang/2/orig -> origin/gh/yang-yu-hang/2/orig 2025-12-04T08:53:08.4310868Z * [new branch] gh/yang-yu-hang/3/base -> origin/gh/yang-yu-hang/3/base 2025-12-04T08:53:08.4310941Z * [new branch] gh/yang-yu-hang/3/head -> origin/gh/yang-yu-hang/3/head 2025-12-04T08:53:08.4311012Z * [new branch] gh/yang-yu-hang/3/orig -> origin/gh/yang-yu-hang/3/orig 2025-12-04T08:53:08.4311089Z * [new branch] gh/yangw-dev/12/base -> origin/gh/yangw-dev/12/base 2025-12-04T08:53:08.4311163Z * [new branch] gh/yangw-dev/12/head -> origin/gh/yangw-dev/12/head 2025-12-04T08:53:08.4311234Z * [new branch] gh/yangw-dev/12/orig -> origin/gh/yangw-dev/12/orig 2025-12-04T08:53:08.4311305Z * [new branch] gh/yangw-dev/13/base -> origin/gh/yangw-dev/13/base 2025-12-04T08:53:08.4311375Z * [new branch] gh/yangw-dev/13/head -> origin/gh/yangw-dev/13/head 2025-12-04T08:53:08.4311444Z * [new branch] gh/yangw-dev/13/orig -> origin/gh/yangw-dev/13/orig 2025-12-04T08:53:08.4311516Z * [new branch] gh/yangw-dev/14/base -> origin/gh/yangw-dev/14/base 2025-12-04T08:53:08.4311585Z * [new branch] gh/yangw-dev/14/head -> origin/gh/yangw-dev/14/head 2025-12-04T08:53:08.4311655Z * [new branch] gh/yangw-dev/14/orig -> origin/gh/yangw-dev/14/orig 2025-12-04T08:53:08.4311728Z * [new branch] gh/yangw-dev/15/base -> origin/gh/yangw-dev/15/base 2025-12-04T08:53:08.4311800Z * [new branch] gh/yangw-dev/15/head -> origin/gh/yangw-dev/15/head 2025-12-04T08:53:08.4311870Z * [new branch] gh/yangw-dev/15/orig -> origin/gh/yangw-dev/15/orig 2025-12-04T08:53:08.4311942Z * [new branch] gh/yangw-dev/19/base -> origin/gh/yangw-dev/19/base 2025-12-04T08:53:08.4312012Z * [new branch] gh/yangw-dev/19/head -> origin/gh/yangw-dev/19/head 2025-12-04T08:53:08.4312082Z * [new branch] gh/yangw-dev/19/orig -> origin/gh/yangw-dev/19/orig 2025-12-04T08:53:08.4312152Z * [new branch] gh/yangw-dev/26/base -> origin/gh/yangw-dev/26/base 2025-12-04T08:53:08.4312221Z * [new branch] gh/yangw-dev/26/head -> origin/gh/yangw-dev/26/head 2025-12-04T08:53:08.4312315Z * [new branch] gh/yangw-dev/26/orig -> origin/gh/yangw-dev/26/orig 2025-12-04T08:53:08.4312386Z * [new branch] gh/yangw-dev/27/base -> origin/gh/yangw-dev/27/base 2025-12-04T08:53:08.4312493Z * [new branch] gh/yangw-dev/27/head -> origin/gh/yangw-dev/27/head 2025-12-04T08:53:08.4312563Z * [new branch] gh/yangw-dev/27/orig -> origin/gh/yangw-dev/27/orig 2025-12-04T08:53:08.4312632Z * [new branch] gh/ydwu4/292/base -> origin/gh/ydwu4/292/base 2025-12-04T08:53:08.4312699Z * [new branch] gh/ydwu4/292/head -> origin/gh/ydwu4/292/head 2025-12-04T08:53:08.4312766Z * [new branch] gh/ydwu4/292/orig -> origin/gh/ydwu4/292/orig 2025-12-04T08:53:08.4312832Z * [new branch] gh/ydwu4/294/base -> origin/gh/ydwu4/294/base 2025-12-04T08:53:08.4312896Z * [new branch] gh/ydwu4/294/head -> origin/gh/ydwu4/294/head 2025-12-04T08:53:08.4312963Z * [new branch] gh/ydwu4/294/orig -> origin/gh/ydwu4/294/orig 2025-12-04T08:53:08.4313027Z * [new branch] gh/ydwu4/295/base -> origin/gh/ydwu4/295/base 2025-12-04T08:53:08.4313093Z * [new branch] gh/ydwu4/295/head -> origin/gh/ydwu4/295/head 2025-12-04T08:53:08.4313160Z * [new branch] gh/ydwu4/295/orig -> origin/gh/ydwu4/295/orig 2025-12-04T08:53:08.4313224Z * [new branch] gh/ydwu4/296/base -> origin/gh/ydwu4/296/base 2025-12-04T08:53:08.4313447Z * [new branch] gh/ydwu4/296/head -> origin/gh/ydwu4/296/head 2025-12-04T08:53:08.4313515Z * [new branch] gh/ydwu4/296/orig -> origin/gh/ydwu4/296/orig 2025-12-04T08:53:08.4313580Z * [new branch] gh/ydwu4/306/base -> origin/gh/ydwu4/306/base 2025-12-04T08:53:08.4313644Z * [new branch] gh/ydwu4/306/head -> origin/gh/ydwu4/306/head 2025-12-04T08:53:08.4313709Z * [new branch] gh/ydwu4/306/orig -> origin/gh/ydwu4/306/orig 2025-12-04T08:53:08.4313776Z * [new branch] gh/ydwu4/312/base -> origin/gh/ydwu4/312/base 2025-12-04T08:53:08.4313843Z * [new branch] gh/ydwu4/312/head -> origin/gh/ydwu4/312/head 2025-12-04T08:53:08.4313909Z * [new branch] gh/ydwu4/312/orig -> origin/gh/ydwu4/312/orig 2025-12-04T08:53:08.4313973Z * [new branch] gh/ydwu4/322/base -> origin/gh/ydwu4/322/base 2025-12-04T08:53:08.4314038Z * [new branch] gh/ydwu4/322/head -> origin/gh/ydwu4/322/head 2025-12-04T08:53:08.4314104Z * [new branch] gh/ydwu4/322/orig -> origin/gh/ydwu4/322/orig 2025-12-04T08:53:08.4314169Z * [new branch] gh/ydwu4/327/base -> origin/gh/ydwu4/327/base 2025-12-04T08:53:08.4314234Z * [new branch] gh/ydwu4/327/head -> origin/gh/ydwu4/327/head 2025-12-04T08:53:08.4314299Z * [new branch] gh/ydwu4/327/orig -> origin/gh/ydwu4/327/orig 2025-12-04T08:53:08.4314366Z * [new branch] gh/ydwu4/328/base -> origin/gh/ydwu4/328/base 2025-12-04T08:53:08.4314434Z * [new branch] gh/ydwu4/328/head -> origin/gh/ydwu4/328/head 2025-12-04T08:53:08.4314498Z * [new branch] gh/ydwu4/328/orig -> origin/gh/ydwu4/328/orig 2025-12-04T08:53:08.4314563Z * [new branch] gh/ydwu4/329/base -> origin/gh/ydwu4/329/base 2025-12-04T08:53:08.4314628Z * [new branch] gh/ydwu4/329/head -> origin/gh/ydwu4/329/head 2025-12-04T08:53:08.4314692Z * [new branch] gh/ydwu4/329/orig -> origin/gh/ydwu4/329/orig 2025-12-04T08:53:08.4314757Z * [new branch] gh/ydwu4/330/base -> origin/gh/ydwu4/330/base 2025-12-04T08:53:08.4314823Z * [new branch] gh/ydwu4/330/head -> origin/gh/ydwu4/330/head 2025-12-04T08:53:08.4314887Z * [new branch] gh/ydwu4/330/orig -> origin/gh/ydwu4/330/orig 2025-12-04T08:53:08.4314999Z * [new branch] gh/ydwu4/331/base -> origin/gh/ydwu4/331/base 2025-12-04T08:53:08.4315108Z * [new branch] gh/ydwu4/331/head -> origin/gh/ydwu4/331/head 2025-12-04T08:53:08.4315173Z * [new branch] gh/ydwu4/331/orig -> origin/gh/ydwu4/331/orig 2025-12-04T08:53:08.4315237Z * [new branch] gh/ydwu4/332/base -> origin/gh/ydwu4/332/base 2025-12-04T08:53:08.4315304Z * [new branch] gh/ydwu4/332/head -> origin/gh/ydwu4/332/head 2025-12-04T08:53:08.4315368Z * [new branch] gh/ydwu4/332/orig -> origin/gh/ydwu4/332/orig 2025-12-04T08:53:08.4315433Z * [new branch] gh/ydwu4/333/base -> origin/gh/ydwu4/333/base 2025-12-04T08:53:08.4315499Z * [new branch] gh/ydwu4/333/head -> origin/gh/ydwu4/333/head 2025-12-04T08:53:08.4315564Z * [new branch] gh/ydwu4/333/orig -> origin/gh/ydwu4/333/orig 2025-12-04T08:53:08.4315631Z * [new branch] gh/ydwu4/334/base -> origin/gh/ydwu4/334/base 2025-12-04T08:53:08.4315699Z * [new branch] gh/ydwu4/334/head -> origin/gh/ydwu4/334/head 2025-12-04T08:53:08.4315764Z * [new branch] gh/ydwu4/334/orig -> origin/gh/ydwu4/334/orig 2025-12-04T08:53:08.4315830Z * [new branch] gh/ydwu4/335/base -> origin/gh/ydwu4/335/base 2025-12-04T08:53:08.4315894Z * [new branch] gh/ydwu4/335/head -> origin/gh/ydwu4/335/head 2025-12-04T08:53:08.4315960Z * [new branch] gh/ydwu4/335/orig -> origin/gh/ydwu4/335/orig 2025-12-04T08:53:08.4316026Z * [new branch] gh/ydwu4/337/base -> origin/gh/ydwu4/337/base 2025-12-04T08:53:08.4316090Z * [new branch] gh/ydwu4/337/head -> origin/gh/ydwu4/337/head 2025-12-04T08:53:08.4316154Z * [new branch] gh/ydwu4/337/orig -> origin/gh/ydwu4/337/orig 2025-12-04T08:53:08.4316221Z * [new branch] gh/ydwu4/339/base -> origin/gh/ydwu4/339/base 2025-12-04T08:53:08.4316286Z * [new branch] gh/ydwu4/339/head -> origin/gh/ydwu4/339/head 2025-12-04T08:53:08.4316352Z * [new branch] gh/ydwu4/339/orig -> origin/gh/ydwu4/339/orig 2025-12-04T08:53:08.4316417Z * [new branch] gh/yf225/133/base -> origin/gh/yf225/133/base 2025-12-04T08:53:08.4316480Z * [new branch] gh/yf225/133/head -> origin/gh/yf225/133/head 2025-12-04T08:53:08.4316545Z * [new branch] gh/yf225/93/base -> origin/gh/yf225/93/base 2025-12-04T08:53:08.4316612Z * [new branch] gh/yf225/93/head -> origin/gh/yf225/93/head 2025-12-04T08:53:08.4316684Z * [new branch] gh/yifuwang/152/base -> origin/gh/yifuwang/152/base 2025-12-04T08:53:08.4316756Z * [new branch] gh/yifuwang/152/head -> origin/gh/yifuwang/152/head 2025-12-04T08:53:08.4316829Z * [new branch] gh/yifuwang/152/orig -> origin/gh/yifuwang/152/orig 2025-12-04T08:53:08.4316900Z * [new branch] gh/yifuwang/195/base -> origin/gh/yifuwang/195/base 2025-12-04T08:53:08.4316972Z * [new branch] gh/yifuwang/195/head -> origin/gh/yifuwang/195/head 2025-12-04T08:53:08.4317043Z * [new branch] gh/yifuwang/195/orig -> origin/gh/yifuwang/195/orig 2025-12-04T08:53:08.4317114Z * [new branch] gh/yiming0416/1/base -> origin/gh/yiming0416/1/base 2025-12-04T08:53:08.4317184Z * [new branch] gh/yiming0416/1/head -> origin/gh/yiming0416/1/head 2025-12-04T08:53:08.4317254Z * [new branch] gh/yiming0416/2/base -> origin/gh/yiming0416/2/base 2025-12-04T08:53:08.4317322Z * [new branch] gh/yiming0416/2/head -> origin/gh/yiming0416/2/head 2025-12-04T08:53:08.4317395Z * [new branch] gh/yushangdi/1/base -> origin/gh/yushangdi/1/base 2025-12-04T08:53:08.4317496Z * [new branch] gh/yushangdi/1/head -> origin/gh/yushangdi/1/head 2025-12-04T08:53:08.4317567Z * [new branch] gh/yushangdi/10/base -> origin/gh/yushangdi/10/base 2025-12-04T08:53:08.4317665Z * [new branch] gh/yushangdi/10/head -> origin/gh/yushangdi/10/head 2025-12-04T08:53:08.4317736Z * [new branch] gh/yushangdi/10/orig -> origin/gh/yushangdi/10/orig 2025-12-04T08:53:08.4317806Z * [new branch] gh/yushangdi/11/base -> origin/gh/yushangdi/11/base 2025-12-04T08:53:08.4317876Z * [new branch] gh/yushangdi/11/head -> origin/gh/yushangdi/11/head 2025-12-04T08:53:08.4317946Z * [new branch] gh/yushangdi/11/orig -> origin/gh/yushangdi/11/orig 2025-12-04T08:53:08.4318015Z * [new branch] gh/yushangdi/2/base -> origin/gh/yushangdi/2/base 2025-12-04T08:53:08.4318086Z * [new branch] gh/yushangdi/2/head -> origin/gh/yushangdi/2/head 2025-12-04T08:53:08.4318158Z * [new branch] gh/yushangdi/7/base -> origin/gh/yushangdi/7/base 2025-12-04T08:53:08.4318227Z * [new branch] gh/yushangdi/7/head -> origin/gh/yushangdi/7/head 2025-12-04T08:53:08.4318298Z * [new branch] gh/yushangdi/7/orig -> origin/gh/yushangdi/7/orig 2025-12-04T08:53:08.4318367Z * [new branch] gh/yushangdi/8/base -> origin/gh/yushangdi/8/base 2025-12-04T08:53:08.4318437Z * [new branch] gh/yushangdi/8/head -> origin/gh/yushangdi/8/head 2025-12-04T08:53:08.4318507Z * [new branch] gh/yushangdi/8/orig -> origin/gh/yushangdi/8/orig 2025-12-04T08:53:08.4318576Z * [new branch] gh/yushangdi/9/base -> origin/gh/yushangdi/9/base 2025-12-04T08:53:08.4318644Z * [new branch] gh/yushangdi/9/head -> origin/gh/yushangdi/9/head 2025-12-04T08:53:08.4318715Z * [new branch] gh/yushangdi/9/orig -> origin/gh/yushangdi/9/orig 2025-12-04T08:53:08.4318784Z * [new branch] gh/zklaus/19/base -> origin/gh/zklaus/19/base 2025-12-04T08:53:08.4318851Z * [new branch] gh/zklaus/19/head -> origin/gh/zklaus/19/head 2025-12-04T08:53:08.4318921Z * [new branch] gh/zklaus/19/orig -> origin/gh/zklaus/19/orig 2025-12-04T08:53:08.4318987Z * [new branch] gh/zklaus/20/base -> origin/gh/zklaus/20/base 2025-12-04T08:53:08.4319054Z * [new branch] gh/zklaus/20/head -> origin/gh/zklaus/20/head 2025-12-04T08:53:08.4319120Z * [new branch] gh/zklaus/20/orig -> origin/gh/zklaus/20/orig 2025-12-04T08:53:08.4319185Z * [new branch] gh/zklaus/21/base -> origin/gh/zklaus/21/base 2025-12-04T08:53:08.4319252Z * [new branch] gh/zklaus/21/head -> origin/gh/zklaus/21/head 2025-12-04T08:53:08.4319317Z * [new branch] gh/zklaus/21/orig -> origin/gh/zklaus/21/orig 2025-12-04T08:53:08.4319384Z * [new branch] gh/zklaus/22/base -> origin/gh/zklaus/22/base 2025-12-04T08:53:08.4319451Z * [new branch] gh/zklaus/22/head -> origin/gh/zklaus/22/head 2025-12-04T08:53:08.4319516Z * [new branch] gh/zklaus/22/orig -> origin/gh/zklaus/22/orig 2025-12-04T08:53:08.4319582Z * [new branch] gh/zklaus/23/base -> origin/gh/zklaus/23/base 2025-12-04T08:53:08.4319649Z * [new branch] gh/zklaus/23/head -> origin/gh/zklaus/23/head 2025-12-04T08:53:08.4319714Z * [new branch] gh/zklaus/23/orig -> origin/gh/zklaus/23/orig 2025-12-04T08:53:08.4319780Z * [new branch] gh/zklaus/24/base -> origin/gh/zklaus/24/base 2025-12-04T08:53:08.4319845Z * [new branch] gh/zklaus/24/head -> origin/gh/zklaus/24/head 2025-12-04T08:53:08.4319911Z * [new branch] gh/zklaus/24/orig -> origin/gh/zklaus/24/orig 2025-12-04T08:53:08.4320022Z * [new branch] gh/zou3519/1197/base -> origin/gh/zou3519/1197/base 2025-12-04T08:53:08.4320093Z * [new branch] gh/zou3519/1197/head -> origin/gh/zou3519/1197/head 2025-12-04T08:53:08.4320186Z * [new branch] gh/zou3519/1197/orig -> origin/gh/zou3519/1197/orig 2025-12-04T08:53:08.4320254Z * [new branch] gh/zou3519/1199/base -> origin/gh/zou3519/1199/base 2025-12-04T08:53:08.4320323Z * [new branch] gh/zou3519/1199/head -> origin/gh/zou3519/1199/head 2025-12-04T08:53:08.4320391Z * [new branch] gh/zou3519/1199/orig -> origin/gh/zou3519/1199/orig 2025-12-04T08:53:08.4320458Z * [new branch] gh/zou3519/1200/base -> origin/gh/zou3519/1200/base 2025-12-04T08:53:08.4320526Z * [new branch] gh/zou3519/1200/head -> origin/gh/zou3519/1200/head 2025-12-04T08:53:08.4320593Z * [new branch] gh/zou3519/1200/orig -> origin/gh/zou3519/1200/orig 2025-12-04T08:53:08.4320663Z * [new branch] gh/zou3519/1201/base -> origin/gh/zou3519/1201/base 2025-12-04T08:53:08.4320730Z * [new branch] gh/zou3519/1201/head -> origin/gh/zou3519/1201/head 2025-12-04T08:53:08.4320799Z * [new branch] gh/zou3519/1201/orig -> origin/gh/zou3519/1201/orig 2025-12-04T08:53:08.4320868Z * [new branch] gh/zou3519/1202/base -> origin/gh/zou3519/1202/base 2025-12-04T08:53:08.4320935Z * [new branch] gh/zou3519/1202/head -> origin/gh/zou3519/1202/head 2025-12-04T08:53:08.4321002Z * [new branch] gh/zou3519/1202/orig -> origin/gh/zou3519/1202/orig 2025-12-04T08:53:08.4321071Z * [new branch] gh/zpcore/1/base -> origin/gh/zpcore/1/base 2025-12-04T08:53:08.4321139Z * [new branch] gh/zpcore/1/head -> origin/gh/zpcore/1/head 2025-12-04T08:53:08.4321205Z * [new branch] gh/zpcore/11/base -> origin/gh/zpcore/11/base 2025-12-04T08:53:08.4321274Z * [new branch] gh/zpcore/11/head -> origin/gh/zpcore/11/head 2025-12-04T08:53:08.4321341Z * [new branch] gh/zpcore/11/orig -> origin/gh/zpcore/11/orig 2025-12-04T08:53:08.4321409Z * [new branch] gh/zpcore/12/base -> origin/gh/zpcore/12/base 2025-12-04T08:53:08.4321477Z * [new branch] gh/zpcore/12/head -> origin/gh/zpcore/12/head 2025-12-04T08:53:08.4321542Z * [new branch] gh/zpcore/12/orig -> origin/gh/zpcore/12/orig 2025-12-04T08:53:08.4321607Z * [new branch] gh/zpcore/13/base -> origin/gh/zpcore/13/base 2025-12-04T08:53:08.4321674Z * [new branch] gh/zpcore/13/head -> origin/gh/zpcore/13/head 2025-12-04T08:53:08.4321739Z * [new branch] gh/zpcore/13/orig -> origin/gh/zpcore/13/orig 2025-12-04T08:53:08.4321805Z * [new branch] gh/zpcore/14/base -> origin/gh/zpcore/14/base 2025-12-04T08:53:08.4321873Z * [new branch] gh/zpcore/14/head -> origin/gh/zpcore/14/head 2025-12-04T08:53:08.4321938Z * [new branch] gh/zpcore/14/orig -> origin/gh/zpcore/14/orig 2025-12-04T08:53:08.4322005Z * [new branch] gh/zpcore/15/base -> origin/gh/zpcore/15/base 2025-12-04T08:53:08.4322072Z * [new branch] gh/zpcore/15/head -> origin/gh/zpcore/15/head 2025-12-04T08:53:08.4322139Z * [new branch] gh/zpcore/15/orig -> origin/gh/zpcore/15/orig 2025-12-04T08:53:08.4322206Z * [new branch] gh/zpcore/2/base -> origin/gh/zpcore/2/base 2025-12-04T08:53:08.4322273Z * [new branch] gh/zpcore/2/head -> origin/gh/zpcore/2/head 2025-12-04T08:53:08.4322339Z * [new branch] gh/zpcore/21/base -> origin/gh/zpcore/21/base 2025-12-04T08:53:08.4322407Z * [new branch] gh/zpcore/21/head -> origin/gh/zpcore/21/head 2025-12-04T08:53:08.4322472Z * [new branch] gh/zpcore/21/orig -> origin/gh/zpcore/21/orig 2025-12-04T08:53:08.4322568Z * [new branch] gh/zpcore/22/base -> origin/gh/zpcore/22/base 2025-12-04T08:53:08.4322660Z * [new branch] gh/zpcore/22/head -> origin/gh/zpcore/22/head 2025-12-04T08:53:08.4322726Z * [new branch] gh/zpcore/22/orig -> origin/gh/zpcore/22/orig 2025-12-04T08:53:08.4322792Z * [new branch] gh/zpcore/23/base -> origin/gh/zpcore/23/base 2025-12-04T08:53:08.4322858Z * [new branch] gh/zpcore/23/head -> origin/gh/zpcore/23/head 2025-12-04T08:53:08.4322924Z * [new branch] gh/zpcore/23/orig -> origin/gh/zpcore/23/orig 2025-12-04T08:53:08.4322989Z * [new branch] gh/zpcore/24/base -> origin/gh/zpcore/24/base 2025-12-04T08:53:08.4323055Z * [new branch] gh/zpcore/24/head -> origin/gh/zpcore/24/head 2025-12-04T08:53:08.4323121Z * [new branch] gh/zpcore/24/orig -> origin/gh/zpcore/24/orig 2025-12-04T08:53:08.4323188Z * [new branch] gh/zpcore/25/base -> origin/gh/zpcore/25/base 2025-12-04T08:53:08.4323293Z * [new branch] gh/zpcore/25/head -> origin/gh/zpcore/25/head 2025-12-04T08:53:08.4323360Z * [new branch] gh/zpcore/25/orig -> origin/gh/zpcore/25/orig 2025-12-04T08:53:08.4323426Z * [new branch] gh/zpcore/26/base -> origin/gh/zpcore/26/base 2025-12-04T08:53:08.4323493Z * [new branch] gh/zpcore/26/head -> origin/gh/zpcore/26/head 2025-12-04T08:53:08.4323559Z * [new branch] gh/zpcore/26/orig -> origin/gh/zpcore/26/orig 2025-12-04T08:53:08.4323624Z * [new branch] gh/zpcore/27/base -> origin/gh/zpcore/27/base 2025-12-04T08:53:08.4323691Z * [new branch] gh/zpcore/27/head -> origin/gh/zpcore/27/head 2025-12-04T08:53:08.4323756Z * [new branch] gh/zpcore/27/orig -> origin/gh/zpcore/27/orig 2025-12-04T08:53:08.4323825Z * [new branch] gh/zpcore/28/base -> origin/gh/zpcore/28/base 2025-12-04T08:53:08.4323892Z * [new branch] gh/zpcore/28/head -> origin/gh/zpcore/28/head 2025-12-04T08:53:08.4323957Z * [new branch] gh/zpcore/28/orig -> origin/gh/zpcore/28/orig 2025-12-04T08:53:08.4324025Z * [new branch] gh/zpcore/3/base -> origin/gh/zpcore/3/base 2025-12-04T08:53:08.4324091Z * [new branch] gh/zpcore/3/head -> origin/gh/zpcore/3/head 2025-12-04T08:53:08.4324156Z * [new branch] gh/zpcore/4/base -> origin/gh/zpcore/4/base 2025-12-04T08:53:08.4324222Z * [new branch] gh/zpcore/4/head -> origin/gh/zpcore/4/head 2025-12-04T08:53:08.4324287Z * [new branch] gh/zpcore/5/base -> origin/gh/zpcore/5/base 2025-12-04T08:53:08.4324352Z * [new branch] gh/zpcore/5/head -> origin/gh/zpcore/5/head 2025-12-04T08:53:08.4324419Z * [new branch] gh/zpcore/6/base -> origin/gh/zpcore/6/base 2025-12-04T08:53:08.4324484Z * [new branch] gh/zpcore/6/head -> origin/gh/zpcore/6/head 2025-12-04T08:53:08.4324550Z * [new branch] gh/zpcore/7/base -> origin/gh/zpcore/7/base 2025-12-04T08:53:08.4324616Z * [new branch] gh/zpcore/7/head -> origin/gh/zpcore/7/head 2025-12-04T08:53:08.4324681Z * [new branch] gh/zpcore/8/base -> origin/gh/zpcore/8/base 2025-12-04T08:53:08.4324746Z * [new branch] gh/zpcore/8/head -> origin/gh/zpcore/8/head 2025-12-04T08:53:08.4324814Z * [new branch] google-main -> origin/google-main 2025-12-04T08:53:08.4324899Z * [new branch] guangyey/external_stream -> origin/guangyey/external_stream 2025-12-04T08:53:08.4324971Z * [new branch] guangyey/test_2025 -> origin/guangyey/test_2025 2025-12-04T08:53:08.4325165Z * [new branch] guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9 2025-12-04T08:53:08.4325328Z * [new branch] hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass 2025-12-04T08:53:08.4325468Z * [new branch] hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests 2025-12-04T08:53:08.4325575Z * [new branch] hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose 2025-12-04T08:53:08.4325640Z * [new branch] hc_baseline -> origin/hc_baseline 2025-12-04T08:53:08.4325702Z * [new branch] hhh_rand -> origin/hhh_rand 2025-12-04T08:53:08.4325763Z * [new branch] huba/f1 -> origin/huba/f1 2025-12-04T08:53:08.4325951Z * [new branch] increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test 2025-12-04T08:53:08.4326015Z * [new branch] inlining -> origin/inlining 2025-12-04T08:53:08.4326085Z * [new branch] inlining-ezyang -> origin/inlining-ezyang 2025-12-04T08:53:08.4326167Z * [new branch] install-torchao-0.13.0 -> origin/install-torchao-0.13.0 2025-12-04T08:53:08.4326344Z * [new branch] instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters 2025-12-04T08:53:08.4326413Z * [new branch] invoke-subgraph -> origin/invoke-subgraph 2025-12-04T08:53:08.4326479Z * [new branch] issue#58739 -> origin/issue#58739 2025-12-04T08:53:08.4326562Z * [new branch] jainapurva-patch-1 -> origin/jainapurva-patch-1 2025-12-04T08:53:08.4326621Z * [new branch] jathu/o3 -> origin/jathu/o3 2025-12-04T08:53:08.4326684Z * [new branch] jathu/sve -> origin/jathu/sve 2025-12-04T08:53:08.4326810Z * [new branch] jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2 2025-12-04T08:53:08.4326916Z * [new branch] jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2 2025-12-04T08:53:08.4327027Z * [new branch] jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter 2025-12-04T08:53:08.4327137Z * [new branch] jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning 2025-12-04T08:53:08.4327222Z * [new branch] jithunnair-amd-patch-1 -> origin/jithunnair-amd-patch-1 2025-12-04T08:53:08.4327309Z * [new branch] jithunnair-amd-patch-10 -> origin/jithunnair-amd-patch-10 2025-12-04T08:53:08.4327390Z * [new branch] jithunnair-amd-patch-2 -> origin/jithunnair-amd-patch-2 2025-12-04T08:53:08.4327470Z * [new branch] jithunnair-amd-patch-3 -> origin/jithunnair-amd-patch-3 2025-12-04T08:53:08.4327553Z * [new branch] jithunnair-amd-patch-4 -> origin/jithunnair-amd-patch-4 2025-12-04T08:53:08.4327634Z * [new branch] jithunnair-amd-patch-5 -> origin/jithunnair-amd-patch-5 2025-12-04T08:53:08.4327712Z * [new branch] jithunnair-amd-patch-6 -> origin/jithunnair-amd-patch-6 2025-12-04T08:53:08.4327791Z * [new branch] jithunnair-amd-patch-7 -> origin/jithunnair-amd-patch-7 2025-12-04T08:53:08.4327870Z * [new branch] jithunnair-amd-patch-8 -> origin/jithunnair-amd-patch-8 2025-12-04T08:53:08.4327946Z * [new branch] jithunnair-amd-patch-9 -> origin/jithunnair-amd-patch-9 2025-12-04T08:53:08.4328024Z * [new branch] justinchu/native-qdq -> origin/justinchu/native-qdq 2025-12-04T08:53:08.4328096Z * [new branch] kainan666/xlf_debug -> origin/kainan666/xlf_debug 2025-12-04T08:53:08.4328160Z * [new branch] kainan_test -> origin/kainan_test 2025-12-04T08:53:08.4328270Z * [new branch] larryliu0820-patch-1 -> origin/larryliu0820-patch-1 2025-12-04T08:53:08.4328403Z * [new branch] leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues 2025-12-04T08:53:08.4328507Z * [new branch] lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error 2025-12-04T08:53:08.4328587Z * [new branch] liaoxuan/shm_all_reduce -> origin/liaoxuan/shm_all_reduce 2025-12-04T08:53:08.4328687Z * [new branch] liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax 2025-12-04T08:53:08.4328766Z * [new branch] liaoxuan/test_int8_sdpa -> origin/liaoxuan/test_int8_sdpa 2025-12-04T08:53:08.4328834Z * [new branch] llama4-stable -> origin/llama4-stable 2025-12-04T08:53:08.4328901Z * [new branch] lts/release/1.8 -> origin/lts/release/1.8 2025-12-04T08:53:08.4328978Z * [new branch] lucaskabela/#94773 -> origin/lucaskabela/#94773 2025-12-04T08:53:08.4329053Z * [new branch] lucaskabela/fix_164876 -> origin/lucaskabela/fix_164876 2025-12-04T08:53:08.4329136Z * [new branch] lucaskabela/flop_counter -> origin/lucaskabela/flop_counter 2025-12-04T08:53:08.4329233Z * [new branch] lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp 2025-12-04T08:53:08.4329337Z * [new branch] lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo 2025-12-04T08:53:08.4329460Z * [new branch] lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr 2025-12-04T08:53:08.4329575Z * [new branch] lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr 2025-12-04T08:53:08.4329706Z * [new branch] lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata 2025-12-04T08:53:08.4329787Z * [new branch] lucaskabela/rnn_decomp -> origin/lucaskabela/rnn_decomp 2025-12-04T08:53:08.4329882Z * [new branch] lucaskabela/typing_backends -> origin/lucaskabela/typing_backends 2025-12-04T08:53:08.4329980Z * [new branch] lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager 2025-12-04T08:53:08.4330073Z * [new branch] lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module 2025-12-04T08:53:08.4330175Z * [new branch] lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined 2025-12-04T08:53:08.4330269Z * [new branch] lucaskabela/typing_variables -> origin/lucaskabela/typing_variables 2025-12-04T08:53:08.4330380Z * [new branch] lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts 2025-12-04T08:53:08.4330501Z * [new branch] lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions 2025-12-04T08:53:08.4330611Z * [new branch] lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists 2025-12-04T08:53:08.4330686Z * [new branch] lw/torch_box_by_ref -> origin/lw/torch_box_by_ref 2025-12-04T08:53:08.4330747Z * [new branch] main -> origin/main 2025-12-04T08:53:08.4330818Z * [new branch] malfet-patch-1 -> origin/malfet-patch-1 2025-12-04T08:53:08.4330887Z * [new branch] malfet-patch-2 -> origin/malfet-patch-2 2025-12-04T08:53:08.4330953Z * [new branch] malfet-patch-3 -> origin/malfet-patch-3 2025-12-04T08:53:08.4331018Z * [new branch] malfet-patch-4 -> origin/malfet-patch-4 2025-12-04T08:53:08.4331084Z * [new branch] malfet-patch-5 -> origin/malfet-patch-5 2025-12-04T08:53:08.4331149Z * [new branch] malfet-patch-6 -> origin/malfet-patch-6 2025-12-04T08:53:08.4331251Z * [new branch] malfet-patch-7 -> origin/malfet-patch-7 2025-12-04T08:53:08.4331346Z * [new branch] malfet-patch-8 -> origin/malfet-patch-8 2025-12-04T08:53:08.4331421Z * [new branch] malfet/add-3.14-ci -> origin/malfet/add-3.14-ci 2025-12-04T08:53:08.4331581Z * [new branch] malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts 2025-12-04T08:53:08.4331746Z * [new branch] malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch 2025-12-04T08:53:08.4331871Z * [new branch] malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers 2025-12-04T08:53:08.4331968Z * [new branch] malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im 2025-12-04T08:53:08.4332084Z * [new branch] manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe 2025-12-04T08:53:08.4332175Z * [new branch] manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp 2025-12-04T08:53:08.4332250Z * [new branch] masnesral/metaconda -> origin/masnesral/metaconda 2025-12-04T08:53:08.4332325Z * [new branch] mem_profiler_flaky_fix -> origin/mem_profiler_flaky_fix 2025-12-04T08:53:08.4332403Z * [new branch] mem_profiler_stack_trace -> origin/mem_profiler_stack_trace 2025-12-04T08:53:08.4332481Z * [new branch] memory_profiler_stack -> origin/memory_profiler_stack 2025-12-04T08:53:08.4332553Z * [new branch] metascroy-patch-1 -> origin/metascroy-patch-1 2025-12-04T08:53:08.4332616Z * [new branch] mingw_posix -> origin/mingw_posix 2025-12-04T08:53:08.4332692Z * [new branch] mlazos/S429861-debug -> origin/mlazos/S429861-debug 2025-12-04T08:53:08.4332756Z * [new branch] mlazos/aa -> origin/mlazos/aa 2025-12-04T08:53:08.4332818Z * [new branch] mlazos/acts -> origin/mlazos/acts 2025-12-04T08:53:08.4332891Z * [new branch] mlazos/arg-renames -> origin/mlazos/arg-renames 2025-12-04T08:53:08.4332968Z * [new branch] mlazos/bad-cudagraphs -> origin/mlazos/bad-cudagraphs 2025-12-04T08:53:08.4333066Z * [new branch] mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks 2025-12-04T08:53:08.4333141Z * [new branch] mlazos/beta-tensor -> origin/mlazos/beta-tensor 2025-12-04T08:53:08.4333208Z * [new branch] mlazos/buffers -> origin/mlazos/buffers 2025-12-04T08:53:08.4333315Z * [new branch] mlazos/buffers2 -> origin/mlazos/buffers2 2025-12-04T08:53:08.4333385Z * [new branch] mlazos/buffers3 -> origin/mlazos/buffers3 2025-12-04T08:53:08.4333448Z * [new branch] mlazos/bwd -> origin/mlazos/bwd 2025-12-04T08:53:08.4333519Z * [new branch] mlazos/combo-test -> origin/mlazos/combo-test 2025-12-04T08:53:08.4333593Z * [new branch] mlazos/ctx-cleanup -> origin/mlazos/ctx-cleanup 2025-12-04T08:53:08.4333668Z * [new branch] mlazos/cuda-cmd-log -> origin/mlazos/cuda-cmd-log 2025-12-04T08:53:08.4333749Z * [new branch] mlazos/cudagraph-tests -> origin/mlazos/cudagraph-tests 2025-12-04T08:53:08.4333850Z * [new branch] mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement 2025-12-04T08:53:08.4333924Z * [new branch] mlazos/cutlass-test -> origin/mlazos/cutlass-test 2025-12-04T08:53:08.4334005Z * [new branch] mlazos/cutlass-topo-bug -> origin/mlazos/cutlass-topo-bug 2025-12-04T08:53:08.4334084Z * [new branch] mlazos/dataclass-proxy -> origin/mlazos/dataclass-proxy 2025-12-04T08:53:08.4334190Z * [new branch] mlazos/dc-attrs -> origin/mlazos/dc-attrs 2025-12-04T08:53:08.4334298Z * [new branch] mlazos/dc-helion -> origin/mlazos/dc-helion 2025-12-04T08:53:08.4334365Z * [new branch] mlazos/dict-fix -> origin/mlazos/dict-fix 2025-12-04T08:53:08.4334434Z * [new branch] mlazos/disable-tf -> origin/mlazos/disable-tf 2025-12-04T08:53:08.4334501Z * [new branch] mlazos/dupe-fix -> origin/mlazos/dupe-fix 2025-12-04T08:53:08.4334568Z * [new branch] mlazos/dyn-batch -> origin/mlazos/dyn-batch 2025-12-04T08:53:08.4334631Z * [new branch] mlazos/evt -> origin/mlazos/evt 2025-12-04T08:53:08.4334712Z * [new branch] mlazos/extract-examples -> origin/mlazos/extract-examples 2025-12-04T08:53:08.4334781Z * [new branch] mlazos/foreach-op -> origin/mlazos/foreach-op 2025-12-04T08:53:08.4334845Z * [new branch] mlazos/fp8 -> origin/mlazos/fp8 2025-12-04T08:53:08.4334913Z * [new branch] mlazos/fp8-bias -> origin/mlazos/fp8-bias 2025-12-04T08:53:08.4334992Z * [new branch] mlazos/fp8-bias-fusion -> origin/mlazos/fp8-bias-fusion 2025-12-04T08:53:08.4335061Z * [new branch] mlazos/fp8-fixes -> origin/mlazos/fp8-fixes 2025-12-04T08:53:08.4335126Z * [new branch] mlazos/freezing -> origin/mlazos/freezing 2025-12-04T08:53:08.4335190Z * [new branch] mlazos/h-comp -> origin/mlazos/h-comp 2025-12-04T08:53:08.4335257Z * [new branch] mlazos/h-comp2 -> origin/mlazos/h-comp2 2025-12-04T08:53:08.4335325Z * [new branch] mlazos/hash-hop -> origin/mlazos/hash-hop 2025-12-04T08:53:08.4335387Z * [new branch] mlazos/hc -> origin/mlazos/hc 2025-12-04T08:53:08.4335457Z * [new branch] mlazos/hc-cycles -> origin/mlazos/hc-cycles 2025-12-04T08:53:08.4335525Z * [new branch] mlazos/hc-fixes -> origin/mlazos/hc-fixes 2025-12-04T08:53:08.4335593Z * [new branch] mlazos/hc-fixes3 -> origin/mlazos/hc-fixes3 2025-12-04T08:53:08.4335662Z * [new branch] mlazos/hc-fixes4 -> origin/mlazos/hc-fixes4 2025-12-04T08:53:08.4335727Z * [new branch] mlazos/hc-hf -> origin/mlazos/hc-hf 2025-12-04T08:53:08.4335791Z * [new branch] mlazos/hc-mut -> origin/mlazos/hc-mut 2025-12-04T08:53:08.4335854Z * [new branch] mlazos/hc10 -> origin/mlazos/hc10 2025-12-04T08:53:08.4335915Z * [new branch] mlazos/hc11 -> origin/mlazos/hc11 2025-12-04T08:53:08.4335974Z * [new branch] mlazos/hc12 -> origin/mlazos/hc12 2025-12-04T08:53:08.4336036Z * [new branch] mlazos/hc13 -> origin/mlazos/hc13 2025-12-04T08:53:08.4336097Z * [new branch] mlazos/hc14 -> origin/mlazos/hc14 2025-12-04T08:53:08.4336156Z * [new branch] mlazos/hc15 -> origin/mlazos/hc15 2025-12-04T08:53:08.4336220Z * [new branch] mlazos/hc2 -> origin/mlazos/hc2 2025-12-04T08:53:08.4336281Z * [new branch] mlazos/hc4 -> origin/mlazos/hc4 2025-12-04T08:53:08.4336341Z * [new branch] mlazos/hc5 -> origin/mlazos/hc5 2025-12-04T08:53:08.4336403Z * [new branch] mlazos/hc6 -> origin/mlazos/hc6 2025-12-04T08:53:08.4336462Z * [new branch] mlazos/hc7 -> origin/mlazos/hc7 2025-12-04T08:53:08.4336521Z * [new branch] mlazos/hc8 -> origin/mlazos/hc8 2025-12-04T08:53:08.4336582Z * [new branch] mlazos/hc9 -> origin/mlazos/hc9 2025-12-04T08:53:08.4336653Z * [new branch] mlazos/hc_baseline2 -> origin/mlazos/hc_baseline2 2025-12-04T08:53:08.4336765Z * [new branch] mlazos/inductor-streams -> origin/mlazos/inductor-streams 2025-12-04T08:53:08.4336859Z * [new branch] mlazos/main -> origin/mlazos/main 2025-12-04T08:53:08.4336921Z * [new branch] mlazos/mcg2 -> origin/mlazos/mcg2 2025-12-04T08:53:08.4336994Z * [new branch] mlazos/meta-guards -> origin/mlazos/meta-guards 2025-12-04T08:53:08.4337098Z * [new branch] mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam 2025-12-04T08:53:08.4337193Z * [new branch] mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup 2025-12-04T08:53:08.4337260Z * [new branch] mlazos/mod-fix -> origin/mlazos/mod-fix 2025-12-04T08:53:08.4337327Z * [new branch] mlazos/mode-fix -> origin/mlazos/mode-fix 2025-12-04T08:53:08.4337391Z * [new branch] mlazos/offsets -> origin/mlazos/offsets 2025-12-04T08:53:08.4337469Z * [new branch] mlazos/overguarding -> origin/mlazos/overguarding 2025-12-04T08:53:08.4337544Z * [new branch] mlazos/proxy-ctors -> origin/mlazos/proxy-ctors 2025-12-04T08:53:08.4337613Z * [new branch] mlazos/quant-fix -> origin/mlazos/quant-fix 2025-12-04T08:53:08.4337683Z * [new branch] mlazos/resnet-fix -> origin/mlazos/resnet-fix 2025-12-04T08:53:08.4337756Z * [new branch] mlazos/rm-buf-names -> origin/mlazos/rm-buf-names 2025-12-04T08:53:08.4337820Z * [new branch] mlazos/rm-code -> origin/mlazos/rm-code 2025-12-04T08:53:08.4337887Z * [new branch] mlazos/rm-spam -> origin/mlazos/rm-spam 2025-12-04T08:53:08.4337949Z * [new branch] mlazos/rtp -> origin/mlazos/rtp 2025-12-04T08:53:08.4338025Z * [new branch] mlazos/static-idx-dbg -> origin/mlazos/static-idx-dbg 2025-12-04T08:53:08.4338112Z * [new branch] mlazos/static-inputs-log -> origin/mlazos/static-inputs-log 2025-12-04T08:53:08.4338178Z * [new branch] mlazos/stests -> origin/mlazos/stests 2025-12-04T08:53:08.4338248Z * [new branch] mlazos/stream-ops -> origin/mlazos/stream-ops 2025-12-04T08:53:08.4338317Z * [new branch] mlazos/td-fix2 -> origin/mlazos/td-fix2 2025-12-04T08:53:08.4338394Z * [new branch] mlazos/tensor-hasattr2 -> origin/mlazos/tensor-hasattr2 2025-12-04T08:53:08.4338454Z * [new branch] mlazos/test -> origin/mlazos/test 2025-12-04T08:53:08.4338520Z * [new branch] mlazos/tf-mode -> origin/mlazos/tf-mode 2025-12-04T08:53:08.4338596Z * [new branch] mlazos/tf-mode-backup2 -> origin/mlazos/tf-mode-backup2 2025-12-04T08:53:08.4338673Z * [new branch] mlazos/tf-mode-reland -> origin/mlazos/tf-mode-reland 2025-12-04T08:53:08.4338752Z * [new branch] mlazos/tf-mode-reland2 -> origin/mlazos/tf-mode-reland2 2025-12-04T08:53:08.4338830Z * [new branch] mlazos/tf-mode-reland3 -> origin/mlazos/tf-mode-reland3 2025-12-04T08:53:08.4338907Z * [new branch] mlazos/triton-no-epi -> origin/mlazos/triton-no-epi 2025-12-04T08:53:08.4338980Z * [new branch] mlazos/tune-proto -> origin/mlazos/tune-proto 2025-12-04T08:53:08.4339051Z * [new branch] mlazos/tuple-fixes -> origin/mlazos/tuple-fixes 2025-12-04T08:53:08.4339125Z * [new branch] mlazos/tuple-fixes2 -> origin/mlazos/tuple-fixes2 2025-12-04T08:53:08.4339202Z * [new branch] mlazos/tuple-handling -> origin/mlazos/tuple-handling 2025-12-04T08:53:08.4339280Z * [new branch] mlazos/user-stream-base -> origin/mlazos/user-stream-base 2025-12-04T08:53:08.4339352Z * [new branch] mlazos/user-streams -> origin/mlazos/user-streams 2025-12-04T08:53:08.4339478Z * [new branch] mlazos/user-streams-backup -> origin/mlazos/user-streams-backup 2025-12-04T08:53:08.4339602Z * [new branch] mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2 2025-12-04T08:53:08.4339673Z * [new branch] mlazos/vary-beta -> origin/mlazos/vary-beta 2025-12-04T08:53:08.4339742Z * [new branch] mlazos/vary-beta2 -> origin/mlazos/vary-beta2 2025-12-04T08:53:08.4339815Z * [new branch] mlazos/weird-perf1 -> origin/mlazos/weird-perf1 2025-12-04T08:53:08.4339889Z * [new branch] mm_out_dtype_compile -> origin/mm_out_dtype_compile 2025-12-04T08:53:08.4339952Z * [new branch] module-shim -> origin/module-shim 2025-12-04T08:53:08.4340014Z * [new branch] move_config -> origin/move_config 2025-12-04T08:53:08.4340084Z * [new branch] msaroufim/reduce -> origin/msaroufim/reduce 2025-12-04T08:53:08.4340153Z * [new branch] mtia/basic-cmake -> origin/mtia/basic-cmake 2025-12-04T08:53:08.4340256Z * [new branch] mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape 2025-12-04T08:53:08.4340326Z * [new branch] my_varlen_backup -> origin/my_varlen_backup 2025-12-04T08:53:08.4340400Z * [new branch] nativert_num_outputs -> origin/nativert_num_outputs 2025-12-04T08:53:08.4340463Z * [new branch] new-codegen -> origin/new-codegen 2025-12-04T08:53:08.4340529Z * [new branch] newtest-base -> origin/newtest-base 2025-12-04T08:53:08.4340603Z * [new branch] ngimel/addmm_dtype -> origin/ngimel/addmm_dtype 2025-12-04T08:53:08.4340667Z * [new branch] ngimel/div_inv -> origin/ngimel/div_inv 2025-12-04T08:53:08.4340743Z * [new branch] ngimel/error_index_list -> origin/ngimel/error_index_list 2025-12-04T08:53:08.4340815Z * [new branch] ngimel/gather_grid -> origin/ngimel/gather_grid 2025-12-04T08:53:08.4340903Z * [new branch] ngimel/gather_grid_release -> origin/ngimel/gather_grid_release 2025-12-04T08:53:08.4340967Z * [new branch] ngimel/gg_new -> origin/ngimel/gg_new 2025-12-04T08:53:08.4341035Z * [new branch] ngimel/hostalloc -> origin/ngimel/hostalloc 2025-12-04T08:53:08.4341104Z * [new branch] ngimel/storage_id -> origin/ngimel/storage_id 2025-12-04T08:53:08.4341166Z * [new branch] nightly -> origin/nightly 2025-12-04T08:53:08.4341281Z * [new branch] nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check 2025-12-04T08:53:08.4341407Z * [new branch] nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias 2025-12-04T08:53:08.4341536Z * [new branch] nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor 2025-12-04T08:53:08.4341660Z * [new branch] nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch 2025-12-04T08:53:08.4341778Z * [new branch] nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions 2025-12-04T08:53:08.4341888Z * [new branch] nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index 2025-12-04T08:53:08.4341955Z * [new branch] nikitaved/test -> origin/nikitaved/test 2025-12-04T08:53:08.4342081Z * [new branch] nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune 2025-12-04T08:53:08.4342158Z * [new branch] no_distributed_log_spew -> origin/no_distributed_log_spew 2025-12-04T08:53:08.4342222Z * [new branch] nofun-hack -> origin/nofun-hack 2025-12-04T08:53:08.4342311Z * [new branch] norm_bench -> origin/norm_bench 2025-12-04T08:53:08.4342419Z * [new branch] nullplay/fuse_matmul -> origin/nullplay/fuse_matmul 2025-12-04T08:53:08.4342495Z * [new branch] nullplay_fuse_matmul -> origin/nullplay_fuse_matmul 2025-12-04T08:53:08.4342561Z * [new branch] optimizer_test -> origin/optimizer_test 2025-12-04T08:53:08.4342630Z * [new branch] orig/release/1.10 -> origin/orig/release/1.10 2025-12-04T08:53:08.4342699Z * [new branch] orig/release/1.11 -> origin/orig/release/1.11 2025-12-04T08:53:08.4342766Z * [new branch] orig/release/1.12 -> origin/orig/release/1.12 2025-12-04T08:53:08.4342832Z * [new branch] orig/release/1.13 -> origin/orig/release/1.13 2025-12-04T08:53:08.4342901Z * [new branch] orig/release/1.6 -> origin/orig/release/1.6 2025-12-04T08:53:08.4342967Z * [new branch] orig/release/1.7 -> origin/orig/release/1.7 2025-12-04T08:53:08.4343033Z * [new branch] orig/release/1.8 -> origin/orig/release/1.8 2025-12-04T08:53:08.4343102Z * [new branch] orig/release/1.9 -> origin/orig/release/1.9 2025-12-04T08:53:08.4343167Z * [new branch] orig/release/2.0 -> origin/orig/release/2.0 2025-12-04T08:53:08.4343233Z * [new branch] orig/release/2.1 -> origin/orig/release/2.1 2025-12-04T08:53:08.4343339Z * [new branch] orig/release/2.2 -> origin/orig/release/2.2 2025-12-04T08:53:08.4343405Z * [new branch] orig/release/2.3 -> origin/orig/release/2.3 2025-12-04T08:53:08.4343469Z * [new branch] orig/release/2.4 -> origin/orig/release/2.4 2025-12-04T08:53:08.4343536Z * [new branch] orig/release/2.5 -> origin/orig/release/2.5 2025-12-04T08:53:08.4343603Z * [new branch] orig/release/2.6 -> origin/orig/release/2.6 2025-12-04T08:53:08.4343667Z * [new branch] orig/release/2.7 -> origin/orig/release/2.7 2025-12-04T08:53:08.4343734Z * [new branch] orig/release/2.8 -> origin/orig/release/2.8 2025-12-04T08:53:08.4343800Z * [new branch] orig/release/2.9 -> origin/orig/release/2.9 2025-12-04T08:53:08.4343888Z * [new branch] origin/gh/fxdawnn/1/base -> origin/origin/gh/fxdawnn/1/base 2025-12-04T08:53:08.4343970Z * [new branch] origin/gh/fxdawnn/1/orig -> origin/origin/gh/fxdawnn/1/orig 2025-12-04T08:53:08.4344053Z * [new branch] origin/gh/zpcore/14/orig -> origin/origin/gh/zpcore/14/orig 2025-12-04T08:53:08.4344123Z * [new branch] oulgen-patch-1 -> origin/oulgen-patch-1 2025-12-04T08:53:08.4344190Z * [new branch] oulgen-patch-2 -> origin/oulgen-patch-2 2025-12-04T08:53:08.4344257Z * [new branch] oulgen-patch-3 -> origin/oulgen-patch-3 2025-12-04T08:53:08.4344322Z * [new branch] oulgen-patch-4 -> origin/oulgen-patch-4 2025-12-04T08:53:08.4344392Z * [new branch] padded-tensor -> origin/padded-tensor 2025-12-04T08:53:08.4344456Z * [new branch] pca2 -> origin/pca2 2025-12-04T08:53:08.4344528Z * [new branch] per_channel_backup -> origin/per_channel_backup 2025-12-04T08:53:08.4344590Z * [new branch] perf_ops -> origin/perf_ops 2025-12-04T08:53:08.4344654Z * [new branch] perf_ops_2_9 -> origin/perf_ops_2_9 2025-12-04T08:53:08.4344727Z * [new branch] pianpwk-patch-1 -> origin/pianpwk-patch-1 2025-12-04T08:53:08.4344813Z * [new branch] pianpwk/__draft_debug_mode -> origin/pianpwk/__draft_debug_mode 2025-12-04T08:53:08.4344922Z * [new branch] pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft 2025-12-04T08:53:08.4345070Z * [new branch] pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile 2025-12-04T08:53:08.4345199Z * [new branch] pianpwk/_draft_triton_11_3 -> origin/pianpwk/_draft_triton_11_3 2025-12-04T08:53:08.4345294Z * [new branch] pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft 2025-12-04T08:53:08.4345397Z * [new branch] pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys 2025-12-04T08:53:08.4345493Z * [new branch] pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode 2025-12-04T08:53:08.4345598Z * [new branch] pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size 2025-12-04T08:53:08.4345673Z * [new branch] pianpwk/anomaly_tb -> origin/pianpwk/anomaly_tb 2025-12-04T08:53:08.4345755Z * [new branch] pianpwk/auto_fx_annotate -> origin/pianpwk/auto_fx_annotate 2025-12-04T08:53:08.4345869Z * [new branch] pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export 2025-12-04T08:53:08.4345955Z * [new branch] pianpwk/bert_dynamic_perf -> origin/pianpwk/bert_dynamic_perf 2025-12-04T08:53:08.4346051Z * [new branch] pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces 2025-12-04T08:53:08.4346138Z * [new branch] pianpwk/debug_hash_tensor -> origin/pianpwk/debug_hash_tensor 2025-12-04T08:53:08.4346228Z * [new branch] pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate 2025-12-04T08:53:08.4346318Z * [new branch] pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults 2025-12-04T08:53:08.4346401Z * [new branch] pianpwk/debug_mode_hacks -> origin/pianpwk/debug_mode_hacks 2025-12-04T08:53:08.4346507Z * [new branch] pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor 2025-12-04T08:53:08.4346595Z * [new branch] pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids 2025-12-04T08:53:08.4346698Z * [new branch] pianpwk/debug_mode_triton -> origin/pianpwk/debug_mode_triton 2025-12-04T08:53:08.4346794Z * [new branch] pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace 2025-12-04T08:53:08.4346893Z * [new branch] pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective 2025-12-04T08:53:08.4346992Z * [new branch] pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf 2025-12-04T08:53:08.4347117Z * [new branch] pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug 2025-12-04T08:53:08.4347226Z * [new branch] pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile 2025-12-04T08:53:08.4347320Z * [new branch] pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn 2025-12-04T08:53:08.4347434Z * [new branch] pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5 2025-12-04T08:53:08.4347527Z * [new branch] pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk 2025-12-04T08:53:08.4347629Z * [new branch] pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath 2025-12-04T08:53:08.4347707Z * [new branch] pianpwk/event_list_tree -> origin/pianpwk/event_list_tree 2025-12-04T08:53:08.4347790Z * [new branch] pianpwk/false_numel_refs -> origin/pianpwk/false_numel_refs 2025-12-04T08:53:08.4347868Z * [new branch] pianpwk/maybe_guard_rel -> origin/pianpwk/maybe_guard_rel 2025-12-04T08:53:08.4347971Z * [new branch] pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft 2025-12-04T08:53:08.4348109Z * [new branch] pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat 2025-12-04T08:53:08.4348253Z * [new branch] pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better 2025-12-04T08:53:08.4348336Z * [new branch] pianpwk/pre_forward_hook -> origin/pianpwk/pre_forward_hook 2025-12-04T08:53:08.4348447Z * [new branch] pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate 2025-12-04T08:53:08.4348552Z * [new branch] pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards 2025-12-04T08:53:08.4348634Z * [new branch] pianpwk/sym_tokens_draft -> origin/pianpwk/sym_tokens_draft 2025-12-04T08:53:08.4348712Z * [new branch] pianpwk/symint_one_hot -> origin/pianpwk/symint_one_hot 2025-12-04T08:53:08.4348824Z * [new branch] pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false 2025-12-04T08:53:08.4348923Z * [new branch] pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap 2025-12-04T08:53:08.4349009Z * [new branch] pianpwk/try_dumb_stuff -> origin/pianpwk/try_dumb_stuff 2025-12-04T08:53:08.4349087Z * [new branch] pianpwk/try_dumb_stuff_2 -> origin/pianpwk/try_dumb_stuff_2 2025-12-04T08:53:08.4349181Z * [new branch] pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm 2025-12-04T08:53:08.4349276Z * [new branch] pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2 2025-12-04T08:53:08.4349353Z * [new branch] pianpwk/user_symints -> origin/pianpwk/user_symints 2025-12-04T08:53:08.4349432Z * [new branch] pianpwk/wan21_reshape -> origin/pianpwk/wan21_reshape 2025-12-04T08:53:08.4349525Z * [new branch] piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112 2025-12-04T08:53:08.4349603Z * [new branch] piz/prop_cache_clean -> origin/piz/prop_cache_clean 2025-12-04T08:53:08.4349676Z * [new branch] pool-separate -> origin/pool-separate 2025-12-04T08:53:08.4349737Z * [new branch] pr-156087 -> origin/pr-156087 2025-12-04T08:53:08.4349797Z * [new branch] pr/131860 -> origin/pr/131860 2025-12-04T08:53:08.4349867Z * [new branch] predispatch_to -> origin/predispatch_to 2025-12-04T08:53:08.4349931Z * [new branch] protect-c17 -> origin/protect-c17 2025-12-04T08:53:08.4350001Z * [new branch] pt-opt-cuda3 -> origin/pt-opt-cuda3 2025-12-04T08:53:08.4350081Z * [new branch] python_compiled_autograd -> origin/python_compiled_autograd 2025-12-04T08:53:08.4350211Z * [new branch] q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown 2025-12-04T08:53:08.4350353Z * [new branch] q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args 2025-12-04T08:53:08.4350434Z * [new branch] qchip/export-D54134695 -> origin/qchip/export-D54134695 2025-12-04T08:53:08.4350507Z * [new branch] quote-pytest_cache -> origin/quote-pytest_cache 2025-12-04T08:53:08.4350607Z * [new branch] reland-accgrad-stream-warn -> origin/reland-accgrad-stream-warn 2025-12-04T08:53:08.4350672Z * [new branch] release/1.10 -> origin/release/1.10 2025-12-04T08:53:08.4350734Z * [new branch] release/1.11 -> origin/release/1.11 2025-12-04T08:53:08.4350798Z * [new branch] release/1.12 -> origin/release/1.12 2025-12-04T08:53:08.4350859Z * [new branch] release/1.13 -> origin/release/1.13 2025-12-04T08:53:08.4350920Z * [new branch] release/1.4 -> origin/release/1.4 2025-12-04T08:53:08.4351018Z * [new branch] release/1.4.1 -> origin/release/1.4.1 2025-12-04T08:53:08.4351109Z * [new branch] release/1.5 -> origin/release/1.5 2025-12-04T08:53:08.4351169Z * [new branch] release/1.6 -> origin/release/1.6 2025-12-04T08:53:08.4351231Z * [new branch] release/1.7 -> origin/release/1.7 2025-12-04T08:53:08.4351290Z * [new branch] release/1.8 -> origin/release/1.8 2025-12-04T08:53:08.4351349Z * [new branch] release/1.9 -> origin/release/1.9 2025-12-04T08:53:08.4351408Z * [new branch] release/2.0 -> origin/release/2.0 2025-12-04T08:53:08.4351467Z * [new branch] release/2.1 -> origin/release/2.1 2025-12-04T08:53:08.4351526Z * [new branch] release/2.2 -> origin/release/2.2 2025-12-04T08:53:08.4351591Z * [new branch] release/2.3 -> origin/release/2.3 2025-12-04T08:53:08.4351651Z * [new branch] release/2.4 -> origin/release/2.4 2025-12-04T08:53:08.4351712Z * [new branch] release/2.5 -> origin/release/2.5 2025-12-04T08:53:08.4351773Z * [new branch] release/2.6 -> origin/release/2.6 2025-12-04T08:53:08.4351833Z * [new branch] release/2.7 -> origin/release/2.7 2025-12-04T08:53:08.4351895Z * [new branch] release/2.8 -> origin/release/2.8 2025-12-04T08:53:08.4351955Z * [new branch] release/2.9 -> origin/release/2.9 2025-12-04T08:53:08.4352020Z * [new branch] release_notes -> origin/release_notes 2025-12-04T08:53:08.4352097Z * [new branch] remove_pyinterpreter -> origin/remove_pyinterpreter 2025-12-04T08:53:08.4352219Z * [new branch] replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836 2025-12-04T08:53:08.4352340Z * [new branch] replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248 2025-12-04T08:53:08.4352459Z * [new branch] replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324 2025-12-04T08:53:08.4352577Z * [new branch] replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020 2025-12-04T08:53:08.4352707Z * [new branch] revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head 2025-12-04T08:53:08.4352819Z * [new branch] revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head 2025-12-04T08:53:08.4352921Z * [new branch] revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head 2025-12-04T08:53:08.4353021Z * [new branch] revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head 2025-12-04T08:53:08.4353191Z * [new branch] revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ 2025-12-04T08:53:08.4353337Z * [new branch] revert-hoo-invoke-subgraph -> origin/revert-hoo-invoke-subgraph 2025-12-04T08:53:08.4353439Z * [new branch] revert_always_build_distributed -> origin/revert_always_build_distributed 2025-12-04T08:53:08.4353507Z * [new branch] rms_norm_patch -> origin/rms_norm_patch 2025-12-04T08:53:08.4353603Z * [new branch] ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation 2025-12-04T08:53:08.4353687Z * [new branch] ruisi/fix_comm_estimation -> origin/ruisi/fix_comm_estimation 2025-12-04T08:53:08.4353792Z * [new branch] ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation 2025-12-04T08:53:08.4353929Z * [new branch] ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing 2025-12-04T08:53:08.4354076Z * [new branch] ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass 2025-12-04T08:53:08.4354161Z * [new branch] ruisi/manual_bucket_pass -> origin/ruisi/manual_bucket_pass 2025-12-04T08:53:08.4354307Z * [new branch] ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures 2025-12-04T08:53:08.4354394Z * [new branch] ryanguo99/fix-closure-var -> origin/ryanguo99/fix-closure-var 2025-12-04T08:53:08.4354472Z * [new branch] rzou/faketensor_bench -> origin/rzou/faketensor_bench 2025-12-04T08:53:08.4354535Z * [new branch] rzou/njt -> origin/rzou/njt 2025-12-04T08:53:08.4354598Z * [new branch] rzou/pca -> origin/rzou/pca 2025-12-04T08:53:08.4354665Z * [new branch] rzou/realprop -> origin/rzou/realprop 2025-12-04T08:53:08.4354731Z * [new branch] samplevllm -> origin/samplevllm 2025-12-04T08:53:08.4354899Z * [new branch] sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm 2025-12-04T08:53:08.4354992Z * [new branch] sapling-pr-archive-SS-JIA -> origin/sapling-pr-archive-SS-JIA 2025-12-04T08:53:08.4355103Z * [new branch] sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain 2025-12-04T08:53:08.4355163Z * [new branch] save -> origin/save 2025-12-04T08:53:08.4355223Z * [new branch] scaled_mm -> origin/scaled_mm 2025-12-04T08:53:08.4355288Z * [new branch] scan_attempt -> origin/scan_attempt 2025-12-04T08:53:08.4355350Z * [new branch] sdym/2.5.1 -> origin/sdym/2.5.1 2025-12-04T08:53:08.4355455Z * [new branch] sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix 2025-12-04T08:53:08.4355532Z * [new branch] shengf/fx-xform-perf -> origin/shengf/fx-xform-perf 2025-12-04T08:53:08.4355607Z * [new branch] shoumikhin-patch-1 -> origin/shoumikhin-patch-1 2025-12-04T08:53:08.4355681Z * [new branch] solve-accuracy-fix -> origin/solve-accuracy-fix 2025-12-04T08:53:08.4355761Z * [new branch] some_rocm_inductor_skips -> origin/some_rocm_inductor_skips 2025-12-04T08:53:08.4355840Z * [new branch] soulitzer/stash-tls-ac -> origin/soulitzer/stash-tls-ac 2025-12-04T08:53:08.4355921Z * [new branch] sparse-mm-bf16-support -> origin/sparse-mm-bf16-support 2025-12-04T08:53:08.4355994Z * [new branch] starterTaskUpdate -> origin/starterTaskUpdate 2025-12-04T08:53:08.4356052Z * [new branch] suo -> origin/suo 2025-12-04T08:53:08.4356116Z * [new branch] sve-poc -> origin/sve-poc 2025-12-04T08:53:08.4356180Z * [new branch] switch-bn -> origin/switch-bn 2025-12-04T08:53:08.4356275Z * [new branch] sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop 2025-12-04T08:53:08.4356344Z * [new branch] sy_aot_eager_record -> origin/sy_aot_eager_record 2025-12-04T08:53:08.4356413Z * [new branch] sy_custom_bucketing -> origin/sy_custom_bucketing 2025-12-04T08:53:08.4356479Z * [new branch] sy_debug_mode_test -> origin/sy_debug_mode_test 2025-12-04T08:53:08.4356547Z * [new branch] sy_deserialize -> origin/sy_deserialize 2025-12-04T08:53:08.4356612Z * [new branch] sy_dump_gm_code -> origin/sy_dump_gm_code 2025-12-04T08:53:08.4356673Z * [new branch] sy_exp -> origin/sy_exp 2025-12-04T08:53:08.4356773Z * [new branch] sy_export_annotation -> origin/sy_export_annotation 2025-12-04T08:53:08.4356840Z * [new branch] sy_invoke_subgraph -> origin/sy_invoke_subgraph 2025-12-04T08:53:08.4356930Z * [new branch] sy_kernel_bw_name -> origin/sy_kernel_bw_name 2025-12-04T08:53:08.4356994Z * [new branch] sy_multi_arch -> origin/sy_multi_arch 2025-12-04T08:53:08.4357061Z * [new branch] sy_nn_module_stack -> origin/sy_nn_module_stack 2025-12-04T08:53:08.4357132Z * [new branch] sy_original_dtensor -> origin/sy_original_dtensor 2025-12-04T08:53:08.4357199Z * [new branch] sy_profiler_cia -> origin/sy_profiler_cia 2025-12-04T08:53:08.4357261Z * [new branch] symm_mem_sync -> origin/symm_mem_sync 2025-12-04T08:53:08.4357345Z * [new branch] sympy-bottleneck-repro -> origin/sympy-bottleneck-repro 2025-12-04T08:53:08.4357424Z * [new branch] tensordict_integration -> origin/tensordict_integration 2025-12-04T08:53:08.4357505Z * [new branch] test-move-conda-builds -> origin/test-move-conda-builds 2025-12-04T08:53:08.4357570Z * [new branch] test-old -> origin/test-old 2025-12-04T08:53:08.4359430Z * [new branch] test/bmm_heur -> origin/test/bmm_heur 2025-12-04T08:53:08.4359542Z * [new branch] tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix 2025-12-04T08:53:08.4359656Z * [new branch] tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune 2025-12-04T08:53:08.4359738Z * [new branch] tianren/customOp_fusion -> origin/tianren/customOp_fusion 2025-12-04T08:53:08.4359864Z * [new branch] tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark 2025-12-04T08:53:08.4359998Z * [new branch] tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix 2025-12-04T08:53:08.4360110Z * [new branch] tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config 2025-12-04T08:53:08.4360204Z * [new branch] tianren/dynamic_range_input -> origin/tianren/dynamic_range_input 2025-12-04T08:53:08.4360303Z * [new branch] tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix 2025-12-04T08:53:08.4360405Z * [new branch] tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge 2025-12-04T08:53:08.4360503Z * [new branch] tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp 2025-12-04T08:53:08.4360584Z * [new branch] tianren/fx_codegen_dump -> origin/tianren/fx_codegen_dump 2025-12-04T08:53:08.4360666Z * [new branch] tianren/symmetric_memory -> origin/tianren/symmetric_memory 2025-12-04T08:53:08.4360731Z * [new branch] tianren/test -> origin/tianren/test 2025-12-04T08:53:08.4360807Z * [new branch] tidy_performance_cyy -> origin/tidy_performance_cyy 2025-12-04T08:53:08.4360868Z * [new branch] tmp -> origin/tmp 2025-12-04T08:53:08.4360935Z * [new branch] torchtitan_ep -> origin/torchtitan_ep 2025-12-04T08:53:08.4361014Z * [new branch] torchtitan_integration -> origin/torchtitan_integration 2025-12-04T08:53:08.4361096Z * [new branch] trace_fsdp_torchtune_lora -> origin/trace_fsdp_torchtune_lora 2025-12-04T08:53:08.4361177Z * [new branch] traceable_fsdp_unit_tests -> origin/traceable_fsdp_unit_tests 2025-12-04T08:53:08.4361249Z * [new branch] tree_loop_vec_base -> origin/tree_loop_vec_base 2025-12-04T08:53:08.4361314Z * [new branch] triton_kernel -> origin/triton_kernel 2025-12-04T08:53:08.4361374Z * [new branch] tt_pkg_1908 -> origin/tt_pkg_1908 2025-12-04T08:53:08.4361474Z * [new branch] type_dec -> origin/type_dec 2025-12-04T08:53:08.4361600Z * [new branch] udate-sphinx-dependancies -> origin/udate-sphinx-dependancies 2025-12-04T08:53:08.4361736Z * [new branch] update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1 2025-12-04T08:53:08.4361868Z * [new branch] update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1 2025-12-04T08:53:08.4361999Z * [new branch] update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1 2025-12-04T08:53:08.4362128Z * [new branch] update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1 2025-12-04T08:53:08.4362257Z * [new branch] update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1 2025-12-04T08:53:08.4362386Z * [new branch] update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1 2025-12-04T08:53:08.4362523Z * [new branch] update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2 2025-12-04T08:53:08.4362655Z * [new branch] update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1 2025-12-04T08:53:08.4362785Z * [new branch] update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1 2025-12-04T08:53:08.4362916Z * [new branch] update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1 2025-12-04T08:53:08.4363047Z * [new branch] update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1 2025-12-04T08:53:08.4363182Z * [new branch] update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1 2025-12-04T08:53:08.4363357Z * [new branch] update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1 2025-12-04T08:53:08.4363442Z * [new branch] update-vllm-dockerfile -> origin/update-vllm-dockerfile 2025-12-04T08:53:08.4363567Z * [new branch] update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1 2025-12-04T08:53:08.4363686Z * [new branch] update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1 2025-12-04T08:53:08.4363805Z * [new branch] update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1 2025-12-04T08:53:08.4363932Z * [new branch] update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388 2025-12-04T08:53:08.4364011Z * [new branch] update_operator_readme -> origin/update_operator_readme 2025-12-04T08:53:08.4364101Z * [new branch] update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736 2025-12-04T08:53:08.4364188Z * [new branch] update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173 2025-12-04T08:53:08.4364273Z * [new branch] update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677 2025-12-04T08:53:08.4364357Z * [new branch] update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283 2025-12-04T08:53:08.4364446Z * [new branch] update_submodule_FBGEMM -> origin/update_submodule_FBGEMM 2025-12-04T08:53:08.4364524Z * [new branch] update_submodule_kineto -> origin/update_submodule_kineto 2025-12-04T08:53:08.4364616Z * [new branch] update_submodule_tensorpipe -> origin/update_submodule_tensorpipe 2025-12-04T08:53:08.4364714Z * [new branch] upload-tests-for-autorevert -> origin/upload-tests-for-autorevert 2025-12-04T08:53:08.4364838Z * [new branch] v0.1.2 -> origin/v0.1.2 2025-12-04T08:53:08.4364942Z * [new branch] v1.0.1 -> origin/v1.0.1 2025-12-04T08:53:08.4365001Z * [new branch] v1.0.3 -> origin/v1.0.3 2025-12-04T08:53:08.4365060Z * [new branch] v1.1.0 -> origin/v1.1.0 2025-12-04T08:53:08.4365119Z * [new branch] v1.2.0 -> origin/v1.2.0 2025-12-04T08:53:08.4365174Z * [new branch] v1.3.0 -> origin/v1.3.0 2025-12-04T08:53:08.4365230Z * [new branch] v1.3.1 -> origin/v1.3.1 2025-12-04T08:53:08.4365294Z * [new branch] validate_fn -> origin/validate_fn 2025-12-04T08:53:08.4365361Z * [new branch] validations_2.6 -> origin/validations_2.6 2025-12-04T08:53:08.4365429Z * [new branch] validations_2.8 -> origin/validations_2.8 2025-12-04T08:53:08.4365495Z * [new branch] varlen-api -> origin/varlen-api 2025-12-04T08:53:08.4365570Z * [new branch] varlen-api-backup -> origin/varlen-api-backup 2025-12-04T08:53:08.4365646Z * [new branch] varlen_batch_invariance -> origin/varlen_batch_invariance 2025-12-04T08:53:08.4365713Z * [new branch] viable/strict -> origin/viable/strict 2025-12-04T08:53:08.4365828Z * [new branch] vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy 2025-12-04T08:53:08.4365891Z * [new branch] vllmbuildci -> origin/vllmbuildci 2025-12-04T08:53:08.4365953Z * [new branch] vllmpin -> origin/vllmpin 2025-12-04T08:53:08.4366042Z * [new branch] vscode-recommend-pyrefly -> origin/vscode-recommend-pyrefly 2025-12-04T08:53:08.4366110Z * [new branch] wdvr-patch-1 -> origin/wdvr-patch-1 2025-12-04T08:53:08.4366176Z * [new branch] wdvr/iss_145259 -> origin/wdvr/iss_145259 2025-12-04T08:53:08.4366239Z * [new branch] whc/pei -> origin/whc/pei 2025-12-04T08:53:08.4366303Z * [new branch] whc/pp_fix -> origin/whc/pp_fix 2025-12-04T08:53:08.4366368Z * [new branch] whc/sharding -> origin/whc/sharding 2025-12-04T08:53:08.4366433Z * [new branch] whc/sharding2 -> origin/whc/sharding2 2025-12-04T08:53:08.4366499Z * [new branch] whc/uneven -> origin/whc/uneven 2025-12-04T08:53:08.4366570Z * [new branch] whc/uneven-merge -> origin/whc/uneven-merge 2025-12-04T08:53:08.4366632Z * [new branch] win_warnings -> origin/win_warnings 2025-12-04T08:53:08.4366709Z * [new branch] windows_libtorch_free -> origin/windows_libtorch_free 2025-12-04T08:53:08.4366773Z * [new branch] xmfan-war -> origin/xmfan-war 2025-12-04T08:53:08.4366837Z * [new branch] xmfan/ca_0516 -> origin/xmfan/ca_0516 2025-12-04T08:53:08.4366905Z * [new branch] xmfan/ca_1051b93192 -> origin/xmfan/ca_1051b93192 2025-12-04T08:53:08.4367058Z * [new branch] xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 2025-12-04T08:53:08.4367126Z * [new branch] xmfan/ca_5a2be192d1 -> origin/xmfan/ca_5a2be192d1 2025-12-04T08:53:08.4367198Z * [new branch] xmfan/ca_9d59b516e9 -> origin/xmfan/ca_9d59b516e9 2025-12-04T08:53:08.4367262Z * [new branch] xmfan/ca_apr8 -> origin/xmfan/ca_apr8 2025-12-04T08:53:08.4367325Z * [new branch] xmfan/ca_base -> origin/xmfan/ca_base 2025-12-04T08:53:08.4367393Z * [new branch] xmfan/ca_dynamic -> origin/xmfan/ca_dynamic 2025-12-04T08:53:08.4367486Z * [new branch] xmfan/ca_fix_dyn -> origin/xmfan/ca_fix_dyn 2025-12-04T08:53:08.4367587Z * [new branch] xmfan/ca_fix_lowering -> origin/xmfan/ca_fix_lowering 2025-12-04T08:53:08.4367662Z * [new branch] xmfan/ca_fix_polyfills -> origin/xmfan/ca_fix_polyfills 2025-12-04T08:53:08.4367725Z * [new branch] xmfan/ca_jan3 -> origin/xmfan/ca_jan3 2025-12-04T08:53:08.4367791Z * [new branch] xmfan/ca_jun18 -> origin/xmfan/ca_jun18 2025-12-04T08:53:08.4367857Z * [new branch] xmfan/ca_jun24 -> origin/xmfan/ca_jun24 2025-12-04T08:53:08.4367922Z * [new branch] xmfan/ca_nested -> origin/xmfan/ca_nested 2025-12-04T08:53:08.4367989Z * [new branch] xmfan/ca_overhead -> origin/xmfan/ca_overhead 2025-12-04T08:53:08.4368082Z * [new branch] xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451 2025-12-04T08:53:08.4368153Z * [new branch] xmfan/cacu_jun18 -> origin/xmfan/cacu_jun18 2025-12-04T08:53:08.4368222Z * [new branch] xmfan/cacu_jun19 -> origin/xmfan/cacu_jun19 2025-12-04T08:53:08.4368287Z * [new branch] xmfan/cacu_jun4 -> origin/xmfan/cacu_jun4 2025-12-04T08:53:08.4368370Z * [new branch] xmfan/disable_duck_shape -> origin/xmfan/disable_duck_shape 2025-12-04T08:53:08.4368467Z * [new branch] xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough 2025-12-04T08:53:08.4368619Z * [new branch] xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:53:08.4368766Z * [new branch] xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 2025-12-04T08:53:08.4368838Z * [new branch] xmfan/single_step -> origin/xmfan/single_step 2025-12-04T08:53:08.4368902Z * [new branch] xmfan/sth_0829 -> origin/xmfan/sth_0829 2025-12-04T08:53:08.4368965Z * [new branch] xmfan/test -> origin/xmfan/test 2025-12-04T08:53:08.4369051Z * [new branch] yguo/debug-0226-constexpr -> origin/yguo/debug-0226-constexpr 2025-12-04T08:53:08.4369129Z * [new branch] yguo/new_latest_changes -> origin/yguo/new_latest_changes 2025-12-04T08:53:08.4369222Z * [new branch] yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes 2025-12-04T08:53:08.4369288Z * [new branch] yiming/bootcamp -> origin/yiming/bootcamp 2025-12-04T08:53:08.4369391Z * [new branch] yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop 2025-12-04T08:53:08.4369454Z * [new branch] yolo-llama3 -> origin/yolo-llama3 2025-12-04T08:53:08.4369526Z * [new branch] zainr/canary-test -> origin/zainr/canary-test 2025-12-04T08:53:08.4369612Z * [new branch] zainr/cleanup-gh-runners -> origin/zainr/cleanup-gh-runners 2025-12-04T08:53:08.4369692Z * [new branch] zainr/pull-migration-c -> origin/zainr/pull-migration-c 2025-12-04T08:53:08.4369754Z * [new branch] zainr/test2 -> origin/zainr/test2 2025-12-04T08:53:08.4369828Z * [new branch] zasdfgbnm-patch-3 -> origin/zasdfgbnm-patch-3 2025-12-04T08:53:08.4369887Z * [new branch] zb2p -> origin/zb2p 2025-12-04T08:53:08.4369969Z * [new branch] zeros-and-scatter-part2 -> origin/zeros-and-scatter-part2 2025-12-04T08:53:08.4370056Z * [new branch] zhxchen17/ci/vllm_lora_oom -> origin/zhxchen17/ci/vllm_lora_oom 2025-12-04T08:53:08.4370157Z * [new branch] zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom 2025-12-04T08:53:08.4370260Z * [new branch] zhxchen17/ci/vllm_pin -> origin/zhxchen17/ci/vllm_pin 2025-12-04T08:53:08.4370412Z * [new branch] zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards 2025-12-04T08:53:08.4370511Z * [new branch] zhxchen17/export/call_override -> origin/zhxchen17/export/call_override 2025-12-04T08:53:08.4370598Z * [new branch] zhxchen17/export/codemod1 -> origin/zhxchen17/export/codemod1 2025-12-04T08:53:08.4370687Z * [new branch] zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return 2025-12-04T08:53:08.4370816Z * [new branch] zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn 2025-12-04T08:53:08.4370914Z * [new branch] zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check 2025-12-04T08:53:08.4371000Z * [new branch] zhxchen17/precompile/aoti -> origin/zhxchen17/precompile/aoti 2025-12-04T08:53:08.4371097Z * [new branch] zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals 2025-12-04T08:53:08.4371218Z * [new branch] zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards 2025-12-04T08:53:08.4371290Z * [new branch] zhxchen17/scratch/0 -> origin/zhxchen17/scratch/0 2025-12-04T08:53:08.4371394Z * [new branch] zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update 2025-12-04T08:53:08.4371475Z * [new branch] zhxhcen17/moodycamel -> origin/zhxhcen17/moodycamel 2025-12-04T08:53:08.4371549Z * [new branch] zxiiro/build-times -> origin/zxiiro/build-times 2025-12-04T08:53:08.4371620Z * [new branch] zxiiro/c7i.2xlarge -> origin/zxiiro/c7i.2xlarge 2025-12-04T08:53:08.4371699Z * [new branch] zxiiro/c7i.2xlarge.h100 -> origin/zxiiro/c7i.2xlarge.h100 2025-12-04T08:53:08.4371762Z * [new branch] zxiiro/main -> origin/zxiiro/main 2025-12-04T08:53:08.4371827Z * [new branch] zxiiro/risc64 -> origin/zxiiro/risc64 2025-12-04T08:53:08.4371919Z * [new branch] zxiiro/test-multicloud-arc -> origin/zxiiro/test-multicloud-arc 2025-12-04T08:53:08.4371979Z * [new tag] ciflow/dynamo/169525 -> ciflow/dynamo/169525 2025-12-04T08:53:08.4372049Z t [tag update] ciflow/inductor/167647 -> ciflow/inductor/167647 2025-12-04T08:53:08.4372115Z t [tag update] ciflow/inductor/168266 -> ciflow/inductor/168266 2025-12-04T08:53:08.4372181Z t [tag update] ciflow/inductor/169535 -> ciflow/inductor/169535 2025-12-04T08:53:08.4372242Z * [new tag] ciflow/trunk/165728 -> ciflow/trunk/165728 2025-12-04T08:53:08.4372301Z * [new tag] ciflow/trunk/169048 -> ciflow/trunk/169048 2025-12-04T08:53:08.4372361Z * [new tag] ciflow/trunk/169125 -> ciflow/trunk/169125 2025-12-04T08:53:08.4372420Z * [new tag] ciflow/trunk/169555 -> ciflow/trunk/169555 2025-12-04T08:53:08.4372480Z * [new tag] ciflow/xpu/169555 -> ciflow/xpu/169555 2025-12-04T08:53:08.6387137Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T08:53:08.6524026Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:08.6529328Z ##[endgroup] 2025-12-04T08:53:08.6529838Z ##[group]Determining the checkout info 2025-12-04T08:53:08.6530616Z ##[endgroup] 2025-12-04T08:53:08.6535751Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T08:53:08.6634119Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T08:53:08.6658918Z ##[group]Checking out the ref 2025-12-04T08:53:08.6660361Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:08.6955940Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:53:08.6962090Z ##[endgroup] 2025-12-04T08:53:08.6962267Z ##[group]Setting up auth for fetching submodules 2025-12-04T08:53:08.6966790Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:53:08.6993063Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T08:53:08.7014884Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T08:53:08.7030897Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T08:53:08.7053456Z ##[endgroup] 2025-12-04T08:53:08.7053616Z ##[group]Fetching submodules 2025-12-04T08:53:08.7055263Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T08:53:08.7266037Z Synchronizing submodule url for 'android/libs/fbjni' 2025-12-04T08:53:08.7278616Z Synchronizing submodule url for 'third_party/FP16' 2025-12-04T08:53:08.7291276Z Synchronizing submodule url for 'third_party/FXdiv' 2025-12-04T08:53:08.7310389Z Synchronizing submodule url for 'third_party/NNPACK' 2025-12-04T08:53:08.7322270Z Synchronizing submodule url for 'third_party/NVTX' 2025-12-04T08:53:08.7337397Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:08.7349437Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-12-04T08:53:08.7364532Z Synchronizing submodule url for 'third_party/aiter' 2025-12-04T08:53:08.7376068Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:08.7392054Z Synchronizing submodule url for 'third_party/benchmark' 2025-12-04T08:53:08.7405294Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-12-04T08:53:08.7419219Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-12-04T08:53:08.7429582Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-12-04T08:53:08.7440351Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-12-04T08:53:08.7450649Z Synchronizing submodule url for 'third_party/cutlass' 2025-12-04T08:53:08.7468307Z Synchronizing submodule url for 'third_party/fbgemm' 2025-12-04T08:53:08.7487801Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:08.7498982Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:08.7513921Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:08.7525066Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:08.7537770Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:08.7548169Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:08.7568626Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-12-04T08:53:08.7582945Z Synchronizing submodule url for 'third_party/flash-attention' 2025-12-04T08:53:08.7605829Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:08.7621810Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:08.7638055Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-12-04T08:53:08.7650446Z Synchronizing submodule url for 'third_party/fmt' 2025-12-04T08:53:08.7662218Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:08.7673342Z Synchronizing submodule url for 'third_party/gloo' 2025-12-04T08:53:08.7685095Z Synchronizing submodule url for 'third_party/googletest' 2025-12-04T08:53:08.7696682Z Synchronizing submodule url for 'third_party/ideep' 2025-12-04T08:53:08.7710194Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:08.7733171Z Synchronizing submodule url for 'third_party/ittapi' 2025-12-04T08:53:08.7744608Z Synchronizing submodule url for 'third_party/kineto' 2025-12-04T08:53:08.7756878Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:08.7770602Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:08.7785939Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:08.7797780Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:08.7808449Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:08.7821494Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:08.7833367Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:08.7843615Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:08.7852614Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:08.7865705Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:08.7876099Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:08.7886097Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:08.7904128Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:08.7921988Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:08.7932496Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:08.7944553Z Synchronizing submodule url for 'third_party/kleidiai' 2025-12-04T08:53:08.7959903Z Synchronizing submodule url for 'third_party/mimalloc' 2025-12-04T08:53:08.7978258Z Synchronizing submodule url for 'third_party/nlohmann' 2025-12-04T08:53:08.7990448Z Synchronizing submodule url for 'third_party/onnx' 2025-12-04T08:53:08.8018274Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:08.8038468Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-12-04T08:53:08.8051406Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:08.8063210Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:08.8074115Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:08.8085375Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:08.8095016Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:08.8104237Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:08.8113082Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:08.8133288Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:08.8145669Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:08.8158799Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:08.8180340Z Synchronizing submodule url for 'third_party/pocketfft' 2025-12-04T08:53:08.8192052Z Synchronizing submodule url for 'third_party/protobuf' 2025-12-04T08:53:08.8206920Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:08.8218864Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:08.8233661Z Synchronizing submodule url for 'third_party/psimd' 2025-12-04T08:53:08.8247522Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-12-04T08:53:08.8261008Z Synchronizing submodule url for 'third_party/pybind11' 2025-12-04T08:53:08.8277140Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-12-04T08:53:08.8288590Z Synchronizing submodule url for 'third_party/sleef' 2025-12-04T08:53:08.8301120Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-12-04T08:53:08.8314374Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:08.8324989Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:08.8336391Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:08.8354149Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:08.8366117Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:08.8393407Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T08:53:08.8676219Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T08:53:08.8749827Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T08:53:08.8822316Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T08:53:08.8941926Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T08:53:08.9016735Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T08:53:08.9082725Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T08:53:09.2456182Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T08:53:09.2636846Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T08:53:09.2835316Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T08:53:09.2949179Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T08:53:09.3166313Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:53:09.3226213Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T08:53:09.3888899Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T08:53:09.3985566Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T08:53:09.4116670Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T08:53:09.4824858Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T08:53:09.5150647Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T08:53:09.6957552Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:53:09.7623498Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T08:53:10.1937495Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T08:53:10.2163547Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:10.2237932Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T08:53:10.2817208Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T08:53:10.2938623Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T08:53:10.3133397Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T08:53:10.3249409Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T08:53:10.3346769Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T08:53:10.3513957Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T08:53:10.3733360Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T08:53:10.3841864Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T08:53:10.4019525Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:10.4093129Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T08:53:10.5647427Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T08:53:10.5749035Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T08:53:10.5833479Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T08:53:10.5926948Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T08:53:10.6005012Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T08:53:10.6080658Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T08:53:10.6157979Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T08:53:10.6216150Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T08:53:10.6282026Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T08:53:10.6350375Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T08:53:10.6415906Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:10.6500179Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T08:53:10.6569877Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T08:53:10.6634090Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T08:53:10.6723376Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T08:53:10.6780217Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:53:10.6857640Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T08:53:10.6926105Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:10.6997996Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T08:53:10.7071423Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T08:53:10.7176539Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T08:53:10.8896744Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T08:53:10.9094708Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T08:53:10.9210056Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T08:53:10.9273971Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T08:53:10.9340977Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T08:53:10.9390791Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T08:53:10.9473852Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T08:53:10.9531024Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T08:53:10.9571853Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T08:53:10.9640300Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T08:53:10.9720227Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T08:53:10.9792809Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:53:10.9939985Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T08:53:11.0018525Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T08:53:11.1359873Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T08:53:11.1449924Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T08:53:11.1667446Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T08:53:11.1721137Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T08:53:11.1811316Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T08:53:11.1991258Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T08:53:11.2217776Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T08:53:11.2470004Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T08:53:11.2591219Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T08:53:11.2767439Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T08:53:11.2875806Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T08:53:11.3166169Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T08:53:11.3314261Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T08:53:11.3387938Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T08:53:11.3424640Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T08:53:11.3654974Z Entering 'android/libs/fbjni' 2025-12-04T08:53:11.3682069Z Entering 'third_party/FP16' 2025-12-04T08:53:11.3711583Z Entering 'third_party/FXdiv' 2025-12-04T08:53:11.3735801Z Entering 'third_party/NNPACK' 2025-12-04T08:53:11.3756892Z Entering 'third_party/NVTX' 2025-12-04T08:53:11.3777805Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:11.3799626Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:11.3833495Z Entering 'third_party/aiter' 2025-12-04T08:53:11.3858412Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:11.3879739Z Entering 'third_party/benchmark' 2025-12-04T08:53:11.3909133Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:11.3941713Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:11.3964284Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:11.3987042Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:11.4010674Z Entering 'third_party/cutlass' 2025-12-04T08:53:11.4045134Z Entering 'third_party/fbgemm' 2025-12-04T08:53:11.4067325Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:11.4091670Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:11.4113629Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:11.4134072Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:11.4162583Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:11.4181727Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:11.4204956Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:11.4226107Z Entering 'third_party/flash-attention' 2025-12-04T08:53:11.4247301Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:11.4270676Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:11.4298528Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:11.4322030Z Entering 'third_party/fmt' 2025-12-04T08:53:11.4341920Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:11.4362400Z Entering 'third_party/gloo' 2025-12-04T08:53:11.4382831Z Entering 'third_party/googletest' 2025-12-04T08:53:11.4404765Z Entering 'third_party/ideep' 2025-12-04T08:53:11.4427911Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:11.4448944Z Entering 'third_party/ittapi' 2025-12-04T08:53:11.4469655Z Entering 'third_party/kineto' 2025-12-04T08:53:11.4491271Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:11.4512660Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:11.4533627Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:11.4566897Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:11.4595257Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:11.4616642Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:11.4637494Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:11.4655837Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:11.4681127Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:11.4700608Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:11.4724051Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:11.4744705Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:11.4771109Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:11.4804302Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:11.4830187Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:11.4864029Z Entering 'third_party/kleidiai' 2025-12-04T08:53:11.4887019Z Entering 'third_party/mimalloc' 2025-12-04T08:53:11.4913209Z Entering 'third_party/nlohmann' 2025-12-04T08:53:11.4940443Z Entering 'third_party/onnx' 2025-12-04T08:53:11.4969515Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:11.4993928Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:11.5023620Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:11.5046407Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:11.5069027Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:11.5097107Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:11.5125601Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:11.5146585Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:11.5169349Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:11.5190091Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:11.5210614Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:11.5234210Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:11.5267324Z Entering 'third_party/pocketfft' 2025-12-04T08:53:11.5292763Z Entering 'third_party/protobuf' 2025-12-04T08:53:11.5330045Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:11.5356243Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:11.5381845Z Entering 'third_party/psimd' 2025-12-04T08:53:11.5400624Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:11.5421481Z Entering 'third_party/pybind11' 2025-12-04T08:53:11.5445272Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:11.5467344Z Entering 'third_party/sleef' 2025-12-04T08:53:11.5489094Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:11.5512338Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:11.5542659Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:11.5573218Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:11.5596765Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:11.5620372Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:11.5656698Z ##[endgroup] 2025-12-04T08:53:11.5656910Z ##[group]Persisting credentials for submodules 2025-12-04T08:53:11.5664517Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T08:53:11.5863588Z Entering 'android/libs/fbjni' 2025-12-04T08:53:11.5892168Z Entering 'third_party/FP16' 2025-12-04T08:53:11.5920151Z Entering 'third_party/FXdiv' 2025-12-04T08:53:11.5953376Z Entering 'third_party/NNPACK' 2025-12-04T08:53:11.5979639Z Entering 'third_party/NVTX' 2025-12-04T08:53:11.6011789Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:11.6040775Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:11.6081525Z Entering 'third_party/aiter' 2025-12-04T08:53:11.6114174Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:11.6147981Z Entering 'third_party/benchmark' 2025-12-04T08:53:11.6178002Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:11.6209776Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:11.6238971Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:11.6269658Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:11.6295858Z Entering 'third_party/cutlass' 2025-12-04T08:53:11.6327240Z Entering 'third_party/fbgemm' 2025-12-04T08:53:11.6357425Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:11.6381356Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:11.6407871Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:11.6435684Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:11.6475789Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:11.6503560Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:11.6527572Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:11.6556600Z Entering 'third_party/flash-attention' 2025-12-04T08:53:11.6581770Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:11.6608288Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:11.6634977Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:11.6666839Z Entering 'third_party/fmt' 2025-12-04T08:53:11.6695157Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:11.6725002Z Entering 'third_party/gloo' 2025-12-04T08:53:11.6752547Z Entering 'third_party/googletest' 2025-12-04T08:53:11.6779963Z Entering 'third_party/ideep' 2025-12-04T08:53:11.6804983Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:11.6834419Z Entering 'third_party/ittapi' 2025-12-04T08:53:11.6860952Z Entering 'third_party/kineto' 2025-12-04T08:53:11.6889105Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:11.6917093Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:11.6950937Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:11.6975993Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:11.6999125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:11.7021730Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:11.7051675Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:11.7079227Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:11.7112696Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:11.7144486Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:11.7174765Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:11.7199125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:11.7229105Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:11.7263078Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:11.7290596Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:11.7319674Z Entering 'third_party/kleidiai' 2025-12-04T08:53:11.7352019Z Entering 'third_party/mimalloc' 2025-12-04T08:53:11.7380751Z Entering 'third_party/nlohmann' 2025-12-04T08:53:11.7413117Z Entering 'third_party/onnx' 2025-12-04T08:53:11.7450642Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:11.7479384Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:11.7509934Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:11.7534140Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:11.7565483Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:11.7591692Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:11.7617064Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:11.7641930Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:11.7670673Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:11.7700337Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:11.7737765Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:11.7767168Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:11.7802783Z Entering 'third_party/pocketfft' 2025-12-04T08:53:11.7827534Z Entering 'third_party/protobuf' 2025-12-04T08:53:11.7857113Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:11.7881445Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:11.7913472Z Entering 'third_party/psimd' 2025-12-04T08:53:11.7940835Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:11.7968899Z Entering 'third_party/pybind11' 2025-12-04T08:53:11.7998303Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:11.8027284Z Entering 'third_party/sleef' 2025-12-04T08:53:11.8059319Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:11.8093965Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:11.8120515Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:11.8147052Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:11.8176165Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:11.8202182Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:11.8246261Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T08:53:11.8424140Z Entering 'android/libs/fbjni' 2025-12-04T08:53:11.8455538Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:53:11.8463945Z Entering 'third_party/FP16' 2025-12-04T08:53:11.8488851Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:53:11.8498361Z Entering 'third_party/FXdiv' 2025-12-04T08:53:11.8533966Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:53:11.8544389Z Entering 'third_party/NNPACK' 2025-12-04T08:53:11.8576942Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:53:11.8591589Z Entering 'third_party/NVTX' 2025-12-04T08:53:11.8622857Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:53:11.8635191Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:11.8661576Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:53:11.8674307Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:11.8697017Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:53:11.8713248Z Entering 'third_party/aiter' 2025-12-04T08:53:11.8731314Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:53:11.8739718Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:11.8759926Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:53:11.8773881Z Entering 'third_party/benchmark' 2025-12-04T08:53:11.8802447Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:11.8813903Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:11.8836416Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:53:11.8849322Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:11.8873028Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:53:11.8883473Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:11.8904791Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:53:11.8917410Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:11.8940834Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:53:11.8952746Z Entering 'third_party/cutlass' 2025-12-04T08:53:11.8972631Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:53:11.8989159Z Entering 'third_party/fbgemm' 2025-12-04T08:53:11.9012752Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:53:11.9024855Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:11.9050609Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:53:11.9059442Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:11.9092485Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:53:11.9113689Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:11.9138534Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:53:11.9154444Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:11.9189197Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:53:11.9203563Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:11.9226222Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:53:11.9235866Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:11.9257067Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:53:11.9267498Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:11.9288929Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:53:11.9302421Z Entering 'third_party/flash-attention' 2025-12-04T08:53:11.9325633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:53:11.9343064Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:11.9366557Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:53:11.9379613Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:11.9401237Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:53:11.9422322Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:11.9447973Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:53:11.9462324Z Entering 'third_party/fmt' 2025-12-04T08:53:11.9490374Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:11.9502034Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:11.9523627Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:53:11.9534623Z Entering 'third_party/gloo' 2025-12-04T08:53:11.9562377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:53:11.9577083Z Entering 'third_party/googletest' 2025-12-04T08:53:11.9605995Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:11.9616608Z Entering 'third_party/ideep' 2025-12-04T08:53:11.9644469Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:53:11.9657228Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:11.9683604Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:53:11.9701888Z Entering 'third_party/ittapi' 2025-12-04T08:53:11.9734087Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:53:11.9746032Z Entering 'third_party/kineto' 2025-12-04T08:53:11.9771382Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:53:11.9782777Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:11.9807585Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:53:11.9821008Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:11.9848142Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:53:11.9860063Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:11.9889533Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:53:11.9902017Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:11.9924889Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:11.9937297Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:11.9958745Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:53:11.9972811Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:12.0001163Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:53:12.0015151Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:12.0036812Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:53:12.0046414Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:12.0077500Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:12.0093880Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:12.0115743Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:53:12.0125234Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:12.0149356Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:53:12.0161395Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:12.0187399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:12.0197114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.0216253Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:12.0227417Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.0250797Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:12.0272177Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:12.0293433Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:53:12.0303008Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:12.0324717Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:53:12.0338101Z Entering 'third_party/kleidiai' 2025-12-04T08:53:12.0361052Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:53:12.0371946Z Entering 'third_party/mimalloc' 2025-12-04T08:53:12.0393804Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:53:12.0405810Z Entering 'third_party/nlohmann' 2025-12-04T08:53:12.0432378Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:53:12.0444413Z Entering 'third_party/onnx' 2025-12-04T08:53:12.0468872Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:53:12.0492973Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:12.0515304Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:12.0532211Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:12.0559830Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:53:12.0572053Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:12.0592375Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:12.0603795Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:12.0629124Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:12.0640295Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:12.0666377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:53:12.0678572Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:12.0705757Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:53:12.0717992Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:12.0744875Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:53:12.0756620Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:12.0781252Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:53:12.0792259Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:12.0814721Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:12.0824198Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.0844877Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:12.0855193Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.0873596Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:12.0883783Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:12.0907167Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:53:12.0925633Z Entering 'third_party/pocketfft' 2025-12-04T08:53:12.0947887Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:53:12.0957940Z Entering 'third_party/protobuf' 2025-12-04T08:53:12.0975765Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:53:12.0985892Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:12.1010322Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:12.1021836Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:12.1045098Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:12.1058232Z Entering 'third_party/psimd' 2025-12-04T08:53:12.1081659Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:53:12.1092868Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:12.1114999Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:53:12.1126301Z Entering 'third_party/pybind11' 2025-12-04T08:53:12.1146480Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:12.1159119Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:12.1182871Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:53:12.1193889Z Entering 'third_party/sleef' 2025-12-04T08:53:12.1212527Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:53:12.1222414Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:12.1245639Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:53:12.1256381Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:12.1280910Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:12.1292013Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:12.1314474Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:53:12.1325627Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:12.1346162Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:53:12.1356468Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:12.1377873Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:12.1388277Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:12.1411261Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:53:12.1611834Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T08:53:12.1780621Z Entering 'android/libs/fbjni' 2025-12-04T08:53:12.1804170Z Entering 'third_party/FP16' 2025-12-04T08:53:12.1825516Z Entering 'third_party/FXdiv' 2025-12-04T08:53:12.1849970Z Entering 'third_party/NNPACK' 2025-12-04T08:53:12.1871300Z Entering 'third_party/NVTX' 2025-12-04T08:53:12.1890394Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:12.1910993Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:12.1939254Z Entering 'third_party/aiter' 2025-12-04T08:53:12.1960353Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:12.1996828Z Entering 'third_party/benchmark' 2025-12-04T08:53:12.2026236Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:12.2050816Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:12.2079224Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:12.2104393Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:12.2124052Z Entering 'third_party/cutlass' 2025-12-04T08:53:12.2149938Z Entering 'third_party/fbgemm' 2025-12-04T08:53:12.2174071Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:12.2195575Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:12.2222879Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:12.2246351Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:12.2271292Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:12.2291609Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:12.2315811Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:12.2343542Z Entering 'third_party/flash-attention' 2025-12-04T08:53:12.2367576Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:12.2390801Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:12.2414265Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:12.2441573Z Entering 'third_party/fmt' 2025-12-04T08:53:12.2468329Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:12.2491923Z Entering 'third_party/gloo' 2025-12-04T08:53:12.2515027Z Entering 'third_party/googletest' 2025-12-04T08:53:12.2536643Z Entering 'third_party/ideep' 2025-12-04T08:53:12.2559862Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:12.2588751Z Entering 'third_party/ittapi' 2025-12-04T08:53:12.2612624Z Entering 'third_party/kineto' 2025-12-04T08:53:12.2633927Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:12.2654829Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:12.2675545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:12.2697441Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:12.2720530Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:12.2741267Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:12.2770746Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:12.2790718Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:12.2812239Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:12.2839439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:12.2859259Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:12.2878828Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.2903005Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.2928781Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:12.2952392Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:12.2975907Z Entering 'third_party/kleidiai' 2025-12-04T08:53:12.2996849Z Entering 'third_party/mimalloc' 2025-12-04T08:53:12.3020960Z Entering 'third_party/nlohmann' 2025-12-04T08:53:12.3042777Z Entering 'third_party/onnx' 2025-12-04T08:53:12.3070768Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:12.3097397Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:12.3122160Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:12.3142933Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:12.3164684Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:12.3182464Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:12.3202505Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:12.3223753Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:12.3244891Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:12.3269132Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.3293641Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.3314740Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:12.3348892Z Entering 'third_party/pocketfft' 2025-12-04T08:53:12.3370534Z Entering 'third_party/protobuf' 2025-12-04T08:53:12.3393229Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:12.3417001Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:12.3443292Z Entering 'third_party/psimd' 2025-12-04T08:53:12.3468985Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:12.3498766Z Entering 'third_party/pybind11' 2025-12-04T08:53:12.3526676Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:12.3554687Z Entering 'third_party/sleef' 2025-12-04T08:53:12.3585056Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:12.3612929Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:12.3642064Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:12.3664910Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:12.3689813Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:12.3720124Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:12.3765170Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T08:53:12.3977867Z Entering 'android/libs/fbjni' 2025-12-04T08:53:12.4006141Z Entering 'third_party/FP16' 2025-12-04T08:53:12.4028372Z Entering 'third_party/FXdiv' 2025-12-04T08:53:12.4049868Z Entering 'third_party/NNPACK' 2025-12-04T08:53:12.4075526Z Entering 'third_party/NVTX' 2025-12-04T08:53:12.4111603Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:12.4135404Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:12.4172391Z Entering 'third_party/aiter' 2025-12-04T08:53:12.4199062Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:12.4226574Z Entering 'third_party/benchmark' 2025-12-04T08:53:12.4251385Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:12.4279120Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:12.4303478Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:12.4326111Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:12.4349734Z Entering 'third_party/cutlass' 2025-12-04T08:53:12.4379372Z Entering 'third_party/fbgemm' 2025-12-04T08:53:12.4405217Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:12.4428380Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:12.4450947Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:12.4472446Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:12.4494979Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:12.4516945Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:12.4542715Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:12.4566028Z Entering 'third_party/flash-attention' 2025-12-04T08:53:12.4591869Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:12.4618730Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:12.4645495Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:12.4666217Z Entering 'third_party/fmt' 2025-12-04T08:53:12.4687994Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:12.4711343Z Entering 'third_party/gloo' 2025-12-04T08:53:12.4733891Z Entering 'third_party/googletest' 2025-12-04T08:53:12.4755779Z Entering 'third_party/ideep' 2025-12-04T08:53:12.4776781Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:12.4800017Z Entering 'third_party/ittapi' 2025-12-04T08:53:12.4825034Z Entering 'third_party/kineto' 2025-12-04T08:53:12.4848466Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:12.4869579Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:12.4897395Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:12.4925505Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:12.4947472Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:12.4967653Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:12.4993029Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:12.5016807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:12.5050478Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:12.5071113Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:12.5097281Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:12.5119857Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.5142243Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.5170408Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:12.5191613Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:12.5219241Z Entering 'third_party/kleidiai' 2025-12-04T08:53:12.5241896Z Entering 'third_party/mimalloc' 2025-12-04T08:53:12.5263458Z Entering 'third_party/nlohmann' 2025-12-04T08:53:12.5288506Z Entering 'third_party/onnx' 2025-12-04T08:53:12.5315923Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:12.5341035Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:12.5363433Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:12.5382808Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:12.5403215Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:12.5427234Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:12.5448741Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:12.5469897Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:12.5489689Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:12.5508819Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:12.5528605Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:12.5551089Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:12.5581969Z Entering 'third_party/pocketfft' 2025-12-04T08:53:12.5601739Z Entering 'third_party/protobuf' 2025-12-04T08:53:12.5622731Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:12.5644967Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:12.5668569Z Entering 'third_party/psimd' 2025-12-04T08:53:12.5690131Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:12.5714103Z Entering 'third_party/pybind11' 2025-12-04T08:53:12.5735809Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:12.5756823Z Entering 'third_party/sleef' 2025-12-04T08:53:12.5780929Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:12.5801153Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:12.5821190Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:12.5839433Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:12.5865863Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:12.5888148Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:12.5929179Z ##[endgroup] 2025-12-04T08:53:12.6075530Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T08:53:12.6163493Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:12.6300639Z ##[group]Run actions/checkout@v4 2025-12-04T08:53:12.6300782Z with: 2025-12-04T08:53:12.6300923Z ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:12.6301064Z fetch-depth: 0 2025-12-04T08:53:12.6301166Z submodules: recursive 2025-12-04T08:53:12.6301276Z show-progress: false 2025-12-04T08:53:12.6301430Z repository: pytorch/pytorch 2025-12-04T08:53:12.6301810Z token: *** 2025-12-04T08:53:12.6301903Z ssh-strict: true 2025-12-04T08:53:12.6302001Z ssh-user: git 2025-12-04T08:53:12.6302097Z persist-credentials: true 2025-12-04T08:53:12.6302205Z clean: true 2025-12-04T08:53:12.6302310Z sparse-checkout-cone-mode: true 2025-12-04T08:53:12.6302430Z fetch-tags: false 2025-12-04T08:53:12.6302526Z lfs: false 2025-12-04T08:53:12.6302618Z set-safe-directory: true 2025-12-04T08:53:12.6302723Z env: 2025-12-04T08:53:12.6302821Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:12.6302922Z ##[endgroup] 2025-12-04T08:53:12.6777034Z Syncing repository: pytorch/pytorch 2025-12-04T08:53:12.6777286Z ##[group]Getting Git version info 2025-12-04T08:53:12.6777439Z Working directory is '/home/runner/_work/pytorch/pytorch' 2025-12-04T08:53:12.6790990Z [command]/usr/bin/git version 2025-12-04T08:53:12.6823071Z git version 2.52.0 2025-12-04T08:53:12.6844678Z ##[endgroup] 2025-12-04T08:53:12.6850527Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/164bc5ef-a63a-4d28-b86e-b31cea925baa/.gitconfig' 2025-12-04T08:53:12.6856332Z Temporarily overriding HOME='/home/runner/_work/_temp/164bc5ef-a63a-4d28-b86e-b31cea925baa' before making global git config changes 2025-12-04T08:53:12.6856663Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T08:53:12.6859314Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T08:53:12.6888848Z [command]/usr/bin/git config --local --get remote.origin.url 2025-12-04T08:53:12.6904741Z https://github.com/pytorch/pytorch 2025-12-04T08:53:12.6919343Z ##[group]Removing previously created refs, to avoid conflicts 2025-12-04T08:53:12.6922988Z [command]/usr/bin/git rev-parse --symbolic-full-name --verify --quiet HEAD 2025-12-04T08:53:12.6946421Z HEAD 2025-12-04T08:53:12.6983640Z ##[endgroup] 2025-12-04T08:53:12.6985698Z [command]/usr/bin/git submodule status 2025-12-04T08:53:12.7196788Z 7e1e1fe3858c63c251c637ae41a20de425dde96f android/libs/fbjni (v0.1.0-12-g7e1e1fe) 2025-12-04T08:53:12.7265394Z 4dfe081cf6bcd15db339cf2680b9281b8451eeb3 third_party/FP16 (4dfe081) 2025-12-04T08:53:12.7324731Z b408327ac2a15ec3e43352421954f5b1967701d1 third_party/FXdiv (b408327) 2025-12-04T08:53:12.7380318Z c07e3a0400713d546e0dea2d5466dd22ea389c73 third_party/NNPACK (c07e3a0) 2025-12-04T08:53:12.7413034Z 3ebbc93ded7285963bff932c678fa367eb393ba6 third_party/NVTX (v3.1.0-313-g3ebbc93) 2025-12-04T08:53:12.7464627Z 1d8f600fd424278486eade7ed3e877c99f0846b1 third_party/VulkanMemoryAllocator (v2.1.0-982-g1d8f600) 2025-12-04T08:53:12.7762781Z 51a0103656eff6fc9bfd39a4597923c4b542c883 third_party/XNNPACK (remotes/origin/ds/ndk-1243-g51a0103656) 2025-12-04T08:53:12.7799707Z 01aae101b9e5e94d6c16a9514c9fb8df99c93150 third_party/aiter (v0.1.1-92-g01aae101) 2025-12-04T08:53:12.7819994Z 299e5928955cc62af9968370293b916f5130916f third_party/benchmark (v1.9.3) 2025-12-04T08:53:12.7872695Z 7fe50dc3da2069d6645d9deb8c017a876472a977 third_party/composable_kernel (rocm-6.4.3-459-g7fe50dc3d) 2025-12-04T08:53:12.7958691Z 89c932f313c6437c38f2982869beacc89c2f2246 third_party/cpp-httplib (v0.26.0) 2025-12-04T08:53:12.8047084Z f858c30bcb16f8effd5ff46996f0514539e17abc third_party/cpuinfo (f858c30) 2025-12-04T08:53:12.8079058Z 0b1577c8c83401237d601d0d0db5210506705396 third_party/cudnn_frontend (v0.5-61-g0b1577c) 2025-12-04T08:53:12.8147516Z f88806b1e31dfa579842638740216dd41fc6c588 third_party/cutlass (v4.3.1) 2025-12-04T08:53:12.8182314Z c0b988d39a9e47c794d699f29930ed4d7c7e13a4 third_party/fbgemm (v1.4.0-rc1-2-gc0b988d39) 2025-12-04T08:53:12.8235797Z 979702c87a8713a8e0a5e9fee122b90d2ef13be5 third_party/flash-attention (v2.7.4) 2025-12-04T08:53:12.8252904Z a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757 third_party/flatbuffers (v24.12.23) 2025-12-04T08:53:12.8503074Z 407c905e45ad75fc29bf0f9bb7c5c2fd3475976f third_party/fmt (12.1.0) 2025-12-04T08:53:12.8583504Z 3fb5c176c17c765a3492cd2f0321b0dab712f350 third_party/gemmlowp/gemmlowp (remotes/origin/revert-87-master-135-g3fb5c17) 2025-12-04T08:53:12.8679112Z 54cbae0d3a67fa890b4c3d9ee162b7860315e341 third_party/gloo (remotes/origin/gh/c-p-i-o/1/base-37-g54cbae0) 2025-12-04T08:53:12.8829009Z 52eb8108c5bdec04579160ae17225d66034bd723 third_party/googletest (release-1.8.0-3544-g52eb8108) 2025-12-04T08:53:12.8906532Z 719d8e6cd7f7a0e01b155657526d693acf97c2b3 third_party/ideep (pytorch-rls-v3.7.1) 2025-12-04T08:53:12.8968154Z dec1d23ca65ab069d225dfe40dea14f455170959 third_party/ittapi (v3.25.5) 2025-12-04T08:53:12.9103925Z 31f85df8fbd89c188f14ef10f1ec65379786b943 third_party/kineto (heads/main) 2025-12-04T08:53:12.9122730Z d7770c89632329a9914ef1a90289917597639cbe third_party/kleidiai (v1.15.0) 2025-12-04T08:53:12.9138714Z fbd8b99c2b828428947d70fdc046bb55609be93e third_party/mimalloc (v2.2.4) 2025-12-04T08:53:12.9156750Z 55f93686c01528224f448c19128836e7df245f72 third_party/nlohmann (v3.12.0) 2025-12-04T08:53:12.9363071Z e709452ef2bbc1d113faf678c24e6d3467696e83 third_party/onnx (v1.18.0) 2025-12-04T08:53:12.9384977Z a799f4aed9c94b765dcdaabaeab7d5e7e2310878 third_party/opentelemetry-cpp (v1.14.2) 2025-12-04T08:53:12.9405490Z 0fa0ef591e38c2758e3184c6c23e497b9f732ffa third_party/pocketfft (release_for_eigen-40-g0fa0ef5) 2025-12-04T08:53:12.9619035Z d1eca4e4b421cd2997495c4b4e65cea6be4e9b8a third_party/protobuf (v3.7.0-rc.2-1279-gd1eca4e4b) 2025-12-04T08:53:12.9676925Z 072586a71b55b7f8c584153d223e95687148a900 third_party/psimd (heads/master) 2025-12-04T08:53:12.9721102Z 4fe0e1e183925bf8cfa6aae24237e724a96479b8 third_party/pthreadpool (0.1-144-g4fe0e1e) 2025-12-04T08:53:12.9742150Z f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8 third_party/pybind11 (v3.0.1) 2025-12-04T08:53:12.9803803Z f45429b087dd7d5bc78bb40dc7cf06425c252d67 third_party/python-peachpy (remotes/origin/pre-generated) 2025-12-04T08:53:12.9855901Z 5a1d179df9cf652951b59010a2d2075372d67f68 third_party/sleef (3.8) 2025-12-04T08:53:12.9905823Z 2b4cd91092d335a697416b2a3cb398283246849d third_party/tensorpipe (heads/main) 2025-12-04T08:53:12.9918934Z ##[group]Cleaning the repository 2025-12-04T08:53:12.9924627Z [command]/usr/bin/git clean -ffdx 2025-12-04T08:53:13.0046002Z [command]/usr/bin/git reset --hard HEAD 2025-12-04T08:53:13.0698710Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:53:13.0756443Z ##[endgroup] 2025-12-04T08:53:13.0758868Z ##[group]Disabling automatic garbage collection 2025-12-04T08:53:13.0764466Z [command]/usr/bin/git config --local gc.auto 0 2025-12-04T08:53:13.0788203Z ##[endgroup] 2025-12-04T08:53:13.0788439Z ##[group]Setting up auth 2025-12-04T08:53:13.0791759Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T08:53:13.0814677Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T08:53:13.1013486Z Entering 'android/libs/fbjni' 2025-12-04T08:53:13.1037690Z Entering 'third_party/FP16' 2025-12-04T08:53:13.1066060Z Entering 'third_party/FXdiv' 2025-12-04T08:53:13.1087160Z Entering 'third_party/NNPACK' 2025-12-04T08:53:13.1114296Z Entering 'third_party/NVTX' 2025-12-04T08:53:13.1154313Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:13.1180386Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:13.1219399Z Entering 'third_party/aiter' 2025-12-04T08:53:13.1246421Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:13.1278124Z Entering 'third_party/benchmark' 2025-12-04T08:53:13.1303016Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:13.1339459Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:13.1365169Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:13.1387736Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:13.1417676Z Entering 'third_party/cutlass' 2025-12-04T08:53:13.1448870Z Entering 'third_party/fbgemm' 2025-12-04T08:53:13.1475329Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:13.1502605Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:13.1530023Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:13.1559304Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:13.1598206Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:13.1629989Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:13.1660418Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:13.1699816Z Entering 'third_party/flash-attention' 2025-12-04T08:53:13.1730330Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:13.1757782Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:13.1804108Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:13.1834971Z Entering 'third_party/fmt' 2025-12-04T08:53:13.1860066Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:13.1884294Z Entering 'third_party/gloo' 2025-12-04T08:53:13.1908938Z Entering 'third_party/googletest' 2025-12-04T08:53:13.1932993Z Entering 'third_party/ideep' 2025-12-04T08:53:13.1956915Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:13.1984041Z Entering 'third_party/ittapi' 2025-12-04T08:53:13.2018640Z Entering 'third_party/kineto' 2025-12-04T08:53:13.2044537Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:13.2068125Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:13.2101827Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:13.2128895Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:13.2151820Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:13.2188496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:13.2216091Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:13.2245295Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:13.2271722Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:13.2292251Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:13.2312138Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:13.2335191Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.2359784Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.2387301Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:13.2417706Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:13.2450351Z Entering 'third_party/kleidiai' 2025-12-04T08:53:13.2487586Z Entering 'third_party/mimalloc' 2025-12-04T08:53:13.2513113Z Entering 'third_party/nlohmann' 2025-12-04T08:53:13.2539058Z Entering 'third_party/onnx' 2025-12-04T08:53:13.2578299Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:13.2605983Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:13.2630822Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:13.2655221Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:13.2676586Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:13.2708696Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:13.2730982Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:13.2757649Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:13.2785389Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:13.2817646Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.2841849Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.2871835Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:13.2904073Z Entering 'third_party/pocketfft' 2025-12-04T08:53:13.2935958Z Entering 'third_party/protobuf' 2025-12-04T08:53:13.2963441Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:13.2987317Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:13.3017113Z Entering 'third_party/psimd' 2025-12-04T08:53:13.3039756Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:13.3060759Z Entering 'third_party/pybind11' 2025-12-04T08:53:13.3084660Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:13.3113971Z Entering 'third_party/sleef' 2025-12-04T08:53:13.3143237Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:13.3169966Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:13.3200844Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:13.3237833Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:13.3266050Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:13.3289580Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:13.3331855Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T08:53:13.3349927Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3360993Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T08:53:13.3385673Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T08:53:13.3560480Z Entering 'android/libs/fbjni' 2025-12-04T08:53:13.3576550Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3596285Z Entering 'third_party/FP16' 2025-12-04T08:53:13.3610061Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3628624Z Entering 'third_party/FXdiv' 2025-12-04T08:53:13.3642083Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3660329Z Entering 'third_party/NNPACK' 2025-12-04T08:53:13.3677437Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3712514Z Entering 'third_party/NVTX' 2025-12-04T08:53:13.3732065Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3757856Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:13.3774317Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3798078Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:13.3812637Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3842153Z Entering 'third_party/aiter' 2025-12-04T08:53:13.3856797Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3875738Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:13.3889319Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3916179Z Entering 'third_party/benchmark' 2025-12-04T08:53:13.3934474Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3953667Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:13.3970959Z http.https://github.com/.extraheader 2025-12-04T08:53:13.3994994Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:13.4011861Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4031709Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:13.4046412Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4064730Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:13.4080079Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4107026Z Entering 'third_party/cutlass' 2025-12-04T08:53:13.4124112Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4159601Z Entering 'third_party/fbgemm' 2025-12-04T08:53:13.4174535Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4192476Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:13.4214858Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4235184Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:13.4254794Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4281497Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:13.4300552Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4317371Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:13.4336601Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4365095Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:13.4380540Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4400403Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:13.4418598Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4436371Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:13.4450119Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4473212Z Entering 'third_party/flash-attention' 2025-12-04T08:53:13.4490826Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4516050Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:13.4532588Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4562429Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:13.4582791Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4613808Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:13.4629475Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4650962Z Entering 'third_party/fmt' 2025-12-04T08:53:13.4667564Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4687405Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:13.4703929Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4723001Z Entering 'third_party/gloo' 2025-12-04T08:53:13.4737938Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4758489Z Entering 'third_party/googletest' 2025-12-04T08:53:13.4773631Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4791955Z Entering 'third_party/ideep' 2025-12-04T08:53:13.4806385Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4826065Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:13.4840555Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4865952Z Entering 'third_party/ittapi' 2025-12-04T08:53:13.4879878Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4898641Z Entering 'third_party/kineto' 2025-12-04T08:53:13.4913188Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4934809Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:13.4953533Z http.https://github.com/.extraheader 2025-12-04T08:53:13.4974001Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:13.4988623Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5009120Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:13.5025107Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5047558Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:13.5063821Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5082913Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:13.5097333Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5120976Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:13.5134418Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5155979Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:13.5170076Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5190228Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:13.5206154Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5225408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:13.5239463Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5257735Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:13.5272063Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5297826Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:13.5314963Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5332305Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.5347160Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5368155Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.5384251Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5410056Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:13.5423672Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5442255Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:13.5455972Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5475212Z Entering 'third_party/kleidiai' 2025-12-04T08:53:13.5488456Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5509014Z Entering 'third_party/mimalloc' 2025-12-04T08:53:13.5522878Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5541210Z Entering 'third_party/nlohmann' 2025-12-04T08:53:13.5559717Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5577956Z Entering 'third_party/onnx' 2025-12-04T08:53:13.5592299Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5619880Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:13.5634335Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5656214Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:13.5671437Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5693037Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:13.5707139Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5726680Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:13.5742129Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5761754Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:13.5776139Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5794645Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:13.5808431Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5824685Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:13.5838349Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5857054Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:13.5871351Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5887619Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:13.5902605Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5919734Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.5935591Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5956774Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.5970499Z http.https://github.com/.extraheader 2025-12-04T08:53:13.5991946Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:13.6003886Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6029693Z Entering 'third_party/pocketfft' 2025-12-04T08:53:13.6043202Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6061583Z Entering 'third_party/protobuf' 2025-12-04T08:53:13.6075155Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6092144Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:13.6108140Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6127841Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:13.6139758Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6160531Z Entering 'third_party/psimd' 2025-12-04T08:53:13.6172887Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6190169Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:13.6203651Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6220312Z Entering 'third_party/pybind11' 2025-12-04T08:53:13.6232335Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6249512Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:13.6262350Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6284918Z Entering 'third_party/sleef' 2025-12-04T08:53:13.6298759Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6317575Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:13.6331734Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6359596Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:13.6374380Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6392698Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:13.6404295Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6421821Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:13.6436079Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6456092Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:13.6468834Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6486308Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:13.6501339Z http.https://github.com/.extraheader 2025-12-04T08:53:13.6538444Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.6562705Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T08:53:13.6715305Z Entering 'android/libs/fbjni' 2025-12-04T08:53:13.6725249Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:53:13.6736885Z Entering 'third_party/FP16' 2025-12-04T08:53:13.6748192Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:53:13.6757430Z Entering 'third_party/FXdiv' 2025-12-04T08:53:13.6769090Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:53:13.6779495Z Entering 'third_party/NNPACK' 2025-12-04T08:53:13.6790426Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:53:13.6802346Z Entering 'third_party/NVTX' 2025-12-04T08:53:13.6812950Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:53:13.6822377Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:13.6832994Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:53:13.6845087Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:13.6856245Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:53:13.6873374Z Entering 'third_party/aiter' 2025-12-04T08:53:13.6883758Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:53:13.6893898Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:13.6904852Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:53:13.6918312Z Entering 'third_party/benchmark' 2025-12-04T08:53:13.6928603Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:13.6937752Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:13.6947533Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:53:13.6960623Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:13.6971071Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:53:13.6979733Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:13.6990037Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:53:13.6999176Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:13.7008939Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:53:13.7019376Z Entering 'third_party/cutlass' 2025-12-04T08:53:13.7029747Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:53:13.7042095Z Entering 'third_party/fbgemm' 2025-12-04T08:53:13.7061987Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:53:13.7072192Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:13.7085767Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:53:13.7105001Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:13.7126165Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:53:13.7141827Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:13.7155214Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:53:13.7168107Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:13.7179148Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:53:13.7196198Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:13.7208206Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:53:13.7219533Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:13.7237842Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:53:13.7245420Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:13.7259234Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:53:13.7274499Z Entering 'third_party/flash-attention' 2025-12-04T08:53:13.7286577Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:53:13.7296566Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:13.7307776Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:53:13.7325082Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:13.7337999Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:53:13.7358574Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:13.7372850Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:53:13.7381918Z Entering 'third_party/fmt' 2025-12-04T08:53:13.7394410Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:13.7408142Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:13.7423008Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:53:13.7434036Z Entering 'third_party/gloo' 2025-12-04T08:53:13.7447865Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:53:13.7457657Z Entering 'third_party/googletest' 2025-12-04T08:53:13.7471377Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.7481367Z Entering 'third_party/ideep' 2025-12-04T08:53:13.7494516Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:53:13.7503721Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:13.7514210Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:53:13.7528949Z Entering 'third_party/ittapi' 2025-12-04T08:53:13.7542980Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:53:13.7553944Z Entering 'third_party/kineto' 2025-12-04T08:53:13.7568123Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:53:13.7580417Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:13.7591136Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:53:13.7601346Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:13.7612788Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:53:13.7632055Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:13.7634403Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:53:13.7644333Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:13.7655054Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:13.7664519Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:13.7674945Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:53:13.7683518Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:13.7695387Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:53:13.7711741Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:13.7723816Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:53:13.7732377Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:13.7740925Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.7749533Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:13.7764506Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:53:13.7777660Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:13.7791136Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:53:13.7800889Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:13.7815453Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:13.7828181Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.7838507Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:13.7847644Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.7858154Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:13.7869799Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:13.7881906Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:53:13.7892990Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:13.7903916Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.7915054Z Entering 'third_party/kleidiai' 2025-12-04T08:53:13.7927791Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:53:13.7937724Z Entering 'third_party/mimalloc' 2025-12-04T08:53:13.7947695Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:53:13.7962205Z Entering 'third_party/nlohmann' 2025-12-04T08:53:13.7977985Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:53:13.7988464Z Entering 'third_party/onnx' 2025-12-04T08:53:13.8001150Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:53:13.8017393Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:13.8031372Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:13.8046045Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:13.8064477Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:53:13.8074985Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:13.8087468Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:13.8098505Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:13.8112386Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.8121612Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:13.8132691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:53:13.8142393Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:13.8153844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:53:13.8163347Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:13.8174177Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:53:13.8187111Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:13.8198370Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:53:13.8211001Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:13.8223521Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:13.8236120Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:13.8251037Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:13.8262729Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:13.8279697Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:13.8293592Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:13.8309096Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:53:13.8327455Z Entering 'third_party/pocketfft' 2025-12-04T08:53:13.8338278Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:53:13.8347891Z Entering 'third_party/protobuf' 2025-12-04T08:53:13.8362847Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:53:13.8373414Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:13.8383367Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:13.8392848Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:13.8405877Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.8420107Z Entering 'third_party/psimd' 2025-12-04T08:53:13.8430876Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:53:13.8443917Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:13.8456434Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:53:13.8467103Z Entering 'third_party/pybind11' 2025-12-04T08:53:13.8480585Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:13.8490779Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:13.8501580Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:53:13.8514745Z Entering 'third_party/sleef' 2025-12-04T08:53:13.8528262Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:53:13.8539861Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:13.8555320Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:53:13.8571364Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:13.8588324Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:13.8599387Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:13.8611389Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:53:13.8627027Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:13.8639272Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:53:13.8650281Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:13.8660721Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:13.8667822Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:13.8678284Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:53:13.8714451Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8735970Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8752851Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8768823Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8782726Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8799583Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8812720Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8827512Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8841020Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8853768Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8866439Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8885809Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8902162Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8923398Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8938648Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8952010Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8964522Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8979710Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.8993705Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9007965Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9021118Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9035479Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9049016Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9062462Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9077012Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9090943Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9105888Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9118654Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9135322Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9149852Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9164057Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9177996Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9190681Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9205513Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9225350Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9241132Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9259692Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9274405Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9287627Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9302010Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9316154Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9331947Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9345873Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9360964Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9378831Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9393384Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9407807Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9423815Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9438031Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9452422Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9466541Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9482219Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9494806Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9508051Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9522044Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9535179Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9550213Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9564999Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9585571Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9608457Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9622875Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9638388Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9652989Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9668060Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9686233Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9700742Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9715293Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9729485Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9743440Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9760590Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9775457Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9790964Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9805548Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9826912Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9842559Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9858659Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9873507Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9888704Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9903185Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9920809Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9935495Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T08:53:13.9957810Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:53:13.9980023Z ##[endgroup] 2025-12-04T08:53:13.9980281Z ##[group]Fetching the repository 2025-12-04T08:53:13.9983823Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2025-12-04T08:53:15.2653707Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object} 2025-12-04T08:53:15.2788956Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:15.2794788Z ##[endgroup] 2025-12-04T08:53:15.2795153Z ##[group]Determining the checkout info 2025-12-04T08:53:15.2796845Z ##[endgroup] 2025-12-04T08:53:15.2802364Z [command]/usr/bin/git sparse-checkout disable 2025-12-04T08:53:15.3112880Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2025-12-04T08:53:15.3135873Z ##[group]Checking out the ref 2025-12-04T08:53:15.3137821Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:15.3741094Z HEAD is now at ffd9b0fb4355 Resolve collective autotuning test failure on arm (#168919) 2025-12-04T08:53:15.3746910Z ##[endgroup] 2025-12-04T08:53:15.3747164Z ##[group]Setting up auth for fetching submodules 2025-12-04T08:53:15.3756301Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2025-12-04T08:53:15.3797771Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2025-12-04T08:53:15.3821497Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2025-12-04T08:53:15.3847027Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2025-12-04T08:53:15.3862614Z ##[endgroup] 2025-12-04T08:53:15.3862847Z ##[group]Fetching submodules 2025-12-04T08:53:15.3864264Z [command]/usr/bin/git submodule sync --recursive 2025-12-04T08:53:15.4079227Z Synchronizing submodule url for 'android/libs/fbjni' 2025-12-04T08:53:15.4096460Z Synchronizing submodule url for 'third_party/FP16' 2025-12-04T08:53:15.4110207Z Synchronizing submodule url for 'third_party/FXdiv' 2025-12-04T08:53:15.4134513Z Synchronizing submodule url for 'third_party/NNPACK' 2025-12-04T08:53:15.4168074Z Synchronizing submodule url for 'third_party/NVTX' 2025-12-04T08:53:15.4213606Z Synchronizing submodule url for 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:15.4230662Z Synchronizing submodule url for 'third_party/XNNPACK' 2025-12-04T08:53:15.4275690Z Synchronizing submodule url for 'third_party/aiter' 2025-12-04T08:53:15.4288211Z Synchronizing submodule url for 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:15.4308907Z Synchronizing submodule url for 'third_party/benchmark' 2025-12-04T08:53:15.4321401Z Synchronizing submodule url for 'third_party/composable_kernel' 2025-12-04T08:53:15.4340671Z Synchronizing submodule url for 'third_party/cpp-httplib' 2025-12-04T08:53:15.4353137Z Synchronizing submodule url for 'third_party/cpuinfo' 2025-12-04T08:53:15.4364325Z Synchronizing submodule url for 'third_party/cudnn_frontend' 2025-12-04T08:53:15.4379833Z Synchronizing submodule url for 'third_party/cutlass' 2025-12-04T08:53:15.4393693Z Synchronizing submodule url for 'third_party/fbgemm' 2025-12-04T08:53:15.4409169Z Synchronizing submodule url for 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:15.4422642Z Synchronizing submodule url for 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:15.4443379Z Synchronizing submodule url for 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:15.4452661Z Synchronizing submodule url for 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:15.4469514Z Synchronizing submodule url for 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:15.4480207Z Synchronizing submodule url for 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:15.4490542Z Synchronizing submodule url for 'third_party/fbgemm/external/json' 2025-12-04T08:53:15.4508402Z Synchronizing submodule url for 'third_party/flash-attention' 2025-12-04T08:53:15.4535984Z Synchronizing submodule url for 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:15.4553727Z Synchronizing submodule url for 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:15.4570606Z Synchronizing submodule url for 'third_party/flatbuffers' 2025-12-04T08:53:15.4582752Z Synchronizing submodule url for 'third_party/fmt' 2025-12-04T08:53:15.4596462Z Synchronizing submodule url for 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:15.4607418Z Synchronizing submodule url for 'third_party/gloo' 2025-12-04T08:53:15.4618514Z Synchronizing submodule url for 'third_party/googletest' 2025-12-04T08:53:15.4628791Z Synchronizing submodule url for 'third_party/ideep' 2025-12-04T08:53:15.4639053Z Synchronizing submodule url for 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:15.4659472Z Synchronizing submodule url for 'third_party/ittapi' 2025-12-04T08:53:15.4669426Z Synchronizing submodule url for 'third_party/kineto' 2025-12-04T08:53:15.4679161Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:15.4702172Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:15.4713319Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:15.4738417Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:15.4758731Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:15.4773934Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:15.4808877Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:15.4820266Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:15.4830073Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:15.4843205Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:15.4871122Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:15.4892110Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:15.4913347Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:15.4930533Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:15.4942611Z Synchronizing submodule url for 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:15.4956477Z Synchronizing submodule url for 'third_party/kleidiai' 2025-12-04T08:53:15.4966833Z Synchronizing submodule url for 'third_party/mimalloc' 2025-12-04T08:53:15.4976290Z Synchronizing submodule url for 'third_party/nlohmann' 2025-12-04T08:53:15.4985649Z Synchronizing submodule url for 'third_party/onnx' 2025-12-04T08:53:15.5012077Z Synchronizing submodule url for 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:15.5026726Z Synchronizing submodule url for 'third_party/opentelemetry-cpp' 2025-12-04T08:53:15.5041748Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:15.5050916Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:15.5061763Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:15.5073550Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:15.5086558Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:15.5100040Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:15.5110120Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:15.5121514Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:15.5134410Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:15.5148583Z Synchronizing submodule url for 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:15.5167813Z Synchronizing submodule url for 'third_party/pocketfft' 2025-12-04T08:53:15.5178397Z Synchronizing submodule url for 'third_party/protobuf' 2025-12-04T08:53:15.5193304Z Synchronizing submodule url for 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:15.5203578Z Synchronizing submodule url for 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:15.5218766Z Synchronizing submodule url for 'third_party/psimd' 2025-12-04T08:53:15.5230446Z Synchronizing submodule url for 'third_party/pthreadpool' 2025-12-04T08:53:15.5248659Z Synchronizing submodule url for 'third_party/pybind11' 2025-12-04T08:53:15.5263717Z Synchronizing submodule url for 'third_party/python-peachpy' 2025-12-04T08:53:15.5275243Z Synchronizing submodule url for 'third_party/sleef' 2025-12-04T08:53:15.5286120Z Synchronizing submodule url for 'third_party/tensorpipe' 2025-12-04T08:53:15.5298286Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:15.5316874Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:15.5327543Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:15.5347316Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:15.5358400Z Synchronizing submodule url for 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:15.5395994Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2025-12-04T08:53:15.5683873Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2025-12-04T08:53:15.5744028Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2025-12-04T08:53:15.5806334Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2025-12-04T08:53:15.5876251Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2025-12-04T08:53:15.5979875Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6' 2025-12-04T08:53:15.6036194Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1' 2025-12-04T08:53:15.6204376Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883' 2025-12-04T08:53:15.6361283Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150' 2025-12-04T08:53:15.6580630Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf' 2025-12-04T08:53:15.6657419Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f' 2025-12-04T08:53:15.6848768Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:53:15.6916088Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246' 2025-12-04T08:53:15.7004833Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc' 2025-12-04T08:53:15.7078188Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396' 2025-12-04T08:53:15.7199841Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588' 2025-12-04T08:53:15.7307106Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4' 2025-12-04T08:53:15.7364473Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea' 2025-12-04T08:53:15.7574559Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977' 2025-12-04T08:53:15.7652623Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349' 2025-12-04T08:53:15.7769330Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8' 2025-12-04T08:53:15.7840427Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:15.7889938Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691' 2025-12-04T08:53:15.7971799Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03' 2025-12-04T08:53:15.8078201Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2025-12-04T08:53:15.8265198Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2025-12-04T08:53:15.8391044Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2025-12-04T08:53:15.8511485Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757' 2025-12-04T08:53:15.8605295Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f' 2025-12-04T08:53:15.8667128Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2025-12-04T08:53:15.8739043Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341' 2025-12-04T08:53:15.8812233Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:15.8868333Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3' 2025-12-04T08:53:15.9075067Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d' 2025-12-04T08:53:15.9151249Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959' 2025-12-04T08:53:15.9223740Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943' 2025-12-04T08:53:15.9313940Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1' 2025-12-04T08:53:15.9391275Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9' 2025-12-04T08:53:15.9464656Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400' 2025-12-04T08:53:15.9546886Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2025-12-04T08:53:15.9617436Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067' 2025-12-04T08:53:15.9708077Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4' 2025-12-04T08:53:15.9764071Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446' 2025-12-04T08:53:15.9838281Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:15.9926289Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5' 2025-12-04T08:53:15.9974627Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150' 2025-12-04T08:53:16.0052956Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a' 2025-12-04T08:53:16.0150090Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159' 2025-12-04T08:53:16.0209751Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:53:16.0271068Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21' 2025-12-04T08:53:16.0321506Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723' 2025-12-04T08:53:16.0394178Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe' 2025-12-04T08:53:16.0478537Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e' 2025-12-04T08:53:16.0611396Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72' 2025-12-04T08:53:16.0804925Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83' 2025-12-04T08:53:16.0887958Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4' 2025-12-04T08:53:16.1004205Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878' 2025-12-04T08:53:16.1118644Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2' 2025-12-04T08:53:16.1181490Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1' 2025-12-04T08:53:16.1241409Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa' 2025-12-04T08:53:16.1364464Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d' 2025-12-04T08:53:16.1424414Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce' 2025-12-04T08:53:16.1478030Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5' 2025-12-04T08:53:16.1565374Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d' 2025-12-04T08:53:16.1674129Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4' 2025-12-04T08:53:16.1737112Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2025-12-04T08:53:16.1902076Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50' 2025-12-04T08:53:16.1977536Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa' 2025-12-04T08:53:16.2184796Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2025-12-04T08:53:16.2247619Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2025-12-04T08:53:16.2335311Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2025-12-04T08:53:16.2402858Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2025-12-04T08:53:16.2463149Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8' 2025-12-04T08:53:16.2537220Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8' 2025-12-04T08:53:16.2611583Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67' 2025-12-04T08:53:16.2696171Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68' 2025-12-04T08:53:16.2777572Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d' 2025-12-04T08:53:16.2845885Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2025-12-04T08:53:16.2906615Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2025-12-04T08:53:16.3072598Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b' 2025-12-04T08:53:16.3166204Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2025-12-04T08:53:16.3226896Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2025-12-04T08:53:16.3253366Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2025-12-04T08:53:16.3472450Z Entering 'android/libs/fbjni' 2025-12-04T08:53:16.3497065Z Entering 'third_party/FP16' 2025-12-04T08:53:16.3528374Z Entering 'third_party/FXdiv' 2025-12-04T08:53:16.3560423Z Entering 'third_party/NNPACK' 2025-12-04T08:53:16.3580867Z Entering 'third_party/NVTX' 2025-12-04T08:53:16.3635056Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:16.3678641Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:16.3725110Z Entering 'third_party/aiter' 2025-12-04T08:53:16.3749704Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:16.3783085Z Entering 'third_party/benchmark' 2025-12-04T08:53:16.3816369Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:16.3841980Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:16.3862608Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:16.3882857Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:16.3901739Z Entering 'third_party/cutlass' 2025-12-04T08:53:16.3931659Z Entering 'third_party/fbgemm' 2025-12-04T08:53:16.3959345Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:16.3996329Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:16.4021931Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:16.4044342Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:16.4090889Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:16.4109595Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:16.4139314Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:16.4161323Z Entering 'third_party/flash-attention' 2025-12-04T08:53:16.4182418Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:16.4207728Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:16.4244047Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:16.4264910Z Entering 'third_party/fmt' 2025-12-04T08:53:16.4283759Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:16.4302510Z Entering 'third_party/gloo' 2025-12-04T08:53:16.4334714Z Entering 'third_party/googletest' 2025-12-04T08:53:16.4357094Z Entering 'third_party/ideep' 2025-12-04T08:53:16.4381435Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:16.4406010Z Entering 'third_party/ittapi' 2025-12-04T08:53:16.4428995Z Entering 'third_party/kineto' 2025-12-04T08:53:16.4458658Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:16.4484859Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:16.4513915Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:16.4542481Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:16.4569365Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:16.4602516Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:16.4623209Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:16.4646412Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:16.4666304Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:16.4689265Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:16.4715755Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:16.4743057Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:16.4777693Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:16.4807291Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:16.4834548Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:16.4855980Z Entering 'third_party/kleidiai' 2025-12-04T08:53:16.4894261Z Entering 'third_party/mimalloc' 2025-12-04T08:53:16.4911096Z Entering 'third_party/nlohmann' 2025-12-04T08:53:16.4945589Z Entering 'third_party/onnx' 2025-12-04T08:53:16.5000496Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:16.5052270Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:16.5077819Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:16.5099516Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:16.5120565Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:16.5141372Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:16.5168944Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:16.5190693Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:16.5212051Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:16.5242680Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:16.5268150Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:16.5292914Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:16.5321727Z Entering 'third_party/pocketfft' 2025-12-04T08:53:16.5353302Z Entering 'third_party/protobuf' 2025-12-04T08:53:16.5384557Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:16.5412190Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:16.5436724Z Entering 'third_party/psimd' 2025-12-04T08:53:16.5467573Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:16.5505581Z Entering 'third_party/pybind11' 2025-12-04T08:53:16.5527956Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:16.5548892Z Entering 'third_party/sleef' 2025-12-04T08:53:16.5568913Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:16.5588859Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:16.5611230Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:16.5630838Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:16.5659642Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:16.5679577Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:16.5720393Z ##[endgroup] 2025-12-04T08:53:16.5720614Z ##[group]Persisting credentials for submodules 2025-12-04T08:53:16.5726069Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :" 2025-12-04T08:53:16.5957020Z Entering 'android/libs/fbjni' 2025-12-04T08:53:16.5983643Z url.https://github.com/.insteadof 2025-12-04T08:53:16.5983801Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6003701Z Entering 'third_party/FP16' 2025-12-04T08:53:16.6017498Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6017778Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6036421Z Entering 'third_party/FXdiv' 2025-12-04T08:53:16.6051068Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6051309Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6067950Z Entering 'third_party/NNPACK' 2025-12-04T08:53:16.6087084Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6087313Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6112607Z Entering 'third_party/NVTX' 2025-12-04T08:53:16.6128530Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6128766Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6151380Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:16.6169052Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6169273Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6194532Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:16.6208274Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6208497Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6233961Z Entering 'third_party/aiter' 2025-12-04T08:53:16.6252134Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6252349Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6275676Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:16.6295589Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6295792Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6318510Z Entering 'third_party/benchmark' 2025-12-04T08:53:16.6332106Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6332302Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6351473Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:16.6374297Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6374472Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6404204Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:16.6419010Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6419200Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6435730Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:16.6449988Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6450167Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6471636Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:16.6485084Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6485258Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6508517Z Entering 'third_party/cutlass' 2025-12-04T08:53:16.6522380Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6522557Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6560135Z Entering 'third_party/fbgemm' 2025-12-04T08:53:16.6576221Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6576392Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6597603Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:16.6612889Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6613060Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6630754Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:16.6645709Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6645887Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6664552Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:16.6678340Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6678517Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6696221Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:16.6709482Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6709657Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6731264Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:16.6745300Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6745464Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6781384Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:16.6802770Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6803135Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6830571Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:16.6844767Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6845203Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6871585Z Entering 'third_party/flash-attention' 2025-12-04T08:53:16.6884391Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6884564Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6906810Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:16.6927376Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6927537Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6947692Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:16.6962331Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6962498Z url.https://github.com/.insteadof 2025-12-04T08:53:16.6986750Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:16.7005455Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7005628Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7025196Z Entering 'third_party/fmt' 2025-12-04T08:53:16.7038275Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7038445Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7058978Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:16.7075557Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7075730Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7094781Z Entering 'third_party/gloo' 2025-12-04T08:53:16.7114523Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7114693Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7131079Z Entering 'third_party/googletest' 2025-12-04T08:53:16.7143215Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7143537Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7163517Z Entering 'third_party/ideep' 2025-12-04T08:53:16.7179565Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7179738Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7208655Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:16.7235437Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7235718Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7261524Z Entering 'third_party/ittapi' 2025-12-04T08:53:16.7275511Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7275736Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7300712Z Entering 'third_party/kineto' 2025-12-04T08:53:16.7317255Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7317466Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7337218Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:16.7370569Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7370777Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7396994Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:16.7425031Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7425218Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7452036Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:16.7469044Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7469231Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7489190Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:16.7502342Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7502519Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7533118Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:16.7557292Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7557462Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7575802Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:16.7595752Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7595976Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7617672Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:16.7631524Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7631688Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7660511Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:16.7696710Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7696863Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7716618Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:16.7727815Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7727976Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7749283Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:16.7763227Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7763530Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7782872Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:16.7795981Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7796179Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7816183Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:16.7840471Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7840619Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7872034Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:16.7886343Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7886500Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7908873Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:16.7927435Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7927595Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7945843Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:16.7961431Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7961581Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7979857Z Entering 'third_party/kleidiai' 2025-12-04T08:53:16.7993827Z url.https://github.com/.insteadof 2025-12-04T08:53:16.7993987Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8011609Z Entering 'third_party/mimalloc' 2025-12-04T08:53:16.8024055Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8024189Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8041761Z Entering 'third_party/nlohmann' 2025-12-04T08:53:16.8055490Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8055625Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8079752Z Entering 'third_party/onnx' 2025-12-04T08:53:16.8107514Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8139412Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8139579Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:16.8166643Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8166787Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8193530Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:16.8210576Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8210841Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8235197Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:16.8250759Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8250932Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8268161Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:16.8290368Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8290510Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8312709Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:16.8329140Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8329349Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8346482Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:16.8360147Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8360335Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8379004Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:16.8398694Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8398840Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8442852Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:16.8463859Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8463996Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8493875Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:16.8507445Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8507574Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8529456Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:16.8543528Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8543656Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8562518Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:16.8577556Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8577684Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8605124Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:16.8632232Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8632360Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8680659Z Entering 'third_party/pocketfft' 2025-12-04T08:53:16.8698475Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8698616Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8721501Z Entering 'third_party/protobuf' 2025-12-04T08:53:16.8740375Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8740506Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8760019Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:16.8779263Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8779403Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8799868Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:16.8814615Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8814743Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8833988Z Entering 'third_party/psimd' 2025-12-04T08:53:16.8849071Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8852400Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8879936Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:16.8901139Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8901340Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8918111Z Entering 'third_party/pybind11' 2025-12-04T08:53:16.8931152Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8931280Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8962096Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:16.8983070Z url.https://github.com/.insteadof 2025-12-04T08:53:16.8983214Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9001619Z Entering 'third_party/sleef' 2025-12-04T08:53:16.9016962Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9040066Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9040179Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:16.9055850Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9056038Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9074625Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:16.9089027Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9089183Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9122132Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:16.9153813Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9153972Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9183983Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:16.9206229Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9206384Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9223661Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:16.9237556Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9237727Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9264078Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:16.9280561Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9280723Z url.https://github.com/.insteadof 2025-12-04T08:53:16.9317818Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url" 2025-12-04T08:53:16.9519725Z Entering 'android/libs/fbjni' 2025-12-04T08:53:16.9542811Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T08:53:16.9552313Z Entering 'third_party/FP16' 2025-12-04T08:53:16.9580600Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T08:53:16.9590856Z Entering 'third_party/FXdiv' 2025-12-04T08:53:16.9613559Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T08:53:16.9637405Z Entering 'third_party/NNPACK' 2025-12-04T08:53:16.9672999Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T08:53:16.9683777Z Entering 'third_party/NVTX' 2025-12-04T08:53:16.9705092Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T08:53:16.9715366Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:16.9735910Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T08:53:16.9745356Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:16.9769087Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T08:53:16.9784891Z Entering 'third_party/aiter' 2025-12-04T08:53:16.9804117Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T08:53:16.9825554Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:16.9848823Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T08:53:16.9863976Z Entering 'third_party/benchmark' 2025-12-04T08:53:16.9888998Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:16.9909512Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:16.9932685Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T08:53:16.9947340Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:16.9968386Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T08:53:16.9978707Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:17.0010652Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T08:53:17.0022351Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:17.0043411Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T08:53:17.0059486Z Entering 'third_party/cutlass' 2025-12-04T08:53:17.0084492Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T08:53:17.0112130Z Entering 'third_party/fbgemm' 2025-12-04T08:53:17.0143133Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T08:53:17.0169137Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:17.0206115Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T08:53:17.0221672Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:17.0250011Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T08:53:17.0266745Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:17.0302529Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T08:53:17.0318735Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:17.0342418Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T08:53:17.0364105Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:17.0398936Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T08:53:17.0411622Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:17.0435969Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T08:53:17.0449823Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:17.0474338Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T08:53:17.0498803Z Entering 'third_party/flash-attention' 2025-12-04T08:53:17.0526871Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T08:53:17.0538827Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:17.0564430Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T08:53:17.0580937Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:17.0619676Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T08:53:17.0647684Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:17.0673988Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T08:53:17.0688251Z Entering 'third_party/fmt' 2025-12-04T08:53:17.0716264Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:17.0727976Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:17.0755918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T08:53:17.0767281Z Entering 'third_party/gloo' 2025-12-04T08:53:17.0789944Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T08:53:17.0799438Z Entering 'third_party/googletest' 2025-12-04T08:53:17.0821876Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.0831861Z Entering 'third_party/ideep' 2025-12-04T08:53:17.0852031Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T08:53:17.0862152Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:17.0882537Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T08:53:17.0897852Z Entering 'third_party/ittapi' 2025-12-04T08:53:17.0918558Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T08:53:17.0928956Z Entering 'third_party/kineto' 2025-12-04T08:53:17.0952918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T08:53:17.0963625Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:17.0987109Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T08:53:17.0996325Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:17.1018633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T08:53:17.1029696Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:17.1069909Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T08:53:17.1084233Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:17.1106844Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T08:53:17.1122237Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:17.1155346Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T08:53:17.1166266Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:17.1195418Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T08:53:17.1209480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:17.1236373Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T08:53:17.1247336Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:17.1273127Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.1284068Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:17.1313571Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T08:53:17.1326406Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:17.1354609Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T08:53:17.1365909Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:17.1386216Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:17.1395567Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.1435022Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:17.1449031Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.1470342Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:17.1488851Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:17.1525152Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T08:53:17.1538270Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:17.1573248Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.1588842Z Entering 'third_party/kleidiai' 2025-12-04T08:53:17.1619529Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T08:53:17.1633330Z Entering 'third_party/mimalloc' 2025-12-04T08:53:17.1655978Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T08:53:17.1665729Z Entering 'third_party/nlohmann' 2025-12-04T08:53:17.1687423Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T08:53:17.1701290Z Entering 'third_party/onnx' 2025-12-04T08:53:17.1721494Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T08:53:17.1738083Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:17.1758165Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:17.1770892Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:17.1791602Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T08:53:17.1802528Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:17.1824304Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:17.1840336Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:17.1861381Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.1870761Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:17.1897053Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T08:53:17.1915209Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:17.1936691Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T08:53:17.1950535Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:17.1969886Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T08:53:17.1990045Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:17.2018316Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T08:53:17.2028805Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:17.2051322Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T08:53:17.2062608Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.2091936Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T08:53:17.2106544Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.2132758Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T08:53:17.2146302Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:17.2176463Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T08:53:17.2201409Z Entering 'third_party/pocketfft' 2025-12-04T08:53:17.2240119Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T08:53:17.2250480Z Entering 'third_party/protobuf' 2025-12-04T08:53:17.2274468Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T08:53:17.2286312Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:17.2313152Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T08:53:17.2327429Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:17.2355220Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.2368615Z Entering 'third_party/psimd' 2025-12-04T08:53:17.2394218Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T08:53:17.2405251Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:17.2431580Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T08:53:17.2441972Z Entering 'third_party/pybind11' 2025-12-04T08:53:17.2461750Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:17.2472547Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:17.2501942Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T08:53:17.2518996Z Entering 'third_party/sleef' 2025-12-04T08:53:17.2547106Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T08:53:17.2561719Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:17.2585880Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T08:53:17.2595765Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:17.2618687Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T08:53:17.2633140Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:17.2661400Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T08:53:17.2671256Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:17.2691285Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T08:53:17.2707967Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:17.2735848Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T08:53:17.2748939Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:17.2776040Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T08:53:17.3024440Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2025-12-04T08:53:17.3239820Z Entering 'android/libs/fbjni' 2025-12-04T08:53:17.3265465Z Entering 'third_party/FP16' 2025-12-04T08:53:17.3288749Z Entering 'third_party/FXdiv' 2025-12-04T08:53:17.3310003Z Entering 'third_party/NNPACK' 2025-12-04T08:53:17.3331092Z Entering 'third_party/NVTX' 2025-12-04T08:53:17.3359613Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:17.3396586Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:17.3427298Z Entering 'third_party/aiter' 2025-12-04T08:53:17.3462624Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:17.3495886Z Entering 'third_party/benchmark' 2025-12-04T08:53:17.3522433Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:17.3557834Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:17.3587832Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:17.3615845Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:17.3643728Z Entering 'third_party/cutlass' 2025-12-04T08:53:17.3672440Z Entering 'third_party/fbgemm' 2025-12-04T08:53:17.3701722Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:17.3726474Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:17.3757434Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:17.3780000Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:17.3803916Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:17.3824471Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:17.3846970Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:17.3891303Z Entering 'third_party/flash-attention' 2025-12-04T08:53:17.3940322Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:17.3974508Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:17.4010982Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:17.4034854Z Entering 'third_party/fmt' 2025-12-04T08:53:17.4055077Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:17.4075279Z Entering 'third_party/gloo' 2025-12-04T08:53:17.4117541Z Entering 'third_party/googletest' 2025-12-04T08:53:17.4139655Z Entering 'third_party/ideep' 2025-12-04T08:53:17.4171556Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:17.4207380Z Entering 'third_party/ittapi' 2025-12-04T08:53:17.4228600Z Entering 'third_party/kineto' 2025-12-04T08:53:17.4251753Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:17.4272824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:17.4294617Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:17.4317661Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:17.4337308Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:17.4359517Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:17.4390562Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:17.4424537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:17.4445496Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:17.4467549Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:17.4499559Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:17.4525357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.4552533Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.4578512Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:17.4603908Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:17.4626591Z Entering 'third_party/kleidiai' 2025-12-04T08:53:17.4650526Z Entering 'third_party/mimalloc' 2025-12-04T08:53:17.4671905Z Entering 'third_party/nlohmann' 2025-12-04T08:53:17.4693235Z Entering 'third_party/onnx' 2025-12-04T08:53:17.4722575Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:17.4745589Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:17.4769190Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:17.4788197Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:17.4808920Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:17.4828934Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:17.4855073Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:17.4884852Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:17.4916181Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:17.4935355Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.4959626Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.4986815Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:17.5020423Z Entering 'third_party/pocketfft' 2025-12-04T08:53:17.5039904Z Entering 'third_party/protobuf' 2025-12-04T08:53:17.5065392Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:17.5086806Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:17.5109083Z Entering 'third_party/psimd' 2025-12-04T08:53:17.5141096Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:17.5167348Z Entering 'third_party/pybind11' 2025-12-04T08:53:17.5191163Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:17.5218533Z Entering 'third_party/sleef' 2025-12-04T08:53:17.5244326Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:17.5266072Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:17.5285572Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:17.5309295Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:17.5341215Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:17.5372541Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:17.5430441Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2025-12-04T08:53:17.5626752Z Entering 'android/libs/fbjni' 2025-12-04T08:53:17.5663584Z Entering 'third_party/FP16' 2025-12-04T08:53:17.5687509Z Entering 'third_party/FXdiv' 2025-12-04T08:53:17.5713174Z Entering 'third_party/NNPACK' 2025-12-04T08:53:17.5738208Z Entering 'third_party/NVTX' 2025-12-04T08:53:17.5767062Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T08:53:17.5799144Z Entering 'third_party/XNNPACK' 2025-12-04T08:53:17.5831470Z Entering 'third_party/aiter' 2025-12-04T08:53:17.5853220Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T08:53:17.5880005Z Entering 'third_party/benchmark' 2025-12-04T08:53:17.5901146Z Entering 'third_party/composable_kernel' 2025-12-04T08:53:17.5926326Z Entering 'third_party/cpp-httplib' 2025-12-04T08:53:17.5947768Z Entering 'third_party/cpuinfo' 2025-12-04T08:53:17.5980557Z Entering 'third_party/cudnn_frontend' 2025-12-04T08:53:17.6006910Z Entering 'third_party/cutlass' 2025-12-04T08:53:17.6038291Z Entering 'third_party/fbgemm' 2025-12-04T08:53:17.6064277Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T08:53:17.6095247Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T08:53:17.6131010Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T08:53:17.6152647Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T08:53:17.6182035Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T08:53:17.6206744Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T08:53:17.6243095Z Entering 'third_party/fbgemm/external/json' 2025-12-04T08:53:17.6267914Z Entering 'third_party/flash-attention' 2025-12-04T08:53:17.6307291Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T08:53:17.6330026Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T08:53:17.6357990Z Entering 'third_party/flatbuffers' 2025-12-04T08:53:17.6387323Z Entering 'third_party/fmt' 2025-12-04T08:53:17.6411890Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T08:53:17.6444000Z Entering 'third_party/gloo' 2025-12-04T08:53:17.6476171Z Entering 'third_party/googletest' 2025-12-04T08:53:17.6508266Z Entering 'third_party/ideep' 2025-12-04T08:53:17.6532363Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T08:53:17.6566013Z Entering 'third_party/ittapi' 2025-12-04T08:53:17.6589996Z Entering 'third_party/kineto' 2025-12-04T08:53:17.6616039Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T08:53:17.6633133Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T08:53:17.6651553Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T08:53:17.6678527Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T08:53:17.6700484Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T08:53:17.6721748Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T08:53:17.6740726Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T08:53:17.6759110Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T08:53:17.6779691Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T08:53:17.6798148Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T08:53:17.6828248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T08:53:17.6869158Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.6900247Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.6937371Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T08:53:17.6967507Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T08:53:17.6992125Z Entering 'third_party/kleidiai' 2025-12-04T08:53:17.7019913Z Entering 'third_party/mimalloc' 2025-12-04T08:53:17.7046604Z Entering 'third_party/nlohmann' 2025-12-04T08:53:17.7073604Z Entering 'third_party/onnx' 2025-12-04T08:53:17.7104103Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T08:53:17.7129306Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T08:53:17.7152229Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T08:53:17.7173037Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T08:53:17.7195016Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T08:53:17.7222135Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T08:53:17.7264317Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T08:53:17.7300527Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T08:53:17.7327078Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T08:53:17.7361346Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T08:53:17.7388584Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T08:53:17.7415951Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T08:53:17.7451599Z Entering 'third_party/pocketfft' 2025-12-04T08:53:17.7493461Z Entering 'third_party/protobuf' 2025-12-04T08:53:17.7519564Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T08:53:17.7542788Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T08:53:17.7580674Z Entering 'third_party/psimd' 2025-12-04T08:53:17.7605669Z Entering 'third_party/pthreadpool' 2025-12-04T08:53:17.7637842Z Entering 'third_party/pybind11' 2025-12-04T08:53:17.7662010Z Entering 'third_party/python-peachpy' 2025-12-04T08:53:17.7686613Z Entering 'third_party/sleef' 2025-12-04T08:53:17.7709187Z Entering 'third_party/tensorpipe' 2025-12-04T08:53:17.7735413Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T08:53:17.7762790Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T08:53:17.7783704Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T08:53:17.7805086Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T08:53:17.7822302Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T08:53:17.7869088Z ##[endgroup] 2025-12-04T08:53:17.8053376Z [command]/usr/bin/git log -1 --format=%H 2025-12-04T08:53:17.8127870Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:17.8248790Z Prepare all required actions 2025-12-04T08:53:17.8249063Z Getting action download info 2025-12-04T08:53:18.0994415Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076) 2025-12-04T08:53:18.8771487Z ##[group]Run ./.github/actions/setup-rocm 2025-12-04T08:53:18.8771631Z env: 2025-12-04T08:53:18.8771718Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.8771823Z ##[endgroup] 2025-12-04T08:53:18.8785925Z ##[group]Run dpkg -l | grep -E " rocm" 2025-12-04T08:53:18.8786088Z dpkg -l | grep -E " rocm" 2025-12-04T08:53:18.8790586Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:18.8790731Z env: 2025-12-04T08:53:18.8790816Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.8790919Z ##[endgroup] 2025-12-04T08:53:18.8852619Z ii rocm-cmake 0.14.0.60401-83~22.04 amd64 rocm-cmake built using CMake 2025-12-04T08:53:18.8852912Z ii rocm-core 6.4.1.60401-83~22.04 amd64 ROCm Runtime software stack 2025-12-04T08:53:18.8853130Z ii rocm-dbgapi 0.77.2.60401-83~22.04 amd64 Library to provide AMD GPU debugger API 2025-12-04T08:53:18.8853415Z ii rocm-debug-agent 2.0.4.60401-83~22.04 amd64 Radeon Open Compute Debug Agent (ROCdebug-agent) 2025-12-04T08:53:18.8853675Z ii rocm-dev 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-12-04T08:53:18.8853912Z ii rocm-device-libs 1.0.0.60401-83~22.04 amd64 Radeon Open Compute - device libraries 2025-12-04T08:53:18.8854120Z ii rocm-gdb 15.2.60401-83~22.04 amd64 ROCgdb 2025-12-04T08:53:18.8854311Z ii rocm-llvm 19.0.0.25184.60401-83~22.04 amd64 ROCm core compiler 2025-12-04T08:53:18.8854521Z ii rocm-opencl 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-12-04T08:53:18.8854955Z ii rocm-opencl-dev 2.0.0.60401-83~22.04 amd64 clr built using CMake 2025-12-04T08:53:18.8855223Z ii rocm-smi-lib 7.5.0.60401-83~22.04 amd64 AMD System Management libraries 2025-12-04T08:53:18.8855619Z ii rocm-utils 6.4.1.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime software stack 2025-12-04T08:53:18.8855859Z ii rocminfo 1.0.0.60401-83~22.04 amd64 Radeon Open Compute (ROCm) Runtime rocminfo tool 2025-12-04T08:53:18.8869919Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:53:18.8870218Z # ignore expansion of "docker ps -q" since it could be empty 2025-12-04T08:53:18.8870383Z # shellcheck disable=SC2046 2025-12-04T08:53:18.8870528Z docker stop $(docker ps -q) || true 2025-12-04T08:53:18.8870665Z # Prune all stopped containers. 2025-12-04T08:53:18.8870798Z docker container prune -f 2025-12-04T08:53:18.8875256Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:18.8875407Z env: 2025-12-04T08:53:18.8875497Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.8875622Z ##[endgroup] 2025-12-04T08:53:18.9144843Z docker: 'docker stop' requires at least 1 argument 2025-12-04T08:53:18.9145099Z 2025-12-04T08:53:18.9145238Z Usage: docker stop [OPTIONS] CONTAINER [CONTAINER...] 2025-12-04T08:53:18.9145426Z 2025-12-04T08:53:18.9145551Z See 'docker stop --help' for more information 2025-12-04T08:53:18.9247316Z Total reclaimed space: 0B 2025-12-04T08:53:18.9274649Z ##[group]Run cat /etc/os-release || true 2025-12-04T08:53:18.9274896Z cat /etc/os-release || true 2025-12-04T08:53:18.9275086Z cat /etc/apt/sources.list.d/rocm.list || true 2025-12-04T08:53:18.9275754Z cat /opt/rocm/.info/version || true 2025-12-04T08:53:18.9275919Z whoami 2025-12-04T08:53:18.9281442Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:18.9281598Z env: 2025-12-04T08:53:18.9281699Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.9281814Z ##[endgroup] 2025-12-04T08:53:18.9315673Z PRETTY_NAME="Ubuntu 22.04.5 LTS" 2025-12-04T08:53:18.9315822Z NAME="Ubuntu" 2025-12-04T08:53:18.9315915Z VERSION_ID="22.04" 2025-12-04T08:53:18.9316017Z VERSION="22.04.5 LTS (Jammy Jellyfish)" 2025-12-04T08:53:18.9316142Z VERSION_CODENAME=jammy 2025-12-04T08:53:18.9316242Z ID=ubuntu 2025-12-04T08:53:18.9316322Z ID_LIKE=debian 2025-12-04T08:53:18.9316446Z HOME_URL="https://www.ubuntu.com/" 2025-12-04T08:53:18.9316575Z SUPPORT_URL="https://help.ubuntu.com/" 2025-12-04T08:53:18.9316730Z BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" 2025-12-04T08:53:18.9316942Z PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" 2025-12-04T08:53:18.9317126Z UBUNTU_CODENAME=jammy 2025-12-04T08:53:18.9323797Z deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/6.4.1 jammy main 2025-12-04T08:53:18.9333604Z 6.4.1-83 2025-12-04T08:53:18.9365147Z runner 2025-12-04T08:53:18.9382282Z ##[group]Run dpkg -l | grep -E " amdgpu" 2025-12-04T08:53:18.9382487Z dpkg -l | grep -E " amdgpu" 2025-12-04T08:53:18.9386663Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:18.9386808Z env: 2025-12-04T08:53:18.9386894Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.9386996Z ##[endgroup] 2025-12-04T08:53:18.9470926Z ii amdgpu-core 1:6.4.60401-2164967.22.04 all Core meta package for unified amdgpu driver. 2025-12-04T08:53:18.9471186Z ii amdgpu-install 6.4.60401-2164967.22.04 all AMDGPU driver repository and installer 2025-12-04T08:53:18.9495292Z ##[group]Run rocm-smi 2025-12-04T08:53:18.9495471Z rocm-smi 2025-12-04T08:53:18.9500385Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:18.9500578Z env: 2025-12-04T08:53:18.9500687Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:18.9500812Z ##[endgroup] 2025-12-04T08:53:19.0166996Z 2025-12-04T08:53:19.0167014Z 2025-12-04T08:53:19.0167787Z ============================================ ROCm System Management Interface ============================================ 2025-12-04T08:53:19.0171716Z ====================================================== Concise Info ====================================================== 2025-12-04T08:53:19.0171978Z Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% 2025-12-04T08:53:19.0172505Z  (DID, GUID) (Junction) (Socket) (Mem, Compute, ID)  2025-12-04T08:53:19.0172722Z ========================================================================================================================== 2025-12-04T08:53:19.0173368Z 0 4 0x74a5, 61326 27.0°C 117.0W NPS1, SPX, 0 N/A 900Mhz 0% manual 1000.0W 0% 0% 2025-12-04T08:53:19.0173565Z ========================================================================================================================== 2025-12-04T08:53:19.0173726Z ================================================== End of ROCm SMI Log =================================================== 2025-12-04T08:53:19.0249745Z ##[group]Run rocminfo 2025-12-04T08:53:19.0249890Z rocminfo 2025-12-04T08:53:19.0253983Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.0254124Z env: 2025-12-04T08:53:19.0254212Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.0254315Z ##[endgroup] 2025-12-04T08:53:19.0833359Z ROCk module version 6.12.12 is loaded 2025-12-04T08:53:19.0833680Z ===================== 2025-12-04T08:53:19.0833941Z HSA System Attributes 2025-12-04T08:53:19.0834069Z ===================== 2025-12-04T08:53:19.0834680Z Runtime Version: 1.15 2025-12-04T08:53:19.0834822Z Runtime Ext Version: 1.7 2025-12-04T08:53:19.0834966Z System Timestamp Freq.: 1000.000000MHz 2025-12-04T08:53:19.0835200Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-12-04T08:53:19.0835444Z Machine Model: LARGE 2025-12-04T08:53:19.0835666Z System Endianness: LITTLE 2025-12-04T08:53:19.0835837Z Mwaitx: DISABLED 2025-12-04T08:53:19.0835993Z XNACK enabled: NO 2025-12-04T08:53:19.0836138Z DMAbuf Support: YES 2025-12-04T08:53:19.0836280Z VMM Support: YES 2025-12-04T08:53:19.0836361Z 2025-12-04T08:53:19.0836405Z ========== 2025-12-04T08:53:19.0836535Z HSA Agents 2025-12-04T08:53:19.0836647Z ========== 2025-12-04T08:53:19.0836765Z ******* 2025-12-04T08:53:19.0836889Z Agent 1 2025-12-04T08:53:19.0837003Z ******* 2025-12-04T08:53:19.0837154Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T08:53:19.0837363Z Uuid: CPU-XX 2025-12-04T08:53:19.0837558Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T08:53:19.0837763Z Vendor Name: CPU 2025-12-04T08:53:19.0837944Z Feature: None specified 2025-12-04T08:53:19.0838133Z Profile: FULL_PROFILE 2025-12-04T08:53:19.0838330Z Float Round Mode: NEAR 2025-12-04T08:53:19.0838523Z Max Queue Number: 0(0x0) 2025-12-04T08:53:19.0838708Z Queue Min Size: 0(0x0) 2025-12-04T08:53:19.0838901Z Queue Max Size: 0(0x0) 2025-12-04T08:53:19.0839096Z Queue Type: MULTI 2025-12-04T08:53:19.0839268Z Node: 0 2025-12-04T08:53:19.0839450Z Device Type: CPU 2025-12-04T08:53:19.0839618Z Cache Info: 2025-12-04T08:53:19.0839769Z L1: 49152(0xc000) KB 2025-12-04T08:53:19.0840080Z Chip ID: 0(0x0) 2025-12-04T08:53:19.0840268Z ASIC Revision: 0(0x0) 2025-12-04T08:53:19.0840454Z Cacheline Size: 64(0x40) 2025-12-04T08:53:19.0840643Z Max Clock Freq. (MHz): 3300 2025-12-04T08:53:19.0840825Z BDFID: 0 2025-12-04T08:53:19.0840999Z Internal Node ID: 0 2025-12-04T08:53:19.0841193Z Compute Unit: 64 2025-12-04T08:53:19.0841376Z SIMDs per CU: 0 2025-12-04T08:53:19.0841561Z Shader Engines: 0 2025-12-04T08:53:19.0841756Z Shader Arrs. per Eng.: 0 2025-12-04T08:53:19.0841949Z WatchPts on Addr. Ranges:1 2025-12-04T08:53:19.0842121Z Memory Properties: 2025-12-04T08:53:19.0842256Z Features: None 2025-12-04T08:53:19.0842384Z Pool Info: 2025-12-04T08:53:19.0842529Z Pool 1 2025-12-04T08:53:19.0842664Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T08:53:19.0842818Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T08:53:19.0842969Z Allocatable: TRUE 2025-12-04T08:53:19.0843132Z Alloc Granule: 4KB 2025-12-04T08:53:19.0843384Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0843550Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0843707Z Accessible by all: TRUE 2025-12-04T08:53:19.0843847Z Pool 2 2025-12-04T08:53:19.0843981Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T08:53:19.0844132Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T08:53:19.0844283Z Allocatable: TRUE 2025-12-04T08:53:19.0844442Z Alloc Granule: 4KB 2025-12-04T08:53:19.0844601Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0844766Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0844926Z Accessible by all: TRUE 2025-12-04T08:53:19.0845065Z Pool 3 2025-12-04T08:53:19.0845198Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T08:53:19.0845342Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T08:53:19.0845493Z Allocatable: TRUE 2025-12-04T08:53:19.0845654Z Alloc Granule: 4KB 2025-12-04T08:53:19.0845813Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0845977Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0846134Z Accessible by all: TRUE 2025-12-04T08:53:19.0846269Z Pool 4 2025-12-04T08:53:19.0846397Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T08:53:19.0846541Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T08:53:19.0846696Z Allocatable: TRUE 2025-12-04T08:53:19.0846855Z Alloc Granule: 4KB 2025-12-04T08:53:19.0847012Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0847176Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0847336Z Accessible by all: TRUE 2025-12-04T08:53:19.0847509Z ISA Info: 2025-12-04T08:53:19.0847613Z ******* 2025-12-04T08:53:19.0847708Z Agent 2 2025-12-04T08:53:19.0847806Z ******* 2025-12-04T08:53:19.0847923Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T08:53:19.0848065Z Uuid: CPU-XX 2025-12-04T08:53:19.0848218Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T08:53:19.0848380Z Vendor Name: CPU 2025-12-04T08:53:19.0848532Z Feature: None specified 2025-12-04T08:53:19.0848682Z Profile: FULL_PROFILE 2025-12-04T08:53:19.0848832Z Float Round Mode: NEAR 2025-12-04T08:53:19.0848984Z Max Queue Number: 0(0x0) 2025-12-04T08:53:19.0849144Z Queue Min Size: 0(0x0) 2025-12-04T08:53:19.0849289Z Queue Max Size: 0(0x0) 2025-12-04T08:53:19.0849438Z Queue Type: MULTI 2025-12-04T08:53:19.0849580Z Node: 1 2025-12-04T08:53:19.0849718Z Device Type: CPU 2025-12-04T08:53:19.0849854Z Cache Info: 2025-12-04T08:53:19.0849965Z L1: 49152(0xc000) KB 2025-12-04T08:53:19.0850135Z Chip ID: 0(0x0) 2025-12-04T08:53:19.0850282Z ASIC Revision: 0(0x0) 2025-12-04T08:53:19.0850448Z Cacheline Size: 64(0x40) 2025-12-04T08:53:19.0850599Z Max Clock Freq. (MHz): 3300 2025-12-04T08:53:19.0850741Z BDFID: 0 2025-12-04T08:53:19.0850885Z Internal Node ID: 1 2025-12-04T08:53:19.0851034Z Compute Unit: 64 2025-12-04T08:53:19.0851177Z SIMDs per CU: 0 2025-12-04T08:53:19.0851320Z Shader Engines: 0 2025-12-04T08:53:19.0851474Z Shader Arrs. per Eng.: 0 2025-12-04T08:53:19.0851626Z WatchPts on Addr. Ranges:1 2025-12-04T08:53:19.0851763Z Memory Properties: 2025-12-04T08:53:19.0851875Z Features: None 2025-12-04T08:53:19.0851978Z Pool Info: 2025-12-04T08:53:19.0852077Z Pool 1 2025-12-04T08:53:19.0852200Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T08:53:19.0852348Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T08:53:19.0852497Z Allocatable: TRUE 2025-12-04T08:53:19.0852646Z Alloc Granule: 4KB 2025-12-04T08:53:19.0852806Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0852966Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0853117Z Accessible by all: TRUE 2025-12-04T08:53:19.0853317Z Pool 2 2025-12-04T08:53:19.0853441Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T08:53:19.0853583Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T08:53:19.0853722Z Allocatable: TRUE 2025-12-04T08:53:19.0853868Z Alloc Granule: 4KB 2025-12-04T08:53:19.0854022Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0854219Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0854368Z Accessible by all: TRUE 2025-12-04T08:53:19.0854498Z Pool 3 2025-12-04T08:53:19.0854621Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T08:53:19.0854759Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T08:53:19.0854899Z Allocatable: TRUE 2025-12-04T08:53:19.0855047Z Alloc Granule: 4KB 2025-12-04T08:53:19.0855201Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0855357Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0855507Z Accessible by all: TRUE 2025-12-04T08:53:19.0855637Z Pool 4 2025-12-04T08:53:19.0855759Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T08:53:19.0855900Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T08:53:19.0856041Z Allocatable: TRUE 2025-12-04T08:53:19.0856191Z Alloc Granule: 4KB 2025-12-04T08:53:19.0856344Z Alloc Recommended Granule:4KB 2025-12-04T08:53:19.0856500Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0856648Z Accessible by all: TRUE 2025-12-04T08:53:19.0856779Z ISA Info: 2025-12-04T08:53:19.0856935Z ******* 2025-12-04T08:53:19.0857028Z Agent 3 2025-12-04T08:53:19.0857119Z ******* 2025-12-04T08:53:19.0857223Z Name: gfx942 2025-12-04T08:53:19.0857359Z Uuid: GPU-3e378e7265318491 2025-12-04T08:53:19.0857508Z Marketing Name: AMD Instinct MI325X 2025-12-04T08:53:19.0857655Z Vendor Name: AMD 2025-12-04T08:53:19.0857803Z Feature: KERNEL_DISPATCH 2025-12-04T08:53:19.0857947Z Profile: BASE_PROFILE 2025-12-04T08:53:19.0858090Z Float Round Mode: NEAR 2025-12-04T08:53:19.0858237Z Max Queue Number: 128(0x80) 2025-12-04T08:53:19.0858382Z Queue Min Size: 64(0x40) 2025-12-04T08:53:19.0858527Z Queue Max Size: 131072(0x20000) 2025-12-04T08:53:19.0858669Z Queue Type: MULTI 2025-12-04T08:53:19.0858802Z Node: 2 2025-12-04T08:53:19.0858938Z Device Type: GPU 2025-12-04T08:53:19.0859071Z Cache Info: 2025-12-04T08:53:19.0859178Z L1: 32(0x20) KB 2025-12-04T08:53:19.0859305Z L2: 4096(0x1000) KB 2025-12-04T08:53:19.0859428Z L3: 262144(0x40000) KB 2025-12-04T08:53:19.0859556Z Chip ID: 29861(0x74a5) 2025-12-04T08:53:19.0859695Z ASIC Revision: 1(0x1) 2025-12-04T08:53:19.0859840Z Cacheline Size: 128(0x80) 2025-12-04T08:53:19.0859990Z Max Clock Freq. (MHz): 2100 2025-12-04T08:53:19.0860127Z BDFID: 25856 2025-12-04T08:53:19.0860266Z Internal Node ID: 2 2025-12-04T08:53:19.0860414Z Compute Unit: 304 2025-12-04T08:53:19.0860555Z SIMDs per CU: 4 2025-12-04T08:53:19.0860731Z Shader Engines: 32 2025-12-04T08:53:19.0860880Z Shader Arrs. per Eng.: 1 2025-12-04T08:53:19.0861030Z WatchPts on Addr. Ranges:4 2025-12-04T08:53:19.0861186Z Coherent Host Access: FALSE 2025-12-04T08:53:19.0861323Z Memory Properties: 2025-12-04T08:53:19.0861433Z Features: KERNEL_DISPATCH 2025-12-04T08:53:19.0861573Z Fast F16 Operation: TRUE 2025-12-04T08:53:19.0861725Z Wavefront Size: 64(0x40) 2025-12-04T08:53:19.0861872Z Workgroup Max Size: 1024(0x400) 2025-12-04T08:53:19.0862011Z Workgroup Max Size per Dimension: 2025-12-04T08:53:19.0862129Z x 1024(0x400) 2025-12-04T08:53:19.0862253Z y 1024(0x400) 2025-12-04T08:53:19.0862379Z z 1024(0x400) 2025-12-04T08:53:19.0862511Z Max Waves Per CU: 32(0x20) 2025-12-04T08:53:19.0862660Z Max Work-item Per CU: 2048(0x800) 2025-12-04T08:53:19.0862810Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T08:53:19.0862938Z Grid Max Size per Dimension: 2025-12-04T08:53:19.0863054Z x 4294967295(0xffffffff) 2025-12-04T08:53:19.0863175Z y 4294967295(0xffffffff) 2025-12-04T08:53:19.0863378Z z 4294967295(0xffffffff) 2025-12-04T08:53:19.0863519Z Max fbarriers/Workgrp: 32 2025-12-04T08:53:19.0869746Z Packet Processor uCode:: 185 2025-12-04T08:53:19.0869910Z SDMA engine uCode:: 24 2025-12-04T08:53:19.0870069Z IOMMU Support:: None 2025-12-04T08:53:19.0870199Z Pool Info: 2025-12-04T08:53:19.0870301Z Pool 1 2025-12-04T08:53:19.0870429Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T08:53:19.0870573Z Size: 268419072(0xfffc000) KB 2025-12-04T08:53:19.0870719Z Allocatable: TRUE 2025-12-04T08:53:19.0870867Z Alloc Granule: 4KB 2025-12-04T08:53:19.0871033Z Alloc Recommended Granule:2048KB 2025-12-04T08:53:19.0871191Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0871343Z Accessible by all: FALSE 2025-12-04T08:53:19.0871476Z Pool 2 2025-12-04T08:53:19.0871601Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T08:53:19.0871746Z Size: 268419072(0xfffc000) KB 2025-12-04T08:53:19.0871892Z Allocatable: TRUE 2025-12-04T08:53:19.0872037Z Alloc Granule: 4KB 2025-12-04T08:53:19.0872191Z Alloc Recommended Granule:2048KB 2025-12-04T08:53:19.0872346Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0872523Z Accessible by all: FALSE 2025-12-04T08:53:19.0872651Z Pool 3 2025-12-04T08:53:19.0872774Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T08:53:19.0872911Z Size: 268419072(0xfffc000) KB 2025-12-04T08:53:19.0873052Z Allocatable: TRUE 2025-12-04T08:53:19.0873199Z Alloc Granule: 4KB 2025-12-04T08:53:19.0873492Z Alloc Recommended Granule:2048KB 2025-12-04T08:53:19.0873648Z Alloc Alignment: 4KB 2025-12-04T08:53:19.0873799Z Accessible by all: FALSE 2025-12-04T08:53:19.0873929Z Pool 4 2025-12-04T08:53:19.0874046Z Segment: GROUP 2025-12-04T08:53:19.0874179Z Size: 64(0x40) KB 2025-12-04T08:53:19.0874320Z Allocatable: FALSE 2025-12-04T08:53:19.0874472Z Alloc Granule: 0KB 2025-12-04T08:53:19.0874625Z Alloc Recommended Granule:0KB 2025-12-04T08:53:19.0874780Z Alloc Alignment: 0KB 2025-12-04T08:53:19.0874933Z Accessible by all: FALSE 2025-12-04T08:53:19.0875069Z ISA Info: 2025-12-04T08:53:19.0875166Z ISA 1 2025-12-04T08:53:19.0875290Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T08:53:19.0875449Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T08:53:19.0875604Z Profiles: HSA_PROFILE_BASE 2025-12-04T08:53:19.0875756Z Default Rounding Mode: NEAR 2025-12-04T08:53:19.0875911Z Default Rounding Mode: NEAR 2025-12-04T08:53:19.0876099Z Fast f16: TRUE 2025-12-04T08:53:19.0876244Z Workgroup Max Size: 1024(0x400) 2025-12-04T08:53:19.0876384Z Workgroup Max Size per Dimension: 2025-12-04T08:53:19.0876511Z x 1024(0x400) 2025-12-04T08:53:19.0876635Z y 1024(0x400) 2025-12-04T08:53:19.0876761Z z 1024(0x400) 2025-12-04T08:53:19.0876897Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T08:53:19.0877033Z Grid Max Size per Dimension: 2025-12-04T08:53:19.0877148Z x 4294967295(0xffffffff) 2025-12-04T08:53:19.0877270Z y 4294967295(0xffffffff) 2025-12-04T08:53:19.0877392Z z 4294967295(0xffffffff) 2025-12-04T08:53:19.0877535Z FBarrier Max Size: 32 2025-12-04T08:53:19.0877662Z ISA 2 2025-12-04T08:53:19.0877797Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T08:53:19.0877961Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T08:53:19.0878114Z Profiles: HSA_PROFILE_BASE 2025-12-04T08:53:19.0878267Z Default Rounding Mode: NEAR 2025-12-04T08:53:19.0878421Z Default Rounding Mode: NEAR 2025-12-04T08:53:19.0878568Z Fast f16: TRUE 2025-12-04T08:53:19.0878716Z Workgroup Max Size: 1024(0x400) 2025-12-04T08:53:19.0878854Z Workgroup Max Size per Dimension: 2025-12-04T08:53:19.0878978Z x 1024(0x400) 2025-12-04T08:53:19.0879102Z y 1024(0x400) 2025-12-04T08:53:19.0879225Z z 1024(0x400) 2025-12-04T08:53:19.0879359Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T08:53:19.0879488Z Grid Max Size per Dimension: 2025-12-04T08:53:19.0879604Z x 4294967295(0xffffffff) 2025-12-04T08:53:19.0879766Z y 4294967295(0xffffffff) 2025-12-04T08:53:19.0879887Z z 4294967295(0xffffffff) 2025-12-04T08:53:19.0880025Z FBarrier Max Size: 32 2025-12-04T08:53:19.0880154Z *** Done *** 2025-12-04T08:53:19.0914182Z ##[group]Run ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T08:53:19.0914477Z ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx') 2025-12-04T08:53:19.0914890Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-12-04T08:53:19.0915232Z if [[ $ngpu -eq 0 ]]; then 2025-12-04T08:53:19.0915444Z  echo "Error: Failed to detect any GPUs on the runner" 2025-12-04T08:53:19.0915665Z  echo "$msg" 2025-12-04T08:53:19.0915804Z  exit 1 2025-12-04T08:53:19.0915955Z fi 2025-12-04T08:53:19.0919860Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.0920053Z env: 2025-12-04T08:53:19.0920209Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.0920338Z ##[endgroup] 2025-12-04T08:53:19.1616342Z ##[group]Run pytorch/pytorch/.github/actions/diskspace-cleanup@main 2025-12-04T08:53:19.1616519Z with: 2025-12-04T08:53:19.1616620Z diskspace-cutoff: 70 2025-12-04T08:53:19.1616722Z env: 2025-12-04T08:53:19.1616816Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.1616918Z ##[endgroup] 2025-12-04T08:53:19.1638412Z ##[group]Run set -ex 2025-12-04T08:53:19.1638559Z set -ex 2025-12-04T08:53:19.1638790Z diskspace_cutoff=70 2025-12-04T08:53:19.1638942Z docker_root_dir=$(docker info -f '{{.DockerRootDir}}') 2025-12-04T08:53:19.1639106Z if [ ! -d "$docker_root_dir" ]; then 2025-12-04T08:53:19.1639304Z  echo "Docker root directory ($docker_root_dir) does not exist. Skipping disk space check." 2025-12-04T08:53:19.1639509Z  exit 0 2025-12-04T08:53:19.1639604Z fi 2025-12-04T08:53:19.1639765Z diskspace=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-12-04T08:53:19.1640094Z msg="Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified" 2025-12-04T08:53:19.1640380Z if [[ "$diskspace" -ge "$diskspace_cutoff" ]] ; then 2025-12-04T08:53:19.1640528Z  docker system prune -af 2025-12-04T08:53:19.1640733Z  diskspace_new=$(df -H --output=pcent ${docker_root_dir} | sed -n 2p | sed 's/%//' | sed 's/ //') 2025-12-04T08:53:19.1640955Z  if [[ "$diskspace_new" -gt "$diskspace_cutoff" ]] ; then 2025-12-04T08:53:19.1641118Z  diskspace_cutoff_int=$((diskspace_cutoff + 0)) 2025-12-04T08:53:19.1641279Z  difference=$((100 - diskspace_cutoff_int)) 2025-12-04T08:53:19.1641491Z  echo "Error: Available diskspace is less than $difference percent. Not enough diskspace." 2025-12-04T08:53:19.1641686Z  echo "$msg" 2025-12-04T08:53:19.1641794Z  exit 1 2025-12-04T08:53:19.1641889Z  else 2025-12-04T08:53:19.1642004Z  difference=$((diskspace - diskspace_new)) 2025-12-04T08:53:19.1642167Z  echo "Diskspace saved: $difference percent" 2025-12-04T08:53:19.1642294Z  fi 2025-12-04T08:53:19.1642383Z fi 2025-12-04T08:53:19.1646556Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.1646694Z env: 2025-12-04T08:53:19.1646792Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.1646895Z ##[endgroup] 2025-12-04T08:53:19.1669687Z + diskspace_cutoff=70 2025-12-04T08:53:19.1677395Z ++ docker info -f '{{.DockerRootDir}}' 2025-12-04T08:53:19.2109251Z + docker_root_dir=/home/runner/docker-data 2025-12-04T08:53:19.2109464Z + '[' '!' -d /home/runner/docker-data ']' 2025-12-04T08:53:19.2117185Z ++ df -H --output=pcent /home/runner/docker-data 2025-12-04T08:53:19.2117966Z ++ sed -n 2p 2025-12-04T08:53:19.2124063Z ++ sed 's/ //' 2025-12-04T08:53:19.2124189Z ++ sed s/%// 2025-12-04T08:53:19.2138048Z + diskspace=' 5' 2025-12-04T08:53:19.2139071Z + msg='Please file an issue on pytorch/pytorch reporting the faulty runner. Include a link to the runner logs so the runner can be identified' 2025-12-04T08:53:19.2139448Z + [[ 5 -ge 70 ]] 2025-12-04T08:53:19.2174573Z ##[group]Run RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-12-04T08:53:19.2174791Z RUNNER_ARTIFACT_DIR="${RUNNER_TEMP}/artifacts" 2025-12-04T08:53:19.2174964Z rm -rf "${RUNNER_ARTIFACT_DIR}" 2025-12-04T08:53:19.2175103Z mkdir -p "${RUNNER_ARTIFACT_DIR}" 2025-12-04T08:53:19.2175272Z echo "RUNNER_ARTIFACT_DIR=${RUNNER_ARTIFACT_DIR}" >> "${GITHUB_ENV}" 2025-12-04T08:53:19.2175429Z  2025-12-04T08:53:19.2175555Z RUNNER_TEST_RESULTS_DIR="${RUNNER_TEMP}/test-results" 2025-12-04T08:53:19.2175728Z rm -rf "${RUNNER_TEST_RESULTS_DIR}" 2025-12-04T08:53:19.2175865Z mkdir -p "${RUNNER_TEST_RESULTS_DIR}" 2025-12-04T08:53:19.2176042Z echo "RUNNER_TEST_RESULTS_DIR=${RUNNER_TEST_RESULTS_DIR}" >> "${GITHUB_ENV}" 2025-12-04T08:53:19.2176205Z  2025-12-04T08:53:19.2176302Z RUNNER_DOCS_DIR="${RUNNER_TEMP}/docs" 2025-12-04T08:53:19.2176432Z rm -rf "${RUNNER_DOCS_DIR}" 2025-12-04T08:53:19.2176550Z mkdir -p "${RUNNER_DOCS_DIR}" 2025-12-04T08:53:19.2176708Z echo "RUNNER_DOCS_DIR=${RUNNER_DOCS_DIR}" >> "${GITHUB_ENV}" 2025-12-04T08:53:19.2180881Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.2181029Z env: 2025-12-04T08:53:19.2181122Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.2181223Z ##[endgroup] 2025-12-04T08:53:19.2316090Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:19.2316313Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:19.2316529Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:19.2320330Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.2320474Z env: 2025-12-04T08:53:19.2320571Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.2320708Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:19.2320881Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:19.2321050Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:19.2321174Z ##[endgroup] 2025-12-04T08:53:19.2372959Z ##[group]Run # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-12-04T08:53:19.2373235Z # All GPUs are visible to the runner; visibility, if needed, will be set by run_test.py. 2025-12-04T08:53:19.2373487Z # Add render group for container creation. 2025-12-04T08:53:19.2373654Z render_gid=`cat /etc/group | grep render | cut -d: -f3` 2025-12-04T08:53:19.2373861Z # Ensure GPU isolation if pod is part of kubernetes setup with DEVICE_FLAG. 2025-12-04T08:53:19.2374058Z if [ -f "/etc/podinfo/gha-render-devices" ]; then 2025-12-04T08:53:19.2374219Z  DEVICE_FLAG=$(cat /etc/podinfo/gha-render-devices) 2025-12-04T08:53:19.2374353Z else 2025-12-04T08:53:19.2374451Z  DEVICE_FLAG="--device /dev/dri" 2025-12-04T08:53:19.2374561Z fi 2025-12-04T08:53:19.2374739Z # The --group-add daemon and --group-add bin are needed in the Ubuntu 24.04 and Almalinux OSs respectively. 2025-12-04T08:53:19.2375007Z # This is due to the device files (/dev/kfd & /dev/dri) being owned by video group on bare metal. 2025-12-04T08:53:19.2375257Z # This video group ID maps to subgid 1 inside the docker image due to the /etc/subgid entries. 2025-12-04T08:53:19.2375518Z # The group name corresponding to group ID 1 can change depending on the OS, so both are necessary. 2025-12-04T08:53:19.2376094Z echo "GPU_FLAG=--device=/dev/mem --device=/dev/kfd $DEVICE_FLAG --group-add video --group-add $render_gid --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host" >> "${GITHUB_ENV}" 2025-12-04T08:53:19.2378782Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:19.2378917Z env: 2025-12-04T08:53:19.2379003Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.2379129Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:19.2379300Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:19.2379457Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:19.2379580Z ##[endgroup] 2025-12-04T08:53:19.2492752Z ##[group]Run aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 2025-12-04T08:53:19.2492956Z with: 2025-12-04T08:53:19.2493111Z role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_s3_and_ecr_read_only 2025-12-04T08:53:19.2493356Z aws-region: us-east-1 2025-12-04T08:53:19.2493465Z role-duration-seconds: 18000 2025-12-04T08:53:19.2493588Z audience: sts.amazonaws.com 2025-12-04T08:53:19.2493818Z env: 2025-12-04T08:53:19.2493915Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:19.2494052Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:19.2494223Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:19.2494389Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:19.2494868Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:19.2495226Z ##[endgroup] 2025-12-04T08:53:19.5867856Z Assuming role with OIDC 2025-12-04T08:53:19.9246025Z Authenticated as assumedRoleId AROAUPVRELQNLLCOPFEJR:GitHubActions 2025-12-04T08:53:20.0189526Z ##[group]Run aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076 2025-12-04T08:53:20.0189739Z with: 2025-12-04T08:53:20.0189848Z mask-password: true 2025-12-04T08:53:20.0189963Z registry-type: private 2025-12-04T08:53:20.0190084Z skip-logout: false 2025-12-04T08:53:20.0190193Z env: 2025-12-04T08:53:20.0190298Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:20.0190452Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:20.0190644Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:20.0190818Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:20.0191250Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:20.0191659Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:20.0191782Z AWS_REGION: us-east-1 2025-12-04T08:53:20.0192185Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:20.0192356Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:20.0194954Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:20.0195059Z ##[endgroup] 2025-12-04T08:53:20.4366063Z Logging into registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1158544Z ##[group]Run env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:21.1158909Z env | grep '^GITHUB' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:21.1159196Z env | grep '^CI' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:21.1159484Z env | grep '^RUNNER' >> "${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" 2025-12-04T08:53:21.1167103Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:21.1167276Z env: 2025-12-04T08:53:21.1167389Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:21.1167550Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:21.1167758Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:21.1168087Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:21.1168529Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:21.1168961Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:21.1169097Z AWS_REGION: us-east-1 2025-12-04T08:53:21.1169342Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:21.1169531Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:21.1172119Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:21.1172236Z ##[endgroup] 2025-12-04T08:53:21.1352966Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main 2025-12-04T08:53:21.1353162Z with: 2025-12-04T08:53:21.1353491Z docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1353811Z use-custom-docker-registry: true 2025-12-04T08:53:21.1353944Z docker-build-dir: .ci/docker 2025-12-04T08:53:21.1354070Z docker-build-script: ./build.sh 2025-12-04T08:53:21.1354194Z working-directory: . 2025-12-04T08:53:21.1354341Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1354500Z force-push: false 2025-12-04T08:53:21.1354601Z env: 2025-12-04T08:53:21.1354697Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:21.1354838Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:21.1355020Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:21.1355201Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:21.1355594Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:21.1355969Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:21.1356089Z AWS_REGION: us-east-1 2025-12-04T08:53:21.1356306Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:21.1356465Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:21.1358721Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:21.1358831Z ##[endgroup] 2025-12-04T08:53:21.1367337Z ##[group]Run set -ex 2025-12-04T08:53:21.1367473Z set -ex 2025-12-04T08:53:21.1367573Z  2025-12-04T08:53:21.1367732Z # If the docker build directory or the build script doesn't exist, the action will 2025-12-04T08:53:21.1367986Z # gracefully return the docker image name as it is. Pulling docker image in Linux 2025-12-04T08:53:21.1368206Z # job could then download the pre-built image as usual 2025-12-04T08:53:21.1368465Z if [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then 2025-12-04T08:53:21.1368711Z  echo "skip=false" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1368854Z else 2025-12-04T08:53:21.1368972Z  echo "skip=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1369152Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1369321Z  2025-12-04T08:53:21.1369527Z  echo "Not using custom ECR registry. Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..." 2025-12-04T08:53:21.1369754Z  exit 0 2025-12-04T08:53:21.1369844Z fi 2025-12-04T08:53:21.1369930Z  2025-12-04T08:53:21.1370066Z if [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then 2025-12-04T08:53:21.1370289Z  # The docker image name already includes the ECR prefix and tag, so we can just 2025-12-04T08:53:21.1370492Z  # use it as it is, but first let's extract the tag 2025-12-04T08:53:21.1370677Z  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}') 2025-12-04T08:53:21.1370996Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1371180Z  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1371331Z else 2025-12-04T08:53:21.1371441Z  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then 2025-12-04T08:53:21.1371589Z  CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:} 2025-12-04T08:53:21.1371738Z  DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*} 2025-12-04T08:53:21.1371866Z  fi 2025-12-04T08:53:21.1372101Z  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}") 2025-12-04T08:53:21.1372328Z  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1372566Z  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1372820Z  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1372982Z fi 2025-12-04T08:53:21.1377083Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:21.1377229Z env: 2025-12-04T08:53:21.1377324Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:21.1377459Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:21.1377637Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:21.1377805Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:21.1378193Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:21.1378564Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:21.1378682Z AWS_REGION: us-east-1 2025-12-04T08:53:21.1378818Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:21.1378973Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:21.1381284Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:21.1381391Z REPO_NAME: pytorch 2025-12-04T08:53:21.1381665Z DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1381955Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:53:21.1382073Z DOCKER_BUILD_SCRIPT: ./build.sh 2025-12-04T08:53:21.1382224Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1382380Z USE_CUSTOM_DOCKER_REGISTRY: true 2025-12-04T08:53:21.1382501Z CUSTOM_TAG_PREFIX: 2025-12-04T08:53:21.1382603Z ##[endgroup] 2025-12-04T08:53:21.1400388Z + [[ -d .ci/docker ]] 2025-12-04T08:53:21.1400533Z + [[ -f .ci/docker/./build.sh ]] 2025-12-04T08:53:21.1400660Z + [[ true == \t\r\u\e ]] 2025-12-04T08:53:21.1400771Z + echo skip=false 2025-12-04T08:53:21.1401153Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]] 2025-12-04T08:53:21.1406317Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1406609Z ++ awk -F '[:,]' '{print $2}' 2025-12-04T08:53:21.1414369Z + DOCKER_TAG=pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1414661Z + echo docker-tag=pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1415049Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1444139Z ##[group]Run set +e 2025-12-04T08:53:21.1444306Z set +e 2025-12-04T08:53:21.1444425Z set -x 2025-12-04T08:53:21.1444541Z  2025-12-04T08:53:21.1444655Z login() { 2025-12-04T08:53:21.1445029Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:53:21.1445264Z } 2025-12-04T08:53:21.1445369Z  2025-12-04T08:53:21.1445474Z retry () { 2025-12-04T08:53:21.1445604Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:53:21.1445753Z } 2025-12-04T08:53:21.1445854Z  2025-12-04T08:53:21.1445970Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:53:21.1446109Z  2025-12-04T08:53:21.1446226Z START_TIME=$(date +%s) 2025-12-04T08:53:21.1446371Z # Wait up to 120 minutes 2025-12-04T08:53:21.1446681Z while [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do 2025-12-04T08:53:21.1446903Z  # Check if image already exists, if it does then skip building it 2025-12-04T08:53:21.1447125Z  if docker manifest inspect "${DOCKER_IMAGE}"; then 2025-12-04T08:53:21.1447290Z  exit 0 2025-12-04T08:53:21.1447420Z  fi 2025-12-04T08:53:21.1447527Z  2025-12-04T08:53:21.1447694Z  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can 2025-12-04T08:53:21.1447988Z  # use this to differentiate between the Docker build and regular build jobs. For the 2025-12-04T08:53:21.1448266Z  # latter, it will wait for the Docker images to become available before continuing 2025-12-04T08:53:21.1448501Z  if [ "${DOCKER_PUSH:-false}" == "true" ]; then 2025-12-04T08:53:21.1448686Z  # It's a Docker build job, let's build the image 2025-12-04T08:53:21.1448840Z  break 2025-12-04T08:53:21.1448961Z  else 2025-12-04T08:53:21.1449122Z  # It's a regular build job, wait for the image to become available 2025-12-04T08:53:21.1449315Z  sleep 300 2025-12-04T08:53:21.1449436Z  fi 2025-12-04T08:53:21.1449544Z done 2025-12-04T08:53:21.1449652Z  2025-12-04T08:53:21.1449824Z # NB: This part requires a full checkout. Otherwise, the merge base will 2025-12-04T08:53:21.1450066Z # be empty. The default action would be to continue rebuild the image 2025-12-04T08:53:21.1450268Z if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then 2025-12-04T08:53:21.1450450Z  # if we're on the base branch then use the parent commit 2025-12-04T08:53:21.1450615Z  MERGE_BASE=$(git rev-parse HEAD~) 2025-12-04T08:53:21.1450744Z else 2025-12-04T08:53:21.1450882Z  # otherwise we're on a PR, so use the most recent base commit 2025-12-04T08:53:21.1451075Z  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION") 2025-12-04T08:53:21.1451223Z fi 2025-12-04T08:53:21.1451315Z  2025-12-04T08:53:21.1451422Z if [[ -z "${MERGE_BASE}" ]]; then 2025-12-04T08:53:21.1451571Z  echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1451711Z  2025-12-04T08:53:21.1451896Z  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..." 2025-12-04T08:53:21.1452103Z  exit 0 2025-12-04T08:53:21.1452203Z fi 2025-12-04T08:53:21.1452300Z  2025-12-04T08:53:21.1452447Z if ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then 2025-12-04T08:53:21.1452695Z  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit" 2025-12-04T08:53:21.1452907Z  exit 1 2025-12-04T08:53:21.1453003Z fi 2025-12-04T08:53:21.1453091Z  2025-12-04T08:53:21.1453236Z PREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}") 2025-12-04T08:53:21.1453618Z # If no image exists but the hash is the same as the previous hash then we should error out here 2025-12-04T08:53:21.1453839Z if [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then 2025-12-04T08:53:21.1454119Z  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch" 2025-12-04T08:53:21.1454395Z  echo " Will re-build docker image to store in local cache, TTS may be longer" 2025-12-04T08:53:21.1454568Z fi 2025-12-04T08:53:21.1454662Z  2025-12-04T08:53:21.1454773Z echo "rebuild=true" >> "${GITHUB_OUTPUT}" 2025-12-04T08:53:21.1458299Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:21.1458452Z env: 2025-12-04T08:53:21.1458550Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:21.1458691Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:21.1458913Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:21.1459085Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:21.1459476Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:21.1459861Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:21.1460136Z AWS_REGION: us-east-1 2025-12-04T08:53:21.1460350Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:21.1460504Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:21.1462828Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:21.1462944Z DOCKER_BUILD_DIR: .ci/docker 2025-12-04T08:53:21.1463090Z BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T08:53:21.1463450Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1463808Z DOCKER_TAG: pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:21.1464040Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1464195Z DOCKER_PUSH: 2025-12-04T08:53:21.1464297Z ##[endgroup] 2025-12-04T08:53:21.1477141Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1477331Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1479060Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:21.1479268Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:21.1479554Z /home/runner/_work/_temp/25d8a7bb-8bc0-4e4c-970c-ff498fdf6b88.sh: line 5: aws: command not found 2025-12-04T08:53:21.2058579Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:21.2068041Z + sleep 1 2025-12-04T08:53:22.2094094Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:22.2097502Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:22.2098172Z /home/runner/_work/_temp/25d8a7bb-8bc0-4e4c-970c-ff498fdf6b88.sh: line 5: aws: command not found 2025-12-04T08:53:22.2099907Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:22.2197420Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:22.2207846Z + sleep 2 2025-12-04T08:53:24.2221346Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:24.2225703Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:24.2226281Z /home/runner/_work/_temp/25d8a7bb-8bc0-4e4c-970c-ff498fdf6b88.sh: line 5: aws: command not found 2025-12-04T08:53:24.2226907Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:24.2328295Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:24.2341941Z ++ date +%s 2025-12-04T08:53:24.2350834Z + START_TIME=1764838404 2025-12-04T08:53:24.2355886Z ++ date +%s 2025-12-04T08:53:24.2364209Z + [[ 1764831204 -lt 1764838404 ]] 2025-12-04T08:53:24.2364806Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:25.6054827Z { 2025-12-04T08:53:25.6055670Z "schemaVersion": 2, 2025-12-04T08:53:25.6056353Z "mediaType": "application/vnd.docker.distribution.manifest.v2+json", 2025-12-04T08:53:25.6056845Z "config": { 2025-12-04T08:53:25.6057222Z "mediaType": "application/vnd.docker.container.image.v1+json", 2025-12-04T08:53:25.6057663Z "size": 30522, 2025-12-04T08:53:25.6058130Z "digest": "sha256:79498ef00fdf8abfcde955fd685c3a7412c33ca80383b5905abfdc3c70621215" 2025-12-04T08:53:25.6058615Z }, 2025-12-04T08:53:25.6058840Z "layers": [ 2025-12-04T08:53:25.6059066Z { 2025-12-04T08:53:25.6059427Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6059867Z "size": 30594402, 2025-12-04T08:53:25.6060715Z "digest": "sha256:02de03a7213b62b792ec66a7efb8c86c4117ca00fb8651facf8ecfe33044b485" 2025-12-04T08:53:25.6061290Z }, 2025-12-04T08:53:25.6061508Z { 2025-12-04T08:53:25.6061857Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6062287Z "size": 1554, 2025-12-04T08:53:25.6062726Z "digest": "sha256:3a5718b5258e28918133dd74ea64bd506b2c15530a2fa8a72c45c5b0d8f7c7b0" 2025-12-04T08:53:25.6063202Z }, 2025-12-04T08:53:25.6063477Z { 2025-12-04T08:53:25.6063784Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6064097Z "size": 335779211, 2025-12-04T08:53:25.6064433Z "digest": "sha256:bf3aa22776924a41b55849f0f30cb22af45d41da1177a9d682cf94cde99d8f98" 2025-12-04T08:53:25.6064782Z }, 2025-12-04T08:53:25.6064936Z { 2025-12-04T08:53:25.6065187Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6065496Z "size": 704, 2025-12-04T08:53:25.6065814Z "digest": "sha256:9d58e5257cefd43e8226153d71d28a865253662146aa9fce9a9f95af67b497fa" 2025-12-04T08:53:25.6066167Z }, 2025-12-04T08:53:25.6066329Z { 2025-12-04T08:53:25.6066579Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6066886Z "size": 1770, 2025-12-04T08:53:25.6067194Z "digest": "sha256:fde80a64553533a56c032d4bc388837e7d4631a0424d1bfe135703165b67fd4d" 2025-12-04T08:53:25.6067542Z }, 2025-12-04T08:53:25.6067696Z { 2025-12-04T08:53:25.6067941Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6068248Z "size": 485, 2025-12-04T08:53:25.6068559Z "digest": "sha256:6931c5f20e80e481e4f484471ff3a02878b4f8c54a9a5a4717213fdaa35c0bff" 2025-12-04T08:53:25.6068899Z }, 2025-12-04T08:53:25.6069053Z { 2025-12-04T08:53:25.6069303Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6069614Z "size": 120663474, 2025-12-04T08:53:25.6069954Z "digest": "sha256:170ea6d3edd62991e37d2e6ebe53dfcd4601f5d42e8f9720af5f8db5fc267856" 2025-12-04T08:53:25.6070305Z }, 2025-12-04T08:53:25.6070460Z { 2025-12-04T08:53:25.6070711Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6071018Z "size": 4433, 2025-12-04T08:53:25.6071334Z "digest": "sha256:dc8487f6c81cac00fa33031f8d3481e2c3634c4f064a9c4c36b87b41e78bc9fb" 2025-12-04T08:53:25.6071692Z }, 2025-12-04T08:53:25.6071845Z { 2025-12-04T08:53:25.6072096Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6072402Z "size": 1755, 2025-12-04T08:53:25.6072709Z "digest": "sha256:9748c5348f39a11c960c49fd9219fdea1c23e612ed11a02d71501424defc80f5" 2025-12-04T08:53:25.6073054Z }, 2025-12-04T08:53:25.6073207Z { 2025-12-04T08:53:25.6073507Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6073812Z "size": 724, 2025-12-04T08:53:25.6074139Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T08:53:25.6074591Z }, 2025-12-04T08:53:25.6074776Z { 2025-12-04T08:53:25.6075106Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6075362Z "size": 3378352584, 2025-12-04T08:53:25.6075600Z "digest": "sha256:af88f886884fe6f1a1992efb7ce8473901f795eef69caa199443f3e076fdfd5b" 2025-12-04T08:53:25.6075928Z }, 2025-12-04T08:53:25.6076061Z { 2025-12-04T08:53:25.6076251Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6076465Z "size": 396, 2025-12-04T08:53:25.6076694Z "digest": "sha256:32fbb88555c4195c45c7008cf92e389d67acc79a7e382503003ef93bcb886afe" 2025-12-04T08:53:25.6076938Z }, 2025-12-04T08:53:25.6077050Z { 2025-12-04T08:53:25.6077228Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6077443Z "size": 80171601, 2025-12-04T08:53:25.6077680Z "digest": "sha256:3231e1ab814b143b244037c540b637be259085834865ac43b1ed2b6f6ad631e1" 2025-12-04T08:53:25.6077920Z }, 2025-12-04T08:53:25.6078145Z { 2025-12-04T08:53:25.6078315Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6078542Z "size": 787, 2025-12-04T08:53:25.6078768Z "digest": "sha256:80061bf5dcbb9a4e38ac865a9cdc0a615bb294e3e6bfa357a6d515dcf3f54abc" 2025-12-04T08:53:25.6079020Z }, 2025-12-04T08:53:25.6079134Z { 2025-12-04T08:53:25.6079321Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6079538Z "size": 106, 2025-12-04T08:53:25.6079763Z "digest": "sha256:6e9524f4518ec02b47ff12c55b6b6afbc65b3f4be59072e2afe20c2c87522549" 2025-12-04T08:53:25.6080006Z }, 2025-12-04T08:53:25.6080118Z { 2025-12-04T08:53:25.6080299Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6080513Z "size": 1495, 2025-12-04T08:53:25.6080734Z "digest": "sha256:ce919d4bf5eeff71d49b160a16603117225530497c3905e02224227d11e2ff88" 2025-12-04T08:53:25.6080977Z }, 2025-12-04T08:53:25.6081086Z { 2025-12-04T08:53:25.6081274Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6081489Z "size": 548601195, 2025-12-04T08:53:25.6081725Z "digest": "sha256:47681e3e6f37423139a5c86549ffbb43e4f258344b0461208f5821263da152e9" 2025-12-04T08:53:25.6081961Z }, 2025-12-04T08:53:25.6082073Z { 2025-12-04T08:53:25.6082254Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6082455Z "size": 162, 2025-12-04T08:53:25.6082628Z "digest": "sha256:cb70fe22c9ebacebfe8402519059c8a66da6d5a77979e4c0ecdb3a762bebe357" 2025-12-04T08:53:25.6082824Z }, 2025-12-04T08:53:25.6082913Z { 2025-12-04T08:53:25.6083056Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6083223Z "size": 104, 2025-12-04T08:53:25.6083440Z "digest": "sha256:17858e829c8cfe9a7e22516e03ad5273d8cf5c50f58edb10ff60c74e15c8e1f6" 2025-12-04T08:53:25.6083638Z }, 2025-12-04T08:53:25.6083729Z { 2025-12-04T08:53:25.6083879Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6084055Z "size": 724, 2025-12-04T08:53:25.6084236Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T08:53:25.6084425Z }, 2025-12-04T08:53:25.6084515Z { 2025-12-04T08:53:25.6084659Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6084834Z "size": 196, 2025-12-04T08:53:25.6085013Z "digest": "sha256:a63f3b4eed1157bcb3c51b64196e74e9f10d1f923652b02fd433c6ed993597ff" 2025-12-04T08:53:25.6085209Z }, 2025-12-04T08:53:25.6085292Z { 2025-12-04T08:53:25.6085434Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6085602Z "size": 2584, 2025-12-04T08:53:25.6085788Z "digest": "sha256:10ab3d1afbc4cb2d3ced8f3e0072c0b1dd124dcadcf68b95fadf8a7a9f663860" 2025-12-04T08:53:25.6085987Z }, 2025-12-04T08:53:25.6086071Z { 2025-12-04T08:53:25.6086210Z + exit 0 2025-12-04T08:53:25.6086357Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6086529Z "size": 7652105336, 2025-12-04T08:53:25.6086713Z "digest": "sha256:98ca88b5095b449a2f2d753a21217856271912fbe51c2d99f928a2196f4097d5" 2025-12-04T08:53:25.6086903Z }, 2025-12-04T08:53:25.6086985Z { 2025-12-04T08:53:25.6087126Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6087339Z "size": 135, 2025-12-04T08:53:25.6087516Z "digest": "sha256:025c90839a58c768b3cc444e48cae67c1a5b2c85320ad8827231f0ba390cf9aa" 2025-12-04T08:53:25.6087703Z }, 2025-12-04T08:53:25.6087792Z { 2025-12-04T08:53:25.6087933Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6088102Z "size": 104, 2025-12-04T08:53:25.6088274Z "digest": "sha256:9255df5942ae69fee24f8074314f451d5d2f1ca71b6c777274297fd43a0032d8" 2025-12-04T08:53:25.6088468Z }, 2025-12-04T08:53:25.6088558Z { 2025-12-04T08:53:25.6088698Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6088909Z "size": 612, 2025-12-04T08:53:25.6089082Z "digest": "sha256:f71ca9d4ed1c4ca8177602f3cb0db83d9787ea6c258a8ef203387b308ff3e0f0" 2025-12-04T08:53:25.6089277Z }, 2025-12-04T08:53:25.6089367Z { 2025-12-04T08:53:25.6089502Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6089679Z "size": 838191953, 2025-12-04T08:53:25.6089864Z "digest": "sha256:d02b47b56ca7f3598f5943d4fdc7139d5e3d3bc82d49185cedf9817dd55fc75c" 2025-12-04T08:53:25.6090056Z }, 2025-12-04T08:53:25.6090145Z { 2025-12-04T08:53:25.6090289Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6090459Z "size": 111, 2025-12-04T08:53:25.6090630Z "digest": "sha256:40279492aea7bc8fb650842b495912195621c21b14cef4c717a9e0a9fc535131" 2025-12-04T08:53:25.6090819Z }, 2025-12-04T08:53:25.6090909Z { 2025-12-04T08:53:25.6091052Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6091222Z "size": 1556, 2025-12-04T08:53:25.6091405Z "digest": "sha256:33a27ce74abd7e32a03a564fc45005bc75904b53ad516f18d47facbeb2f2794e" 2025-12-04T08:53:25.6091597Z }, 2025-12-04T08:53:25.6091684Z { 2025-12-04T08:53:25.6091824Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6091991Z "size": 107, 2025-12-04T08:53:25.6092175Z "digest": "sha256:6b66ed335d1d8df6140caba76d9c2babed83bb37962e1e638825d49e67184fa5" 2025-12-04T08:53:25.6092365Z }, 2025-12-04T08:53:25.6092451Z { 2025-12-04T08:53:25.6092592Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6092762Z "size": 166, 2025-12-04T08:53:25.6092935Z "digest": "sha256:9f010fa04118bfee2d7b4481e6badb714032bde0652b04151a6599e57e1bd91b" 2025-12-04T08:53:25.6093127Z }, 2025-12-04T08:53:25.6093204Z { 2025-12-04T08:53:25.6093370Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6093523Z "size": 3702493, 2025-12-04T08:53:25.6093691Z "digest": "sha256:6c64d5e8bb6ae6ef4e3f1d316429d8b14a6e8a1fb410fb83b96c8bbd4a0a095c" 2025-12-04T08:53:25.6093864Z }, 2025-12-04T08:53:25.6093941Z { 2025-12-04T08:53:25.6094065Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6094217Z "size": 107, 2025-12-04T08:53:25.6094379Z "digest": "sha256:c20ea058f549f5f5538c95c5e0da23afbbc9fb7ffc1987d126fe684eeed743f5" 2025-12-04T08:53:25.6094557Z }, 2025-12-04T08:53:25.6094634Z { 2025-12-04T08:53:25.6094758Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6094911Z "size": 829, 2025-12-04T08:53:25.6095066Z "digest": "sha256:3c4fd2d54638a1336d39769fe36041aa6d186a8dea0e7096b8d8a7068ba0d3c0" 2025-12-04T08:53:25.6095238Z }, 2025-12-04T08:53:25.6095316Z { 2025-12-04T08:53:25.6095441Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6095592Z "size": 26673844, 2025-12-04T08:53:25.6095762Z "digest": "sha256:964ebac3d7a95c64ea7f0d828cd58e6244cc955e9a099a2525079ecf64026e3f" 2025-12-04T08:53:25.6095935Z }, 2025-12-04T08:53:25.6096011Z { 2025-12-04T08:53:25.6096136Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6096287Z "size": 104, 2025-12-04T08:53:25.6096445Z "digest": "sha256:2aaa7210673fc5bd15d36e54ee5c3fb495d1eafa1cb8d686054ccedb1c37bfc8" 2025-12-04T08:53:25.6096654Z }, 2025-12-04T08:53:25.6096731Z { 2025-12-04T08:53:25.6096855Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6097006Z "size": 424, 2025-12-04T08:53:25.6097162Z "digest": "sha256:fa273daa00371a98ed668535e14b8cc3cb425feba0b601b3e3c72314d0234312" 2025-12-04T08:53:25.6097334Z }, 2025-12-04T08:53:25.6097411Z { 2025-12-04T08:53:25.6097534Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6097686Z "size": 19279582, 2025-12-04T08:53:25.6097853Z "digest": "sha256:d931a62fd2408369decfa0e6eac11768e35d0ffddee87d769c82aaf1ad7e2899" 2025-12-04T08:53:25.6098065Z }, 2025-12-04T08:53:25.6098143Z { 2025-12-04T08:53:25.6098268Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6098422Z "size": 826, 2025-12-04T08:53:25.6098577Z "digest": "sha256:d3573d61c28e1400840260d3c2c786c9e104f6558162beac799e55b6f5c1e747" 2025-12-04T08:53:25.6098752Z }, 2025-12-04T08:53:25.6098826Z { 2025-12-04T08:53:25.6098950Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6099102Z "size": 724, 2025-12-04T08:53:25.6099256Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T08:53:25.6099428Z }, 2025-12-04T08:53:25.6099506Z { 2025-12-04T08:53:25.6099632Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6099783Z "size": 149, 2025-12-04T08:53:25.6099938Z "digest": "sha256:f9b32f08c49055dd61bd359d5f42f6adb9e5a183c2821d97d11572dd7ce1e91f" 2025-12-04T08:53:25.6100109Z }, 2025-12-04T08:53:25.6100192Z { 2025-12-04T08:53:25.6100316Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6100466Z "size": 136, 2025-12-04T08:53:25.6100619Z "digest": "sha256:3a0206399d60f6e8897f78c8e8f81b59d51969a329ef45485d28ae19607ca72c" 2025-12-04T08:53:25.6100787Z }, 2025-12-04T08:53:25.6100869Z { 2025-12-04T08:53:25.6100996Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6101151Z "size": 140, 2025-12-04T08:53:25.6101304Z "digest": "sha256:386f322edd1c1c275126bab065c22fcd3950916c1fb8491a21a7f5c358af599a" 2025-12-04T08:53:25.6101475Z }, 2025-12-04T08:53:25.6101553Z { 2025-12-04T08:53:25.6101675Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6101828Z "size": 32, 2025-12-04T08:53:25.6101985Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:53:25.6102157Z }, 2025-12-04T08:53:25.6102235Z { 2025-12-04T08:53:25.6102365Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6102518Z "size": 223, 2025-12-04T08:53:25.6102673Z "digest": "sha256:bbe49df30697f6959cd958299909d9255cd54663ce2e9e2c2d378f8f9dfe8345" 2025-12-04T08:53:25.6102845Z }, 2025-12-04T08:53:25.6102921Z { 2025-12-04T08:53:25.6103047Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6103213Z "size": 346, 2025-12-04T08:53:25.6103414Z "digest": "sha256:d6630aa6f375b12cb7471c5b60eb32e02ff8d70adf4497e061d6c15fead186c7" 2025-12-04T08:53:25.6103598Z }, 2025-12-04T08:53:25.6103683Z { 2025-12-04T08:53:25.6103813Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6103974Z "size": 88302, 2025-12-04T08:53:25.6104137Z "digest": "sha256:6d807afc1309592c99c7d77af3874afb54c1718377fe721ac0cc616f59d291b9" 2025-12-04T08:53:25.6104315Z }, 2025-12-04T08:53:25.6104401Z { 2025-12-04T08:53:25.6104537Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6104696Z "size": 106, 2025-12-04T08:53:25.6104854Z "digest": "sha256:60b679430e4e0b7690392dfe4f5dc417847f7a3ba2b761ce747b66d412e1d956" 2025-12-04T08:53:25.6105030Z }, 2025-12-04T08:53:25.6105115Z { 2025-12-04T08:53:25.6105249Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6105451Z "size": 1671, 2025-12-04T08:53:25.6105615Z "digest": "sha256:3992ae84f9eda1c5c52fa96b1f1d0fc3f93c661c5cf0b971a504a260c290da49" 2025-12-04T08:53:25.6105788Z }, 2025-12-04T08:53:25.6105865Z { 2025-12-04T08:53:25.6105991Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6106205Z "size": 724, 2025-12-04T08:53:25.6106542Z "digest": "sha256:8539cc3f8d8a138501ed0255c0cd7ec491bc0add9e4a62095f1c0f9533daa1cc" 2025-12-04T08:53:25.6106754Z }, 2025-12-04T08:53:25.6106878Z { 2025-12-04T08:53:25.6107063Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6107295Z "size": 138, 2025-12-04T08:53:25.6107482Z "digest": "sha256:62d400609f9c38fce4745f72372423072ba0f142b3c03775ccb317f6c5240966" 2025-12-04T08:53:25.6118849Z }, 2025-12-04T08:53:25.6118940Z { 2025-12-04T08:53:25.6119082Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6119254Z "size": 119, 2025-12-04T08:53:25.6119418Z "digest": "sha256:7e7b097490967d568331cc9f8afdd02422fe101c6364ec5e12dba2970991e533" 2025-12-04T08:53:25.6119598Z }, 2025-12-04T08:53:25.6119680Z { 2025-12-04T08:53:25.6119814Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6119978Z "size": 6231259865, 2025-12-04T08:53:25.6120153Z "digest": "sha256:7dcdbd8421cb17aaa5d0cb965ddf94e196cb364e762b12ab78024cb25e3b6bcd" 2025-12-04T08:53:25.6120337Z }, 2025-12-04T08:53:25.6120420Z { 2025-12-04T08:53:25.6120551Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6120711Z "size": 174, 2025-12-04T08:53:25.6120869Z "digest": "sha256:cbb12613719bab9f179968227f9fb8881251992804e460b9a9e1c00f3ac4a0c5" 2025-12-04T08:53:25.6121045Z }, 2025-12-04T08:53:25.6121128Z { 2025-12-04T08:53:25.6121259Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6121418Z "size": 1896, 2025-12-04T08:53:25.6121585Z "digest": "sha256:e87038dce9bc8e13bd64006847d30ddcaf77455256c4985fccfec83f82d4b925" 2025-12-04T08:53:25.6121762Z }, 2025-12-04T08:53:25.6121847Z { 2025-12-04T08:53:25.6121979Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6122139Z "size": 162783968, 2025-12-04T08:53:25.6122306Z "digest": "sha256:e4606b636f96f1c80f4be26aeb9d6f5f990f6149789c2de160451c5ac76a467d" 2025-12-04T08:53:25.6122483Z }, 2025-12-04T08:53:25.6122565Z { 2025-12-04T08:53:25.6122695Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6122855Z "size": 302, 2025-12-04T08:53:25.6123015Z "digest": "sha256:6f2a5d33b946e561219b9968769773e36ce1d28bee8c62eff652098b7825fc79" 2025-12-04T08:53:25.6123189Z }, 2025-12-04T08:53:25.6123307Z { 2025-12-04T08:53:25.6123437Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6123596Z "size": 32, 2025-12-04T08:53:25.6123760Z "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1" 2025-12-04T08:53:25.6123941Z }, 2025-12-04T08:53:25.6124020Z { 2025-12-04T08:53:25.6124151Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6124312Z "size": 108, 2025-12-04T08:53:25.6124473Z "digest": "sha256:a4f2bf2f19e63b91d46f2d9cf11a25c657517a6835996404da1e79a09d918b0e" 2025-12-04T08:53:25.6124650Z }, 2025-12-04T08:53:25.6124733Z { 2025-12-04T08:53:25.6124865Z "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip", 2025-12-04T08:53:25.6125025Z "size": 54145661, 2025-12-04T08:53:25.6125197Z "digest": "sha256:1ae00acdac56cbc6d3f81b3c5d854a2b77c30d458b0fbe18c5935145364484f0" 2025-12-04T08:53:25.6125375Z } 2025-12-04T08:53:25.6125458Z ] 2025-12-04T08:53:25.6125543Z } 2025-12-04T08:53:25.6141212Z ##[group]Run set -eux 2025-12-04T08:53:25.6141341Z set -eux 2025-12-04T08:53:25.6141511Z # It's ok if this steps fails, it would then be an anonymous user like what we used to have 2025-12-04T08:53:25.6141989Z aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true 2025-12-04T08:53:25.6146449Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:25.6146605Z env: 2025-12-04T08:53:25.6146705Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:25.6146848Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:25.6147028Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:25.6147198Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:25.6147626Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:25.6147998Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:25.6148126Z AWS_REGION: us-east-1 2025-12-04T08:53:25.6148327Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:25.6148485Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:25.6150767Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:25.6150877Z ##[endgroup] 2025-12-04T08:53:25.6172505Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token 2025-12-04T08:53:25.6172791Z /home/runner/_work/_temp/6164ff79-fa6c-47f3-9d8a-a21472f14c00.sh: line 3: aws: command not found 2025-12-04T08:53:25.6173949Z + jq --raw-output .SecretString 2025-12-04T08:53:25.6174514Z + jq -r .docker_hub_readonly_token 2025-12-04T08:53:25.6174875Z + docker login --username pytorchbot --password-stdin 2025-12-04T08:53:25.6270631Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:25.6276859Z + true 2025-12-04T08:53:25.6334546Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main 2025-12-04T08:53:25.6334732Z with: 2025-12-04T08:53:25.6334997Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:25.6335332Z docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:25.6335486Z env: 2025-12-04T08:53:25.6335581Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:25.6335719Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:25.6335895Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:25.6336063Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:25.6336447Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:25.6336821Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:25.6336938Z AWS_REGION: us-east-1 2025-12-04T08:53:25.6337139Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:25.6337298Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:25.6339617Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:25.6339967Z ##[endgroup] 2025-12-04T08:53:25.6346576Z ##[group]Run set -x 2025-12-04T08:53:25.6346696Z set -x 2025-12-04T08:53:25.6346789Z set +e 2025-12-04T08:53:25.6346881Z  2025-12-04T08:53:25.6346974Z login() { 2025-12-04T08:53:25.6347161Z  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1" 2025-12-04T08:53:25.6347353Z } 2025-12-04T08:53:25.6347441Z  2025-12-04T08:53:25.6347531Z retry () { 2025-12-04T08:53:25.6347665Z  $* || (sleep 1 && $*) || (sleep 2 && $*) 2025-12-04T08:53:25.6347788Z } 2025-12-04T08:53:25.6347872Z  2025-12-04T08:53:25.6347971Z retry login "${DOCKER_REGISTRY}" 2025-12-04T08:53:25.6348091Z  2025-12-04T08:53:25.6348273Z IMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024') 2025-12-04T08:53:25.6348626Z echo "Compressed size of image in MB: ${IMAGE_SIZE}" 2025-12-04T08:53:25.6348770Z  2025-12-04T08:53:25.6348855Z set -e 2025-12-04T08:53:25.6348990Z # ignore output since only exit code is used for conditional 2025-12-04T08:53:25.6349174Z # only pull docker image if it's not available locally 2025-12-04T08:53:25.6349378Z if ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then 2025-12-04T08:53:25.6349564Z  retry docker pull "${DOCKER_IMAGE}" 2025-12-04T08:53:25.6349686Z fi 2025-12-04T08:53:25.6353636Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T08:53:25.6353778Z env: 2025-12-04T08:53:25.6353870Z GIT_DEFAULT_BRANCH: main 2025-12-04T08:53:25.6354008Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T08:53:25.6354181Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T08:53:25.6354353Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T08:53:25.6354731Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T08:53:25.6355101Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T08:53:25.6355215Z AWS_REGION: us-east-1 2025-12-04T08:53:25.6355357Z AWS_ACCESS_KEY_ID: *** 2025-12-04T08:53:25.6355507Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T08:53:25.6357822Z AWS_SESSION_TOKEN: *** 2025-12-04T08:53:25.6358094Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:25.6358482Z DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:25.6358636Z ##[endgroup] 2025-12-04T08:53:25.6377896Z + set +e 2025-12-04T08:53:25.6378236Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:25.6378418Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:25.6382125Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:25.6383813Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:25.6384365Z /home/runner/_work/_temp/bce571b1-1441-435c-9202-ef26b9a79d85.sh: line 5: aws: command not found 2025-12-04T08:53:25.6450202Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:25.6458216Z + sleep 1 2025-12-04T08:53:26.6469072Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:26.6474030Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:26.6474856Z /home/runner/_work/_temp/bce571b1-1441-435c-9202-ef26b9a79d85.sh: line 5: aws: command not found 2025-12-04T08:53:26.6475608Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:26.6564829Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:26.6576828Z + sleep 2 2025-12-04T08:53:28.6592496Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:28.6596515Z + aws ecr get-login-password --region us-east-1 2025-12-04T08:53:28.6597057Z /home/runner/_work/_temp/bce571b1-1441-435c-9202-ef26b9a79d85.sh: line 5: aws: command not found 2025-12-04T08:53:28.6597672Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T08:53:28.6691746Z Error: Cannot perform an interactive login from a non TTY device 2025-12-04T08:53:28.6708166Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:28.6708877Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024' 2025-12-04T08:53:30.0410905Z + IMAGE_SIZE=18579.916069984436 2025-12-04T08:53:30.0411425Z + echo 'Compressed size of image in MB: 18579.916069984436' 2025-12-04T08:53:30.0411812Z + set -e 2025-12-04T08:53:30.0413067Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:30.0414072Z Compressed size of image in MB: 18579.916069984436 2025-12-04T08:53:30.0537819Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:30.0538467Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T08:53:31.0939106Z pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image 2025-12-04T08:53:31.0939789Z 02de03a7213b: Pulling fs layer 2025-12-04T08:53:31.0940114Z 3a5718b5258e: Pulling fs layer 2025-12-04T08:53:31.0940384Z bf3aa2277692: Pulling fs layer 2025-12-04T08:53:31.0940632Z 9d58e5257cef: Pulling fs layer 2025-12-04T08:53:31.0940887Z fde80a645535: Pulling fs layer 2025-12-04T08:53:31.0941129Z 6931c5f20e80: Pulling fs layer 2025-12-04T08:53:31.0941370Z 170ea6d3edd6: Pulling fs layer 2025-12-04T08:53:31.0941616Z dc8487f6c81c: Pulling fs layer 2025-12-04T08:53:31.0941851Z 9748c5348f39: Pulling fs layer 2025-12-04T08:53:31.0942090Z 8539cc3f8d8a: Pulling fs layer 2025-12-04T08:53:31.0942335Z af88f886884f: Pulling fs layer 2025-12-04T08:53:31.0942574Z 32fbb88555c4: Pulling fs layer 2025-12-04T08:53:31.0942809Z fde80a645535: Waiting 2025-12-04T08:53:31.0943039Z 3231e1ab814b: Pulling fs layer 2025-12-04T08:53:31.0943364Z 80061bf5dcbb: Pulling fs layer 2025-12-04T08:53:31.0943606Z 6e9524f4518e: Pulling fs layer 2025-12-04T08:53:31.0943843Z ce919d4bf5ee: Pulling fs layer 2025-12-04T08:53:31.0944088Z 47681e3e6f37: Pulling fs layer 2025-12-04T08:53:31.0945424Z 6931c5f20e80: Waiting 2025-12-04T08:53:31.0945652Z cb70fe22c9eb: Pulling fs layer 2025-12-04T08:53:31.0945894Z 17858e829c8c: Pulling fs layer 2025-12-04T08:53:31.0946131Z 170ea6d3edd6: Waiting 2025-12-04T08:53:31.0946356Z a63f3b4eed11: Pulling fs layer 2025-12-04T08:53:31.0946596Z dc8487f6c81c: Waiting 2025-12-04T08:53:31.0946816Z 10ab3d1afbc4: Pulling fs layer 2025-12-04T08:53:31.0947050Z 9748c5348f39: Waiting 2025-12-04T08:53:31.0947270Z 98ca88b5095b: Pulling fs layer 2025-12-04T08:53:31.0947512Z 025c90839a58: Pulling fs layer 2025-12-04T08:53:31.0947751Z 9255df5942ae: Pulling fs layer 2025-12-04T08:53:31.0947983Z 8539cc3f8d8a: Waiting 2025-12-04T08:53:31.0948207Z f71ca9d4ed1c: Pulling fs layer 2025-12-04T08:53:31.0948449Z d02b47b56ca7: Pulling fs layer 2025-12-04T08:53:31.0948689Z 40279492aea7: Pulling fs layer 2025-12-04T08:53:31.0948920Z af88f886884f: Waiting 2025-12-04T08:53:31.0949142Z 33a27ce74abd: Pulling fs layer 2025-12-04T08:53:31.0949379Z 6b66ed335d1d: Pulling fs layer 2025-12-04T08:53:31.0956245Z 9f010fa04118: Pulling fs layer 2025-12-04T08:53:31.0956461Z 32fbb88555c4: Waiting 2025-12-04T08:53:31.0956640Z 6c64d5e8bb6a: Pulling fs layer 2025-12-04T08:53:31.0956823Z c20ea058f549: Pulling fs layer 2025-12-04T08:53:31.0957001Z 3c4fd2d54638: Pulling fs layer 2025-12-04T08:53:31.0957169Z 3231e1ab814b: Waiting 2025-12-04T08:53:31.0957354Z 964ebac3d7a9: Pulling fs layer 2025-12-04T08:53:31.0957532Z 2aaa7210673f: Pulling fs layer 2025-12-04T08:53:31.0957695Z 80061bf5dcbb: Waiting 2025-12-04T08:53:31.0957856Z fa273daa0037: Pulling fs layer 2025-12-04T08:53:31.0958024Z 6e9524f4518e: Waiting 2025-12-04T08:53:31.0958182Z d931a62fd240: Pulling fs layer 2025-12-04T08:53:31.0958354Z d3573d61c28e: Pulling fs layer 2025-12-04T08:53:31.0958516Z ce919d4bf5ee: Waiting 2025-12-04T08:53:31.0958676Z f9b32f08c490: Pulling fs layer 2025-12-04T08:53:31.0958847Z 3a0206399d60: Pulling fs layer 2025-12-04T08:53:31.0959019Z 386f322edd1c: Pulling fs layer 2025-12-04T08:53:31.0959190Z 4f4fb700ef54: Pulling fs layer 2025-12-04T08:53:31.0959371Z bbe49df30697: Pulling fs layer 2025-12-04T08:53:31.0959539Z d6630aa6f375: Pulling fs layer 2025-12-04T08:53:31.0959704Z 47681e3e6f37: Waiting 2025-12-04T08:53:31.0959882Z 6d807afc1309: Pulling fs layer 2025-12-04T08:53:31.0960214Z 60b679430e4e: Pulling fs layer 2025-12-04T08:53:31.0960369Z 3992ae84f9ed: Pulling fs layer 2025-12-04T08:53:31.0960521Z 62d400609f9c: Pulling fs layer 2025-12-04T08:53:31.0960796Z 7e7b09749096: Pulling fs layer 2025-12-04T08:53:31.0961884Z 7dcdbd8421cb: Pulling fs layer 2025-12-04T08:53:31.0962317Z cbb12613719b: Pulling fs layer 2025-12-04T08:53:31.0962616Z e87038dce9bc: Pulling fs layer 2025-12-04T08:53:31.0962891Z a63f3b4eed11: Waiting 2025-12-04T08:53:31.0963178Z e4606b636f96: Pulling fs layer 2025-12-04T08:53:31.0963505Z 10ab3d1afbc4: Waiting 2025-12-04T08:53:31.0963790Z 6f2a5d33b946: Pulling fs layer 2025-12-04T08:53:31.0964048Z 025c90839a58: Waiting 2025-12-04T08:53:31.0964282Z 98ca88b5095b: Waiting 2025-12-04T08:53:31.0964514Z f71ca9d4ed1c: Waiting 2025-12-04T08:53:31.0964809Z a4f2bf2f19e6: Pulling fs layer 2025-12-04T08:53:31.0965077Z 1ae00acdac56: Pulling fs layer 2025-12-04T08:53:31.0965334Z 40279492aea7: Waiting 2025-12-04T08:53:31.0965563Z d02b47b56ca7: Waiting 2025-12-04T08:53:31.0965809Z 33a27ce74abd: Waiting 2025-12-04T08:53:31.0966038Z 9f010fa04118: Waiting 2025-12-04T08:53:31.0966259Z 6c64d5e8bb6a: Waiting 2025-12-04T08:53:31.0966482Z f9b32f08c490: Waiting 2025-12-04T08:53:31.0966703Z c20ea058f549: Waiting 2025-12-04T08:53:31.0966926Z 3c4fd2d54638: Waiting 2025-12-04T08:53:31.0967151Z fa273daa0037: Waiting 2025-12-04T08:53:31.0967373Z d931a62fd240: Waiting 2025-12-04T08:53:31.0967593Z 9d58e5257cef: Waiting 2025-12-04T08:53:31.0967815Z 9255df5942ae: Waiting 2025-12-04T08:53:31.0968038Z 6b66ed335d1d: Waiting 2025-12-04T08:53:31.0968260Z 6d807afc1309: Waiting 2025-12-04T08:53:31.0968481Z d3573d61c28e: Waiting 2025-12-04T08:53:31.0968705Z bbe49df30697: Waiting 2025-12-04T08:53:31.0968923Z 62d400609f9c: Waiting 2025-12-04T08:53:31.0969859Z d6630aa6f375: Waiting 2025-12-04T08:53:31.0970085Z e4606b636f96: Waiting 2025-12-04T08:53:31.0970303Z cbb12613719b: Waiting 2025-12-04T08:53:31.0970525Z e87038dce9bc: Waiting 2025-12-04T08:53:31.0970754Z 7dcdbd8421cb: Waiting 2025-12-04T08:53:31.0970987Z 386f322edd1c: Waiting 2025-12-04T08:53:31.0971213Z a4f2bf2f19e6: Waiting 2025-12-04T08:53:31.0971430Z 3992ae84f9ed: Waiting 2025-12-04T08:53:31.0971654Z 3a0206399d60: Waiting 2025-12-04T08:53:31.0971876Z 1ae00acdac56: Waiting 2025-12-04T08:53:31.0972096Z 60b679430e4e: Waiting 2025-12-04T08:53:31.0972313Z 4f4fb700ef54: Waiting 2025-12-04T08:53:31.0972537Z 6f2a5d33b946: Waiting 2025-12-04T08:53:31.0972756Z 2aaa7210673f: Waiting 2025-12-04T08:53:31.6987997Z 3a5718b5258e: Verifying Checksum 2025-12-04T08:53:31.6988368Z 3a5718b5258e: Download complete 2025-12-04T08:53:32.2813102Z 9d58e5257cef: Download complete 2025-12-04T08:53:32.7608650Z 02de03a7213b: Verifying Checksum 2025-12-04T08:53:32.7609033Z 02de03a7213b: Download complete 2025-12-04T08:53:32.8595336Z fde80a645535: Verifying Checksum 2025-12-04T08:53:32.8595632Z fde80a645535: Download complete 2025-12-04T08:53:33.2898974Z 02de03a7213b: Pull complete 2025-12-04T08:53:33.3063182Z 3a5718b5258e: Pull complete 2025-12-04T08:53:33.3302207Z 6931c5f20e80: Verifying Checksum 2025-12-04T08:53:33.3302436Z 6931c5f20e80: Download complete 2025-12-04T08:53:33.9066657Z dc8487f6c81c: Verifying Checksum 2025-12-04T08:53:33.9067090Z dc8487f6c81c: Download complete 2025-12-04T08:53:34.5136084Z 9748c5348f39: Verifying Checksum 2025-12-04T08:53:34.5136469Z 9748c5348f39: Download complete 2025-12-04T08:53:35.1212093Z 8539cc3f8d8a: Verifying Checksum 2025-12-04T08:53:35.1212457Z 8539cc3f8d8a: Download complete 2025-12-04T08:53:36.6731998Z 170ea6d3edd6: Verifying Checksum 2025-12-04T08:53:36.6732461Z 170ea6d3edd6: Download complete 2025-12-04T08:53:37.3191885Z 32fbb88555c4: Download complete 2025-12-04T08:53:40.9267573Z 3231e1ab814b: Verifying Checksum 2025-12-04T08:53:40.9268021Z 3231e1ab814b: Download complete 2025-12-04T08:53:41.5946129Z 80061bf5dcbb: Verifying Checksum 2025-12-04T08:53:41.5946584Z 80061bf5dcbb: Download complete 2025-12-04T08:53:42.2021221Z 6e9524f4518e: Verifying Checksum 2025-12-04T08:53:42.2021394Z 6e9524f4518e: Download complete 2025-12-04T08:53:42.8066832Z ce919d4bf5ee: Verifying Checksum 2025-12-04T08:53:42.8067095Z ce919d4bf5ee: Download complete 2025-12-04T08:53:44.3132579Z bf3aa2277692: Download complete 2025-12-04T08:53:44.9179731Z cb70fe22c9eb: Verifying Checksum 2025-12-04T08:53:44.9180062Z cb70fe22c9eb: Download complete 2025-12-04T08:53:45.5680032Z 17858e829c8c: Download complete 2025-12-04T08:53:46.2416404Z a63f3b4eed11: Download complete 2025-12-04T08:53:46.8530199Z 10ab3d1afbc4: Verifying Checksum 2025-12-04T08:53:46.8530411Z 10ab3d1afbc4: Download complete 2025-12-04T08:53:48.6080152Z bf3aa2277692: Pull complete 2025-12-04T08:53:48.6297258Z 9d58e5257cef: Pull complete 2025-12-04T08:53:48.6367045Z fde80a645535: Pull complete 2025-12-04T08:53:48.6426443Z 6931c5f20e80: Pull complete 2025-12-04T08:53:49.8627263Z 170ea6d3edd6: Pull complete 2025-12-04T08:53:49.8667924Z dc8487f6c81c: Pull complete 2025-12-04T08:53:49.8717614Z 9748c5348f39: Pull complete 2025-12-04T08:53:49.8795592Z 8539cc3f8d8a: Pull complete 2025-12-04T08:56:47.8701080Z 47681e3e6f37: Verifying Checksum 2025-12-04T08:56:47.8701400Z 47681e3e6f37: Download complete 2025-12-04T08:56:48.5320603Z 025c90839a58: Download complete 2025-12-04T08:56:49.1504640Z 9255df5942ae: Download complete 2025-12-04T08:56:49.7551737Z f71ca9d4ed1c: Download complete 2025-12-04T08:57:16.8078349Z d02b47b56ca7: Verifying Checksum 2025-12-04T08:57:16.8081815Z d02b47b56ca7: Download complete 2025-12-04T08:57:17.3861588Z 40279492aea7: Download complete 2025-12-04T08:57:17.9555976Z 33a27ce74abd: Verifying Checksum 2025-12-04T08:57:17.9556286Z 33a27ce74abd: Download complete 2025-12-04T08:57:18.5290983Z 6b66ed335d1d: Download complete 2025-12-04T08:57:19.1218967Z 9f010fa04118: Download complete 2025-12-04T08:57:20.2919863Z 6c64d5e8bb6a: Verifying Checksum 2025-12-04T08:57:20.2920995Z 6c64d5e8bb6a: Download complete 2025-12-04T08:57:20.8932217Z c20ea058f549: Verifying Checksum 2025-12-04T08:57:20.8932713Z c20ea058f549: Download complete 2025-12-04T08:57:21.4673195Z 3c4fd2d54638: Verifying Checksum 2025-12-04T08:57:21.4673414Z 3c4fd2d54638: Download complete 2025-12-04T08:57:23.2000293Z 964ebac3d7a9: Verifying Checksum 2025-12-04T08:57:23.8000246Z 2aaa7210673f: Download complete 2025-12-04T08:57:24.3704440Z fa273daa0037: Verifying Checksum 2025-12-04T08:57:24.3704882Z fa273daa0037: Download complete 2025-12-04T08:57:25.8344428Z d931a62fd240: Verifying Checksum 2025-12-04T08:57:25.8344792Z d931a62fd240: Download complete 2025-12-04T08:57:26.4288746Z d3573d61c28e: Verifying Checksum 2025-12-04T08:57:26.4289087Z d3573d61c28e: Download complete 2025-12-04T08:57:27.0097935Z f9b32f08c490: Download complete 2025-12-04T08:57:27.5826179Z 3a0206399d60: Download complete 2025-12-04T08:57:28.1592124Z 386f322edd1c: Download complete 2025-12-04T08:57:28.4603400Z 4f4fb700ef54: Verifying Checksum 2025-12-04T08:57:28.4603747Z 4f4fb700ef54: Download complete 2025-12-04T08:57:29.0505372Z bbe49df30697: Verifying Checksum 2025-12-04T08:57:29.0505845Z bbe49df30697: Download complete 2025-12-04T08:57:29.6167128Z d6630aa6f375: Verifying Checksum 2025-12-04T08:57:29.6167558Z d6630aa6f375: Download complete 2025-12-04T08:57:30.3377643Z 6d807afc1309: Verifying Checksum 2025-12-04T08:57:30.3377878Z 6d807afc1309: Download complete 2025-12-04T08:57:30.9185612Z 60b679430e4e: Download complete 2025-12-04T08:57:31.5322962Z 3992ae84f9ed: Verifying Checksum 2025-12-04T08:57:31.5323165Z 3992ae84f9ed: Download complete 2025-12-04T08:57:32.1202197Z 62d400609f9c: Verifying Checksum 2025-12-04T08:57:32.1202628Z 62d400609f9c: Download complete 2025-12-04T08:57:32.7045672Z 7e7b09749096: Verifying Checksum 2025-12-04T08:57:32.7045905Z 7e7b09749096: Download complete 2025-12-04T09:04:41.8516417Z af88f886884f: Verifying Checksum 2025-12-04T09:04:41.8518981Z af88f886884f: Download complete 2025-12-04T09:04:42.4633783Z cbb12613719b: Verifying Checksum 2025-12-04T09:04:42.4634165Z cbb12613719b: Download complete 2025-12-04T09:04:43.0450929Z e87038dce9bc: Download complete 2025-12-04T09:04:47.6396886Z e4606b636f96: Verifying Checksum 2025-12-04T09:04:47.6398280Z e4606b636f96: Download complete 2025-12-04T09:04:48.2447197Z 6f2a5d33b946: Verifying Checksum 2025-12-04T09:04:48.2447654Z 6f2a5d33b946: Download complete 2025-12-04T09:04:48.8335998Z a4f2bf2f19e6: Verifying Checksum 2025-12-04T09:04:48.8336434Z a4f2bf2f19e6: Download complete 2025-12-04T09:04:51.0596882Z 1ae00acdac56: Verifying Checksum 2025-12-04T09:04:51.0597143Z 1ae00acdac56: Download complete 2025-12-04T09:05:04.2177754Z af88f886884f: Pull complete 2025-12-04T09:05:04.2219233Z 32fbb88555c4: Pull complete 2025-12-04T09:05:04.8455726Z 3231e1ab814b: Pull complete 2025-12-04T09:05:04.8490664Z 80061bf5dcbb: Pull complete 2025-12-04T09:05:04.8541486Z 6e9524f4518e: Pull complete 2025-12-04T09:05:04.8609826Z ce919d4bf5ee: Pull complete 2025-12-04T09:05:08.2291704Z 47681e3e6f37: Pull complete 2025-12-04T09:05:08.2327544Z cb70fe22c9eb: Pull complete 2025-12-04T09:05:08.2389843Z 17858e829c8c: Pull complete 2025-12-04T09:05:08.2508451Z a63f3b4eed11: Pull complete 2025-12-04T09:05:08.2545893Z 10ab3d1afbc4: Pull complete 2025-12-04T09:25:37.1359631Z 7dcdbd8421cb: Verifying Checksum 2025-12-04T09:25:37.1359858Z 7dcdbd8421cb: Download complete 2025-12-04T09:29:35.9215630Z 98ca88b5095b: Verifying Checksum 2025-12-04T09:29:35.9215920Z 98ca88b5095b: Download complete 2025-12-04T09:30:20.9096908Z 98ca88b5095b: Pull complete 2025-12-04T09:30:20.9145421Z 025c90839a58: Pull complete 2025-12-04T09:30:20.9198763Z 9255df5942ae: Pull complete 2025-12-04T09:30:20.9236702Z f71ca9d4ed1c: Pull complete 2025-12-04T09:30:25.5885772Z d02b47b56ca7: Pull complete 2025-12-04T09:30:25.5931760Z 40279492aea7: Pull complete 2025-12-04T09:30:25.5979779Z 33a27ce74abd: Pull complete 2025-12-04T09:30:25.6027185Z 6b66ed335d1d: Pull complete 2025-12-04T09:30:25.6066661Z 9f010fa04118: Pull complete 2025-12-04T09:30:25.6383470Z 6c64d5e8bb6a: Pull complete 2025-12-04T09:30:25.6414054Z c20ea058f549: Pull complete 2025-12-04T09:30:25.6450672Z 3c4fd2d54638: Pull complete 2025-12-04T09:30:25.8666572Z 964ebac3d7a9: Pull complete 2025-12-04T09:30:25.8705314Z 2aaa7210673f: Pull complete 2025-12-04T09:30:25.8760594Z fa273daa0037: Pull complete 2025-12-04T09:30:25.9721728Z d931a62fd240: Pull complete 2025-12-04T09:30:25.9760236Z d3573d61c28e: Pull complete 2025-12-04T09:30:25.9859575Z f9b32f08c490: Pull complete 2025-12-04T09:30:25.9894342Z 3a0206399d60: Pull complete 2025-12-04T09:30:25.9930844Z 386f322edd1c: Pull complete 2025-12-04T09:30:25.9969657Z 4f4fb700ef54: Pull complete 2025-12-04T09:30:26.0020081Z bbe49df30697: Pull complete 2025-12-04T09:30:26.0063723Z d6630aa6f375: Pull complete 2025-12-04T09:30:26.0114067Z 6d807afc1309: Pull complete 2025-12-04T09:30:26.0154333Z 60b679430e4e: Pull complete 2025-12-04T09:30:26.0188369Z 3992ae84f9ed: Pull complete 2025-12-04T09:30:26.0274996Z 62d400609f9c: Pull complete 2025-12-04T09:30:26.0310589Z 7e7b09749096: Pull complete 2025-12-04T09:31:05.1269125Z 7dcdbd8421cb: Pull complete 2025-12-04T09:31:05.1312122Z cbb12613719b: Pull complete 2025-12-04T09:31:05.1358712Z e87038dce9bc: Pull complete 2025-12-04T09:31:07.5838229Z e4606b636f96: Pull complete 2025-12-04T09:31:07.5878877Z 6f2a5d33b946: Pull complete 2025-12-04T09:31:07.5962451Z a4f2bf2f19e6: Pull complete 2025-12-04T09:31:08.2234142Z 1ae00acdac56: Pull complete 2025-12-04T09:31:08.2251001Z Digest: sha256:f0728d30af94602d09207f794eb469a578a6cd97e72880fb3f401801d2f4acc6 2025-12-04T09:31:08.2254619Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:31:08.2258954Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:31:08.2323673Z Prepare all required actions 2025-12-04T09:31:08.2345760Z ##[group]Run ./.github/actions/get-workflow-job-id 2025-12-04T09:31:08.2346022Z with: 2025-12-04T09:31:08.2346520Z github-token: *** 2025-12-04T09:31:08.2346680Z env: 2025-12-04T09:31:08.2346836Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:31:08.2347369Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:31:08.2347672Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:31:08.2347945Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:31:08.2348590Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:31:08.2349226Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:31:08.2349422Z AWS_REGION: us-east-1 2025-12-04T09:31:08.2349626Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:31:08.2349920Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:31:08.2353411Z AWS_SESSION_TOKEN: *** 2025-12-04T09:31:08.2353535Z ##[endgroup] 2025-12-04T09:31:08.2363823Z ##[group]Run set -eux 2025-12-04T09:31:08.2363943Z set -eux 2025-12-04T09:31:08.2364115Z python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}" 2025-12-04T09:31:08.2368388Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:31:08.2368537Z env: 2025-12-04T09:31:08.2368632Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:31:08.2368769Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:31:08.2368948Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:31:08.2369114Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:31:08.2369499Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:31:08.2369871Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:31:08.2369988Z AWS_REGION: us-east-1 2025-12-04T09:31:08.2370175Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:31:08.2370328Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:31:08.2372622Z AWS_SESSION_TOKEN: *** 2025-12-04T09:31:08.2372778Z GITHUB_TOKEN: *** 2025-12-04T09:31:08.2372875Z ##[endgroup] 2025-12-04T09:31:08.2392195Z + python3 .github/scripts/get_workflow_job_id.py 19922812470 linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr 2025-12-04T09:31:09.3130324Z Setting output job-id=57116139284 2025-12-04T09:31:09.3130751Z Setting output job-name=linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:31:09.3253146Z Prepare all required actions 2025-12-04T09:31:09.3253393Z Getting action download info 2025-12-04T09:31:09.7091766Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6) 2025-12-04T09:31:10.9075859Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093) 2025-12-04T09:31:11.9495508Z ##[group]Run ./.github/actions/download-build-artifacts 2025-12-04T09:31:11.9495678Z with: 2025-12-04T09:31:11.9495793Z name: linux-noble-rocm-py3.12-mi300 2025-12-04T09:31:11.9495931Z s3-bucket: gha-artifacts 2025-12-04T09:31:11.9496055Z env: 2025-12-04T09:31:11.9496154Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:31:11.9496296Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:31:11.9496474Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:31:11.9496645Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:31:11.9497063Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:31:11.9497437Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:31:11.9497556Z AWS_REGION: us-east-1 2025-12-04T09:31:11.9497736Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:31:11.9497894Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:31:11.9500230Z AWS_SESSION_TOKEN: *** 2025-12-04T09:31:11.9500336Z ##[endgroup] 2025-12-04T09:31:11.9522177Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:31:11.9522485Z with: 2025-12-04T09:31:11.9522600Z name: linux-noble-rocm-py3.12-mi300 2025-12-04T09:31:11.9522742Z s3-bucket: gha-artifacts 2025-12-04T09:31:11.9522858Z region: us-east-1 2025-12-04T09:31:11.9522957Z env: 2025-12-04T09:31:11.9523062Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:31:11.9523199Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:31:11.9523447Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:31:11.9523616Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:31:11.9524010Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:31:11.9524393Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:31:11.9524514Z AWS_REGION: us-east-1 2025-12-04T09:31:11.9524703Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:31:11.9524862Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:31:11.9527171Z AWS_SESSION_TOKEN: *** 2025-12-04T09:31:11.9527280Z ##[endgroup] 2025-12-04T09:31:12.1770112Z (node:17195) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:31:12.1770495Z 2025-12-04T09:31:12.1770666Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:31:12.1771115Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:31:12.1771537Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:31:12.4538757Z Found 1 objects with prefix pytorch/pytorch/19922812470/linux-noble-rocm-py3.12-mi300/ 2025-12-04T09:31:12.4539166Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:32:52.3445280Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/artifacts.zip 2025-12-04T09:32:52.3450947Z Artifact download has finished successfully 2025-12-04T09:32:52.3689444Z ##[group]Run unzip -o artifacts.zip 2025-12-04T09:32:52.3689628Z unzip -o artifacts.zip 2025-12-04T09:32:52.3693876Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:52.3694034Z env: 2025-12-04T09:32:52.3694137Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:52.3694281Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:32:52.3694639Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:32:52.3694818Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:32:52.3695206Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:32:52.3695575Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:32:52.3695697Z AWS_REGION: us-east-1 2025-12-04T09:32:52.3695870Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:32:52.3696026Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:32:52.3698324Z AWS_SESSION_TOKEN: *** 2025-12-04T09:32:52.3698429Z ##[endgroup] 2025-12-04T09:32:52.3746445Z Archive: artifacts.zip 2025-12-04T09:32:52.3747416Z creating: dist/ 2025-12-04T09:32:52.3829843Z inflating: dist/.ninja_log 2025-12-04T09:32:55.3234687Z inflating: dist/torch-2.10.0a0+gitffd9b0f-cp312-cp312-linux_x86_64.whl 2025-12-04T09:32:55.3238507Z creating: build/ 2025-12-04T09:32:55.3239042Z creating: build/custom_test_artifacts/ 2025-12-04T09:32:55.3239428Z creating: build/custom_test_artifacts/custom-op-build/ 2025-12-04T09:32:55.3239696Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2025-12-04T09:32:55.3240005Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:32:55.3240343Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:32:55.3240670Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/ 2025-12-04T09:32:55.3241683Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:32:55.3242024Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:32:55.3242351Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:32:55.3242754Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:32:55.3243174Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:32:55.3243582Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:32:55.3243933Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:32:55.3244294Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:32:55.3244692Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:32:55.3245099Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:32:55.3245471Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:32:55.3245885Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:32:55.3246310Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:32:55.3246678Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:32:55.3246991Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:32:55.3247293Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2025-12-04T09:32:55.3247619Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2025-12-04T09:32:55.3247984Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts 2025-12-04T09:32:55.3248377Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make 2025-12-04T09:32:55.3248892Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2025-12-04T09:32:55.3249244Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2025-12-04T09:32:55.3249592Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2025-12-04T09:32:55.3249884Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2025-12-04T09:32:55.3250179Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2025-12-04T09:32:55.3250469Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2025-12-04T09:32:55.3250760Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2025-12-04T09:32:55.3259000Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d 2025-12-04T09:32:55.3365891Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2025-12-04T09:32:55.3366266Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.d 2025-12-04T09:32:55.3366807Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2025-12-04T09:32:55.3367123Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts 2025-12-04T09:32:55.3367459Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make 2025-12-04T09:32:55.3367774Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2025-12-04T09:32:55.3368143Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2025-12-04T09:32:55.3368446Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2025-12-04T09:32:55.3368749Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2025-12-04T09:32:55.3369061Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2025-12-04T09:32:55.3369359Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2025-12-04T09:32:55.3369650Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2025-12-04T09:32:55.3380512Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d 2025-12-04T09:32:55.3424243Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2025-12-04T09:32:55.3424626Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.d 2025-12-04T09:32:55.3424941Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:32:55.3425256Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:32:55.3425530Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2025-12-04T09:32:55.3425787Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2025-12-04T09:32:55.3426330Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2025-12-04T09:32:55.3426590Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_outer_vec.cc 2025-12-04T09:32:55.3426849Z inflating: build/custom_test_artifacts/custom-op-build/hipblaslt_test_vec_ext.cc 2025-12-04T09:32:55.3427627Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2025-12-04T09:32:55.3427949Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2025-12-04T09:32:55.3428499Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2025-12-04T09:32:55.3520811Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2025-12-04T09:32:55.3550020Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2025-12-04T09:32:55.3550383Z creating: build/custom_test_artifacts/jit-hook-build/ 2025-12-04T09:32:55.3550682Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2025-12-04T09:32:55.3551023Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:32:55.3552670Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:32:55.3553095Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/ 2025-12-04T09:32:55.3553565Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:32:55.3553967Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:32:55.3554366Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:32:55.3554835Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:32:55.3555311Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:32:55.3555735Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:32:55.3556159Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:32:55.3556560Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:32:55.3557207Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:32:55.3557697Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:32:55.3558147Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:32:55.3559203Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:32:55.3559840Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:32:55.3560232Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:32:55.3560536Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:32:55.3560850Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2025-12-04T09:32:55.3561217Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2025-12-04T09:32:55.3561599Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts 2025-12-04T09:32:55.3562016Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make 2025-12-04T09:32:55.3562419Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2025-12-04T09:32:55.3562788Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2025-12-04T09:32:55.3563192Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2025-12-04T09:32:55.3563622Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2025-12-04T09:32:55.3564016Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2025-12-04T09:32:55.3564402Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2025-12-04T09:32:55.3564790Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2025-12-04T09:32:55.3574035Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d 2025-12-04T09:32:55.3610198Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2025-12-04T09:32:55.3610549Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.d 2025-12-04T09:32:55.3610896Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:32:55.3611224Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:32:55.3611508Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2025-12-04T09:32:55.3611778Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2025-12-04T09:32:55.3612237Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2025-12-04T09:32:55.3612515Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_outer_vec.cc 2025-12-04T09:32:55.3613122Z inflating: build/custom_test_artifacts/jit-hook-build/hipblaslt_test_vec_ext.cc 2025-12-04T09:32:55.3613656Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2025-12-04T09:32:55.3614004Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2025-12-04T09:32:55.3614369Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2025-12-04T09:32:55.3635036Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2025-12-04T09:32:55.3635257Z creating: build/custom_test_artifacts/custom-backend-build/ 2025-12-04T09:32:55.3635477Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2025-12-04T09:32:55.3635786Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/ 2025-12-04T09:32:55.3637634Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml 2025-12-04T09:32:55.3637920Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/ 2025-12-04T09:32:55.3638205Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake 2025-12-04T09:32:55.3638492Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/ 2025-12-04T09:32:55.3638768Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/ 2025-12-04T09:32:55.3639551Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c 2025-12-04T09:32:55.3640345Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out 2025-12-04T09:32:55.3640661Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake 2025-12-04T09:32:55.3640953Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/ 2025-12-04T09:32:55.3641247Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/ 2025-12-04T09:32:55.3642235Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp 2025-12-04T09:32:55.3642863Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out 2025-12-04T09:32:55.3643197Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake 2025-12-04T09:32:55.3644290Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin 2025-12-04T09:32:55.3645012Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin 2025-12-04T09:32:55.3645333Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/ 2025-12-04T09:32:55.3645583Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2025-12-04T09:32:55.3645840Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2025-12-04T09:32:55.3646190Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2025-12-04T09:32:55.3646493Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts 2025-12-04T09:32:55.3646831Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make 2025-12-04T09:32:55.3647162Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2025-12-04T09:32:55.3647478Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2025-12-04T09:32:55.3647795Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2025-12-04T09:32:55.3648113Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2025-12-04T09:32:55.3648434Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2025-12-04T09:32:55.3648749Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2025-12-04T09:32:55.3649061Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2025-12-04T09:32:55.3649839Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d 2025-12-04T09:32:55.3713889Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2025-12-04T09:32:55.3714274Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.d 2025-12-04T09:32:55.3714580Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2025-12-04T09:32:55.3714915Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts 2025-12-04T09:32:55.3715271Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make 2025-12-04T09:32:55.3715645Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2025-12-04T09:32:55.3716009Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2025-12-04T09:32:55.3716361Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2025-12-04T09:32:55.3716695Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2025-12-04T09:32:55.3717085Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2025-12-04T09:32:55.3717412Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2025-12-04T09:32:55.3717737Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2025-12-04T09:32:55.3728055Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d 2025-12-04T09:32:55.3757638Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2025-12-04T09:32:55.3757992Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.d 2025-12-04T09:32:55.3758317Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2025-12-04T09:32:55.3758627Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2025-12-04T09:32:55.3758909Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2025-12-04T09:32:55.3759213Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2025-12-04T09:32:55.3759565Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2025-12-04T09:32:55.3759834Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_outer_vec.cc 2025-12-04T09:32:55.3760096Z inflating: build/custom_test_artifacts/custom-backend-build/hipblaslt_test_vec_ext.cc 2025-12-04T09:32:55.3760916Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2025-12-04T09:32:55.3761234Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2025-12-04T09:32:55.3761475Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2025-12-04T09:32:55.3815765Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2025-12-04T09:32:55.3836621Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2025-12-04T09:32:55.3836810Z creating: build/lib/ 2025-12-04T09:32:55.3881949Z inflating: build/lib/libprotobuf-lite.a 2025-12-04T09:32:55.4126309Z inflating: build/lib/libprotobuf.a 2025-12-04T09:32:55.4400286Z inflating: build/lib/libprotoc.a 2025-12-04T09:32:55.4405843Z inflating: build/lib/libpthreadpool.a 2025-12-04T09:32:55.4410192Z inflating: build/lib/libcpuinfo.a 2025-12-04T09:32:55.4414383Z inflating: build/lib/libcpuinfo_internals.a 2025-12-04T09:32:55.4414856Z inflating: build/lib/libclog.a 2025-12-04T09:32:55.4425284Z inflating: build/lib/libpytorch_qnnpack.a 2025-12-04T09:32:55.4426132Z inflating: build/lib/libnnpack_reference_layers.a 2025-12-04T09:32:55.4527711Z inflating: build/lib/libmicrokernels-prod.a 2025-12-04T09:32:55.4537544Z inflating: build/lib/libnnpack.a 2025-12-04T09:32:55.5007326Z inflating: build/lib/libmicrokernels-all.a 2025-12-04T09:32:55.5045268Z inflating: build/lib/libgtest.a 2025-12-04T09:32:55.5054550Z inflating: build/lib/libgmock.a 2025-12-04T09:32:55.5054720Z inflating: build/lib/libgtest_main.a 2025-12-04T09:32:55.5054867Z inflating: build/lib/libgmock_main.a 2025-12-04T09:32:55.5104675Z inflating: build/lib/libXNNPACK.a 2025-12-04T09:32:55.5146424Z inflating: build/lib/libbenchmark.a 2025-12-04T09:32:55.5146603Z inflating: build/lib/libbenchmark_main.a 2025-12-04T09:32:55.5182882Z inflating: build/lib/libasmjit.a 2025-12-04T09:32:55.5183056Z inflating: build/lib/libjitprofiling.a 2025-12-04T09:32:55.5812029Z inflating: build/lib/libfbgemm.a 2025-12-04T09:32:55.5816265Z inflating: build/lib/libittnotify.a 2025-12-04T09:32:55.5832960Z inflating: build/lib/libtensorpipe_uv.a 2025-12-04T09:32:55.6128858Z inflating: build/lib/libtensorpipe.a 2025-12-04T09:32:55.6194697Z inflating: build/lib/libgloo.a 2025-12-04T09:32:55.6220466Z inflating: build/lib/libonnx_proto.a 2025-12-04T09:32:55.6440449Z inflating: build/lib/libgloo_hip.a 2025-12-04T09:32:55.6834347Z inflating: build/lib/libonnx.a 2025-12-04T09:32:55.6844926Z inflating: build/lib/libfmt.a 2025-12-04T09:32:56.2487358Z inflating: build/lib/libdnnl.a 2025-12-04T09:32:56.2657697Z inflating: build/lib/libkineto.a 2025-12-04T09:32:56.2722720Z inflating: build/lib/libc10.so 2025-12-04T09:32:56.2723423Z inflating: build/lib/libcaffe2_nvrtc.so 2025-12-04T09:32:56.2724265Z inflating: build/lib/libtorch_global_deps.so 2025-12-04T09:32:56.2749826Z inflating: build/lib/libc10_hip.so 2025-12-04T09:32:56.3018340Z inflating: build/lib/libfbgemm_genai.a 2025-12-04T09:32:58.0103703Z inflating: build/lib/libtorch_cpu.so 2025-12-04T09:32:58.0106083Z inflating: build/lib/libshm.so 2025-12-04T09:32:58.8437599Z inflating: build/lib/libtorch_hip.so 2025-12-04T09:32:58.8438211Z inflating: build/lib/libtorch.so 2025-12-04T09:32:58.8449196Z inflating: build/lib/libjitbackend_test.so 2025-12-04T09:32:58.8490168Z inflating: build/lib/libtorchbind_test.so 2025-12-04T09:32:58.8507682Z inflating: build/lib/libbackend_with_compiler.so 2025-12-04T09:32:58.8521632Z inflating: build/lib/libaoti_custom_ops.so 2025-12-04T09:32:58.9901295Z inflating: build/lib/libtorch_python.so 2025-12-04T09:32:58.9922709Z inflating: build/lib/libnnapi_backend.so 2025-12-04T09:32:58.9924045Z creating: build/bin/ 2025-12-04T09:32:58.9924361Z creating: build/bin/CMakeFiles/ 2025-12-04T09:32:58.9924652Z inflating: build/bin/cmake_install.cmake 2025-12-04T09:32:58.9924920Z inflating: build/bin/CTestTestfile.cmake 2025-12-04T09:32:59.0191587Z inflating: build/bin/protoc-3.13.0.0 2025-12-04T09:32:59.0444200Z inflating: build/bin/protoc 2025-12-04T09:32:59.0477184Z inflating: build/bin/c10_AllocatorConfig_test 2025-12-04T09:32:59.0507880Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2025-12-04T09:32:59.0539406Z inflating: build/bin/c10_DeviceGuard_test 2025-12-04T09:32:59.0571140Z inflating: build/bin/c10_Device_test 2025-12-04T09:32:59.0607175Z inflating: build/bin/c10_DispatchKeySet_test 2025-12-04T09:32:59.0640299Z inflating: build/bin/c10_Scalar_test 2025-12-04T09:32:59.0670323Z inflating: build/bin/c10_StreamGuard_test 2025-12-04T09:32:59.0704791Z inflating: build/bin/c10_SymInt_test 2025-12-04T09:32:59.0737812Z inflating: build/bin/c10_InlineDeviceGuard_test 2025-12-04T09:32:59.0771831Z inflating: build/bin/c10_SizesAndStrides_test 2025-12-04T09:32:59.0802009Z inflating: build/bin/c10_ConstexprCrc_test 2025-12-04T09:32:59.0844053Z inflating: build/bin/c10_cow_test 2025-12-04T09:32:59.0877994Z inflating: build/bin/c10_InlineStreamGuard_test 2025-12-04T09:32:59.0908267Z inflating: build/bin/c10_ArrayRef_test 2025-12-04T09:32:59.0940518Z inflating: build/bin/c10_Bitset_test 2025-12-04T09:32:59.0970829Z inflating: build/bin/c10_DeadlockDetection_test 2025-12-04T09:32:59.1005427Z inflating: build/bin/c10_Enumerate_test 2025-12-04T09:32:59.1036678Z inflating: build/bin/c10_Half_test 2025-12-04T09:32:59.1070644Z inflating: build/bin/c10_LeftRight_test 2025-12-04T09:32:59.1103207Z inflating: build/bin/c10_NetworkFlow_test 2025-12-04T09:32:59.1135536Z inflating: build/bin/c10_IntrusiveList_test 2025-12-04T09:32:59.1166164Z inflating: build/bin/c10_Synchronized_test 2025-12-04T09:32:59.1196508Z inflating: build/bin/c10_Semaphore_test 2025-12-04T09:32:59.1228232Z inflating: build/bin/c10_TypeIndex_test 2025-12-04T09:32:59.1261709Z inflating: build/bin/c10_ThreadLocal_test 2025-12-04T09:32:59.1293320Z inflating: build/bin/c10_accumulate_test 2025-12-04T09:32:59.1327130Z inflating: build/bin/c10_bfloat16_test 2025-12-04T09:32:59.1357786Z inflating: build/bin/c10_bit_cast_test 2025-12-04T09:32:59.1392499Z inflating: build/bin/c10_complex_math_test 2025-12-04T09:32:59.1424495Z inflating: build/bin/c10_exception_test 2025-12-04T09:32:59.1454743Z inflating: build/bin/c10_error_test 2025-12-04T09:32:59.1488295Z inflating: build/bin/c10_complex_test 2025-12-04T09:32:59.1519468Z inflating: build/bin/c10_flags_test 2025-12-04T09:32:59.1550441Z inflating: build/bin/c10_generic_math_test 2025-12-04T09:32:59.1585063Z inflating: build/bin/c10_logging_test 2025-12-04T09:32:59.1615580Z inflating: build/bin/c10_nofatal_test 2025-12-04T09:32:59.1646796Z inflating: build/bin/c10_irange_test 2025-12-04T09:32:59.1736434Z inflating: build/bin/c10_intrusive_ptr_test 2025-12-04T09:32:59.1769243Z inflating: build/bin/c10_lazy_test 2025-12-04T09:32:59.1813962Z inflating: build/bin/c10_optional_test 2025-12-04T09:32:59.1846423Z inflating: build/bin/c10_registry_test 2025-12-04T09:32:59.1880737Z inflating: build/bin/c10_string_util_test 2025-12-04T09:32:59.1912099Z inflating: build/bin/c10_ssize_test 2025-12-04T09:32:59.1999515Z inflating: build/bin/c10_small_vector_test 2025-12-04T09:32:59.2037037Z inflating: build/bin/c10_ordered_preserving_dict_test 2025-12-04T09:32:59.2067078Z inflating: build/bin/c10_string_view_test 2025-12-04T09:32:59.2097868Z inflating: build/bin/c10_tempfile_test 2025-12-04T09:32:59.2124590Z inflating: build/bin/c10_intrusive_ptr_benchmark 2025-12-04T09:32:59.2158453Z inflating: build/bin/c10_typeid_test 2025-12-04T09:32:59.2188563Z inflating: build/bin/c10_hip_HIPAssertionsTest_1_var_test 2025-12-04T09:32:59.2218705Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_stream 2025-12-04T09:32:59.2248782Z inflating: build/bin/c10_hip_HIPAssertionsTest_catches_thread_and_block_and_device 2025-12-04T09:32:59.2278822Z inflating: build/bin/c10_hip_HIPAssertionsTest_from_2_processes 2025-12-04T09:32:59.2308786Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_blocks_and_threads 2025-12-04T09:32:59.2338816Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_multiple_blocks 2025-12-04T09:32:59.2368840Z inflating: build/bin/c10_hip_HIPAssertionsTest_multiple_writes_from_same_block 2025-12-04T09:32:59.2398933Z inflating: build/bin/c10_hip_HIPTest 2025-12-04T09:32:59.2726155Z inflating: build/bin/vec_test_all_types_DEFAULT 2025-12-04T09:32:59.3067713Z inflating: build/bin/vec_test_all_types_AVX512 2025-12-04T09:32:59.3429306Z inflating: build/bin/vec_test_all_types_AVX2 2025-12-04T09:32:59.3490552Z inflating: build/bin/test_aoti_abi_check 2025-12-04T09:32:59.3523556Z inflating: build/bin/test_vec_half_DEFAULT 2025-12-04T09:32:59.3556685Z inflating: build/bin/test_vec_half_AVX512 2025-12-04T09:32:59.3589729Z inflating: build/bin/test_vec_half_AVX2 2025-12-04T09:32:59.3624423Z inflating: build/bin/BackoffTest 2025-12-04T09:32:59.3659688Z inflating: build/bin/FileStoreTest 2025-12-04T09:32:59.3697206Z inflating: build/bin/TCPStoreTest 2025-12-04T09:32:59.3732733Z inflating: build/bin/HashStoreTest 2025-12-04T09:32:59.3776687Z inflating: build/bin/ProcessGroupGlooTest 2025-12-04T09:32:59.3778185Z inflating: build/bin/example_allreduce 2025-12-04T09:32:59.3780456Z inflating: build/bin/torch_shm_manager 2025-12-04T09:32:59.3816540Z inflating: build/bin/static_runtime_bench 2025-12-04T09:32:59.3972283Z inflating: build/bin/static_runtime_test 2025-12-04T09:32:59.4019207Z inflating: build/bin/Dict_test 2025-12-04T09:32:59.4051493Z inflating: build/bin/Dimname_test 2025-12-04T09:32:59.4090955Z inflating: build/bin/MaybeOwned_test 2025-12-04T09:32:59.4125927Z inflating: build/bin/NamedTensor_test 2025-12-04T09:32:59.4162014Z inflating: build/bin/apply_utils_test 2025-12-04T09:32:59.4197949Z inflating: build/bin/atest 2025-12-04T09:32:59.4236939Z inflating: build/bin/basic 2025-12-04T09:32:59.4270404Z inflating: build/bin/broadcast_test 2025-12-04T09:32:59.4301968Z inflating: build/bin/cpu_allocator_test 2025-12-04T09:32:59.4337440Z inflating: build/bin/cpu_generator_test 2025-12-04T09:32:59.4369673Z inflating: build/bin/cpu_profiling_allocator_test 2025-12-04T09:32:59.4424924Z inflating: build/bin/cpu_rng_test 2025-12-04T09:32:59.4456880Z inflating: build/bin/dlconvertor_test 2025-12-04T09:32:59.4492362Z inflating: build/bin/extension_backend_test 2025-12-04T09:32:59.4526637Z inflating: build/bin/half_test 2025-12-04T09:32:59.4585394Z inflating: build/bin/ivalue_test 2025-12-04T09:32:59.4615578Z inflating: build/bin/lazy_tensor_test 2025-12-04T09:32:59.4648132Z inflating: build/bin/math_kernel_test 2025-12-04T09:32:59.4680686Z inflating: build/bin/memory_format_test 2025-12-04T09:32:59.4713620Z inflating: build/bin/memory_overlapping_test 2025-12-04T09:32:59.4746367Z inflating: build/bin/mobile_memory_cleanup 2025-12-04T09:32:59.4780498Z inflating: build/bin/native_test 2025-12-04T09:32:59.4812047Z inflating: build/bin/operator_name_test 2025-12-04T09:32:59.4843746Z inflating: build/bin/operators_test 2025-12-04T09:32:59.4875660Z inflating: build/bin/packedtensoraccessor_test 2025-12-04T09:32:59.4916554Z inflating: build/bin/pow_test 2025-12-04T09:32:59.4951188Z inflating: build/bin/quantized_test 2025-12-04T09:32:59.4982489Z inflating: build/bin/reduce_ops_test 2025-12-04T09:32:59.5013570Z inflating: build/bin/reportMemoryUsage_test 2025-12-04T09:32:59.5047733Z inflating: build/bin/scalar_tensor_test 2025-12-04T09:32:59.5079580Z inflating: build/bin/stride_properties_test 2025-12-04T09:32:59.5114656Z inflating: build/bin/scalar_test 2025-12-04T09:32:59.5146562Z inflating: build/bin/StorageUtils_test 2025-12-04T09:32:59.5180155Z inflating: build/bin/type_ptr_test 2025-12-04T09:32:59.5211217Z inflating: build/bin/thread_init_test 2025-12-04T09:32:59.5259323Z inflating: build/bin/tensor_iterator_test 2025-12-04T09:32:59.5292582Z inflating: build/bin/test_parallel 2025-12-04T09:32:59.5331448Z inflating: build/bin/type_test 2025-12-04T09:32:59.5363667Z inflating: build/bin/undefined_tensor_test 2025-12-04T09:32:59.5394225Z inflating: build/bin/verify_api_visibility 2025-12-04T09:32:59.5425720Z inflating: build/bin/weakref_test 2025-12-04T09:32:59.5468592Z inflating: build/bin/legacy_vmap_test 2025-12-04T09:32:59.5500489Z inflating: build/bin/wrapdim_test 2025-12-04T09:32:59.5532337Z inflating: build/bin/xla_tensor_test 2025-12-04T09:32:59.5568583Z inflating: build/bin/IListRef_test 2025-12-04T09:32:59.5630381Z inflating: build/bin/List_test 2025-12-04T09:32:59.5700463Z inflating: build/bin/kernel_function_legacy_test 2025-12-04T09:32:59.5740604Z inflating: build/bin/KernelFunction_test 2025-12-04T09:32:59.5797477Z inflating: build/bin/kernel_function_test 2025-12-04T09:32:59.5857946Z inflating: build/bin/kernel_lambda_test 2025-12-04T09:32:59.5932152Z inflating: build/bin/kernel_lambda_legacy_test 2025-12-04T09:32:59.5969060Z inflating: build/bin/kernel_stackbased_test 2025-12-04T09:32:59.6000702Z inflating: build/bin/CppSignature_test 2025-12-04T09:32:59.6057227Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2025-12-04T09:32:59.6087446Z inflating: build/bin/op_allowlist_test 2025-12-04T09:32:59.6264885Z inflating: build/bin/op_registration_test 2025-12-04T09:32:59.6298825Z inflating: build/bin/backend_fallback_test 2025-12-04T09:32:59.6329299Z inflating: build/bin/hip_complex_math_test 2025-12-04T09:32:59.6370114Z inflating: build/bin/inline_container_test 2025-12-04T09:32:59.6402738Z inflating: build/bin/hip_complex_test 2025-12-04T09:32:59.6435452Z inflating: build/bin/hip_apply_test 2025-12-04T09:32:59.6466098Z inflating: build/bin/hip_distributions_test 2025-12-04T09:32:59.6496336Z inflating: build/bin/hip_generator_test 2025-12-04T09:32:59.6526492Z inflating: build/bin/hip_half_test 2025-12-04T09:32:59.6556774Z inflating: build/bin/hip_integer_divider_test 2025-12-04T09:32:59.6587002Z inflating: build/bin/hip_optional_test 2025-12-04T09:32:59.6617182Z inflating: build/bin/hip_packedtensoraccessor_test 2025-12-04T09:32:59.6649044Z inflating: build/bin/hip_dlconvertor_test 2025-12-04T09:32:59.6679396Z inflating: build/bin/hip_vectorized_test 2025-12-04T09:32:59.7302659Z inflating: build/bin/test_jit 2025-12-04T09:32:59.7502282Z inflating: build/bin/test_lazy 2025-12-04T09:32:59.7536103Z inflating: build/bin/test_dist_autograd 2025-12-04T09:32:59.7577668Z inflating: build/bin/test_cpp_rpc 2025-12-04T09:32:59.8240839Z inflating: build/bin/test_api 2025-12-04T09:32:59.8241681Z inflating: build/bin/parallel_benchmark 2025-12-04T09:32:59.8242076Z creating: .additional_ci_files/ 2025-12-04T09:32:59.8278598Z inflating: .additional_ci_files/test-times.json 2025-12-04T09:32:59.8412029Z inflating: .additional_ci_files/test-class-times.json 2025-12-04T09:32:59.8438205Z ##[group]Run rm artifacts.zip 2025-12-04T09:32:59.8438389Z rm artifacts.zip 2025-12-04T09:32:59.8442770Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:59.8442929Z env: 2025-12-04T09:32:59.8443029Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:59.8443314Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:32:59.8443504Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:32:59.8443682Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:32:59.8444086Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:32:59.8444490Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:32:59.8444615Z AWS_REGION: us-east-1 2025-12-04T09:32:59.8444790Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:32:59.8444969Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:32:59.8447346Z AWS_SESSION_TOKEN: *** 2025-12-04T09:32:59.8447457Z ##[endgroup] 2025-12-04T09:32:59.9444860Z ##[group]Run df -H 2025-12-04T09:32:59.9445006Z df -H 2025-12-04T09:32:59.9447788Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:32:59.9447985Z env: 2025-12-04T09:32:59.9448167Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:32:59.9448340Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:32:59.9448580Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:32:59.9448806Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:32:59.9449228Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:32:59.9449637Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:32:59.9449875Z AWS_REGION: us-east-1 2025-12-04T09:32:59.9450031Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:32:59.9450189Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:32:59.9452437Z AWS_SESSION_TOKEN: *** 2025-12-04T09:32:59.9452542Z ##[endgroup] 2025-12-04T09:32:59.9881177Z Filesystem Size Used Avail Use% Mounted on 2025-12-04T09:32:59.9881588Z overlay 16T 799G 15T 6% / 2025-12-04T09:32:59.9881974Z tmpfs 68M 0 68M 0% /dev 2025-12-04T09:32:59.9882292Z /dev/md0 16T 799G 15T 6% /run 2025-12-04T09:32:59.9882603Z shm 68M 4.1k 68M 1% /dev/shm 2025-12-04T09:32:59.9883123Z amdprj2-k8s_2 5.5T 120G 5.4T 3% /home/runner/pytorch-data 2025-12-04T09:32:59.9883784Z tmpfs 3.3T 13k 3.3T 1% /run/secrets/kubernetes.io/serviceaccount 2025-12-04T09:32:59.9884194Z tmpfs 1.7T 0 1.7T 0% /proc/acpi 2025-12-04T09:32:59.9884526Z tmpfs 1.7T 0 1.7T 0% /proc/scsi 2025-12-04T09:32:59.9884855Z tmpfs 1.7T 0 1.7T 0% /sys/firmware 2025-12-04T09:32:59.9885242Z tmpfs 1.7T 0 1.7T 0% /sys/devices/virtual/powercap 2025-12-04T09:32:59.9911412Z Prepare all required actions 2025-12-04T09:32:59.9911666Z Getting action download info 2025-12-04T09:33:00.3900909Z ##[group]Run ./.github/actions/download-td-artifacts 2025-12-04T09:33:00.3901183Z with: 2025-12-04T09:33:00.3901352Z env: 2025-12-04T09:33:00.3901517Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:00.3901754Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:00.3902060Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:00.3902350Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:00.3903046Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:00.3903756Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:00.3903980Z AWS_REGION: us-east-1 2025-12-04T09:33:00.3904307Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:00.3904618Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:00.3908365Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:00.3908502Z ##[endgroup] 2025-12-04T09:33:00.3926539Z ##[group]Run seemethere/download-artifact-s3@v4 2025-12-04T09:33:00.3926674Z with: 2025-12-04T09:33:00.3926767Z name: td_results 2025-12-04T09:33:00.3926871Z s3-bucket: gha-artifacts 2025-12-04T09:33:00.3926982Z region: us-east-1 2025-12-04T09:33:00.3927078Z env: 2025-12-04T09:33:00.3927170Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:00.3927311Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:00.3927499Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:00.3927671Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:00.3928059Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:00.3928436Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:00.3928554Z AWS_REGION: us-east-1 2025-12-04T09:33:00.3928684Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:00.3928893Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:00.3931183Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:00.3931287Z ##[endgroup] 2025-12-04T09:33:00.6232108Z (node:17234) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023. 2025-12-04T09:33:00.6232446Z 2025-12-04T09:33:00.6232596Z Please migrate your code to use AWS SDK for JavaScript (v3). 2025-12-04T09:33:00.6232951Z For more information, check the migration guide at https://a.co/7PzMCcy 2025-12-04T09:33:00.6233489Z (Use `node --trace-warnings ...` to show where the warning was created) 2025-12-04T09:33:00.8781549Z Found 1 objects with prefix pytorch/pytorch/19922812470/td_results/ 2025-12-04T09:33:00.8782376Z Starting download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:33:01.2200631Z Finished download (1/1): /home/runner/_work/pytorch/pytorch/td_results.json 2025-12-04T09:33:01.2204567Z Artifact download has finished successfully 2025-12-04T09:33:01.2491068Z ##[group]Run mkdir -p .additional_ci_files 2025-12-04T09:33:01.2491241Z mkdir -p .additional_ci_files 2025-12-04T09:33:01.2491413Z mv td_results.json .additional_ci_files/td_results.json || true 2025-12-04T09:33:01.2495812Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:01.2495965Z env: 2025-12-04T09:33:01.2496065Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:01.2496206Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:01.2496384Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:01.2496555Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:01.2497130Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:01.2497522Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:01.2497637Z AWS_REGION: us-east-1 2025-12-04T09:33:01.2497836Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:01.2497991Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:01.2500266Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:01.2500372Z ##[endgroup] 2025-12-04T09:33:01.2558954Z ##[group]Run .github/scripts/parse_ref.py 2025-12-04T09:33:01.2559135Z .github/scripts/parse_ref.py 2025-12-04T09:33:01.2565215Z shell: /usr/bin/bash -e {0} 2025-12-04T09:33:01.2565333Z env: 2025-12-04T09:33:01.2565460Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:01.2565603Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:01.2565785Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:01.2565961Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:01.2566369Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:01.2566744Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:01.2566865Z AWS_REGION: us-east-1 2025-12-04T09:33:01.2567060Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:01.2567243Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:01.2569516Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:01.2569626Z ##[endgroup] 2025-12-04T09:33:01.2658820Z Setting output branch=main 2025-12-04T09:33:01.2725208Z Prepare all required actions 2025-12-04T09:33:01.2725440Z Getting action download info 2025-12-04T09:33:01.4689686Z ##[group]Run ./.github/actions/filter-test-configs 2025-12-04T09:33:01.4689833Z with: 2025-12-04T09:33:01.4690046Z github-token: *** 2025-12-04T09:33:01.4691716Z test-matrix: {"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:33:01.4693824Z job-name: linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:01.4694034Z env: 2025-12-04T09:33:01.4694128Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:01.4694265Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:01.4694439Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:01.4694604Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:01.4694988Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:01.4695356Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:01.4695482Z AWS_REGION: us-east-1 2025-12-04T09:33:01.4695607Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:01.4695756Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:01.4698028Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:01.4698132Z ##[endgroup] 2025-12-04T09:33:01.4713621Z ##[group]Run nick-fields/retry@v3.0.0 2025-12-04T09:33:01.4713747Z with: 2025-12-04T09:33:01.4713838Z shell: bash 2025-12-04T09:33:01.4713937Z timeout_minutes: 10 2025-12-04T09:33:01.4714045Z max_attempts: 5 2025-12-04T09:33:01.4714146Z retry_wait_seconds: 30 2025-12-04T09:33:01.4714446Z command: set -eux # PyYAML 6.0 doesn't work with MacOS x86 anymore # This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2 python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:33:01.4714755Z polling_interval_seconds: 1 2025-12-04T09:33:01.4714873Z warning_on_retry: true 2025-12-04T09:33:01.4714984Z continue_on_error: false 2025-12-04T09:33:01.4715093Z env: 2025-12-04T09:33:01.4715189Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:01.4715327Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:01.4715510Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:01.4715679Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:01.4716199Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:01.4716573Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:01.4716693Z AWS_REGION: us-east-1 2025-12-04T09:33:01.4716829Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:01.4716986Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:01.4719266Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:01.4719423Z GITHUB_TOKEN: *** 2025-12-04T09:33:01.4719530Z ##[endgroup] 2025-12-04T09:33:01.5108076Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2 2025-12-04T09:33:01.6538968Z Defaulting to user installation because normal site-packages is not writeable 2025-12-04T09:33:01.7533460Z Collecting requests==2.27.1 2025-12-04T09:33:01.7899937Z Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB) 2025-12-04T09:33:01.8008129Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.1/63.1 KB 5.9 MB/s eta 0:00:00 2025-12-04T09:33:01.9013595Z Collecting pyyaml==6.0.2 2025-12-04T09:33:01.9083355Z Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB) 2025-12-04T09:33:01.9299469Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 36.2 MB/s eta 0:00:00 2025-12-04T09:33:02.0284098Z Collecting charset-normalizer~=2.0.0 2025-12-04T09:33:02.0335331Z Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB) 2025-12-04T09:33:02.0607286Z Collecting urllib3<1.27,>=1.21.1 2025-12-04T09:33:02.0659390Z Downloading urllib3-1.26.20-py2.py3-none-any.whl (144 kB) 2025-12-04T09:33:02.0677760Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 144.2/144.2 KB 165.3 MB/s eta 0:00:00 2025-12-04T09:33:02.1372958Z Collecting idna<4,>=2.5 2025-12-04T09:33:02.1425352Z Downloading idna-3.11-py3-none-any.whl (71 kB) 2025-12-04T09:33:02.1440930Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.0/71.0 KB 114.9 MB/s eta 0:00:00 2025-12-04T09:33:02.1613712Z Collecting certifi>=2017.4.17 2025-12-04T09:33:02.1664870Z Downloading certifi-2025.11.12-py3-none-any.whl (159 kB) 2025-12-04T09:33:02.1681687Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.4/159.4 KB 215.6 MB/s eta 0:00:00 2025-12-04T09:33:02.2215550Z Installing collected packages: urllib3, pyyaml, idna, charset-normalizer, certifi, requests 2025-12-04T09:33:02.3136244Z WARNING: The script normalizer is installed in '/home/runner/.local/bin' which is not on PATH. 2025-12-04T09:33:02.3137100Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2025-12-04T09:33:02.3307289Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 idna-3.11 pyyaml-6.0.2 requests-2.27.1 urllib3-1.26.20 2025-12-04T09:33:02.5101193Z Command completed after 1 attempt(s). 2025-12-04T09:33:02.5167439Z ##[group]Run set -x 2025-12-04T09:33:02.5167584Z set -x 2025-12-04T09:33:02.5167684Z  2025-12-04T09:33:02.5167899Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:33:02.5168123Z # in runner workspace 2025-12-04T09:33:02.5168327Z python3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py" 2025-12-04T09:33:02.5172504Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:02.5172658Z env: 2025-12-04T09:33:02.5172757Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:02.5172906Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:02.5173086Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:02.5173484Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:02.5173957Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:02.5174330Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:02.5174449Z AWS_REGION: us-east-1 2025-12-04T09:33:02.5174762Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:02.5174918Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:02.5177178Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:02.5177290Z ##[endgroup] 2025-12-04T09:33:02.5200201Z + python3 /home/runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py 2025-12-04T09:33:02.5289890Z Setting output branch=main 2025-12-04T09:33:02.5329183Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:33:02.5329405Z echo "Workflow: ${GITHUB_WORKFLOW}" 2025-12-04T09:33:02.5329560Z echo "Job name: ${JOB_NAME}" 2025-12-04T09:33:02.5329696Z  2025-12-04T09:33:02.5329887Z # Use relative path here as this could be checked out anywhere, not necessarily 2025-12-04T09:33:02.5330097Z # in runner workspace 2025-12-04T09:33:02.5330290Z python3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \ 2025-12-04T09:33:02.5330504Z  --workflow "${GITHUB_WORKFLOW}" \ 2025-12-04T09:33:02.5330677Z  --job-name "${JOB_NAME}" \ 2025-12-04T09:33:02.5332677Z  --test-matrix "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}]}" \ 2025-12-04T09:33:02.5334983Z  --selected-test-configs "" \ 2025-12-04T09:33:02.5335141Z  --pr-number "${PR_NUMBER}" \ 2025-12-04T09:33:02.5335292Z  --tag "${TAG}" \ 2025-12-04T09:33:02.5335433Z  --event-name "${EVENT_NAME}" \ 2025-12-04T09:33:02.5335585Z  --schedule "${SCHEDULE}" \ 2025-12-04T09:33:02.5335730Z  --branch "${HEAD_BRANCH}" 2025-12-04T09:33:02.5339938Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:02.5340086Z env: 2025-12-04T09:33:02.5340181Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:02.5340319Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:02.5340504Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:02.5340673Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:02.5341056Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:02.5341423Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:02.5341556Z AWS_REGION: us-east-1 2025-12-04T09:33:02.5341732Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:02.5341885Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:02.5344325Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:02.5344534Z GITHUB_TOKEN: *** 2025-12-04T09:33:02.5344726Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:02.5344930Z PR_NUMBER: 2025-12-04T09:33:02.5345024Z TAG: 2025-12-04T09:33:02.5345114Z EVENT_NAME: schedule 2025-12-04T09:33:02.5345219Z SCHEDULE: 29 8 * * * 2025-12-04T09:33:02.5345319Z HEAD_BRANCH: main 2025-12-04T09:33:02.5345416Z ##[endgroup] 2025-12-04T09:33:02.5367247Z Workflow: rocm-mi300 2025-12-04T09:33:02.5367477Z Job name: linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:03.2015361Z Setting output keep-going=True 2025-12-04T09:33:03.2015781Z Setting output ci-verbose-test-logs=False 2025-12-04T09:33:03.2016177Z Setting output ci-test-showlocals=False 2025-12-04T09:33:03.2016541Z Setting output ci-no-test-timeout=False 2025-12-04T09:33:03.2016888Z Setting output ci-no-td=False 2025-12-04T09:33:03.2017246Z Setting output ci-td-distributed=False 2025-12-04T09:33:03.2017676Z Setting output is-unstable=False 2025-12-04T09:33:03.2018024Z Setting output reenabled-issues= 2025-12-04T09:33:03.2024047Z Setting output test-matrix={"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}]} 2025-12-04T09:33:03.2029310Z Setting output is-test-matrix-empty=False 2025-12-04T09:33:03.2137651Z ##[group]Run echo "Filtered matrix:" 2025-12-04T09:33:03.2137841Z echo "Filtered matrix:" 2025-12-04T09:33:03.2141779Z echo "{"include": [{"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 1, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 2, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 3, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 4, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 5, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "mem_leak_check": "mem_leak_check", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests", "mem_leak_check": "mem_leak_check"}, {"config": "default", "shard": 6, "num_shards": 6, "runner": "linux.rocm.gpu.gfx942.1.b", "rerun_disabled_tests": "rerun_disabled_tests"}]}" 2025-12-04T09:33:03.2145851Z  2025-12-04T09:33:03.2145947Z echo 2025-12-04T09:33:03.2146068Z echo "Is the current job unstable? False" 2025-12-04T09:33:03.2146206Z  2025-12-04T09:33:03.2146296Z echo 2025-12-04T09:33:03.2146415Z echo "Is keep-going label set? True" 2025-12-04T09:33:03.2146543Z  2025-12-04T09:33:03.2146631Z echo 2025-12-04T09:33:03.2146735Z echo "Reenabled issues? " 2025-12-04T09:33:03.2151125Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:03.2151277Z env: 2025-12-04T09:33:03.2151379Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:03.2151564Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:03.2151744Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:03.2151915Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:03.2152301Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:03.2152671Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:03.2152793Z AWS_REGION: us-east-1 2025-12-04T09:33:03.2152983Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:03.2153141Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:03.2155645Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:03.2155754Z ##[endgroup] 2025-12-04T09:33:03.2172317Z Filtered matrix: 2025-12-04T09:33:03.2176804Z {include: [{config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 1, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 2, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 3, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 4, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 5, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, mem_leak_check: mem_leak_check, rerun_disabled_tests: rerun_disabled_tests}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests, mem_leak_check: mem_leak_check}, {config: default, shard: 6, num_shards: 6, runner: linux.rocm.gpu.gfx942.1.b, rerun_disabled_tests: rerun_disabled_tests}]} 2025-12-04T09:33:03.2180692Z 2025-12-04T09:33:03.2180744Z Is the current job unstable? False 2025-12-04T09:33:03.2180838Z 2025-12-04T09:33:03.2180886Z Is keep-going label set? True 2025-12-04T09:33:03.2181030Z 2025-12-04T09:33:03.2181072Z Reenabled issues? 2025-12-04T09:33:03.2201046Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:33:03.2201253Z echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}" 2025-12-04T09:33:03.2203829Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:03.2203983Z env: 2025-12-04T09:33:03.2204083Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:03.2204224Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:03.2204406Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:03.2204578Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:03.2204970Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:03.2205344Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:03.2205468Z AWS_REGION: us-east-1 2025-12-04T09:33:03.2205624Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:03.2205828Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:03.2208111Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:03.2208221Z JOB_TIMEOUT: 600 2025-12-04T09:33:03.2208324Z ##[endgroup] 2025-12-04T09:33:03.2247971Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:33:03.2248237Z env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:33:03.2248454Z env | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}" 2025-12-04T09:33:03.2253129Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T09:33:03.2253377Z env: 2025-12-04T09:33:03.2253505Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:03.2253676Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:03.2253892Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:03.2254097Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:03.2254570Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:03.2255048Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:03.2255190Z AWS_REGION: us-east-1 2025-12-04T09:33:03.2255401Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:03.2255587Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:03.2258095Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:03.2258209Z ##[endgroup] 2025-12-04T09:33:03.2329178Z ##[group]Run set -x 2025-12-04T09:33:03.2329391Z set -x 2025-12-04T09:33:03.2329507Z  2025-12-04T09:33:03.2329623Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2025-12-04T09:33:03.2329792Z  TEST_COMMAND=.ci/pytorch/multigpu-test.sh 2025-12-04T09:33:03.2329965Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2025-12-04T09:33:03.2330179Z  TEST_COMMAND=.ci/caffe2/test.sh 2025-12-04T09:33:03.2330307Z else 2025-12-04T09:33:03.2330417Z  TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:33:03.2330541Z fi 2025-12-04T09:33:03.2330633Z  2025-12-04T09:33:03.2330771Z # detached container should get cleaned up by teardown_ec2_linux 2025-12-04T09:33:03.2330979Z # TODO: Stop building test binaries as part of the build phase 2025-12-04T09:33:03.2331158Z # Used for GPU_FLAG since that doesn't play nice 2025-12-04T09:33:03.2331354Z # shellcheck disable=SC2086,SC2090 2025-12-04T09:33:03.2331495Z container_name=$(docker run \ 2025-12-04T09:33:03.2331662Z  ${GPU_FLAG:-} \ 2025-12-04T09:33:03.2331897Z  -e BUILD_ENVIRONMENT \ 2025-12-04T09:33:03.2332021Z  -e PR_NUMBER \ 2025-12-04T09:33:03.2332135Z  -e GITHUB_ACTIONS \ 2025-12-04T09:33:03.2332252Z  -e GITHUB_REPOSITORY \ 2025-12-04T09:33:03.2332374Z  -e GITHUB_WORKFLOW \ 2025-12-04T09:33:03.2332491Z  -e GITHUB_JOB \ 2025-12-04T09:33:03.2332599Z  -e GITHUB_RUN_ID \ 2025-12-04T09:33:03.2332712Z  -e GITHUB_RUN_NUMBER \ 2025-12-04T09:33:03.2332832Z  -e GITHUB_RUN_ATTEMPT \ 2025-12-04T09:33:03.2332950Z  -e JOB_ID \ 2025-12-04T09:33:03.2333054Z  -e JOB_NAME \ 2025-12-04T09:33:03.2333162Z  -e BASE_SHA \ 2025-12-04T09:33:03.2333317Z  -e BRANCH \ 2025-12-04T09:33:03.2333424Z  -e SHA1 \ 2025-12-04T09:33:03.2333531Z  -e AWS_DEFAULT_REGION \ 2025-12-04T09:33:03.2333648Z  -e IN_WHEEL_TEST \ 2025-12-04T09:33:03.2333760Z  -e SHARD_NUMBER \ 2025-12-04T09:33:03.2333878Z  -e TEST_CONFIG \ 2025-12-04T09:33:03.2333988Z  -e NUM_TEST_SHARDS \ 2025-12-04T09:33:03.2334106Z  -e REENABLED_ISSUES \ 2025-12-04T09:33:03.2334224Z  -e CONTINUE_THROUGH_ERROR \ 2025-12-04T09:33:03.2334347Z  -e VERBOSE_TEST_LOGS \ 2025-12-04T09:33:03.2334464Z  -e TEST_SHOWLOCALS \ 2025-12-04T09:33:03.2334580Z  -e NO_TEST_TIMEOUT \ 2025-12-04T09:33:03.2334689Z  -e NO_TD \ 2025-12-04T09:33:03.2334805Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2025-12-04T09:33:03.2334946Z  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \ 2025-12-04T09:33:03.2335087Z  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \ 2025-12-04T09:33:03.2335222Z  -e TESTS_TO_INCLUDE \ 2025-12-04T09:33:03.2335343Z  -e HUGGING_FACE_HUB_TOKEN \ 2025-12-04T09:33:03.2335469Z  -e DASHBOARD_TAG \ 2025-12-04T09:33:03.2335614Z  --env-file="${RUNNER_TEMP}/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:33:03.2335776Z  --ulimit stack=10485760:83886080 \ 2025-12-04T09:33:03.2335900Z  --ulimit core=0 \ 2025-12-04T09:33:03.2336034Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2025-12-04T09:33:03.2336185Z  --security-opt seccomp=unconfined \ 2025-12-04T09:33:03.2336320Z  --cap-add=SYS_PTRACE \ 2025-12-04T09:33:03.2336441Z  --shm-size="8g" \ 2025-12-04T09:33:03.2336547Z  --tty \ 2025-12-04T09:33:03.2336647Z  --detach \ 2025-12-04T09:33:03.2336756Z  --name="${container_name}" \ 2025-12-04T09:33:03.2336879Z  --user jenkins \ 2025-12-04T09:33:03.2337017Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2025-12-04T09:33:03.2337169Z  -w /var/lib/jenkins/workspace \ 2025-12-04T09:33:03.2337361Z  "${DOCKER_IMAGE}" 2025-12-04T09:33:03.2337466Z ) 2025-12-04T09:33:03.2337568Z # save container name for later step 2025-12-04T09:33:03.2337730Z echo "CONTAINER_NAME=${container_name}" >> "$GITHUB_ENV" 2025-12-04T09:33:03.2337999Z # jenkins user does not have write permission to mounted workspace; work-around by copying within container to jenkins home 2025-12-04T09:33:03.2338345Z docker exec -t "${container_name}" sh -c "cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && ${TEST_COMMAND}" 2025-12-04T09:33:03.2341125Z shell: /usr/bin/bash -e {0} 2025-12-04T09:33:03.2341237Z env: 2025-12-04T09:33:03.2341328Z GIT_DEFAULT_BRANCH: main 2025-12-04T09:33:03.2341463Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T09:33:03.2341641Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T09:33:03.2341803Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T09:33:03.2342189Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T09:33:03.2342599Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T09:33:03.2342715Z AWS_REGION: us-east-1 2025-12-04T09:33:03.2342850Z AWS_ACCESS_KEY_ID: *** 2025-12-04T09:33:03.2343000Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T09:33:03.2345478Z AWS_SESSION_TOKEN: *** 2025-12-04T09:33:03.2345607Z BUILD_ENVIRONMENT: linux-noble-rocm-py3.12-mi300 2025-12-04T09:33:03.2345742Z PR_NUMBER: 2025-12-04T09:33:03.2345844Z GITHUB_REPOSITORY: pytorch/pytorch 2025-12-04T09:33:03.2345970Z GITHUB_WORKFLOW: rocm-mi300 2025-12-04T09:33:03.2346082Z GITHUB_JOB: test 2025-12-04T09:33:03.2346179Z GITHUB_RUN_ID: 19922812470 2025-12-04T09:33:03.2346290Z GITHUB_RUN_NUMBER: 14122 2025-12-04T09:33:03.2346399Z GITHUB_RUN_ATTEMPT: 1 2025-12-04T09:33:03.2346499Z JOB_ID: 57116139284 2025-12-04T09:33:03.2346693Z JOB_NAME: linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:03.2346903Z BRANCH: main 2025-12-04T09:33:03.2347013Z SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:03.2347164Z BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:03.2347295Z TEST_CONFIG: default 2025-12-04T09:33:03.2347393Z SHARD_NUMBER: 2 2025-12-04T09:33:03.2347486Z NUM_TEST_SHARDS: 6 2025-12-04T09:33:03.2347585Z REENABLED_ISSUES: 2025-12-04T09:33:03.2347687Z CONTINUE_THROUGH_ERROR: True 2025-12-04T09:33:03.2347802Z VERBOSE_TEST_LOGS: False 2025-12-04T09:33:03.2347910Z TEST_SHOWLOCALS: False 2025-12-04T09:33:03.2348012Z NO_TEST_TIMEOUT: False 2025-12-04T09:33:03.2348113Z NO_TD: False 2025-12-04T09:33:03.2348374Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:33:03.2348665Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1 2025-12-04T09:33:03.2348796Z PYTORCH_TEST_RERUN_DISABLED_TESTS: 0 2025-12-04T09:33:03.2348918Z TESTS_TO_INCLUDE: 2025-12-04T09:33:03.2349017Z DASHBOARD_TAG: 2025-12-04T09:33:03.2349163Z HUGGING_FACE_HUB_TOKEN: *** 2025-12-04T09:33:03.2349273Z ##[endgroup] 2025-12-04T09:33:03.2360724Z + [[ default == \m\u\l\t\i\g\p\u ]] 2025-12-04T09:33:03.2360866Z + [[ linux-noble-rocm-py3.12-mi300 == *onnx* ]] 2025-12-04T09:33:03.2361017Z + TEST_COMMAND=.ci/pytorch/test.sh 2025-12-04T09:33:03.2367074Z +++ nproc --ignore=2 2025-12-04T09:33:03.2375584Z ++ docker run --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e MAX_JOBS=126 -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e TESTS_TO_INCLUDE -e HUGGING_FACE_HUB_TOKEN -e DASHBOARD_TAG --env-file=/home/runner/_work/_temp/github_env_19922812470 --ulimit stack=10485760:83886080 --ulimit core=0 --env-file=/tmp/github_env_19922812470 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --shm-size=8g --tty --detach --name= --user jenkins -v /home/runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-noble-rocm-n-py3-f0cd68561080d537ef3d3d6f81b25a6416ad600a 2025-12-04T09:33:03.4376416Z + container_name=155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T09:33:03.4377139Z + echo CONTAINER_NAME=155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T09:33:03.4378220Z + docker exec -t 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 sh -c 'cd .. && cp -R workspace pytorch && cd pytorch && pip install dist/*.whl && .ci/pytorch/test.sh' 2025-12-04T09:33:11.8250686Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp312-cp312-linux_x86_64.whl 2025-12-04T09:33:12.3846160Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.18.0) 2025-12-04T09:33:12.3846623Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (4.12.2) 2025-12-04T09:33:12.3849364Z Requirement already satisfied: setuptools in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (78.1.1) 2025-12-04T09:33:12.3849788Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (1.13.3) 2025-12-04T09:33:12.3852299Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (2.8.8) 2025-12-04T09:33:12.3853663Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (3.1.6) 2025-12-04T09:33:12.3854706Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from torch==2.10.0a0+gitffd9b0f) (2025.10.0) 2025-12-04T09:33:12.3900208Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f) (1.3.0) 2025-12-04T09:33:12.3919312Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.12/lib/python3.12/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f) (3.0.3) 2025-12-04T09:33:12.5330450Z Installing collected packages: torch 2025-12-04T09:33:18.3731756Z Successfully installed torch-2.10.0a0+gitffd9b0f 2025-12-04T09:33:18.4221218Z + export TERM=vt100 2025-12-04T09:33:18.4221788Z + TERM=vt100 2025-12-04T09:33:18.4227443Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:33:18.4237246Z + source .ci/pytorch/common.sh 2025-12-04T09:33:18.4242001Z +++ dirname .ci/pytorch/common.sh 2025-12-04T09:33:18.4252337Z ++ source .ci/pytorch/common_utils.sh 2025-12-04T09:33:18.4254815Z +++ declare -f -t trap_add 2025-12-04T09:33:18.4259340Z ++ set -ex -o pipefail 2025-12-04T09:33:18.4259594Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T09:33:18.4259852Z ++ unset HIP_PLATFORM 2025-12-04T09:33:18.4260058Z ++ export PYTORCH_TEST_WITH_ROCM=1 2025-12-04T09:33:18.4260290Z ++ PYTORCH_TEST_WITH_ROCM=1 2025-12-04T09:33:18.4260516Z ++ BUILD_TEST_LIBTORCH=0 2025-12-04T09:33:18.4264702Z ++ dirname .ci/pytorch/test.sh 2025-12-04T09:33:18.4279134Z + source .ci/pytorch/common-build.sh 2025-12-04T09:33:18.4280451Z ++ [[ linux-noble-rocm-py3.12-mi300 != *win-* ]] 2025-12-04T09:33:18.4290379Z ++++ dirname .ci/pytorch/common-build.sh 2025-12-04T09:33:18.4300410Z +++ cd .ci/pytorch 2025-12-04T09:33:18.4300656Z +++ pwd -P 2025-12-04T09:33:18.4302577Z ++ script_dir=/var/lib/jenkins/pytorch/.ci/pytorch 2025-12-04T09:33:18.4303002Z ++ [[ linux-noble-rocm-py3.12-mi300 == *-pch* ]] 2025-12-04T09:33:18.4303450Z ++ which sccache 2025-12-04T09:33:18.4317524Z ++ [[ -z '' ]] 2025-12-04T09:33:18.4317709Z ++ unset SCCACHE_BUCKET 2025-12-04T09:33:18.4317876Z ++ unset SCCACHE_REGION 2025-12-04T09:33:18.4318043Z ++ sccache --stop-server 2025-12-04T09:33:18.4339536Z ++ true 2025-12-04T09:33:18.4339686Z ++ rm -f /var/lib/jenkins/sccache_error.log 2025-12-04T09:33:18.4352959Z ++ trap_add sccache_epilogue EXIT 2025-12-04T09:33:18.4353133Z ++ trap_add_cmd=sccache_epilogue 2025-12-04T09:33:18.4353328Z ++ shift 2025-12-04T09:33:18.4353463Z ++ for trap_add_name in "$@" 2025-12-04T09:33:18.4361457Z ++++ trap -p EXIT 2025-12-04T09:33:18.4364034Z +++ eval 'extract_trap_cmd ' 2025-12-04T09:33:18.4364190Z ++++ extract_trap_cmd 2025-12-04T09:33:18.4364340Z ++++ printf '%s\n' '' 2025-12-04T09:33:18.4364493Z +++ printf '%s\n' sccache_epilogue 2025-12-04T09:33:18.4366958Z ++ trap -- ' 2025-12-04T09:33:18.4367092Z sccache_epilogue' EXIT 2025-12-04T09:33:18.4367340Z ++ [[ -n '' ]] 2025-12-04T09:33:18.4367498Z ++ [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T09:33:18.4368793Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2025-12-04T09:33:18.4368982Z ++ SCCACHE_IDLE_TIMEOUT=0 2025-12-04T09:33:18.4369127Z ++ sccache --start-server 2025-12-04T09:33:18.4392981Z sccache: Starting the server... 2025-12-04T09:33:18.4595432Z sccache: Listening on address 127.0.0.1:4226 2025-12-04T09:33:18.4604716Z ++ sccache --zero-stats 2025-12-04T09:33:18.4624980Z Statistics zeroed. 2025-12-04T09:33:18.4626843Z ++ which ccache 2025-12-04T09:33:18.4636405Z + [[ linux-noble-rocm-py3.12-mi300 != *rocm* ]] 2025-12-04T09:33:18.4636593Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T09:33:18.4636761Z + echo 'Environment variables:' 2025-12-04T09:33:18.4636907Z Environment variables: 2025-12-04T09:33:18.4637033Z + env 2025-12-04T09:33:18.4644771Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-12-04T09:33:18.4644984Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:33:18.4645167Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-12-04T09:33:18.4645393Z HOSTNAME=linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr 2025-12-04T09:33:18.4645702Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4645964Z GITHUB_ACTION=__run_2 2025-12-04T09:33:18.4646109Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:33:18.4646257Z GITHUB_RUN_NUMBER=14122 2025-12-04T09:33:18.4646445Z TEST_CONFIG=default 2025-12-04T09:33:18.4646607Z RUNNER_NAME=linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr 2025-12-04T09:33:18.4646801Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:33:18.4646957Z AWS_DEFAULT_REGION=us-east-1 2025-12-04T09:33:18.4647129Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts 2025-12-04T09:33:18.4647317Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-12-04T09:33:18.4647473Z GITHUB_REF_TYPE=branch 2025-12-04T09:33:18.4647627Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4647970Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:33:18.4648453Z *** 2025-12-04T09:33:18.4648572Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:33:18.4648715Z GITHUB_ACTIONS=true 2025-12-04T09:33:18.4648857Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4649044Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4649301Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm-mi300.yml@refs/heads/main 2025-12-04T09:33:18.4649531Z UCC_HOME=/usr 2025-12-04T09:33:18.4649660Z RUNNER_ENVIRONMENT=self-hosted 2025-12-04T09:33:18.4649805Z VERBOSE_TEST_LOGS=False 2025-12-04T09:33:18.4649937Z GITHUB_REF=refs/heads/main 2025-12-04T09:33:18.4650067Z RUNNER_OS=Linux 2025-12-04T09:33:18.4650180Z SHARD_NUMBER=2 2025-12-04T09:33:18.4650303Z GITHUB_REF_PROTECTED=true 2025-12-04T09:33:18.4650550Z RUNNER_MANUALLY_TRAP_SIG=1 2025-12-04T09:33:18.4650688Z HOME=/var/lib/jenkins 2025-12-04T09:33:18.4650850Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:33:18.4651016Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:33:18.4651180Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs 2025-12-04T09:33:18.4651337Z LANG=C.UTF-8 2025-12-04T09:33:18.4651475Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e 2025-12-04T09:33:18.4651651Z PYTORCH_TEST_WITH_ROCM=1 2025-12-04T09:33:18.4651831Z RUNNER_TRACKING_ID=github_32e96445-69ce-4c9e-80fb-f05222fc8831 2025-12-04T09:33:18.4652017Z RUNNER_ARCH=X64 2025-12-04T09:33:18.4652147Z RUNNER_TEMP=/home/runner/_work/_temp 2025-12-04T09:33:18.4652292Z NUM_TEST_SHARDS=6 2025-12-04T09:33:18.4652409Z UCX_HOME=/usr 2025-12-04T09:33:18.4652645Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4653031Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:18.4653356Z MAGMA_HOME=/opt/rocm/magma 2025-12-04T09:33:18.4653591Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4653953Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:33:18.4654151Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:33:18.4654341Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1 2025-12-04T09:33:18.4654538Z DASHBOARD_TAG= 2025-12-04T09:33:18.4654653Z GITHUB_RUN_ID=19922812470 2025-12-04T09:33:18.4654907Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4655186Z GITHUB_ACTOR=pytorchmergebot 2025-12-04T09:33:18.4655323Z PR_NUMBER= 2025-12-04T09:33:18.4655433Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:33:18.4655562Z ANACONDA_PYTHON_VERSION=3.12 2025-12-04T09:33:18.4655723Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:33:18.4655897Z TERM=vt100 2025-12-04T09:33:18.4656004Z INSTALLED_VISION=yes 2025-12-04T09:33:18.4656127Z BRANCH=main 2025-12-04T09:33:18.4656248Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:33:18.4656382Z TESTS_TO_INCLUDE= 2025-12-04T09:33:18.4656576Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-12-04T09:33:18.4656804Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:33:18.4656951Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100 2025-12-04T09:33:18.4657111Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77 2025-12-04T09:33:18.4657253Z REENABLED_ISSUES= 2025-12-04T09:33:18.4657346Z SHLVL=1 2025-12-04T09:33:18.4657432Z MAX_JOBS=126 2025-12-04T09:33:18.4657562Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results 2025-12-04T09:33:18.4657718Z GITHUB_ACTOR_ID=97764156 2025-12-04T09:33:18.4657835Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool 2025-12-04T09:33:18.4658001Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4658157Z GITHUB_REF_NAME=main 2025-12-04T09:33:18.4658259Z ROCM_PATH=/opt/rocm 2025-12-04T09:33:18.4658355Z GITHUB_JOB=test 2025-12-04T09:33:18.4658453Z NO_TEST_TIMEOUT=False 2025-12-04T09:33:18.4658563Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:33:18.4658684Z LC_ALL=C.UTF-8 2025-12-04T09:33:18.4658782Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:33:18.4658902Z RUNNER_WORKSPACE=/home/runner/_work/pytorch 2025-12-04T09:33:18.4659034Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:33:18.4659147Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:33:18.4659509Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:33:18.4659866Z GITHUB_BASE_REF= 2025-12-04T09:33:18.4659960Z CI=true 2025-12-04T09:33:18.4660055Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:33:18.4660170Z JOB_ID=57116139284 2025-12-04T09:33:18.4660264Z GITHUB_HEAD_REF= 2025-12-04T09:33:18.4660396Z GITHUB_ACTION_REF= 2025-12-04T09:33:18.4660494Z TEST_SHOWLOCALS=False 2025-12-04T09:33:18.4660602Z GITHUB_WORKFLOW=rocm-mi300 2025-12-04T09:33:18.4660720Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:33:18.4660931Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4661143Z NO_TD=False 2025-12-04T09:33:18.4661236Z OLDPWD=/var/lib/jenkins 2025-12-04T09:33:18.4661338Z _=/usr/bin/env 2025-12-04T09:33:18.4661469Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2025-12-04T09:33:18.4715598Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch 2025-12-04T09:33:18.4717064Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/bin 2025-12-04T09:33:18.4717550Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/lib 2025-12-04T09:33:18.4717920Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/test 2025-12-04T09:33:18.4718216Z + BUILD_DIR=build 2025-12-04T09:33:18.4718467Z + BUILD_RENAMED_DIR=build_renamed 2025-12-04T09:33:18.4718671Z + BUILD_BIN_DIR=build/bin 2025-12-04T09:33:18.4719386Z + SHARD_NUMBER=2 2025-12-04T09:33:18.4719551Z + NUM_TEST_SHARDS=6 2025-12-04T09:33:18.4719752Z + export TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:33:18.4719959Z + TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:33:18.4720145Z + export VALGRIND=ON 2025-12-04T09:33:18.4720307Z + VALGRIND=ON 2025-12-04T09:33:18.4720489Z + [[ linux-noble-rocm-py3.12-mi300 == *clang9* ]] 2025-12-04T09:33:18.4720735Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-12-04T09:33:18.4720939Z + detect_cuda_arch 2025-12-04T09:33:18.4721122Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T09:33:18.4721367Z + [[ linux-noble-rocm-py3.12-mi300 == *s390x* ]] 2025-12-04T09:33:18.4721573Z + [[ 0 == \1 ]] 2025-12-04T09:33:18.4721717Z + [[ True == \1 ]] 2025-12-04T09:33:18.4721891Z + [[ linux-noble-rocm-py3.12-mi300 != *bazel* ]] 2025-12-04T09:33:18.4722117Z ++ realpath build/custom_test_artifacts 2025-12-04T09:33:18.4728629Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/pytorch/build/custom_test_artifacts 2025-12-04T09:33:18.4728894Z + [[ -n '' ]] 2025-12-04T09:33:18.4729024Z + echo 'Environment variables' 2025-12-04T09:33:18.4729192Z Environment variables 2025-12-04T09:33:18.4729323Z + env 2025-12-04T09:33:18.4736957Z GITHUB_WORKSPACE=/home/runner/_work/pytorch/pytorch 2025-12-04T09:33:18.4737179Z CONTINUE_THROUGH_ERROR=True 2025-12-04T09:33:18.4737364Z BUILD_ENVIRONMENT=linux-noble-rocm-py3.12-mi300 2025-12-04T09:33:18.4737613Z HOSTNAME=linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr 2025-12-04T09:33:18.4737961Z GITHUB_PATH=/home/runner/_work/_temp/_runner_file_commands/add_path_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4738252Z GITHUB_ACTION=__run_2 2025-12-04T09:33:18.4738399Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 2025-12-04T09:33:18.4738553Z GITHUB_RUN_NUMBER=14122 2025-12-04T09:33:18.4738702Z TEST_CONFIG=default 2025-12-04T09:33:18.4738877Z RUNNER_NAME=linux.rocm.gpu.gfx942.1.b-gwk9b-runner-shkfr 2025-12-04T09:33:18.4739071Z GITHUB_REPOSITORY_OWNER_ID=21003710 2025-12-04T09:33:18.4739240Z AWS_DEFAULT_REGION=us-east-1 2025-12-04T09:33:18.4739417Z RUNNER_ARTIFACT_DIR=/home/runner/_work/_temp/artifacts 2025-12-04T09:33:18.4739611Z GITHUB_TRIGGERING_ACTOR=pytorchmergebot 2025-12-04T09:33:18.4739830Z GITHUB_REF_TYPE=branch 2025-12-04T09:33:18.4739985Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4740327Z HUGGING_FACE_HUB_TOKEN=*** 2025-12-04T09:33:18.4740533Z *** 2025-12-04T09:33:18.4740650Z GITHUB_REPOSITORY_ID=65600975 2025-12-04T09:33:18.4740794Z GITHUB_ACTIONS=true 2025-12-04T09:33:18.4740941Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4741128Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4741394Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/rocm-mi300.yml@refs/heads/main 2025-12-04T09:33:18.4741621Z UCC_HOME=/usr 2025-12-04T09:33:18.4741750Z TORCH_SERIALIZATION_DEBUG=1 2025-12-04T09:33:18.4742081Z RUNNER_ENVIRONMENT=self-hosted 2025-12-04T09:33:18.4742232Z VERBOSE_TEST_LOGS=False 2025-12-04T09:33:18.4742373Z GITHUB_REF=refs/heads/main 2025-12-04T09:33:18.4742501Z RUNNER_OS=Linux 2025-12-04T09:33:18.4742626Z SHARD_NUMBER=2 2025-12-04T09:33:18.4742745Z GITHUB_REF_PROTECTED=true 2025-12-04T09:33:18.4742889Z RUNNER_MANUALLY_TRAP_SIG=1 2025-12-04T09:33:18.4743024Z HOME=/var/lib/jenkins 2025-12-04T09:33:18.4743178Z GITHUB_API_URL=https://api.github.com 2025-12-04T09:33:18.4743400Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0 2025-12-04T09:33:18.4743566Z RUNNER_DOCS_DIR=/home/runner/_work/_temp/docs 2025-12-04T09:33:18.4743725Z LANG=C.UTF-8 2025-12-04T09:33:18.4743874Z UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e 2025-12-04T09:33:18.4744054Z PYTORCH_TEST_WITH_ROCM=1 2025-12-04T09:33:18.4744241Z RUNNER_TRACKING_ID=github_32e96445-69ce-4c9e-80fb-f05222fc8831 2025-12-04T09:33:18.4744435Z RUNNER_ARCH=X64 2025-12-04T09:33:18.4744566Z RUNNER_TEMP=/home/runner/_work/_temp 2025-12-04T09:33:18.4744726Z NUM_TEST_SHARDS=6 2025-12-04T09:33:18.4744890Z UCX_HOME=/usr 2025-12-04T09:33:18.4745128Z GITHUB_STATE=/home/runner/_work/_temp/_runner_file_commands/save_state_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4745579Z JOB_NAME=linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1.b, mem_leak_check) 2025-12-04T09:33:18.4745852Z MAGMA_HOME=/opt/rocm/magma 2025-12-04T09:33:18.4746089Z GITHUB_ENV=/home/runner/_work/_temp/_runner_file_commands/set_env_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4746423Z GITHUB_EVENT_PATH=/home/runner/_work/_temp/_github_workflow/event.json 2025-12-04T09:33:18.4746634Z GITHUB_EVENT_NAME=schedule 2025-12-04T09:33:18.4746838Z GITHUB_ACTIONS_RUNNER_EXTRA_USER_AGENT=actions-runner-controller/0.12.1 2025-12-04T09:33:18.4747049Z DASHBOARD_TAG= 2025-12-04T09:33:18.4747171Z GITHUB_RUN_ID=19922812470 2025-12-04T09:33:18.4747444Z GITHUB_STEP_SUMMARY=/home/runner/_work/_temp/_runner_file_commands/step_summary_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4747737Z GITHUB_ACTOR=pytorchmergebot 2025-12-04T09:33:18.4747877Z PR_NUMBER= 2025-12-04T09:33:18.4747993Z GITHUB_RUN_ATTEMPT=1 2025-12-04T09:33:18.4748136Z VALGRIND=ON 2025-12-04T09:33:18.4748268Z ANACONDA_PYTHON_VERSION=3.12 2025-12-04T09:33:18.4748412Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql 2025-12-04T09:33:18.4748552Z TERM=vt100 2025-12-04T09:33:18.4748648Z INSTALLED_VISION=yes 2025-12-04T09:33:18.4748753Z BRANCH=main 2025-12-04T09:33:18.4748856Z OPENSSL_ROOT_DIR=/opt/openssl 2025-12-04T09:33:18.4748977Z TESTS_TO_INCLUDE= 2025-12-04T09:33:18.4749148Z GITHUB_ACTION_PATH=/home/runner/_work/pytorch/pytorch/./.github/actions/setup-rocm 2025-12-04T09:33:18.4749353Z GITHUB_SERVER_URL=https://github.com 2025-12-04T09:33:18.4749504Z PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx950;gfx1100 2025-12-04T09:33:18.4749666Z UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77 2025-12-04T09:33:18.4749812Z REENABLED_ISSUES= 2025-12-04T09:33:18.4749912Z SHLVL=1 2025-12-04T09:33:18.4750003Z MAX_JOBS=126 2025-12-04T09:33:18.4750141Z RUNNER_TEST_RESULTS_DIR=/home/runner/_work/_temp/test-results 2025-12-04T09:33:18.4750306Z GITHUB_ACTOR_ID=97764156 2025-12-04T09:33:18.4750435Z RUNNER_TOOL_CACHE=/home/runner/_work/_tool 2025-12-04T09:33:18.4750612Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 2025-12-04T09:33:18.4750771Z GITHUB_REF_NAME=main 2025-12-04T09:33:18.4750875Z ROCM_PATH=/opt/rocm 2025-12-04T09:33:18.4750979Z GITHUB_JOB=test 2025-12-04T09:33:18.4751084Z NO_TEST_TIMEOUT=False 2025-12-04T09:33:18.4751201Z GITHUB_REPOSITORY=pytorch/pytorch 2025-12-04T09:33:18.4751324Z LC_ALL=C.UTF-8 2025-12-04T09:33:18.4751425Z GITHUB_RETENTION_DAYS=90 2025-12-04T09:33:18.4751554Z RUNNER_WORKSPACE=/home/runner/_work/pytorch 2025-12-04T09:33:18.4751692Z OPENSSL_DIR=/opt/openssl 2025-12-04T09:33:18.4751811Z GITHUB_ACTION_REPOSITORY= 2025-12-04T09:33:18.4752236Z PATH=/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:33:18.4752618Z GITHUB_BASE_REF= 2025-12-04T09:33:18.4752718Z CI=true 2025-12-04T09:33:18.4752821Z GITHUB_REPOSITORY_OWNER=pytorch 2025-12-04T09:33:18.4752943Z JOB_ID=57116139284 2025-12-04T09:33:18.4753046Z GITHUB_HEAD_REF= 2025-12-04T09:33:18.4753146Z GITHUB_ACTION_REF= 2025-12-04T09:33:18.4753317Z TEST_SHOWLOCALS=False 2025-12-04T09:33:18.4753433Z GITHUB_WORKFLOW=rocm-mi300 2025-12-04T09:33:18.4753555Z DEBIAN_FRONTEND=noninteractive 2025-12-04T09:33:18.4753776Z GITHUB_OUTPUT=/home/runner/_work/_temp/_runner_file_commands/set_output_e0a4d36b-f240-4938-9929-3bd1938fb146 2025-12-04T09:33:18.4753996Z NO_TD=False 2025-12-04T09:33:18.4754089Z OLDPWD=/var/lib/jenkins 2025-12-04T09:33:18.4754198Z _=/usr/bin/env 2025-12-04T09:33:18.4754300Z + echo 'Testing pytorch' 2025-12-04T09:33:18.4754411Z Testing pytorch 2025-12-04T09:33:18.4754516Z + export LANG=C.UTF-8 2025-12-04T09:33:18.4754622Z + LANG=C.UTF-8 2025-12-04T09:33:18.4754721Z + PR_NUMBER= 2025-12-04T09:33:18.4754834Z + [[ default == \d\e\f\a\u\l\t ]] 2025-12-04T09:33:18.4754964Z + export CUDA_VISIBLE_DEVICES=0 2025-12-04T09:33:18.4755159Z + CUDA_VISIBLE_DEVICES=0 2025-12-04T09:33:18.4755292Z + export HIP_VISIBLE_DEVICES=0 2025-12-04T09:33:18.4755413Z + HIP_VISIBLE_DEVICES=0 2025-12-04T09:33:18.4755538Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]] 2025-12-04T09:33:18.4755670Z + [[ default == \s\l\o\w ]] 2025-12-04T09:33:18.4755848Z + [[ linux-noble-rocm-py3.12-mi300 == *slow-gradcheck* ]] 2025-12-04T09:33:18.4756034Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T09:33:18.4756196Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T09:33:18.4756350Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:33:18.4756496Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2025-12-04T09:33:18.4756631Z + [[ default == *crossref* ]] 2025-12-04T09:33:18.4756777Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T09:33:18.4756911Z + export VALGRIND=OFF 2025-12-04T09:33:18.4757027Z + VALGRIND=OFF 2025-12-04T09:33:18.4757125Z + rocminfo 2025-12-04T09:33:18.4857807Z ROCk module version 6.12.12 is loaded 2025-12-04T09:33:18.5238867Z ===================== 2025-12-04T09:33:18.5239053Z HSA System Attributes 2025-12-04T09:33:18.5240041Z ===================== 2025-12-04T09:33:18.5240459Z Runtime Version: 1.18 2025-12-04T09:33:18.5240694Z Runtime Ext Version: 1.14 2025-12-04T09:33:18.5240933Z System Timestamp Freq.: 1000.000000MHz 2025-12-04T09:33:18.5241329Z Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) 2025-12-04T09:33:18.5241750Z Machine Model: LARGE 2025-12-04T09:33:18.5242108Z System Endianness: LITTLE 2025-12-04T09:33:18.5242383Z Mwaitx: DISABLED 2025-12-04T09:33:18.5242606Z XNACK enabled: NO 2025-12-04T09:33:18.5242825Z DMAbuf Support: YES 2025-12-04T09:33:18.5243040Z VMM Support: YES 2025-12-04T09:33:18.5243173Z 2025-12-04T09:33:18.5243376Z ========== 2025-12-04T09:33:18.5243583Z HSA Agents 2025-12-04T09:33:18.5243793Z ========== 2025-12-04T09:33:18.5243983Z ******* 2025-12-04T09:33:18.5244170Z Agent 1 2025-12-04T09:33:18.5244375Z ******* 2025-12-04T09:33:18.5244667Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5244978Z Uuid: CPU-XX 2025-12-04T09:33:18.5245291Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5245598Z Vendor Name: CPU 2025-12-04T09:33:18.5245886Z Feature: None specified 2025-12-04T09:33:18.5246148Z Profile: FULL_PROFILE 2025-12-04T09:33:18.5246415Z Float Round Mode: NEAR 2025-12-04T09:33:18.5247266Z Max Queue Number: 0(0x0) 2025-12-04T09:33:18.5247527Z Queue Min Size: 0(0x0) 2025-12-04T09:33:18.5247826Z Queue Max Size: 0(0x0) 2025-12-04T09:33:18.5248085Z Queue Type: MULTI 2025-12-04T09:33:18.5248329Z Node: 0 2025-12-04T09:33:18.5248576Z Device Type: CPU 2025-12-04T09:33:18.5248807Z Cache Info: 2025-12-04T09:33:18.5249013Z L1: 49152(0xc000) KB 2025-12-04T09:33:18.5249257Z Chip ID: 0(0x0) 2025-12-04T09:33:18.5249511Z ASIC Revision: 0(0x0) 2025-12-04T09:33:18.5249772Z Cacheline Size: 64(0x40) 2025-12-04T09:33:18.5250041Z Max Clock Freq. (MHz): 3300 2025-12-04T09:33:18.5250307Z BDFID: 0 2025-12-04T09:33:18.5250557Z Internal Node ID: 0 2025-12-04T09:33:18.5250988Z Compute Unit: 64 2025-12-04T09:33:18.5251243Z SIMDs per CU: 0 2025-12-04T09:33:18.5251500Z Shader Engines: 0 2025-12-04T09:33:18.5251781Z Shader Arrs. per Eng.: 0 2025-12-04T09:33:18.5252059Z WatchPts on Addr. Ranges:1 2025-12-04T09:33:18.5252307Z Memory Properties: 2025-12-04T09:33:18.5252507Z Features: None 2025-12-04T09:33:18.5252700Z Pool Info: 2025-12-04T09:33:18.5252884Z Pool 1 2025-12-04T09:33:18.5253122Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T09:33:18.5253455Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T09:33:18.5253718Z Allocatable: TRUE 2025-12-04T09:33:18.5253990Z Alloc Granule: 4KB 2025-12-04T09:33:18.5254282Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5254559Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5254845Z Accessible by all: TRUE 2025-12-04T09:33:18.5255079Z Pool 2 2025-12-04T09:33:18.5255308Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T09:33:18.5255567Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T09:33:18.5255824Z Allocatable: TRUE 2025-12-04T09:33:18.5256128Z Alloc Granule: 4KB 2025-12-04T09:33:18.5256400Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5256644Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5256854Z Accessible by all: TRUE 2025-12-04T09:33:18.5257050Z Pool 3 2025-12-04T09:33:18.5257231Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T09:33:18.5257427Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T09:33:18.5257622Z Allocatable: TRUE 2025-12-04T09:33:18.5257824Z Alloc Granule: 4KB 2025-12-04T09:33:18.5258033Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5258246Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5258452Z Accessible by all: TRUE 2025-12-04T09:33:18.5258634Z Pool 4 2025-12-04T09:33:18.5258915Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T09:33:18.5259114Z Size: 1584734456(0x5e7520f8) KB 2025-12-04T09:33:18.5259309Z Allocatable: TRUE 2025-12-04T09:33:18.5259513Z Alloc Granule: 4KB 2025-12-04T09:33:18.5259723Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5259931Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5260136Z Accessible by all: TRUE 2025-12-04T09:33:18.5260318Z ISA Info: 2025-12-04T09:33:18.5260456Z ******* 2025-12-04T09:33:18.5260593Z Agent 2 2025-12-04T09:33:18.5260720Z ******* 2025-12-04T09:33:18.5260880Z Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5261102Z Uuid: CPU-XX 2025-12-04T09:33:18.5261307Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5261556Z Vendor Name: CPU 2025-12-04T09:33:18.5261759Z Feature: None specified 2025-12-04T09:33:18.5261955Z Profile: FULL_PROFILE 2025-12-04T09:33:18.5262157Z Float Round Mode: NEAR 2025-12-04T09:33:18.5262357Z Max Queue Number: 0(0x0) 2025-12-04T09:33:18.5262556Z Queue Min Size: 0(0x0) 2025-12-04T09:33:18.5262786Z Queue Max Size: 0(0x0) 2025-12-04T09:33:18.5262998Z Queue Type: MULTI 2025-12-04T09:33:18.5263179Z Node: 1 2025-12-04T09:33:18.5263458Z Device Type: CPU 2025-12-04T09:33:18.5263631Z Cache Info: 2025-12-04T09:33:18.5263788Z L1: 49152(0xc000) KB 2025-12-04T09:33:18.5263973Z Chip ID: 0(0x0) 2025-12-04T09:33:18.5264218Z ASIC Revision: 0(0x0) 2025-12-04T09:33:18.5264433Z Cacheline Size: 64(0x40) 2025-12-04T09:33:18.5264648Z Max Clock Freq. (MHz): 3300 2025-12-04T09:33:18.5264840Z BDFID: 0 2025-12-04T09:33:18.5265034Z Internal Node ID: 1 2025-12-04T09:33:18.5265232Z Compute Unit: 64 2025-12-04T09:33:18.5265424Z SIMDs per CU: 0 2025-12-04T09:33:18.5265622Z Shader Engines: 0 2025-12-04T09:33:18.5265827Z Shader Arrs. per Eng.: 0 2025-12-04T09:33:18.5266034Z WatchPts on Addr. Ranges:1 2025-12-04T09:33:18.5266225Z Memory Properties: 2025-12-04T09:33:18.5266368Z Features: None 2025-12-04T09:33:18.5266507Z Pool Info: 2025-12-04T09:33:18.5266649Z Pool 1 2025-12-04T09:33:18.5266826Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T09:33:18.5267006Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T09:33:18.5267189Z Allocatable: TRUE 2025-12-04T09:33:18.5267354Z Alloc Granule: 4KB 2025-12-04T09:33:18.5267527Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5267699Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5267911Z Accessible by all: TRUE 2025-12-04T09:33:18.5268059Z Pool 2 2025-12-04T09:33:18.5268200Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T09:33:18.5268358Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T09:33:18.5268515Z Allocatable: TRUE 2025-12-04T09:33:18.5268679Z Alloc Granule: 4KB 2025-12-04T09:33:18.5268851Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5269023Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5269190Z Accessible by all: TRUE 2025-12-04T09:33:18.5269336Z Pool 3 2025-12-04T09:33:18.5269475Z Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED 2025-12-04T09:33:18.5269635Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T09:33:18.5269789Z Allocatable: TRUE 2025-12-04T09:33:18.5270095Z Alloc Granule: 4KB 2025-12-04T09:33:18.5270267Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5270437Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5270604Z Accessible by all: TRUE 2025-12-04T09:33:18.5270749Z Pool 4 2025-12-04T09:33:18.5270887Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T09:33:18.5271043Z Size: 1585355616(0x5e7e9b60) KB 2025-12-04T09:33:18.5271198Z Allocatable: TRUE 2025-12-04T09:33:18.5271363Z Alloc Granule: 4KB 2025-12-04T09:33:18.5271541Z Alloc Recommended Granule:4KB 2025-12-04T09:33:18.5271714Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5271886Z Accessible by all: TRUE 2025-12-04T09:33:18.5272033Z ISA Info: 2025-12-04T09:33:18.5272144Z ******* 2025-12-04T09:33:18.5272247Z Agent 3 2025-12-04T09:33:18.5272353Z ******* 2025-12-04T09:33:18.5272475Z Name: gfx942 2025-12-04T09:33:18.5272625Z Uuid: GPU-3e378e7265318491 2025-12-04T09:33:18.5272786Z Marketing Name: AMD Radeon Graphics 2025-12-04T09:33:18.5272953Z Vendor Name: AMD 2025-12-04T09:33:18.5273111Z Feature: KERNEL_DISPATCH 2025-12-04T09:33:18.5273309Z Profile: BASE_PROFILE 2025-12-04T09:33:18.5273474Z Float Round Mode: NEAR 2025-12-04T09:33:18.5273638Z Max Queue Number: 128(0x80) 2025-12-04T09:33:18.5273805Z Queue Min Size: 64(0x40) 2025-12-04T09:33:18.5273965Z Queue Max Size: 131072(0x20000) 2025-12-04T09:33:18.5274126Z Queue Type: MULTI 2025-12-04T09:33:18.5274278Z Node: 2 2025-12-04T09:33:18.5274431Z Device Type: GPU 2025-12-04T09:33:18.5274573Z Cache Info: 2025-12-04T09:33:18.5274700Z L1: 32(0x20) KB 2025-12-04T09:33:18.5274842Z L2: 4096(0x1000) KB 2025-12-04T09:33:18.5274979Z L3: 262144(0x40000) KB 2025-12-04T09:33:18.5275163Z Chip ID: 29861(0x74a5) 2025-12-04T09:33:18.5275322Z ASIC Revision: 1(0x1) 2025-12-04T09:33:18.5275488Z Cacheline Size: 128(0x80) 2025-12-04T09:33:18.5275654Z Max Clock Freq. (MHz): 2100 2025-12-04T09:33:18.5275808Z BDFID: 25856 2025-12-04T09:33:18.5275964Z Internal Node ID: 2 2025-12-04T09:33:18.5276125Z Compute Unit: 304 2025-12-04T09:33:18.5276282Z SIMDs per CU: 4 2025-12-04T09:33:18.5276442Z Shader Engines: 32 2025-12-04T09:33:18.5276608Z Shader Arrs. per Eng.: 1 2025-12-04T09:33:18.5276777Z WatchPts on Addr. Ranges:4 2025-12-04T09:33:18.5276948Z Coherent Host Access: FALSE 2025-12-04T09:33:18.5277123Z Memory Properties: 2025-12-04T09:33:18.5277252Z Features: KERNEL_DISPATCH 2025-12-04T09:33:18.5277441Z Fast F16 Operation: TRUE 2025-12-04T09:33:18.5277604Z Wavefront Size: 64(0x40) 2025-12-04T09:33:18.5277766Z Workgroup Max Size: 1024(0x400) 2025-12-04T09:33:18.5277940Z Workgroup Max Size per Dimension: 2025-12-04T09:33:18.5278088Z x 1024(0x400) 2025-12-04T09:33:18.5278227Z y 1024(0x400) 2025-12-04T09:33:18.5278361Z z 1024(0x400) 2025-12-04T09:33:18.5278510Z Max Waves Per CU: 32(0x20) 2025-12-04T09:33:18.5278673Z Max Work-item Per CU: 2048(0x800) 2025-12-04T09:33:18.5278840Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T09:33:18.5278986Z Grid Max Size per Dimension: 2025-12-04T09:33:18.5279112Z x 2147483647(0x7fffffff) 2025-12-04T09:33:18.5279252Z y 65535(0xffff) 2025-12-04T09:33:18.5279386Z z 65535(0xffff) 2025-12-04T09:33:18.5279539Z Max fbarriers/Workgrp: 32 2025-12-04T09:33:18.5279762Z Packet Processor uCode:: 185 2025-12-04T09:33:18.5279931Z SDMA engine uCode:: 24 2025-12-04T09:33:18.5280095Z IOMMU Support:: None 2025-12-04T09:33:18.5280238Z Pool Info: 2025-12-04T09:33:18.5280352Z Pool 1 2025-12-04T09:33:18.5280494Z Segment: GLOBAL; FLAGS: COARSE GRAINED 2025-12-04T09:33:18.5280658Z Size: 268419072(0xfffc000) KB 2025-12-04T09:33:18.5280824Z Allocatable: TRUE 2025-12-04T09:33:18.5280991Z Alloc Granule: 4KB 2025-12-04T09:33:18.5281164Z Alloc Recommended Granule:2048KB 2025-12-04T09:33:18.5281335Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5281501Z Accessible by all: FALSE 2025-12-04T09:33:18.5281649Z Pool 2 2025-12-04T09:33:18.5281788Z Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED 2025-12-04T09:33:18.5281946Z Size: 268419072(0xfffc000) KB 2025-12-04T09:33:18.5282130Z Allocatable: TRUE 2025-12-04T09:33:18.5282293Z Alloc Granule: 4KB 2025-12-04T09:33:18.5282463Z Alloc Recommended Granule:2048KB 2025-12-04T09:33:18.5282676Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5282895Z Accessible by all: FALSE 2025-12-04T09:33:18.5283042Z Pool 3 2025-12-04T09:33:18.5283190Z Segment: GLOBAL; FLAGS: FINE GRAINED 2025-12-04T09:33:18.5283385Z Size: 268419072(0xfffc000) KB 2025-12-04T09:33:18.5283538Z Allocatable: TRUE 2025-12-04T09:33:18.5283695Z Alloc Granule: 4KB 2025-12-04T09:33:18.5283864Z Alloc Recommended Granule:2048KB 2025-12-04T09:33:18.5284033Z Alloc Alignment: 4KB 2025-12-04T09:33:18.5284197Z Accessible by all: FALSE 2025-12-04T09:33:18.5284342Z Pool 4 2025-12-04T09:33:18.5284473Z Segment: GROUP 2025-12-04T09:33:18.5284621Z Size: 64(0x40) KB 2025-12-04T09:33:18.5284811Z Allocatable: FALSE 2025-12-04T09:33:18.5284974Z Alloc Granule: 0KB 2025-12-04T09:33:18.5285141Z Alloc Recommended Granule:0KB 2025-12-04T09:33:18.5285313Z Alloc Alignment: 0KB 2025-12-04T09:33:18.5285479Z Accessible by all: FALSE 2025-12-04T09:33:18.5285625Z ISA Info: 2025-12-04T09:33:18.5285739Z ISA 1 2025-12-04T09:33:18.5285880Z Name: amdgcn-amd-amdhsa--gfx942:sramecc+:xnack- 2025-12-04T09:33:18.5286054Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T09:33:18.5286231Z Profiles: HSA_PROFILE_BASE 2025-12-04T09:33:18.5286398Z Default Rounding Mode: NEAR 2025-12-04T09:33:18.5286572Z Default Rounding Mode: NEAR 2025-12-04T09:33:18.5286732Z Fast f16: TRUE 2025-12-04T09:33:18.5286892Z Workgroup Max Size: 1024(0x400) 2025-12-04T09:33:18.5287044Z Workgroup Max Size per Dimension: 2025-12-04T09:33:18.5287184Z x 1024(0x400) 2025-12-04T09:33:18.5287326Z y 1024(0x400) 2025-12-04T09:33:18.5287465Z z 1024(0x400) 2025-12-04T09:33:18.5287615Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T09:33:18.5287764Z Grid Max Size per Dimension: 2025-12-04T09:33:18.5287895Z x 2147483647(0x7fffffff) 2025-12-04T09:33:18.5288039Z y 65535(0xffff) 2025-12-04T09:33:18.5288180Z z 65535(0xffff) 2025-12-04T09:33:18.5288332Z FBarrier Max Size: 32 2025-12-04T09:33:18.5288475Z ISA 2 2025-12-04T09:33:18.5288625Z Name: amdgcn-amd-amdhsa--gfx9-4-generic:sramecc+:xnack- 2025-12-04T09:33:18.5288809Z Machine Models: HSA_MACHINE_MODEL_LARGE 2025-12-04T09:33:18.5288978Z Profiles: HSA_PROFILE_BASE 2025-12-04T09:33:18.5289148Z Default Rounding Mode: NEAR 2025-12-04T09:33:18.5289313Z Default Rounding Mode: NEAR 2025-12-04T09:33:18.5289469Z Fast f16: TRUE 2025-12-04T09:33:18.5289662Z Workgroup Max Size: 1024(0x400) 2025-12-04T09:33:18.5289816Z Workgroup Max Size per Dimension: 2025-12-04T09:33:18.5289950Z x 1024(0x400) 2025-12-04T09:33:18.5290089Z y 1024(0x400) 2025-12-04T09:33:18.5290224Z z 1024(0x400) 2025-12-04T09:33:18.5290368Z Grid Max Size: 4294967295(0xffffffff) 2025-12-04T09:33:18.5290511Z Grid Max Size per Dimension: 2025-12-04T09:33:18.5290636Z x 2147483647(0x7fffffff) 2025-12-04T09:33:18.5290770Z y 65535(0xffff) 2025-12-04T09:33:18.5290905Z z 65535(0xffff) 2025-12-04T09:33:18.5291059Z FBarrier Max Size: 32 2025-12-04T09:33:18.5291202Z *** Done *** 2025-12-04T09:33:18.5299193Z + rocminfo 2025-12-04T09:33:18.5300476Z + grep -E 'Name:.*\sgfx|Marketing' 2025-12-04T09:33:18.5793956Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5794648Z Marketing Name: AMD EPYC 9575F 64-Core Processor 2025-12-04T09:33:18.5795013Z Name: gfx942 2025-12-04T09:33:18.5795349Z Marketing Name: AMD Radeon Graphics 2025-12-04T09:33:18.5859625Z + MAYBE_ROCM=rocm/ 2025-12-04T09:33:18.5859890Z + [[ linux-noble-rocm-py3.12-mi300 == *xpu* ]] 2025-12-04T09:33:18.5860177Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-12-04T09:33:18.5860436Z + pip_install ninja==1.10.2 2025-12-04T09:33:18.5860719Z + pip_install_pkg='python3 -m pip install --progress-bar off' 2025-12-04T09:33:18.5861053Z + python3 -m pip install --progress-bar off ninja==1.10.2 2025-12-04T09:33:18.7706823Z Collecting ninja==1.10.2 2025-12-04T09:33:18.7957433Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB) 2025-12-04T09:33:18.8031291Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2025-12-04T09:33:18.8963513Z Installing collected packages: ninja 2025-12-04T09:33:18.8963805Z Attempting uninstall: ninja 2025-12-04T09:33:18.8976199Z Found existing installation: ninja 1.11.1.4 2025-12-04T09:33:18.8985504Z Uninstalling ninja-1.11.1.4: 2025-12-04T09:33:18.9010876Z Successfully uninstalled ninja-1.11.1.4 2025-12-04T09:33:18.9093236Z Successfully installed ninja-1.10.2 2025-12-04T09:33:18.9385470Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:33:18.9386779Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/rocm/llvm/bin:/opt/rocm/opencl/bin:/opt/rocm/hip/bin:/opt/rocm/hcc/bin:/opt/rocm/bin:/opt/conda/envs/py_3.12/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2025-12-04T09:33:18.9387536Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-12-04T09:33:18.9387811Z + [[ linux-noble-rocm-py3.12-mi300 == *asan* ]] 2025-12-04T09:33:18.9388066Z + [[ linux-noble-rocm-py3.12-mi300 == *-debug* ]] 2025-12-04T09:33:18.9388328Z + [[ linux-noble-rocm-py3.12-mi300 != *-bazel-* ]] 2025-12-04T09:33:18.9388707Z + echo 'We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass' 2025-12-04T09:33:18.9389144Z We are not in debug mode: linux-noble-rocm-py3.12-mi300. Expect the assertion to pass 2025-12-04T09:33:18.9389577Z + cd test 2025-12-04T09:33:18.9389845Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)' 2025-12-04T09:33:19.8576003Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2025-12-04T09:33:19.8576217Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]] 2025-12-04T09:33:19.8576375Z + [[ default == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]] 2025-12-04T09:33:19.8578241Z + DYNAMO_BENCHMARK_FLAGS=() 2025-12-04T09:33:19.8578396Z + [[ default == *pr_time_benchmarks* ]] 2025-12-04T09:33:19.8578535Z + [[ default == *dynamo_eager* ]] 2025-12-04T09:33:19.8578656Z + [[ default == *aot_eager* ]] 2025-12-04T09:33:19.8578768Z + [[ default == *aot_inductor* ]] 2025-12-04T09:33:19.8578895Z + [[ default == *max_autotune_inductor* ]] 2025-12-04T09:33:19.8579020Z + [[ default == *inductor* ]] 2025-12-04T09:33:19.8579133Z + [[ default == *dynamic* ]] 2025-12-04T09:33:19.8579247Z + [[ default == *cpu* ]] 2025-12-04T09:33:19.8579350Z + [[ default == *xpu* ]] 2025-12-04T09:33:19.8579480Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda) 2025-12-04T09:33:19.8589081Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-12-04T09:33:19.8589245Z + [[ linux-noble-rocm-py3.12-mi300 == *-bazel-* ]] 2025-12-04T09:33:19.8592006Z + cd test 2025-12-04T09:33:19.8592310Z + python -c 'import torch; print(torch.__config__.show())' 2025-12-04T09:33:20.6044311Z PyTorch built with: 2025-12-04T09:33:20.6044561Z - GCC 11.5 2025-12-04T09:33:20.6044709Z - C++ Version: 201703 2025-12-04T09:33:20.6044997Z - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:33:20.6045756Z - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:33:20.6046001Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:33:20.6046172Z - LAPACK is enabled (usually provided by MKL) 2025-12-04T09:33:20.6046327Z - NNPACK is enabled 2025-12-04T09:33:20.6046462Z - CPU capability usage: AVX512 2025-12-04T09:33:20.6046605Z - HIP Runtime 7.1.25424 2025-12-04T09:33:20.6046731Z - MIOpen 3.5.1 2025-12-04T09:33:20.6046841Z - Magma 2.9.0 2025-12-04T09:33:20.6048783Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=35b7a9a26c5923d98aebaa41a031dae21788a9ee, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_FBGEMM_GENAI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=ON, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 2025-12-04T09:33:20.6050805Z 2025-12-04T09:33:20.8301957Z + cd test 2025-12-04T09:33:20.8302297Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2025-12-04T09:33:21.5267977Z ATen/Parallel: 2025-12-04T09:33:21.5268188Z at::get_num_threads() : 128 2025-12-04T09:33:21.5268362Z at::get_num_interop_threads() : 128 2025-12-04T09:33:21.5268521Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2025-12-04T09:33:21.5268648Z omp_get_max_threads() : 128 2025-12-04T09:33:21.5268878Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications 2025-12-04T09:33:21.5269100Z mkl_get_max_threads() : 128 2025-12-04T09:33:21.5269258Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d) 2025-12-04T09:33:21.5269432Z std::thread::hardware_concurrency() : 128 2025-12-04T09:33:21.5269561Z Environment variables: 2025-12-04T09:33:21.5269671Z OMP_NUM_THREADS : [not set] 2025-12-04T09:33:21.5269782Z MKL_NUM_THREADS : [not set] 2025-12-04T09:33:21.5269894Z ATen parallel backend: OpenMP 2025-12-04T09:33:21.5269969Z 2025-12-04T09:33:21.7640928Z + [[ default == *numpy_2* ]] 2025-12-04T09:33:21.7641297Z + [[ linux-noble-rocm-py3.12-mi300 == *aarch64* ]] 2025-12-04T09:33:21.7641497Z + [[ default == *backward* ]] 2025-12-04T09:33:21.7641664Z + [[ default == *libtorch_agnostic_targetting* ]] 2025-12-04T09:33:21.7641841Z + [[ default == *xla* ]] 2025-12-04T09:33:21.7649866Z + [[ default == *vllm* ]] 2025-12-04T09:33:21.7650031Z + [[ default == *executorch* ]] 2025-12-04T09:33:21.7650193Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]] 2025-12-04T09:33:21.7650358Z + [[ default == \q\u\a\n\t\i\z\a\t\i\o\n ]] 2025-12-04T09:33:21.7650545Z + [[ linux-noble-rocm-py3.12-mi300 == *libtorch* ]] 2025-12-04T09:33:21.7650730Z + [[ default == distributed ]] 2025-12-04T09:33:21.7650885Z + [[ default == *operator_benchmark* ]] 2025-12-04T09:33:21.7651052Z + [[ default == *operator_microbenchmark* ]] 2025-12-04T09:33:21.7651244Z + [[ default == *attention_microbenchmark* ]] 2025-12-04T09:33:21.7651409Z + [[ default == *inductor_distributed* ]] 2025-12-04T09:33:21.7651579Z + [[ default == *inductor-halide* ]] 2025-12-04T09:33:21.7651744Z + [[ default == *inductor-pallas* ]] 2025-12-04T09:33:21.7651907Z + [[ default == *inductor-triton-cpu* ]] 2025-12-04T09:33:21.7652241Z + [[ default == *inductor-micro-benchmark* ]] 2025-12-04T09:33:21.7652423Z + [[ default == *aoti_cross_compile_for_windows* ]] 2025-12-04T09:33:21.7652602Z + [[ default == *huggingface* ]] 2025-12-04T09:33:21.7652744Z + [[ default == *timm* ]] 2025-12-04T09:33:21.7652881Z + [[ default == cachebench ]] 2025-12-04T09:33:21.7653032Z + [[ default == verify_cachebench ]] 2025-12-04T09:33:21.7653180Z + [[ default == *torchbench* ]] 2025-12-04T09:33:21.7653377Z + [[ default == *inductor_cpp_wrapper* ]] 2025-12-04T09:33:21.7653538Z + [[ default == *inductor_core* ]] 2025-12-04T09:33:21.7653688Z + [[ default == *inductor* ]] 2025-12-04T09:33:21.7653828Z + [[ default == *einops* ]] 2025-12-04T09:33:21.7653966Z + [[ default == *dynamo_core* ]] 2025-12-04T09:33:21.7654118Z + [[ default == *dynamo_wrapped* ]] 2025-12-04T09:33:21.7654298Z + [[ linux-noble-rocm-py3.12-mi300 == *rocm* ]] 2025-12-04T09:33:21.7654459Z + [[ -n '' ]] 2025-12-04T09:33:21.7654571Z + [[ 2 == 1 ]] 2025-12-04T09:33:21.7654690Z + [[ 2 == 2 ]] 2025-12-04T09:33:21.7654807Z + [[ 6 -gt 1 ]] 2025-12-04T09:33:21.7654926Z + install_torchvision 2025-12-04T09:33:21.7655060Z + local orig_preload 2025-12-04T09:33:21.7655182Z + local commit 2025-12-04T09:33:21.7655309Z ++ get_pinned_commit vision 2025-12-04T09:33:21.7655461Z ++ cat .github/ci_commit_pins/vision.txt 2025-12-04T09:33:21.7661301Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:21.7661435Z + orig_preload= 2025-12-04T09:33:21.7661530Z + '[' -n '' ']' 2025-12-04T09:33:21.7661636Z + [[ linux-noble-rocm-py3.12-mi300 == *cuda* ]] 2025-12-04T09:33:21.7661921Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision 2025-12-04T09:33:21.7662255Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:21.7662473Z + local wheel_dir=dist/vision 2025-12-04T09:33:21.7662590Z + local found_whl=0 2025-12-04T09:33:21.7662699Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:33:21.7662827Z + [[ -f dist/vision/*.whl ]] 2025-12-04T09:33:21.7662939Z + '[' 0 == 0 ']' 2025-12-04T09:33:21.7663200Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:21.9077513Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:21.9078642Z Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-uqy5ta6a 2025-12-04T09:33:21.9097450Z Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-uqy5ta6a 2025-12-04T09:33:28.2085522Z Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e' 2025-12-04T09:33:28.2099834Z Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:28.8586818Z Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e 2025-12-04T09:33:30.4882198Z Preparing metadata (pyproject.toml) ... [?25l- \ | done 2025-12-04T09:33:30.4902121Z [?25hBuilding wheels for collected packages: torchvision 2025-12-04T09:34:28.1503896Z Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done 2025-12-04T09:34:28.1519685Z [?25h Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl size=1814540 sha256=2cd76d959f7c7ee90f1b19103171072d812c92e81d84994da60d51f70152a80b 2025-12-04T09:34:28.1520596Z Stored in directory: /var/lib/jenkins/.cache/pip/wheels/22/df/b5/2cdf6bb6a10c31c47b56cf4d0441cf0ee834f1c9dee15fb9d9 2025-12-04T09:34:28.1541801Z Successfully built torchvision 2025-12-04T09:34:28.2051825Z + for file in "${wheel_dir}"/*.whl 2025-12-04T09:34:28.2053026Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl 2025-12-04T09:34:28.2053799Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl') 2025-12-04T09:34:28.2054280Z + local args 2025-12-04T09:34:28.2054728Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl == *\ * ]] 2025-12-04T09:34:28.2055143Z + for path in "${args[@]}" 2025-12-04T09:34:28.2055544Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl' 2025-12-04T09:34:28.2056090Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl 2025-12-04T09:34:28.2056721Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl 2025-12-04T09:34:28.3495663Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp312-cp312-linux_x86_64.whl 2025-12-04T09:34:28.3541543Z Installing collected packages: torchvision 2025-12-04T09:34:28.5764553Z Successfully installed torchvision-0.25.0a0+617079d 2025-12-04T09:34:28.6032916Z + '[' -n '' ']' 2025-12-04T09:34:28.6033246Z + test_python_shard 2 2025-12-04T09:34:28.6033571Z + [[ -z 6 ]] 2025-12-04T09:34:28.6034241Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 2 6 --verbose --upload-artifacts-while-running 2025-12-04T09:34:30.1982496Z Excluding inductor/test_max_autotune on ROCm 2025-12-04T09:34:30.1982819Z Excluding test_cuda_nvml_based_avail on ROCm 2025-12-04T09:34:30.5614785Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json 2025-12-04T09:34:30.9159119Z Ignoring disabled issues: [''] 2025-12-04T09:34:30.9208935Z Found test times from artifacts 2025-12-04T09:34:30.9385383Z Found test times from artifacts 2025-12-04T09:34:30.9391374Z Running all tests 2025-12-04T09:34:30.9589964Z Running parallel tests on 1 processes 2025-12-04T09:34:30.9593143Z Name: tests to run (est. time: 171.66min) 2025-12-04T09:34:30.9593388Z Serial tests (94): 2025-12-04T09:34:30.9593511Z inductor/test_aot_inductor 1/3 2025-12-04T09:34:30.9593686Z inductor/test_torchinductor_dynamic_shapes 4/5 2025-12-04T09:34:30.9593842Z inductor/test_torchinductor_opinfo 2/10 2025-12-04T09:34:30.9593984Z inductor/test_torchinductor_opinfo 8/10 2025-12-04T09:34:30.9594121Z inductor/test_cpu_repro 2/4 2025-12-04T09:34:30.9594261Z inductor/test_mkldnn_pattern_matcher 2/2 2025-12-04T09:34:30.9594396Z inductor/test_layout_optim 1/1 2025-12-04T09:34:30.9594531Z inductor/test_cuda_select_algorithm 1/1 2025-12-04T09:34:30.9594672Z functorch/test_eager_transforms 1/1 2025-12-04T09:34:30.9594808Z test_sparse_semi_structured 1/1 2025-12-04T09:34:30.9594943Z inductor/test_aot_inductor_arrayref 1/2 2025-12-04T09:34:30.9595417Z inductor/test_compile_subprocess 1/3 2025-12-04T09:34:30.9595560Z inductor/test_multi_kernel 1/1 2025-12-04T09:34:30.9595686Z inductor/test_loop_ordering 1/1 2025-12-04T09:34:30.9595819Z dynamo/test_functions 1/1 2025-12-04T09:34:30.9595945Z dynamo/test_regional_inductor 1/1 2025-12-04T09:34:30.9596077Z inductor/test_inplace_padding 1/1 2025-12-04T09:34:30.9596225Z inductor/test_fp8 1/1 2025-12-04T09:34:30.9596343Z inductor/test_pad_mm 1/1 2025-12-04T09:34:30.9596462Z dynamo/test_utils 1/1 2025-12-04T09:34:30.9596577Z inductor/test_mps_basic 1/1 2025-12-04T09:34:30.9596706Z inductor/test_external_callables 1/1 2025-12-04T09:34:30.9596858Z export/test_export_training_ir_to_run_decomp 1/1 2025-12-04T09:34:30.9597007Z inductor/test_async_compile 1/1 2025-12-04T09:34:30.9597137Z inductor/test_compiled_optimizers 2/2 2025-12-04T09:34:30.9597271Z inductor/test_control_flow 4/4 2025-12-04T09:34:30.9597400Z inductor/test_minifier_isolate 1/1 2025-12-04T09:34:30.9597532Z test_matmul_cuda 1/1 2025-12-04T09:34:30.9597642Z test_ops 3/5 2025-12-04T09:34:30.9597744Z test_decomp 4/11 2025-12-04T09:34:30.9597970Z test_decomp 10/11 2025-12-04T09:34:30.9598085Z nn/test_multihead_attention 1/1 2025-12-04T09:34:30.9598219Z higher_order_ops/test_invoke_quant 1/1 2025-12-04T09:34:30.9598356Z higher_order_ops/test_local_map 1/1 2025-12-04T09:34:30.9598492Z higher_order_ops/test_invoke_subgraph 1/1 2025-12-04T09:34:30.9598622Z test_utils 1/1 2025-12-04T09:34:30.9598736Z profiler/test_memory_profiler 1/1 2025-12-04T09:34:30.9598867Z functorch/test_aotdispatch 1/1 2025-12-04T09:34:30.9598986Z test_fx 2/3 2025-12-04T09:34:30.9599089Z functorch/test_ops 2/5 2025-12-04T09:34:30.9599202Z nn/test_pruning 1/1 2025-12-04T09:34:30.9599314Z optim/test_lrscheduler 1/1 2025-12-04T09:34:30.9599439Z profiler/test_cpp_thread 1/1 2025-12-04T09:34:30.9599568Z profiler/test_execution_trace 1/1 2025-12-04T09:34:30.9599700Z profiler/test_profiler_tree 1/1 2025-12-04T09:34:30.9599830Z profiler/test_python_tracer 1/1 2025-12-04T09:34:30.9599958Z profiler/test_record_function 1/1 2025-12-04T09:34:30.9600093Z profiler/test_torch_tidy 1/1 2025-12-04T09:34:30.9600213Z test_accelerator 1/1 2025-12-04T09:34:30.9600332Z test_appending_byte_serializer 1/1 2025-12-04T09:34:30.9600464Z test_as_strided 1/1 2025-12-04T09:34:30.9600572Z test_autocast 1/1 2025-12-04T09:34:30.9600685Z test_bundled_inputs 1/1 2025-12-04T09:34:30.9600807Z test_comparison_utils 1/1 2025-12-04T09:34:30.9600925Z test_compile_benchmark_util 1/1 2025-12-04T09:34:30.9601045Z test_complex 1/1 2025-12-04T09:34:30.9601156Z test_cpp_api_parity 1/1 2025-12-04T09:34:30.9601285Z test_fx_passes 1/1 2025-12-04T09:34:30.9601393Z test_fx_reinplace_pass 1/1 2025-12-04T09:34:30.9601507Z test_hop_infra 1/1 2025-12-04T09:34:30.9601604Z test_hub 1/1 2025-12-04T09:34:30.9601702Z test_jit_autocast 1/1 2025-12-04T09:34:30.9601816Z test_jit_disabled 1/1 2025-12-04T09:34:30.9601921Z test_jit_fuser_te 1/1 2025-12-04T09:34:30.9602022Z test_mkldnn 1/1 2025-12-04T09:34:30.9602125Z test_nestedtensor 2/2 2025-12-04T09:34:30.9602246Z test_rename_privateuse1_to_existing_device 1/1 2025-12-04T09:34:30.9602383Z test_scaled_matmul_cuda 1/1 2025-12-04T09:34:30.9602498Z test_scatter_gather_ops 1/1 2025-12-04T09:34:30.9602609Z test_schema_check 1/1 2025-12-04T09:34:30.9602714Z test_stateless 1/1 2025-12-04T09:34:30.9602815Z test_subclass 1/1 2025-12-04T09:34:30.9602915Z test_sympy_utils 1/1 2025-12-04T09:34:30.9603020Z test_tensorboard 1/1 2025-12-04T09:34:30.9603129Z test_utils_config_module 1/1 2025-12-04T09:34:30.9603238Z test_view_ops 1/1 2025-12-04T09:34:30.9603400Z test_xnnpack_integration 1/1 2025-12-04T09:34:30.9603527Z torch_np/numpy_tests/lib/test_arraypad 1/1 2025-12-04T09:34:30.9603672Z torch_np/numpy_tests/lib/test_arraysetops 1/1 2025-12-04T09:34:30.9603868Z torch_np/numpy_tests/lib/test_function_base 1/1 2025-12-04T09:34:30.9604016Z torch_np/numpy_tests/lib/test_histograms 1/1 2025-12-04T09:34:30.9604170Z torch_np/numpy_tests/lib/test_index_tricks 1/1 2025-12-04T09:34:30.9604316Z torch_np/numpy_tests/lib/test_shape_base_ 1/1 2025-12-04T09:34:30.9604458Z torch_np/numpy_tests/lib/test_twodim_base 1/1 2025-12-04T09:34:30.9604599Z torch_np/numpy_tests/lib/test_type_check 1/1 2025-12-04T09:34:30.9604742Z torch_np/numpy_tests/linalg/test_linalg 1/1 2025-12-04T09:34:30.9604870Z torch_np/test_basic 1/1 2025-12-04T09:34:30.9604984Z torch_np/test_binary_ufuncs 1/1 2025-12-04T09:34:30.9605103Z torch_np/test_dtype 1/1 2025-12-04T09:34:30.9605216Z torch_np/test_function_base 1/1 2025-12-04T09:34:30.9605336Z torch_np/test_indexing 1/1 2025-12-04T09:34:30.9605452Z torch_np/test_reductions 1/1 2025-12-04T09:34:30.9605570Z typing/test_python_operators 1/1 2025-12-04T09:34:30.9605689Z xpu/test_conv 1/1 2025-12-04T09:34:30.9605792Z Parallel tests (0): 2025-12-04T09:34:30.9605900Z Name: excluded (est. time: 0.0min) 2025-12-04T09:34:30.9606015Z Serial tests (0): 2025-12-04T09:34:30.9606153Z Parallel tests (0): 2025-12-04T09:34:30.9606330Z Running inductor/test_aot_inductor 1/3 ... [2025-12-04 09:34:30.959522][2244855.226190993] 2025-12-04T09:34:30.9606520Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T09:34:30.9606922Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:34:30.959799] 2025-12-04T09:45:04.8711189Z 2025-12-04T09:45:04.8712357Z inductor/test_aot_inductor 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_1.3_4808bd0e5a98ed60_.log 2025-12-04T09:45:04.8771581Z Running 306 items in this shard: test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_add_complex_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_addmm_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_constant_tensor_name_collision_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aoti_debug_printer_cpp_kernel_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_boolean_indexing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_buffer_mutation_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_mismatched_branch_output_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_predicate_on_cpu_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_symint_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_replace_view_ops_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_device_moved_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_duplicated_params_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fake_tensor_device_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_free_inactive_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_int_list_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_mmaped_weights_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_weight_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_masked_select_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_misaligned_input_1_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pad_non_zero_memory_leak_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_poi_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_squeeze_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_pytree_inputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quanatized_int8_linear_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_quantized_linear_bias_none_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_replace_unbacked_symbol_with_backed_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_reuse_kernel_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_runtime_checks_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_multi_arch_embed_kernel_binary_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_stft_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_reinterpret_view_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_with_none_input_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_mutated_autotuning_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbounded_expr_substitutions_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_update_constant_buffer_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_simple_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_conv_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_outer_code_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_with_cudagraphs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_add_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_addmm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_amp_fallback_random_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_constant_tensor_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_aoti_debug_printer_codegen_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_assert_async_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_buffer_mutation_3_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_nested_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_non_tensor_predicates_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicate_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_use_buffers_from_outer_scope_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_original_fqn_and_dtype_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_device_moved_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_dynamic_smem_above_default_limit_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fill__fallback_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_fqn_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_free_inactive_buffer_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_on_disk_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_linear_dynamic_maxautotune_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_non_default_gpu_device_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_on_gpu_device1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_permute_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_pytree_inputs_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_complex_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_runtime_checks_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_with_unbacked_add_and_mul_expr_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_symint_item_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_extern_kernel_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_reinterpret_view_mem_leak_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_sympy_expr_arg_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_sym_expr_cond_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_grid_with_backed_symbols_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_zero_size_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_amp_fallback_random_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aot_inductor_consts_cpp_build_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_debug_printer_fp8_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aoti_user_defined_triton_kernel_profiling_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotuning_args_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_4_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_codegen_int_array_var_fix_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_cpu_predicate_cuda_operands_max_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_multiple_outputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_with_replace_view_ops_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_original_fqn_and_dtype_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_constant_type_propagation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_smem_above_default_limit_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_mem_leak_fix_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_foreach_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fqn_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_grid_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_masked_select_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_missing_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_mixed_device_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nested_tensor_from_jagged_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_non_contiguous_output_alias_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_none_args_aot_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_misaligned_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pad_non_zero_memory_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_proxy_executor_permute_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_replace_unbacked_symbol_with_backed_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_reuse_kernel_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_fp8_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_embed_kernel_binary_False_max_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_small_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_stride_with_unbacked_expr_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sympy_cpp_printer_min_max_minmax1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_autotuning_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_input_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_next_power_of_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbounded_expr_substitutions_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_using_model_name_for_files_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_mixed_device_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_outer_buffers_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_parameters_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_no_triton_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps 2025-12-04T09:45:04.8818704Z 2025-12-04T09:45:04.8818834Z Finished inductor/test_aot_inductor 1/3 ... [2025-12-04 09:45:04.871237][2245489.137907856], took 10.57min 2025-12-04T09:45:04.8819230Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T09:45:07.1403195Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T09:45:07.1405072Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T09:45:07.1405420Z Uploading artifacts took 0.00 seconds 2025-12-04T09:45:07.1406583Z Running inductor/test_torchinductor_dynamic_shapes 4/5 ... [2025-12-04 09:45:07.140521][2245491.407187568] 2025-12-04T09:45:07.1406921Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T09:45:07.1429670Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:45:07.140831] 2025-12-04T09:59:35.9318604Z 2025-12-04T09:59:35.9319437Z inductor/test_torchinductor_dynamic_shapes 4/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_4.5_20c2bd9be3dc89ee_.log 2025-12-04T09:59:35.9386885Z Running 371 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool2d_low_prec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_inplace_permuted_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adding_tensor_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_assert_alignment_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_computed_offsets_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_nd_tiling_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_empty_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_check_stack_no_cycles_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_config_option_dont_assume_alignment_cudagraphs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_config_option_dont_assume_alignment_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_consecutive_split_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_1d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_fill_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_nd_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_with_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_cpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_gpu_tensor_cpp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumprod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_no_mask_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_multi_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_data_type_propogation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_deterministic_codegen_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_device_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_precision_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_presicion_accuracy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_prim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtype_mismatch_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_bfloat16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expand_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_expanded_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fill1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fill2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float32_to_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fuse_large_params_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fuse_tiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_misaligned_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_abs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_deterministic_fallback_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_fallback1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inner_reduction_detection_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_mixed_dtype_ops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_grid_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_grid_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_offset_pointwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_triton_kernel_wrapper_functional_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logcumsumexp_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_long_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_masked_fill_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mixed_mm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mixed_mm3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mm_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mul_softmax_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_prime_size_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_var_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutations_loop_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_needs_contiguous_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_no_mega_fusion_during_lowering_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nonzero_unbacked_refinement_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_single_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_y1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_log_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_ndtr_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_scaled_modified_bessel_k0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_scaled_modified_bessel_k1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_sinc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_polar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randint_int64_mod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_no_ops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_replication_pad_errors_with_bool_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_resize_as_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sgn_extremal_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_shape_prop_torch_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sign_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_simplify_loops_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_single_elem_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_loop_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_one_kernel_persist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sqrt_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_squeeze_varargs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_std_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sum_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tanh_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_transpose_add_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_transposed_propagates_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_triu_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_floordiv_simplify_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsqueeze_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_bicubic2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_matmul_4bit_fp32_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool1d_argmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex9_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex_strided_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_const_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_inplace_permuted_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_alexnet_prefix_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aliased_buffer_reuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_allow_reuse_active_if_under_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_allow_reuse_disable_if_exceed_peak_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_support_str_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin_with_duplicates_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_to_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_alignment_op_name_pass_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_batch_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_neg_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_empty_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_empty_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cauchy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_check_stack_no_cycles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_compar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_cudagraphs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_backward_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_functional_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_inference_heuristics_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cummin_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumprod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_no_mask_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_scan_op_multi_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dense_mask_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_diagonal_copy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dist_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_trivial_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtype_mismatch_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_bfloat16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_bfloat16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_bag_byte_unpack_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_erfc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_erfinv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_with_return_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fmod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fmod_zero_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_functionalize_rng_wrappers_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_generate_rand_fp8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_pad_dynamic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardswish_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_deterministic_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_indirect_load_broadcast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_multiple_specializations_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kernel_names_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_l1_loss_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_grid_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_layer_norm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linspace3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_list_clearing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_regional_compile_repeated_blocks_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_recompile_on_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_True_descending_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_new_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_norm_constant_overflow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_output_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_view_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_t_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_erfcx_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_expit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammainc_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_hermite_polynomial_he_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0e_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_ndtri_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_round_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_spherical_bessel_j0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randn_generator_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_view_default_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reuse_buffers_with_aliasing_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scaled_dot_product_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_add3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_bf16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_shape_padding_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sin_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_single_elem_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_single_elem_indirect_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sizehint_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_special_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumprod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_low_prec_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_strided_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor_index_slice_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tmp_not_defined_issue2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tmp_not_defined_issue3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transposed_propagates_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_uint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vdd_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vectorized_ops_masked_var_novec_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_constant_fold_uniform_value_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_zeros_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op10_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op1_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op8_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op9_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sym_stride_lowering_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sym_sum_unbacked_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_save_data_dependent_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_reduction_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_dynamic_cuda 2025-12-04T09:59:35.9447880Z 2025-12-04T09:59:35.9448065Z Finished inductor/test_torchinductor_dynamic_shapes 4/5 ... [2025-12-04 09:59:35.932041][2246360.198710125], took 14.48min 2025-12-04T09:59:35.9448488Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T09:59:35.9448850Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T09:59:35.9449098Z Running inductor/test_torchinductor_opinfo 2/10 ... [2025-12-04 09:59:35.937971][2246360.204643889] 2025-12-04T09:59:35.9449309Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T09:59:35.9449724Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=2', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:59:35.938147] 2025-12-04T10:07:08.6734080Z 2025-12-04T10:07:08.6735530Z inductor/test_torchinductor_opinfo 2/10 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_2.10_89524e7cd1d7ac1a_.log 2025-12-04T10:07:08.6794752Z Running 363 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmatmul___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcmul_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bucketize_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumulative_trapezoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_no_rounding_mode_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eq_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fft_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_power_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_add_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_inner_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isneginf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_unary_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logaddexp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matrix_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_median_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_movedim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_multinomial_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_zeros_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_group_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_kl_div_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool1d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pdist_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_number_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_quantile_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize__cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sgn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_with_dtype_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_erfcx_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_svd_lowrank_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensordot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_uint16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_float32 2025-12-04T10:07:08.6853769Z 2025-12-04T10:07:08.6853911Z Finished inductor/test_torchinductor_opinfo 2/10 ... [2025-12-04 10:07:08.667282][2246812.933951226], took 7.55min 2025-12-04T10:07:08.6854325Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:07:08.6854698Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:07:08.6854929Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T10:07:08.6855113Z Uploading artifacts took 0.00 seconds 2025-12-04T10:07:08.6855318Z Running inductor/test_torchinductor_opinfo 8/10 ... [2025-12-04 10:07:08.672946][2246812.939618024] 2025-12-04T10:07:08.6855534Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:07:08.6855950Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=8', '--num-shards=10', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:07:08.673143] 2025-12-04T10:15:54.5654919Z 2025-12-04T10:15:54.5655994Z inductor/test_torchinductor_opinfo 8/10 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_8.10_d915845c16223ad7_.log 2025-12-04T10:15:54.5721335Z Running 349 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___ror___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_decomposed_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_any_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_arange_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argwhere_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_2d_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_or_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bool_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_shapes_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_byte_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cauchy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_column_stack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_complex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_embed_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_einsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_strided_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_equal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfc_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_float_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmod_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_histc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isfinite_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isinf_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isposinf_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_item_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kthvalue_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lcm_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cond_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eig_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_householder_product_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_power_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_and_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_not_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logspace_tensor_overload_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mH_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumsum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_var_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_maximum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_dropout_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_instance_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_trilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_layer_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_prelu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_nuc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_in_place_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rand_like_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_roll_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rot90_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsub_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_to_size_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tanh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trace_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tril_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triu_indices_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_consecutive_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unique_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_like_cuda_int64 2025-12-04T10:15:54.5776916Z 2025-12-04T10:15:54.5777055Z Finished inductor/test_torchinductor_opinfo 8/10 ... [2025-12-04 10:15:54.565798][2247338.832466603], took 8.76min 2025-12-04T10:15:54.5777459Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:15:54.5777815Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:15:54.5778038Z Running inductor/test_cpu_repro 2/4 ... [2025-12-04 10:15:54.571676][2247338.838348717] 2025-12-04T10:15:54.5778223Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:15:54.5778613Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=2', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:54.571862] 2025-12-04T10:23:13.5394253Z 2025-12-04T10:23:13.5395251Z inductor/test_cpu_repro 2/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_2.4_85ab7db17e10bdde_.log 2025-12-04T10:23:13.5440746Z Running 179 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_argmax_argmin_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_broadcast_scalar_cpp_tile_2d_kernel, test/inductor/test_cpu_repro.py::CPUReproTests::test_channels_last_view_as_complex, test/inductor/test_cpu_repro.py::CPUReproTests::test_complex_memory_overlap, test/inductor/test_cpu_repro.py::CPUReproTests::test_concat_inner_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_constant_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_stride_constraints, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int32_to_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_cpp_kernel_profile, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_maxpool2d_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_disabled_amp_is_inference_True, test/inductor/test_cpu_repro.py::CPUReproTests::test_dropout, test/inductor/test_cpu_repro.py::CPUReproTests::test_embedding_vec_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_for_loop_collapsed, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_15,3,13, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_frexp, test/inductor/test_cpu_repro.py::CPUReproTests::test_horizontal_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_in_out_buffer, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_propagation_issue_102065, test/inductor/test_cpu_repro.py::CPUReproTests::test_inplace_add_alpha, test/inductor/test_cpu_repro.py::CPUReproTests::test_invalid_index_of_empty_tensor, test/inductor/test_cpu_repro.py::CPUReproTests::test_ir_node_str, test/inductor/test_cpu_repro.py::CPUReproTests::test_issue122380, test/inductor/test_cpu_repro.py::CPUReproTests::test_load_half, test/inductor/test_cpu_repro.py::CPUReproTests::test_local_buffer_with_line_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_logical_op_store_to_lowp_data_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_masked_load_int64_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_max_reduction_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_maxpool2d_with_pre_loop_collapse_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_memory_copy_with_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_meta_device, test/inductor/test_cpu_repro.py::CPUReproTests::test_mkl_linear, test/inductor/test_cpu_repro.py::CPUReproTests::test_multihead_attention_cpu, test/inductor/test_cpu_repro.py::CPUReproTests::test_new_vec_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_param_assign_wrapped, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_index_with_constant_stride, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_reduction_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_parallel_num_threads, test/inductor/test_cpu_repro.py::CPUReproTests::test_pow_cos, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduce_with_masked, test/inductor/test_cpu_repro.py::CPUReproTests::test_reduction_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_redundant_to_node_elimination_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_with_inf_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_require_stride_order_non_owning, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_share_local_buffers_in_outer_loop_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_sigmoid_with_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_softmax_with_zero_dim, test/inductor/test_cpu_repro.py::CPUReproTests::test_store_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_load_decomposed_dequant_add_relu_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_float_bool, test/inductor/test_cpu_repro.py::CPUReproTests::test_torch_logit, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_sum2d_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_bitwise, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_indirect_load_cse_cache, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_logical, test/inductor/test_cpu_repro.py::CPUReproTests::test_view_dtype 2025-12-04T10:23:13.5476804Z 2025-12-04T10:23:13.5476919Z Finished inductor/test_cpu_repro 2/4 ... [2025-12-04 10:23:13.539369][2247777.806038871], took 7.32min 2025-12-04T10:23:13.5477370Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:23:13.5477723Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:23:13.5477966Z Running inductor/test_mkldnn_pattern_matcher 2/2 ... [2025-12-04 10:23:13.545601][2247777.812274753] 2025-12-04T10:23:13.5478167Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:23:13.5478575Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:23:13.545779] 2025-12-04T10:26:59.8291787Z 2025-12-04T10:26:59.8292549Z inductor/test_mkldnn_pattern_matcher 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_2.2_03ef1d2b40b1572b_.log 2025-12-04T10:26:59.8320278Z Running 138 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_add_scalar, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_failed_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_pass_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_add_bias, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_fp32, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_input_non_contiguous_3D_wo_bias, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_multi_linear_share_same_input, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add_relu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardswish, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardtanh, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_silu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv1d_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_broadcast_shapes_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_int8_mixed_bf16_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_False_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_False_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_xpu_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_dynamic_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_fp8_inductor_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_sum_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_113440_issue_1, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_113440_issue_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_99842_issue, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_True_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int4_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int8, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_linear_unary_dynamic_shapes, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_qat_bn_conv2d 2025-12-04T10:26:59.8346798Z 2025-12-04T10:26:59.8346946Z Finished inductor/test_mkldnn_pattern_matcher 2/2 ... [2025-12-04 10:26:59.829008][2248004.095678383], took 3.77min 2025-12-04T10:26:59.8347360Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:26:59.8351384Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:26:59.8354034Z Running inductor/test_layout_optim 1/1 ... [2025-12-04 10:26:59.835288][2248004.101959075] 2025-12-04T10:26:59.8354238Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:26:59.8356101Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_layout_optim.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:26:59.835510] 2025-12-04T10:27:05.2061113Z 2025-12-04T10:27:05.2062264Z inductor/test_layout_optim 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_layout_optim_1.1_9416772a88a4e836_.log 2025-12-04T10:27:05.2063142Z Running 0 items in this shard: 2025-12-04T10:27:05.2063636Z 2025-12-04T10:27:05.2063983Z Finished inductor/test_layout_optim 1/1 ... [2025-12-04 10:27:05.205759][2248009.472428278], took 0.09min 2025-12-04T10:27:05.2065187Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:27:05.2120366Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:27:05.2123523Z Running inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 10:27:05.212077][2248009.478750149] 2025-12-04T10:27:05.2123763Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:27:05.2124191Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:27:05.212253] 2025-12-04T10:58:28.1903160Z 2025-12-04T10:58:28.1904279Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_2c02839777e739fe_.log) 2025-12-04T10:58:28.1905377Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6bb55f98b09515aa.xml 2025-12-04T10:58:28.1906790Z ============================= test session starts ============================== 2025-12-04T10:58:28.1907325Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.1907797Z cachedir: .pytest_cache 2025-12-04T10:58:28.1908378Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.1908977Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.1909273Z configfile: pytest.ini 2025-12-04T10:58:28.1909828Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.1910416Z collecting ... collected 58 items 2025-12-04T10:58:28.1910773Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T10:58:28.1931404Z Running 58 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.1945249Z 2025-12-04T10:58:28.1945517Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5286s] [ 1%] 2025-12-04T10:58:28.1946095Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5089s] [ 1%] 2025-12-04T10:58:28.1946625Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5045s] [ 1%] 2025-12-04T10:58:28.1946905Z 2025-12-04T10:58:28.1946969Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.1947268Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.1947519Z Traceback (most recent call last): 2025-12-04T10:58:28.1947785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1948040Z method(*args, **kwargs) 2025-12-04T10:58:28.1948272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1948523Z method(*args, **kwargs) 2025-12-04T10:58:28.1948795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.1949023Z with policy(): 2025-12-04T10:58:28.1949238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.1949478Z raise RuntimeError(msg) 2025-12-04T10:58:28.1950011Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.1950465Z 2025-12-04T10:58:28.1950570Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.1951048Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.1951433Z 2025-12-04T10:58:28.1951524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.1951796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1951972Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1952253Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1952568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1952719Z graph_break [] 2025-12-04T10:58:28.1952937Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.1953182Z Traceback (most recent call last): 2025-12-04T10:58:28.1953468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1953704Z method(*args, **kwargs) 2025-12-04T10:58:28.1953938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1954173Z method(*args, **kwargs) 2025-12-04T10:58:28.1954412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.1954667Z with policy(): 2025-12-04T10:58:28.1954917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.1955152Z raise RuntimeError(msg) 2025-12-04T10:58:28.1955645Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.1956103Z 2025-12-04T10:58:28.1956183Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.1956598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.1956941Z 2025-12-04T10:58:28.1957036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.1957249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1957431Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1957785Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1958131Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1958299Z graph_break [] 2025-12-04T10:58:28.1958439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1958614Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1958782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1959069Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1959320Z graph_break [] 2025-12-04T10:58:28.1959428Z =================================== FAILURES =================================== 2025-12-04T10:58:28.1959686Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.1959934Z Traceback (most recent call last): 2025-12-04T10:58:28.1960204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1960480Z method(*args, **kwargs) 2025-12-04T10:58:28.1960716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.1960949Z method(*args, **kwargs) 2025-12-04T10:58:28.1961175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.1961407Z with policy(): 2025-12-04T10:58:28.1961644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.1961883Z raise RuntimeError(msg) 2025-12-04T10:58:28.1962377Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.1962851Z 2025-12-04T10:58:28.1962931Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.1963410Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.1963776Z 2025-12-04T10:58:28.1963863Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.1964094Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1964303Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1964579Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1964868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1965021Z graph_break [] 2025-12-04T10:58:28.1965151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1965323Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1965487Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1965775Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1966033Z graph_break [] 2025-12-04T10:58:28.1966168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.1966344Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.1966536Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.1966870Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.1967144Z graph_break [] 2025-12-04T10:58:28.1967464Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6bb55f98b09515aa.xml - 2025-12-04T10:58:28.1967804Z =========================== short test summary info ============================ 2025-12-04T10:58:28.1968563Z FAILED [0.5045s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.1969249Z 2025-12-04T10:58:28.1969365Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.1969790Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.1970129Z 2025-12-04T10:58:28.1970219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.1970415Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.1970576Z ========================== 1 failed, 2 rerun in 3.70s ========================== 2025-12-04T10:58:28.1970723Z Got exit code 1 2025-12-04T10:58:28.1970828Z Retrying single test... 2025-12-04T10:58:28.1971097Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b420f9f196448500.xml 2025-12-04T10:58:28.1971395Z ============================= test session starts ============================== 2025-12-04T10:58:28.1971617Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.1971811Z cachedir: .pytest_cache 2025-12-04T10:58:28.1972038Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.1972282Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.1972411Z configfile: pytest.ini 2025-12-04T10:58:28.1972648Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.1972952Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.1973428Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2004055Z Running 1 items in this shard 2025-12-04T10:58:28.2004211Z 2025-12-04T10:58:28.2004635Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:22.416688749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2005085Z 2025-12-04T10:58:28.2005282Z [W1204 10:27:22.677183898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2005480Z 2025-12-04T10:58:28.2005648Z [W1204 10:27:22.677374315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2005848Z 2025-12-04T10:58:28.2006126Z [W1204 10:27:22.677858888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2006323Z 2025-12-04T10:58:28.2006476Z [W1204 10:27:22.677961306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2006671Z 2025-12-04T10:58:28.2006823Z [W1204 10:27:22.679222807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2007054Z 2025-12-04T10:58:28.2007219Z [W1204 10:27:22.679287886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2007424Z 2025-12-04T10:58:28.2007582Z [W1204 10:27:22.679406864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2007802Z 2025-12-04T10:58:28.2007959Z [W1204 10:27:22.679460963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2008160Z 2025-12-04T10:58:28.2008415Z [W1204 10:27:22.683729916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2008617Z 2025-12-04T10:58:28.2008773Z [W1204 10:27:22.683813395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2008962Z 2025-12-04T10:58:28.2009119Z [W1204 10:27:22.683871074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2009311Z 2025-12-04T10:58:28.2009488Z [W1204 10:27:22.683962883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2009677Z 2025-12-04T10:58:28.2009861Z [W1204 10:27:22.684013052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2010059Z 2025-12-04T10:58:28.2010234Z [W1204 10:27:22.684098491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2010432Z 2025-12-04T10:58:28.2010585Z [W1204 10:27:22.684144620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2010779Z 2025-12-04T10:58:28.2010929Z [W1204 10:27:22.684215949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2011121Z 2025-12-04T10:58:28.2011269Z [W1204 10:27:22.684259358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2011458Z 2025-12-04T10:58:28.2011613Z [W1204 10:27:22.720177630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2011802Z 2025-12-04T10:58:28.2011953Z [W1204 10:27:22.720268938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2012151Z 2025-12-04T10:58:28.2012303Z [W1204 10:27:22.720327547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2012495Z 2025-12-04T10:58:28.2012650Z [W1204 10:27:22.720415216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2012842Z 2025-12-04T10:58:28.2012993Z [W1204 10:27:22.720462065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2013183Z 2025-12-04T10:58:28.2013425Z [W1204 10:27:22.720542974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2013616Z 2025-12-04T10:58:28.2013771Z [W1204 10:27:22.720586873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2013962Z 2025-12-04T10:58:28.2014117Z [W1204 10:27:22.720654712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2014307Z 2025-12-04T10:58:28.2014460Z [W1204 10:27:22.720697402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2014646Z 2025-12-04T10:58:28.2014703Z ('RERUN', {'yellow': True}) [2.8832s] [100%] 2025-12-04T10:58:28.2015172Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:23.932018295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2015586Z 2025-12-04T10:58:28.2015741Z [W1204 10:27:23.932179193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2015970Z 2025-12-04T10:58:28.2016123Z [W1204 10:27:23.932236012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2016316Z 2025-12-04T10:58:28.2016467Z [W1204 10:27:23.932328080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2016659Z 2025-12-04T10:58:28.2016809Z [W1204 10:27:23.932375830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2017001Z 2025-12-04T10:58:28.2017151Z [W1204 10:27:23.932460648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2017346Z 2025-12-04T10:58:28.2017497Z [W1204 10:27:23.932505457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2017692Z 2025-12-04T10:58:28.2017846Z [W1204 10:27:23.932575416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2018034Z 2025-12-04T10:58:28.2018185Z [W1204 10:27:23.932617146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2018373Z 2025-12-04T10:58:28.2018527Z [W1204 10:27:23.934923350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2018715Z 2025-12-04T10:58:28.2018868Z [W1204 10:27:23.935016098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2019056Z 2025-12-04T10:58:28.2019212Z [W1204 10:27:23.935072178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2019404Z 2025-12-04T10:58:28.2019560Z [W1204 10:27:23.935151416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2019747Z 2025-12-04T10:58:28.2019898Z [W1204 10:27:23.935194376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2020091Z 2025-12-04T10:58:28.2020242Z [W1204 10:27:23.935270985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2020433Z 2025-12-04T10:58:28.2020583Z [W1204 10:27:23.935313424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2020774Z 2025-12-04T10:58:28.2020944Z [W1204 10:27:23.935380713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2021134Z 2025-12-04T10:58:28.2021282Z [W1204 10:27:23.935421432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2021470Z 2025-12-04T10:58:28.2021619Z [W1204 10:27:23.968853342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2021806Z 2025-12-04T10:58:28.2021955Z [W1204 10:27:23.968940611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2022142Z 2025-12-04T10:58:28.2022291Z [W1204 10:27:23.968998020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2022477Z 2025-12-04T10:58:28.2022633Z [W1204 10:27:23.969118548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2022850Z 2025-12-04T10:58:28.2023005Z [W1204 10:27:23.969165447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2023193Z 2025-12-04T10:58:28.2023389Z [W1204 10:27:23.969245316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2023580Z 2025-12-04T10:58:28.2023732Z [W1204 10:27:23.969288966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2023920Z 2025-12-04T10:58:28.2024076Z [W1204 10:27:23.969357164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2024266Z 2025-12-04T10:58:28.2024425Z [W1204 10:27:23.969399014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2024618Z 2025-12-04T10:58:28.2024674Z ('RERUN', {'yellow': True}) [0.7378s] [100%] 2025-12-04T10:58:28.2025128Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:24.646816730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2025537Z 2025-12-04T10:58:28.2025689Z [W1204 10:27:24.646967978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2025876Z 2025-12-04T10:58:28.2026024Z [W1204 10:27:24.647028547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2026210Z 2025-12-04T10:58:28.2026359Z [W1204 10:27:24.647120485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2026547Z 2025-12-04T10:58:28.2026698Z [W1204 10:27:24.647165325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2026887Z 2025-12-04T10:58:28.2027037Z [W1204 10:27:24.647247353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2027224Z 2025-12-04T10:58:28.2027374Z [W1204 10:27:24.647290913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2027564Z 2025-12-04T10:58:28.2027713Z [W1204 10:27:24.647360042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2027899Z 2025-12-04T10:58:28.2028091Z [W1204 10:27:24.647401081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2028278Z 2025-12-04T10:58:28.2028429Z [W1204 10:27:24.649654366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2028616Z 2025-12-04T10:58:28.2028767Z [W1204 10:27:24.649733035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2028956Z 2025-12-04T10:58:28.2029106Z [W1204 10:27:24.649785564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2029291Z 2025-12-04T10:58:28.2029440Z [W1204 10:27:24.649862363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2029628Z 2025-12-04T10:58:28.2029778Z [W1204 10:27:24.649905172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2029965Z 2025-12-04T10:58:28.2030114Z [W1204 10:27:24.649979861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2030337Z 2025-12-04T10:58:28.2030488Z [W1204 10:27:24.650025630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2030676Z 2025-12-04T10:58:28.2030825Z [W1204 10:27:24.650095219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2031012Z 2025-12-04T10:58:28.2031161Z [W1204 10:27:24.650136358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2031348Z 2025-12-04T10:58:28.2031496Z [W1204 10:27:24.683468110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2031688Z 2025-12-04T10:58:28.2031838Z [W1204 10:27:24.683554219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2032029Z 2025-12-04T10:58:28.2032178Z [W1204 10:27:24.683611018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2032364Z 2025-12-04T10:58:28.2032513Z [W1204 10:27:24.683695837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2032700Z 2025-12-04T10:58:28.2032851Z [W1204 10:27:24.683740406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2033037Z 2025-12-04T10:58:28.2033186Z [W1204 10:27:24.683819005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2033409Z 2025-12-04T10:58:28.2033562Z [W1204 10:27:24.683862184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2033754Z 2025-12-04T10:58:28.2033906Z [W1204 10:27:24.683930333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2034093Z 2025-12-04T10:58:28.2034244Z [W1204 10:27:24.683972252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2034433Z 2025-12-04T10:58:28.2034475Z FAILED [0.7098s] [100%] 2025-12-04T10:58:28.2034542Z 2025-12-04T10:58:28.2034601Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2034863Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2035108Z Traceback (most recent call last): 2025-12-04T10:58:28.2035388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2035628Z method(*args, **kwargs) 2025-12-04T10:58:28.2035850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2036080Z method(*args, **kwargs) 2025-12-04T10:58:28.2036300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2036529Z with policy(): 2025-12-04T10:58:28.2036742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2036973Z raise RuntimeError(msg) 2025-12-04T10:58:28.2037460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2037945Z 2025-12-04T10:58:28.2038023Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2038440Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2038782Z 2025-12-04T10:58:28.2038872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2039077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2039253Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2039536Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2039829Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2039994Z graph_break [] 2025-12-04T10:58:28.2040209Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2040449Z Traceback (most recent call last): 2025-12-04T10:58:28.2040683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2040913Z method(*args, **kwargs) 2025-12-04T10:58:28.2041131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2041358Z method(*args, **kwargs) 2025-12-04T10:58:28.2041576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2041805Z with policy(): 2025-12-04T10:58:28.2042015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2042247Z raise RuntimeError(msg) 2025-12-04T10:58:28.2042738Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2043187Z 2025-12-04T10:58:28.2043319Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2043731Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2044097Z 2025-12-04T10:58:28.2044190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2044390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2044561Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2044834Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2045126Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2045276Z graph_break [] 2025-12-04T10:58:28.2045408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2045579Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2045745Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2046033Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2046320Z graph_break [] 2025-12-04T10:58:28.2046424Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2046678Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2046917Z Traceback (most recent call last): 2025-12-04T10:58:28.2047150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2047383Z method(*args, **kwargs) 2025-12-04T10:58:28.2047605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2047834Z method(*args, **kwargs) 2025-12-04T10:58:28.2048055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2048284Z with policy(): 2025-12-04T10:58:28.2048495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2048731Z raise RuntimeError(msg) 2025-12-04T10:58:28.2049215Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2049665Z 2025-12-04T10:58:28.2049742Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2050158Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2050494Z 2025-12-04T10:58:28.2050585Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2050787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2050959Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2051230Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2051517Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2051665Z graph_break [] 2025-12-04T10:58:28.2051793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2051964Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2052127Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2052445Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2052696Z graph_break [] 2025-12-04T10:58:28.2052823Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2052996Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2053162Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2053496Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2053745Z graph_break [] 2025-12-04T10:58:28.2054045Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b420f9f196448500.xml - 2025-12-04T10:58:28.2054383Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2055141Z FAILED [0.7098s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2055865Z 2025-12-04T10:58:28.2055942Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2056353Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2056689Z 2025-12-04T10:58:28.2056780Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2056967Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2057138Z ================== 1 failed, 57 deselected, 2 rerun in 4.49s =================== 2025-12-04T10:58:28.2057280Z Got exit code 1 2025-12-04T10:58:28.2057378Z Retrying single test... 2025-12-04T10:58:28.2057644Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74e90107c378f493.xml 2025-12-04T10:58:28.2057936Z ============================= test session starts ============================== 2025-12-04T10:58:28.2058146Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2058332Z cachedir: .pytest_cache 2025-12-04T10:58:28.2058556Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2058801Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2058920Z configfile: pytest.ini 2025-12-04T10:58:28.2059155Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2059428Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2059830Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2060200Z Running 1 items in this shard 2025-12-04T10:58:28.2060274Z 2025-12-04T10:58:28.2060685Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:33.233242126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2061089Z 2025-12-04T10:58:28.2061246Z [W1204 10:27:34.497745745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2061442Z 2025-12-04T10:58:28.2061594Z [W1204 10:27:34.497862433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2061787Z 2025-12-04T10:58:28.2061937Z [W1204 10:27:34.498262867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2062127Z 2025-12-04T10:58:28.2062278Z [W1204 10:27:34.498360065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2062468Z 2025-12-04T10:58:28.2062620Z [W1204 10:27:34.499387099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2062811Z 2025-12-04T10:58:28.2062961Z [W1204 10:27:34.499436308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2063180Z 2025-12-04T10:58:28.2063372Z [W1204 10:27:34.499530017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2063563Z 2025-12-04T10:58:28.2063714Z [W1204 10:27:34.499575286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2063906Z 2025-12-04T10:58:28.2064060Z [W1204 10:27:34.503533595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2064247Z 2025-12-04T10:58:28.2064401Z [W1204 10:27:34.503615644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2064591Z 2025-12-04T10:58:28.2064743Z [W1204 10:27:34.503671743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2064936Z 2025-12-04T10:58:28.2065090Z [W1204 10:27:34.503753221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2065278Z 2025-12-04T10:58:28.2065431Z [W1204 10:27:34.503799701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2065619Z 2025-12-04T10:58:28.2065772Z [W1204 10:27:34.503883069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2065962Z 2025-12-04T10:58:28.2066114Z [W1204 10:27:34.503927779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2066304Z 2025-12-04T10:58:28.2066456Z [W1204 10:27:34.503999228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2066649Z 2025-12-04T10:58:28.2066800Z [W1204 10:27:34.504045427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2066990Z 2025-12-04T10:58:28.2067141Z [W1204 10:27:34.538976644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2067331Z 2025-12-04T10:58:28.2067481Z [W1204 10:27:34.539072662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2067673Z 2025-12-04T10:58:28.2067823Z [W1204 10:27:34.539131051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2068015Z 2025-12-04T10:58:28.2068204Z [W1204 10:27:34.539237720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2068397Z 2025-12-04T10:58:28.2068547Z [W1204 10:27:34.539294659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2068738Z 2025-12-04T10:58:28.2068892Z [W1204 10:27:34.539375968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2069080Z 2025-12-04T10:58:28.2069233Z [W1204 10:27:34.539420047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2069421Z 2025-12-04T10:58:28.2069573Z [W1204 10:27:34.539488436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2069761Z 2025-12-04T10:58:28.2069915Z [W1204 10:27:34.539530245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2070137Z 2025-12-04T10:58:28.2070191Z ('RERUN', {'yellow': True}) [2.9008s] [100%] 2025-12-04T10:58:28.2070649Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:35.641472396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2071056Z 2025-12-04T10:58:28.2071211Z [W1204 10:27:35.641622524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2071401Z 2025-12-04T10:58:28.2071554Z [W1204 10:27:35.641679713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2071748Z 2025-12-04T10:58:28.2071901Z [W1204 10:27:35.641771331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2072095Z 2025-12-04T10:58:28.2072248Z [W1204 10:27:35.641816651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2072439Z 2025-12-04T10:58:28.2072588Z [W1204 10:27:35.641897699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2072780Z 2025-12-04T10:58:28.2072930Z [W1204 10:27:35.641940769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2073120Z 2025-12-04T10:58:28.2073301Z [W1204 10:27:35.642014908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2073493Z 2025-12-04T10:58:28.2073644Z [W1204 10:27:35.642058827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2073837Z 2025-12-04T10:58:28.2073990Z [W1204 10:27:35.644276952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2074177Z 2025-12-04T10:58:28.2074330Z [W1204 10:27:35.644354911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2074518Z 2025-12-04T10:58:28.2074671Z [W1204 10:27:35.644409520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2074858Z 2025-12-04T10:58:28.2075012Z [W1204 10:27:35.644486959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2075199Z 2025-12-04T10:58:28.2075388Z [W1204 10:27:35.644530888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2075578Z 2025-12-04T10:58:28.2075733Z [W1204 10:27:35.644607207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2075922Z 2025-12-04T10:58:28.2076075Z [W1204 10:27:35.644649766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2076268Z 2025-12-04T10:58:28.2076420Z [W1204 10:27:35.644715905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2076611Z 2025-12-04T10:58:28.2076762Z [W1204 10:27:35.644757645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2076953Z 2025-12-04T10:58:28.2077105Z [W1204 10:27:35.677726312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2077294Z 2025-12-04T10:58:28.2077443Z [W1204 10:27:35.677809861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2077670Z 2025-12-04T10:58:28.2077820Z [W1204 10:27:35.677865870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2078011Z 2025-12-04T10:58:28.2078161Z [W1204 10:27:35.677949489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2078353Z 2025-12-04T10:58:28.2078504Z [W1204 10:27:35.678007278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2078694Z 2025-12-04T10:58:28.2078848Z [W1204 10:27:35.678089577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2079050Z 2025-12-04T10:58:28.2079204Z [W1204 10:27:35.678133096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2079397Z 2025-12-04T10:58:28.2079549Z [W1204 10:27:35.678200985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2079737Z 2025-12-04T10:58:28.2079890Z [W1204 10:27:35.678242214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2080077Z 2025-12-04T10:58:28.2080130Z ('RERUN', {'yellow': True}) [0.6453s] [100%] 2025-12-04T10:58:28.2080587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:36.281318890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2080995Z 2025-12-04T10:58:28.2081148Z [W1204 10:27:36.281467368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2081343Z 2025-12-04T10:58:28.2081494Z [W1204 10:27:36.281523417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2081685Z 2025-12-04T10:58:28.2081835Z [W1204 10:27:36.281612745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2082026Z 2025-12-04T10:58:28.2082175Z [W1204 10:27:36.281657545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2082366Z 2025-12-04T10:58:28.2082515Z [W1204 10:27:36.281738853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2082732Z 2025-12-04T10:58:28.2082884Z [W1204 10:27:36.281781493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2083076Z 2025-12-04T10:58:28.2083226Z [W1204 10:27:36.281849602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2083449Z 2025-12-04T10:58:28.2083601Z [W1204 10:27:36.281891041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2083789Z 2025-12-04T10:58:28.2083942Z [W1204 10:27:36.284113206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2084129Z 2025-12-04T10:58:28.2084282Z [W1204 10:27:36.284192315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2084468Z 2025-12-04T10:58:28.2084623Z [W1204 10:27:36.284245134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2084847Z 2025-12-04T10:58:28.2085000Z [W1204 10:27:36.284321953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2085188Z 2025-12-04T10:58:28.2085342Z [W1204 10:27:36.284364842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2085530Z 2025-12-04T10:58:28.2085683Z [W1204 10:27:36.284441601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2085873Z 2025-12-04T10:58:28.2086022Z [W1204 10:27:36.284483901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2086212Z 2025-12-04T10:58:28.2086363Z [W1204 10:27:36.284549890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2086558Z 2025-12-04T10:58:28.2086708Z [W1204 10:27:36.284590979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2086903Z 2025-12-04T10:58:28.2087054Z [W1204 10:27:36.317590766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2087245Z 2025-12-04T10:58:28.2087396Z [W1204 10:27:36.317676085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2087587Z 2025-12-04T10:58:28.2087738Z [W1204 10:27:36.317730874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2087928Z 2025-12-04T10:58:28.2088081Z [W1204 10:27:36.317813063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2088275Z 2025-12-04T10:58:28.2088429Z [W1204 10:27:36.317856592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2088617Z 2025-12-04T10:58:28.2088772Z [W1204 10:27:36.317946570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2088965Z 2025-12-04T10:58:28.2089115Z [W1204 10:27:36.317989290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2089300Z 2025-12-04T10:58:28.2089450Z [W1204 10:27:36.318061789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2089635Z 2025-12-04T10:58:28.2089818Z [W1204 10:27:36.318104078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2090008Z 2025-12-04T10:58:28.2090052Z FAILED [0.6717s] [100%] 2025-12-04T10:58:28.2090117Z 2025-12-04T10:58:28.2090177Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2090436Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2090684Z Traceback (most recent call last): 2025-12-04T10:58:28.2090928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2091167Z method(*args, **kwargs) 2025-12-04T10:58:28.2091394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2091628Z method(*args, **kwargs) 2025-12-04T10:58:28.2091854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2092084Z with policy(): 2025-12-04T10:58:28.2092321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2092552Z raise RuntimeError(msg) 2025-12-04T10:58:28.2093030Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2093501Z 2025-12-04T10:58:28.2093579Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2093991Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2094329Z 2025-12-04T10:58:28.2094422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2094621Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2094794Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2095070Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2095359Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2095505Z graph_break [] 2025-12-04T10:58:28.2095719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2095957Z Traceback (most recent call last): 2025-12-04T10:58:28.2096189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2096419Z method(*args, **kwargs) 2025-12-04T10:58:28.2096641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2096866Z method(*args, **kwargs) 2025-12-04T10:58:28.2097083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2097307Z with policy(): 2025-12-04T10:58:28.2097516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2097746Z raise RuntimeError(msg) 2025-12-04T10:58:28.2098265Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2098713Z 2025-12-04T10:58:28.2098789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2099196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2099532Z 2025-12-04T10:58:28.2099620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2099816Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2099985Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2100258Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2100546Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2100727Z graph_break [] 2025-12-04T10:58:28.2100853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2101022Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2101184Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2101467Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2101716Z graph_break [] 2025-12-04T10:58:28.2101821Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2102070Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2102309Z Traceback (most recent call last): 2025-12-04T10:58:28.2102540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2102775Z method(*args, **kwargs) 2025-12-04T10:58:28.2102995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2103223Z method(*args, **kwargs) 2025-12-04T10:58:28.2103468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2103691Z with policy(): 2025-12-04T10:58:28.2103899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2104131Z raise RuntimeError(msg) 2025-12-04T10:58:28.2104617Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2105067Z 2025-12-04T10:58:28.2105144Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2105550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2105883Z 2025-12-04T10:58:28.2105970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2106167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2106337Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2106638Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2106924Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2107070Z graph_break [] 2025-12-04T10:58:28.2107197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2107365Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2107528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2107813Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2108061Z graph_break [] 2025-12-04T10:58:28.2108188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2108355Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2108520Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2108806Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2109087Z graph_break [] 2025-12-04T10:58:28.2109386Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-74e90107c378f493.xml - 2025-12-04T10:58:28.2109720Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2110474Z FAILED [0.6717s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2111156Z 2025-12-04T10:58:28.2111231Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2111637Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2111971Z 2025-12-04T10:58:28.2112057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2112240Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2112406Z ================== 1 failed, 57 deselected, 2 rerun in 4.37s =================== 2025-12-04T10:58:28.2112549Z Got exit code 1 2025-12-04T10:58:28.2112854Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2113308Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2113674Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-195fe8bf16d78bc4.xml 2025-12-04T10:58:28.2113964Z ============================= test session starts ============================== 2025-12-04T10:58:28.2114175Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2114362Z cachedir: .pytest_cache 2025-12-04T10:58:28.2114586Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2114824Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2114941Z configfile: pytest.ini 2025-12-04T10:58:28.2115203Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2115480Z collecting ... collected 58 items / 1 deselected / 57 selected 2025-12-04T10:58:28.2115643Z stepcurrent: skipping 1 already run items. 2025-12-04T10:58:28.2115780Z Running 57 items in this shard 2025-12-04T10:58:28.2115853Z 2025-12-04T10:58:28.2116119Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5167s] [ 1%] 2025-12-04T10:58:28.2116742Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5168s] [ 1%] 2025-12-04T10:58:28.2117273Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5057s] [ 1%] 2025-12-04T10:58:28.2117552Z 2025-12-04T10:58:28.2117605Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2117894Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2118139Z Traceback (most recent call last): 2025-12-04T10:58:28.2118378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2118615Z method(*args, **kwargs) 2025-12-04T10:58:28.2118840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2119073Z method(*args, **kwargs) 2025-12-04T10:58:28.2119294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2119522Z with policy(): 2025-12-04T10:58:28.2119735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2119971Z raise RuntimeError(msg) 2025-12-04T10:58:28.2120449Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2120888Z 2025-12-04T10:58:28.2120967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2121378Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2121714Z 2025-12-04T10:58:28.2121809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2122009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2122185Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2122461Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2122751Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2122899Z graph_break [] 2025-12-04T10:58:28.2123115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2123392Z Traceback (most recent call last): 2025-12-04T10:58:28.2123623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2123891Z method(*args, **kwargs) 2025-12-04T10:58:28.2124110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2124341Z method(*args, **kwargs) 2025-12-04T10:58:28.2124561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2124787Z with policy(): 2025-12-04T10:58:28.2124997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2125230Z raise RuntimeError(msg) 2025-12-04T10:58:28.2125715Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2126162Z 2025-12-04T10:58:28.2126239Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2126692Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2127025Z 2025-12-04T10:58:28.2127114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2127315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2127486Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2127760Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2128049Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2128198Z graph_break [] 2025-12-04T10:58:28.2128329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2128505Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2128670Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2128956Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2129205Z graph_break [] 2025-12-04T10:58:28.2129316Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2129570Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2129812Z Traceback (most recent call last): 2025-12-04T10:58:28.2130048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2130282Z method(*args, **kwargs) 2025-12-04T10:58:28.2130505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2130735Z method(*args, **kwargs) 2025-12-04T10:58:28.2130953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2131177Z with policy(): 2025-12-04T10:58:28.2131390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2131622Z raise RuntimeError(msg) 2025-12-04T10:58:28.2132129Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2132579Z 2025-12-04T10:58:28.2132657Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2133064Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2133433Z 2025-12-04T10:58:28.2133525Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2133724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2133898Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2134178Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2134469Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2134615Z graph_break [] 2025-12-04T10:58:28.2134782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2134957Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2135123Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2135408Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2135661Z graph_break [] 2025-12-04T10:58:28.2135791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2135965Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2136129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2136415Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2136674Z graph_break [] 2025-12-04T10:58:28.2136974Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-195fe8bf16d78bc4.xml - 2025-12-04T10:58:28.2137317Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2138083Z FAILED [0.5057s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2138767Z 2025-12-04T10:58:28.2138847Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2139258Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2139597Z 2025-12-04T10:58:28.2139686Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2139875Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2140041Z =================== 1 failed, 1 deselected, 2 rerun in 3.71s =================== 2025-12-04T10:58:28.2140182Z Got exit code 1 2025-12-04T10:58:28.2140281Z Retrying single test... 2025-12-04T10:58:28.2140547Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-19b4077e26fd76a9.xml 2025-12-04T10:58:28.2140877Z ============================= test session starts ============================== 2025-12-04T10:58:28.2141093Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2141286Z cachedir: .pytest_cache 2025-12-04T10:58:28.2141512Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2141754Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2141875Z configfile: pytest.ini 2025-12-04T10:58:28.2142102Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2142376Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2142783Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2143154Z Running 1 items in this shard 2025-12-04T10:58:28.2143297Z 2025-12-04T10:58:28.2143680Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:55.642429577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2144089Z 2025-12-04T10:58:28.2144246Z [W1204 10:27:55.926064331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2144440Z 2025-12-04T10:58:28.2144594Z [W1204 10:27:55.926259128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2144785Z 2025-12-04T10:58:28.2144937Z [W1204 10:27:55.926776910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2145130Z 2025-12-04T10:58:28.2145281Z [W1204 10:27:55.926883328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2145477Z 2025-12-04T10:58:28.2145627Z [W1204 10:27:55.927669426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2145818Z 2025-12-04T10:58:28.2145968Z [W1204 10:27:55.927734355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2146160Z 2025-12-04T10:58:28.2146311Z [W1204 10:27:55.927852203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2146505Z 2025-12-04T10:58:28.2146660Z [W1204 10:27:55.927912042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2146850Z 2025-12-04T10:58:28.2147004Z [W1204 10:27:55.932250155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2147199Z 2025-12-04T10:58:28.2147353Z [W1204 10:27:55.932331714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2147541Z 2025-12-04T10:58:28.2147694Z [W1204 10:27:55.932386943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2147883Z 2025-12-04T10:58:28.2148037Z [W1204 10:27:55.932471961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2148225Z 2025-12-04T10:58:28.2148380Z [W1204 10:27:55.932517181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2148602Z 2025-12-04T10:58:28.2148759Z [W1204 10:27:55.932594990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2148954Z 2025-12-04T10:58:28.2149106Z [W1204 10:27:55.932639569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2149299Z 2025-12-04T10:58:28.2149451Z [W1204 10:27:55.932710578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2149641Z 2025-12-04T10:58:28.2149792Z [W1204 10:27:55.932753097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2149988Z 2025-12-04T10:58:28.2150139Z [W1204 10:27:55.968513541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2150330Z 2025-12-04T10:58:28.2150482Z [W1204 10:27:55.968603740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2150710Z 2025-12-04T10:58:28.2150861Z [W1204 10:27:55.968661569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2151054Z 2025-12-04T10:58:28.2151204Z [W1204 10:27:55.968747588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2151398Z 2025-12-04T10:58:28.2151551Z [W1204 10:27:55.968793187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2151740Z 2025-12-04T10:58:28.2151894Z [W1204 10:27:55.968874306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2152083Z 2025-12-04T10:58:28.2152239Z [W1204 10:27:55.968919345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2152431Z 2025-12-04T10:58:28.2152586Z [W1204 10:27:55.968988234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2152779Z 2025-12-04T10:58:28.2152933Z [W1204 10:27:55.969035743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2153122Z 2025-12-04T10:58:28.2153177Z ('RERUN', {'yellow': True}) [2.9030s] [100%] 2025-12-04T10:58:28.2153685Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:56.130272384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2154094Z 2025-12-04T10:58:28.2154247Z [W1204 10:27:56.130428692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2154441Z 2025-12-04T10:58:28.2154591Z [W1204 10:27:56.130485701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2154783Z 2025-12-04T10:58:28.2154935Z [W1204 10:27:56.130582369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2155127Z 2025-12-04T10:58:28.2155277Z [W1204 10:27:56.130629048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2155470Z 2025-12-04T10:58:28.2155621Z [W1204 10:27:56.130711627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2155814Z 2025-12-04T10:58:28.2155998Z [W1204 10:27:56.130754707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2156193Z 2025-12-04T10:58:28.2156344Z [W1204 10:27:56.130823075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2156539Z 2025-12-04T10:58:28.2156693Z [W1204 10:27:56.130864825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2156883Z 2025-12-04T10:58:28.2157037Z [W1204 10:27:56.133119770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2157225Z 2025-12-04T10:58:28.2157380Z [W1204 10:27:56.133199629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2157568Z 2025-12-04T10:58:28.2157727Z [W1204 10:27:56.133254298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2157960Z 2025-12-04T10:58:28.2158115Z [W1204 10:27:56.133332607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2158302Z 2025-12-04T10:58:28.2158455Z [W1204 10:27:56.133376686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2158643Z 2025-12-04T10:58:28.2158796Z [W1204 10:27:56.133452595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2158988Z 2025-12-04T10:58:28.2159139Z [W1204 10:27:56.133495704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2159329Z 2025-12-04T10:58:28.2159481Z [W1204 10:27:56.133565063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2159673Z 2025-12-04T10:58:28.2159828Z [W1204 10:27:56.133606362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2160022Z 2025-12-04T10:58:28.2160171Z [W1204 10:27:56.167317508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2160361Z 2025-12-04T10:58:28.2160511Z [W1204 10:27:56.167406237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2160703Z 2025-12-04T10:58:28.2160854Z [W1204 10:27:56.167464256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2161047Z 2025-12-04T10:58:28.2161201Z [W1204 10:27:56.167550675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2161390Z 2025-12-04T10:58:28.2161543Z [W1204 10:27:56.167596834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2161733Z 2025-12-04T10:58:28.2161888Z [W1204 10:27:56.167677973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2162077Z 2025-12-04T10:58:28.2162232Z [W1204 10:27:56.167722022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2162420Z 2025-12-04T10:58:28.2162574Z [W1204 10:27:56.167789031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2162762Z 2025-12-04T10:58:28.2162938Z [W1204 10:27:56.167830961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2163126Z 2025-12-04T10:58:28.2163179Z ('RERUN', {'yellow': True}) [0.6968s] [100%] 2025-12-04T10:58:28.2163676Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:27:57.831653019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2164081Z 2025-12-04T10:58:28.2164233Z [W1204 10:27:57.831806506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2164424Z 2025-12-04T10:58:28.2164574Z [W1204 10:27:57.831864135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2164767Z 2025-12-04T10:58:28.2164920Z [W1204 10:27:57.831961374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2165112Z 2025-12-04T10:58:28.2165264Z [W1204 10:27:57.832012473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2165489Z 2025-12-04T10:58:28.2165640Z [W1204 10:27:57.832100642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2165833Z 2025-12-04T10:58:28.2165983Z [W1204 10:27:57.832144681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2166176Z 2025-12-04T10:58:28.2166329Z [W1204 10:27:57.832213210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2166519Z 2025-12-04T10:58:28.2166672Z [W1204 10:27:57.832255109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2166863Z 2025-12-04T10:58:28.2167018Z [W1204 10:27:57.834532464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2167211Z 2025-12-04T10:58:28.2167365Z [W1204 10:27:57.834611353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2167553Z 2025-12-04T10:58:28.2167708Z [W1204 10:27:57.834667152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2167898Z 2025-12-04T10:58:28.2168053Z [W1204 10:27:57.834744951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2168243Z 2025-12-04T10:58:28.2168401Z [W1204 10:27:57.834789000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2168595Z 2025-12-04T10:58:28.2168748Z [W1204 10:27:57.834866539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2168945Z 2025-12-04T10:58:28.2169097Z [W1204 10:27:57.834909548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2169291Z 2025-12-04T10:58:28.2169443Z [W1204 10:27:57.834976817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2169637Z 2025-12-04T10:58:28.2169788Z [W1204 10:27:57.835023156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2169981Z 2025-12-04T10:58:28.2170132Z [W1204 10:27:57.868981029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2170327Z 2025-12-04T10:58:28.2170507Z [W1204 10:27:57.869071757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2170704Z 2025-12-04T10:58:28.2170856Z [W1204 10:27:57.869130196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2171051Z 2025-12-04T10:58:28.2171207Z [W1204 10:27:57.869215275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2171398Z 2025-12-04T10:58:28.2171556Z [W1204 10:27:57.869259604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2171746Z 2025-12-04T10:58:28.2171903Z [W1204 10:27:57.869340603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2172091Z 2025-12-04T10:58:28.2172247Z [W1204 10:27:57.869382863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2172464Z 2025-12-04T10:58:28.2172620Z [W1204 10:27:57.869450712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2172812Z 2025-12-04T10:58:28.2172970Z [W1204 10:27:57.869491411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2173162Z 2025-12-04T10:58:28.2173208Z FAILED [0.7104s] [100%] 2025-12-04T10:58:28.2173312Z 2025-12-04T10:58:28.2173373Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2173633Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2173883Z Traceback (most recent call last): 2025-12-04T10:58:28.2174132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2174374Z method(*args, **kwargs) 2025-12-04T10:58:28.2174606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2174845Z method(*args, **kwargs) 2025-12-04T10:58:28.2175070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2175303Z with policy(): 2025-12-04T10:58:28.2175521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2175759Z raise RuntimeError(msg) 2025-12-04T10:58:28.2176246Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2176691Z 2025-12-04T10:58:28.2176774Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2177193Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2177536Z 2025-12-04T10:58:28.2177626Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2177834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2178015Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2178297Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2178627Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2178782Z graph_break [] 2025-12-04T10:58:28.2179004Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2179253Z Traceback (most recent call last): 2025-12-04T10:58:28.2179492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2179730Z method(*args, **kwargs) 2025-12-04T10:58:28.2179959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2180198Z method(*args, **kwargs) 2025-12-04T10:58:28.2180424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2180659Z with policy(): 2025-12-04T10:58:28.2180881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2181153Z raise RuntimeError(msg) 2025-12-04T10:58:28.2181646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2182098Z 2025-12-04T10:58:28.2182174Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2182587Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2182925Z 2025-12-04T10:58:28.2183017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2183223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2183440Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2183721Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2184017Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2184169Z graph_break [] 2025-12-04T10:58:28.2184328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2184551Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2184885Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2185230Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2185522Z graph_break [] 2025-12-04T10:58:28.2185701Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2185999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2186277Z Traceback (most recent call last): 2025-12-04T10:58:28.2186582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2186861Z method(*args, **kwargs) 2025-12-04T10:58:28.2187129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2187412Z method(*args, **kwargs) 2025-12-04T10:58:28.2187673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2187994Z with policy(): 2025-12-04T10:58:28.2188247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2188521Z raise RuntimeError(msg) 2025-12-04T10:58:28.2189070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2189530Z 2025-12-04T10:58:28.2189635Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2190105Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2190473Z 2025-12-04T10:58:28.2190577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2190855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2191083Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2191406Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2191729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2191946Z graph_break [] 2025-12-04T10:58:28.2192122Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2192406Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2192623Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2192944Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2204250Z graph_break [] 2025-12-04T10:58:28.2204403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2204582Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2204750Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2205045Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2205299Z graph_break [] 2025-12-04T10:58:28.2205609Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-19b4077e26fd76a9.xml - 2025-12-04T10:58:28.2205952Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2206731Z FAILED [0.7104s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2207420Z 2025-12-04T10:58:28.2207499Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2207916Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2208257Z 2025-12-04T10:58:28.2208411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2208600Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2208772Z ================== 1 failed, 57 deselected, 2 rerun in 4.48s =================== 2025-12-04T10:58:28.2208917Z Got exit code 1 2025-12-04T10:58:28.2209016Z Retrying single test... 2025-12-04T10:58:28.2209283Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-28baf2d01135a84f.xml 2025-12-04T10:58:28.2209575Z ============================= test session starts ============================== 2025-12-04T10:58:28.2209788Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2209977Z cachedir: .pytest_cache 2025-12-04T10:58:28.2210201Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2210442Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2210562Z configfile: pytest.ini 2025-12-04T10:58:28.2210793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2211101Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2211510Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2211881Z Running 1 items in this shard 2025-12-04T10:58:28.2211956Z 2025-12-04T10:58:28.2212331Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:06.185490756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2212743Z 2025-12-04T10:58:28.2212899Z [W1204 10:28:07.461039687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2213098Z 2025-12-04T10:58:28.2213280Z [W1204 10:28:07.461207894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2213471Z 2025-12-04T10:58:28.2213625Z [W1204 10:28:07.461701666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2213814Z 2025-12-04T10:58:28.2213965Z [W1204 10:28:07.461809105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2214153Z 2025-12-04T10:58:28.2214305Z [W1204 10:28:07.463094145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2214493Z 2025-12-04T10:58:28.2214646Z [W1204 10:28:07.463152224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2214836Z 2025-12-04T10:58:28.2214985Z [W1204 10:28:07.463250502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2215173Z 2025-12-04T10:58:28.2215328Z [W1204 10:28:07.463300452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2215518Z 2025-12-04T10:58:28.2215671Z [W1204 10:28:07.467099142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2215862Z 2025-12-04T10:58:28.2216014Z [W1204 10:28:07.467181011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2216202Z 2025-12-04T10:58:28.2216383Z [W1204 10:28:07.467238890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2216573Z 2025-12-04T10:58:28.2216721Z [W1204 10:28:07.467321889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2216909Z 2025-12-04T10:58:28.2217058Z [W1204 10:28:07.467366878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2217247Z 2025-12-04T10:58:28.2217396Z [W1204 10:28:07.467445897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2217587Z 2025-12-04T10:58:28.2217736Z [W1204 10:28:07.467489456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2217928Z 2025-12-04T10:58:28.2218082Z [W1204 10:28:07.467560965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2218308Z 2025-12-04T10:58:28.2218460Z [W1204 10:28:07.467603195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2218648Z 2025-12-04T10:58:28.2218801Z [W1204 10:28:07.501906522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2218988Z 2025-12-04T10:58:28.2219138Z [W1204 10:28:07.502004710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2219324Z 2025-12-04T10:58:28.2219476Z [W1204 10:28:07.502063100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2219665Z 2025-12-04T10:58:28.2219819Z [W1204 10:28:07.502149128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2220007Z 2025-12-04T10:58:28.2220164Z [W1204 10:28:07.502193698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2220354Z 2025-12-04T10:58:28.2220504Z [W1204 10:28:07.502275906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2220694Z 2025-12-04T10:58:28.2220844Z [W1204 10:28:07.502319006 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2221035Z 2025-12-04T10:58:28.2221185Z [W1204 10:28:07.502395594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2221374Z 2025-12-04T10:58:28.2221526Z [W1204 10:28:07.502436864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2221717Z 2025-12-04T10:58:28.2221767Z ('RERUN', {'yellow': True}) [2.7780s] [100%] 2025-12-04T10:58:28.2222225Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:08.463725166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2222629Z 2025-12-04T10:58:28.2222780Z [W1204 10:28:08.463880093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2222970Z 2025-12-04T10:58:28.2223123Z [W1204 10:28:08.463935852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2223328Z 2025-12-04T10:58:28.2223514Z [W1204 10:28:08.464031091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2223702Z 2025-12-04T10:58:28.2223854Z [W1204 10:28:08.464078880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2224046Z 2025-12-04T10:58:28.2224199Z [W1204 10:28:08.464161489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2224387Z 2025-12-04T10:58:28.2224541Z [W1204 10:28:08.464203808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2224731Z 2025-12-04T10:58:28.2224885Z [W1204 10:28:08.464271667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2225072Z 2025-12-04T10:58:28.2225227Z [W1204 10:28:08.464312856 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2225417Z 2025-12-04T10:58:28.2225568Z [W1204 10:28:08.466554351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2225790Z 2025-12-04T10:58:28.2225938Z [W1204 10:28:08.466631210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2226130Z 2025-12-04T10:58:28.2226280Z [W1204 10:28:08.466685649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2226469Z 2025-12-04T10:58:28.2226619Z [W1204 10:28:08.466762458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2226808Z 2025-12-04T10:58:28.2226958Z [W1204 10:28:08.466805198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2227150Z 2025-12-04T10:58:28.2227300Z [W1204 10:28:08.466881956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2227491Z 2025-12-04T10:58:28.2227642Z [W1204 10:28:08.466923686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2227830Z 2025-12-04T10:58:28.2227982Z [W1204 10:28:08.466990375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2228172Z 2025-12-04T10:58:28.2228325Z [W1204 10:28:08.467035194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2228511Z 2025-12-04T10:58:28.2228664Z [W1204 10:28:08.500232949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2228853Z 2025-12-04T10:58:28.2229009Z [W1204 10:28:08.500319817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2229199Z 2025-12-04T10:58:28.2229353Z [W1204 10:28:08.500376576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2229541Z 2025-12-04T10:58:28.2229693Z [W1204 10:28:08.500463815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2229882Z 2025-12-04T10:58:28.2230032Z [W1204 10:28:08.500507854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2230222Z 2025-12-04T10:58:28.2230370Z [W1204 10:28:08.500588213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2230559Z 2025-12-04T10:58:28.2230732Z [W1204 10:28:08.500631492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2230923Z 2025-12-04T10:58:28.2231073Z [W1204 10:28:08.500698541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2231259Z 2025-12-04T10:58:28.2231409Z [W1204 10:28:08.500744801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2231596Z 2025-12-04T10:58:28.2231646Z ('RERUN', {'yellow': True}) [0.4988s] [100%] 2025-12-04T10:58:28.2232096Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:08.968517037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2232502Z 2025-12-04T10:58:28.2232656Z [W1204 10:28:08.968668714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2232872Z 2025-12-04T10:58:28.2233024Z [W1204 10:28:08.968726233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2233213Z 2025-12-04T10:58:28.2233399Z [W1204 10:28:08.968816172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2233587Z 2025-12-04T10:58:28.2233739Z [W1204 10:28:08.968861291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2233926Z 2025-12-04T10:58:28.2234078Z [W1204 10:28:08.968942840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2234265Z 2025-12-04T10:58:28.2234419Z [W1204 10:28:08.968985099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2234609Z 2025-12-04T10:58:28.2234759Z [W1204 10:28:08.969058598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2234950Z 2025-12-04T10:58:28.2235099Z [W1204 10:28:08.969102777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2235289Z 2025-12-04T10:58:28.2235438Z [W1204 10:28:08.971334373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2235632Z 2025-12-04T10:58:28.2235780Z [W1204 10:28:08.971416252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2235971Z 2025-12-04T10:58:28.2236122Z [W1204 10:28:08.971468871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2236315Z 2025-12-04T10:58:28.2236464Z [W1204 10:28:08.971545540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2236656Z 2025-12-04T10:58:28.2236806Z [W1204 10:28:08.971588219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2237002Z 2025-12-04T10:58:28.2237152Z [W1204 10:28:08.971663578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2237345Z 2025-12-04T10:58:28.2237498Z [W1204 10:28:08.971705417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2237687Z 2025-12-04T10:58:28.2237871Z [W1204 10:28:08.971771596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2238059Z 2025-12-04T10:58:28.2238215Z [W1204 10:28:08.971812265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2238403Z 2025-12-04T10:58:28.2238556Z [W1204 10:28:08.004624056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2238745Z 2025-12-04T10:58:28.2238898Z [W1204 10:28:08.004712314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2239087Z 2025-12-04T10:58:28.2239240Z [W1204 10:28:08.004768973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2239429Z 2025-12-04T10:58:28.2239584Z [W1204 10:28:08.004855592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2239772Z 2025-12-04T10:58:28.2239922Z [W1204 10:28:08.004898791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2240146Z 2025-12-04T10:58:28.2240296Z [W1204 10:28:08.004979640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2240485Z 2025-12-04T10:58:28.2240634Z [W1204 10:28:08.005028269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2240828Z 2025-12-04T10:58:28.2240978Z [W1204 10:28:08.005100368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2241169Z 2025-12-04T10:58:28.2241319Z [W1204 10:28:08.005148258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2241507Z 2025-12-04T10:58:28.2241548Z FAILED [0.4963s] [100%] 2025-12-04T10:58:28.2241613Z 2025-12-04T10:58:28.2241669Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2241926Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2242170Z Traceback (most recent call last): 2025-12-04T10:58:28.2242416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2242652Z method(*args, **kwargs) 2025-12-04T10:58:28.2242876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2243104Z method(*args, **kwargs) 2025-12-04T10:58:28.2243376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2243604Z with policy(): 2025-12-04T10:58:28.2243816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2244051Z raise RuntimeError(msg) 2025-12-04T10:58:28.2244533Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2244979Z 2025-12-04T10:58:28.2245056Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2245468Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2245839Z 2025-12-04T10:58:28.2245932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2246135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2246310Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2246587Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2246881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2247028Z graph_break [] 2025-12-04T10:58:28.2247242Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2247480Z Traceback (most recent call last): 2025-12-04T10:58:28.2247717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2247949Z method(*args, **kwargs) 2025-12-04T10:58:28.2248171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2248441Z method(*args, **kwargs) 2025-12-04T10:58:28.2248660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2248886Z with policy(): 2025-12-04T10:58:28.2249097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2249326Z raise RuntimeError(msg) 2025-12-04T10:58:28.2249808Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2250253Z 2025-12-04T10:58:28.2250330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2250742Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2251074Z 2025-12-04T10:58:28.2251163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2251360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2251531Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2251804Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2252094Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2252239Z graph_break [] 2025-12-04T10:58:28.2252369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2252539Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2252702Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2252986Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2253236Z graph_break [] 2025-12-04T10:58:28.2253420Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2253679Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2253918Z Traceback (most recent call last): 2025-12-04T10:58:28.2254183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2254413Z method(*args, **kwargs) 2025-12-04T10:58:28.2254634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2254860Z method(*args, **kwargs) 2025-12-04T10:58:28.2255076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2255298Z with policy(): 2025-12-04T10:58:28.2255506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2255734Z raise RuntimeError(msg) 2025-12-04T10:58:28.2256214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2256704Z 2025-12-04T10:58:28.2256778Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2257193Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2257528Z 2025-12-04T10:58:28.2257616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2257811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2257980Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2258254Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2258539Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2258687Z graph_break [] 2025-12-04T10:58:28.2258812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2258980Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2259140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2259425Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2259674Z graph_break [] 2025-12-04T10:58:28.2259799Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2259967Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2260130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2260420Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2260672Z graph_break [] 2025-12-04T10:58:28.2260967Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-28baf2d01135a84f.xml - 2025-12-04T10:58:28.2261302Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2262084Z FAILED [0.4963s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2262769Z 2025-12-04T10:58:28.2262844Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2263293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2263626Z 2025-12-04T10:58:28.2263713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2263896Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2264060Z ================== 1 failed, 57 deselected, 2 rerun in 3.93s =================== 2025-12-04T10:58:28.2264199Z Got exit code 1 2025-12-04T10:58:28.2264504Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2264910Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2265311Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b52bcf7f9c0a10a.xml 2025-12-04T10:58:28.2265601Z ============================= test session starts ============================== 2025-12-04T10:58:28.2265809Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2265995Z cachedir: .pytest_cache 2025-12-04T10:58:28.2266216Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2266454Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2266570Z configfile: pytest.ini 2025-12-04T10:58:28.2266797Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2267066Z collecting ... collected 58 items / 2 deselected / 56 selected 2025-12-04T10:58:28.2267230Z stepcurrent: skipping 2 already run items. 2025-12-04T10:58:28.2267358Z Running 56 items in this shard 2025-12-04T10:58:28.2267428Z 2025-12-04T10:58:28.2267691Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5366s] [ 1%] 2025-12-04T10:58:28.2268240Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5180s] [ 1%] 2025-12-04T10:58:28.2268763Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5329s] [ 1%] 2025-12-04T10:58:28.2269036Z 2025-12-04T10:58:28.2269090Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2269345Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2269584Z Traceback (most recent call last): 2025-12-04T10:58:28.2269822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2270054Z method(*args, **kwargs) 2025-12-04T10:58:28.2270276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2270504Z method(*args, **kwargs) 2025-12-04T10:58:28.2270719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2270943Z with policy(): 2025-12-04T10:58:28.2271183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2271414Z raise RuntimeError(msg) 2025-12-04T10:58:28.2271890Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2272333Z 2025-12-04T10:58:28.2272407Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2272819Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2273158Z 2025-12-04T10:58:28.2273281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2273479Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2273681Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2273954Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2274242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2274385Z graph_break [] 2025-12-04T10:58:28.2274599Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2274840Z Traceback (most recent call last): 2025-12-04T10:58:28.2275071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2275300Z method(*args, **kwargs) 2025-12-04T10:58:28.2275524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2275757Z method(*args, **kwargs) 2025-12-04T10:58:28.2275974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2276195Z with policy(): 2025-12-04T10:58:28.2276402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2276632Z raise RuntimeError(msg) 2025-12-04T10:58:28.2277117Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2277565Z 2025-12-04T10:58:28.2277642Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2278049Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2278387Z 2025-12-04T10:58:28.2278474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2278669Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2278839Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2279111Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2279398Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2279542Z graph_break [] 2025-12-04T10:58:28.2279701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2279872Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2280034Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2280319Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2280568Z graph_break [] 2025-12-04T10:58:28.2280673Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2280925Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2281165Z Traceback (most recent call last): 2025-12-04T10:58:28.2281399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2281631Z method(*args, **kwargs) 2025-12-04T10:58:28.2281851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2282117Z method(*args, **kwargs) 2025-12-04T10:58:28.2282336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2282559Z with policy(): 2025-12-04T10:58:28.2282770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2283003Z raise RuntimeError(msg) 2025-12-04T10:58:28.2283521Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2283971Z 2025-12-04T10:58:28.2284045Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2284461Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2284797Z 2025-12-04T10:58:28.2284884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2285080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2285251Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2285522Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2285813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2285957Z graph_break [] 2025-12-04T10:58:28.2286084Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2286254Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2286417Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2286701Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2286950Z graph_break [] 2025-12-04T10:58:28.2287077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2287244Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2287408Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2287727Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2287972Z graph_break [] 2025-12-04T10:58:28.2288271Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b52bcf7f9c0a10a.xml - 2025-12-04T10:58:28.2288611Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2289366Z FAILED [0.5329s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2290047Z 2025-12-04T10:58:28.2290124Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2290536Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2290903Z 2025-12-04T10:58:28.2290990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2291171Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2291338Z =================== 1 failed, 2 deselected, 2 rerun in 3.75s =================== 2025-12-04T10:58:28.2291477Z Got exit code 1 2025-12-04T10:58:28.2291570Z Retrying single test... 2025-12-04T10:58:28.2291832Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b8ba0f7f29138d3a.xml 2025-12-04T10:58:28.2292123Z ============================= test session starts ============================== 2025-12-04T10:58:28.2292330Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2292520Z cachedir: .pytest_cache 2025-12-04T10:58:28.2292742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2292981Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2293099Z configfile: pytest.ini 2025-12-04T10:58:28.2293363Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2293632Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2294035Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2294408Z Running 1 items in this shard 2025-12-04T10:58:28.2294484Z 2025-12-04T10:58:28.2294868Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:27.921804521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2294872Z 2025-12-04T10:58:28.2295029Z [W1204 10:28:27.190442681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295031Z 2025-12-04T10:58:28.2295185Z [W1204 10:28:27.190579109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295187Z 2025-12-04T10:58:28.2295371Z [W1204 10:28:27.191016272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295373Z 2025-12-04T10:58:28.2295524Z [W1204 10:28:27.191098741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295528Z 2025-12-04T10:58:28.2295677Z [W1204 10:28:27.191802950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295679Z 2025-12-04T10:58:28.2295830Z [W1204 10:28:27.191851339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295832Z 2025-12-04T10:58:28.2295982Z [W1204 10:28:27.191951638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2295984Z 2025-12-04T10:58:28.2296132Z [W1204 10:28:27.191998957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296136Z 2025-12-04T10:58:28.2296286Z [W1204 10:28:27.196175332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296323Z 2025-12-04T10:58:28.2296476Z [W1204 10:28:27.196259701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296478Z 2025-12-04T10:58:28.2296630Z [W1204 10:28:27.196316960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296632Z 2025-12-04T10:58:28.2296782Z [W1204 10:28:27.196401369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296786Z 2025-12-04T10:58:28.2296935Z [W1204 10:28:27.196446538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2296937Z 2025-12-04T10:58:28.2297090Z [W1204 10:28:27.196527367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297093Z 2025-12-04T10:58:28.2297242Z [W1204 10:28:27.196570786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297244Z 2025-12-04T10:58:28.2297393Z [W1204 10:28:27.196639555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297395Z 2025-12-04T10:58:28.2297542Z [W1204 10:28:27.196682175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297544Z 2025-12-04T10:58:28.2297694Z [W1204 10:28:27.232208843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297696Z 2025-12-04T10:58:28.2297846Z [W1204 10:28:27.232303672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2297850Z 2025-12-04T10:58:28.2297999Z [W1204 10:28:27.232362451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2298000Z 2025-12-04T10:58:28.2298154Z [W1204 10:28:27.232449219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2300313Z 2025-12-04T10:58:28.2300479Z [W1204 10:28:27.232495519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2300482Z 2025-12-04T10:58:28.2300630Z [W1204 10:28:27.232575107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2300632Z 2025-12-04T10:58:28.2300816Z [W1204 10:28:27.232619877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2300817Z 2025-12-04T10:58:28.2300970Z [W1204 10:28:27.232688596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2300972Z 2025-12-04T10:58:28.2301124Z [W1204 10:28:27.232731115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2301126Z 2025-12-04T10:58:28.2301177Z ('RERUN', {'yellow': True}) [2.9411s] [100%] 2025-12-04T10:58:28.2301551Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:29.464004185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2301554Z 2025-12-04T10:58:28.2301704Z [W1204 10:28:29.464155003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2301733Z 2025-12-04T10:58:28.2301883Z [W1204 10:28:29.464211612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2301885Z 2025-12-04T10:58:28.2302035Z [W1204 10:28:29.464300201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302037Z 2025-12-04T10:58:28.2302185Z [W1204 10:28:29.464347040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302187Z 2025-12-04T10:58:28.2302338Z [W1204 10:28:29.464431119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302340Z 2025-12-04T10:58:28.2302490Z [W1204 10:28:29.464485748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302492Z 2025-12-04T10:58:28.2302646Z [W1204 10:28:29.464555227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302648Z 2025-12-04T10:58:28.2302798Z [W1204 10:28:29.464596696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302799Z 2025-12-04T10:58:28.2302950Z [W1204 10:28:29.466882821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2302952Z 2025-12-04T10:58:28.2303103Z [W1204 10:28:29.466968209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303104Z 2025-12-04T10:58:28.2303295Z [W1204 10:28:29.467026889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303298Z 2025-12-04T10:58:28.2303447Z [W1204 10:28:29.467105887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303452Z 2025-12-04T10:58:28.2303602Z [W1204 10:28:29.467149567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303606Z 2025-12-04T10:58:28.2303753Z [W1204 10:28:29.467225415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303755Z 2025-12-04T10:58:28.2303903Z [W1204 10:28:29.467267675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2303905Z 2025-12-04T10:58:28.2304084Z [W1204 10:28:29.467334314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304085Z 2025-12-04T10:58:28.2304236Z [W1204 10:28:29.467375473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304240Z 2025-12-04T10:58:28.2304387Z [W1204 10:28:29.501582992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304389Z 2025-12-04T10:58:28.2304538Z [W1204 10:28:29.501672911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304540Z 2025-12-04T10:58:28.2304693Z [W1204 10:28:29.501729240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304694Z 2025-12-04T10:58:28.2304842Z [W1204 10:28:29.501813099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2304845Z 2025-12-04T10:58:28.2304996Z [W1204 10:28:29.501856908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2305025Z 2025-12-04T10:58:28.2305175Z [W1204 10:28:29.501936977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2305176Z 2025-12-04T10:58:28.2305326Z [W1204 10:28:29.501978746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2305328Z 2025-12-04T10:58:28.2305475Z [W1204 10:28:29.502052065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2305477Z 2025-12-04T10:58:28.2305626Z [W1204 10:28:29.502094154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2305628Z 2025-12-04T10:58:28.2305678Z ('RERUN', {'yellow': True}) [0.7213s] [100%] 2025-12-04T10:58:28.2306047Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:29.239797145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306051Z 2025-12-04T10:58:28.2306200Z [W1204 10:28:29.239950122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306201Z 2025-12-04T10:58:28.2306348Z [W1204 10:28:29.240008701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306350Z 2025-12-04T10:58:28.2306498Z [W1204 10:28:29.240095100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306500Z 2025-12-04T10:58:28.2306648Z [W1204 10:28:29.240140319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306653Z 2025-12-04T10:58:28.2306800Z [W1204 10:28:29.240221868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306802Z 2025-12-04T10:58:28.2306950Z [W1204 10:28:29.240264127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2306952Z 2025-12-04T10:58:28.2307101Z [W1204 10:28:29.240336016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307103Z 2025-12-04T10:58:28.2307252Z [W1204 10:28:29.240377286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307254Z 2025-12-04T10:58:28.2307423Z [W1204 10:28:29.242638591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307427Z 2025-12-04T10:58:28.2307575Z [W1204 10:28:29.242719499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307577Z 2025-12-04T10:58:28.2307724Z [W1204 10:28:29.242772329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307726Z 2025-12-04T10:58:28.2307875Z [W1204 10:28:29.242848677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2307877Z 2025-12-04T10:58:28.2308026Z [W1204 10:28:29.242892107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308028Z 2025-12-04T10:58:28.2308176Z [W1204 10:28:29.242968876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308204Z 2025-12-04T10:58:28.2308353Z [W1204 10:28:29.243014755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308355Z 2025-12-04T10:58:28.2308503Z [W1204 10:28:29.243083794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308505Z 2025-12-04T10:58:28.2308652Z [W1204 10:28:29.243124863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308654Z 2025-12-04T10:58:28.2308803Z [W1204 10:28:30.277033417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308804Z 2025-12-04T10:58:28.2308954Z [W1204 10:28:30.277118515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2308956Z 2025-12-04T10:58:28.2309106Z [W1204 10:28:30.277175095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309108Z 2025-12-04T10:58:28.2309255Z [W1204 10:28:30.277258563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309259Z 2025-12-04T10:58:28.2309407Z [W1204 10:28:30.277301833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309409Z 2025-12-04T10:58:28.2309558Z [W1204 10:28:30.277380791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309560Z 2025-12-04T10:58:28.2309709Z [W1204 10:28:30.277423051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309711Z 2025-12-04T10:58:28.2309859Z [W1204 10:28:30.277489910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2309863Z 2025-12-04T10:58:28.2310010Z [W1204 10:28:30.277530739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2310011Z 2025-12-04T10:58:28.2310051Z FAILED [0.7714s] [100%] 2025-12-04T10:58:28.2310053Z 2025-12-04T10:58:28.2310106Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2310268Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2310315Z Traceback (most recent call last): 2025-12-04T10:58:28.2310497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2310540Z method(*args, **kwargs) 2025-12-04T10:58:28.2310693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2310737Z method(*args, **kwargs) 2025-12-04T10:58:28.2310887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2310925Z with policy(): 2025-12-04T10:58:28.2311078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2311119Z raise RuntimeError(msg) 2025-12-04T10:58:28.2311529Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2311531Z 2025-12-04T10:58:28.2311609Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2311935Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2311937Z 2025-12-04T10:58:28.2312026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2312100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2312161Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2312343Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2312419Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2312456Z graph_break [] 2025-12-04T10:58:28.2312618Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2312667Z Traceback (most recent call last): 2025-12-04T10:58:28.2312820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2312861Z method(*args, **kwargs) 2025-12-04T10:58:28.2313011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2313052Z method(*args, **kwargs) 2025-12-04T10:58:28.2313201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2313237Z with policy(): 2025-12-04T10:58:28.2313429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2313472Z raise RuntimeError(msg) 2025-12-04T10:58:28.2313890Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2313893Z 2025-12-04T10:58:28.2313969Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2314267Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2314269Z 2025-12-04T10:58:28.2314357Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2314470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2314533Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2314711Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2314786Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2314822Z graph_break [] 2025-12-04T10:58:28.2314897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2314954Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2315027Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2315208Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2315245Z graph_break [] 2025-12-04T10:58:28.2315299Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2315487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2315535Z Traceback (most recent call last): 2025-12-04T10:58:28.2315688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2315729Z method(*args, **kwargs) 2025-12-04T10:58:28.2315878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2315919Z method(*args, **kwargs) 2025-12-04T10:58:28.2316068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2316105Z with policy(): 2025-12-04T10:58:28.2316257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2316301Z raise RuntimeError(msg) 2025-12-04T10:58:28.2316713Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2316716Z 2025-12-04T10:58:28.2316791Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2317088Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2317091Z 2025-12-04T10:58:28.2317179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2317254Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2317314Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2317492Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2317564Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2317601Z graph_break [] 2025-12-04T10:58:28.2317673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2317732Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2317806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2318008Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2318045Z graph_break [] 2025-12-04T10:58:28.2318121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2318177Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2318249Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2318425Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2318463Z graph_break [] 2025-12-04T10:58:28.2318707Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b8ba0f7f29138d3a.xml - 2025-12-04T10:58:28.2318768Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2319426Z FAILED [0.7714s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2319455Z 2025-12-04T10:58:28.2319529Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2319832Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2319835Z 2025-12-04T10:58:28.2319920Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2319984Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2320053Z ================== 1 failed, 57 deselected, 2 rerun in 4.60s =================== 2025-12-04T10:58:28.2320091Z Got exit code 1 2025-12-04T10:58:28.2320131Z Retrying single test... 2025-12-04T10:58:28.2320330Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0ba6ded0358f71dd.xml 2025-12-04T10:58:28.2320386Z ============================= test session starts ============================== 2025-12-04T10:58:28.2320499Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2320540Z cachedir: .pytest_cache 2025-12-04T10:58:28.2320700Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2320746Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2320789Z configfile: pytest.ini 2025-12-04T10:58:28.2320951Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2321028Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2321324Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2321370Z Running 1 items in this shard 2025-12-04T10:58:28.2321372Z 2025-12-04T10:58:28.2321750Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:39.658305502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2321754Z 2025-12-04T10:58:28.2321930Z [W1204 10:28:39.931486863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2321934Z 2025-12-04T10:58:28.2322086Z [W1204 10:28:39.931685450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322089Z 2025-12-04T10:58:28.2322237Z [W1204 10:28:39.932228542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322239Z 2025-12-04T10:58:28.2322389Z [W1204 10:28:39.932334300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322391Z 2025-12-04T10:58:28.2322538Z [W1204 10:28:39.933447563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322541Z 2025-12-04T10:58:28.2322689Z [W1204 10:28:39.933511442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322715Z 2025-12-04T10:58:28.2322865Z [W1204 10:28:39.933627240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2322867Z 2025-12-04T10:58:28.2323015Z [W1204 10:28:39.933683099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323016Z 2025-12-04T10:58:28.2323166Z [W1204 10:28:39.937901094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323168Z 2025-12-04T10:58:28.2323347Z [W1204 10:28:39.937984172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323349Z 2025-12-04T10:58:28.2323501Z [W1204 10:28:39.938047631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323505Z 2025-12-04T10:58:28.2323654Z [W1204 10:28:39.938135050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323656Z 2025-12-04T10:58:28.2323805Z [W1204 10:28:39.938183189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323807Z 2025-12-04T10:58:28.2323957Z [W1204 10:28:39.938262088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2323959Z 2025-12-04T10:58:28.2324107Z [W1204 10:28:39.938307157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324109Z 2025-12-04T10:58:28.2324259Z [W1204 10:28:39.938376686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324261Z 2025-12-04T10:58:28.2324411Z [W1204 10:28:39.938420286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324414Z 2025-12-04T10:58:28.2324562Z [W1204 10:28:39.973861466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324563Z 2025-12-04T10:58:28.2324712Z [W1204 10:28:39.973953274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324714Z 2025-12-04T10:58:28.2324861Z [W1204 10:28:39.974015273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2324863Z 2025-12-04T10:58:28.2325043Z [W1204 10:28:39.974104312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325045Z 2025-12-04T10:58:28.2325194Z [W1204 10:28:39.974150291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325197Z 2025-12-04T10:58:28.2325347Z [W1204 10:28:39.974230670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325349Z 2025-12-04T10:58:28.2325498Z [W1204 10:28:39.974275509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325500Z 2025-12-04T10:58:28.2325647Z [W1204 10:28:39.974343078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325649Z 2025-12-04T10:58:28.2325801Z [W1204 10:28:39.974385628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2325802Z 2025-12-04T10:58:28.2325851Z ('RERUN', {'yellow': True}) [2.8903s] [100%] 2025-12-04T10:58:28.2326261Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:40.157567370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2326263Z 2025-12-04T10:58:28.2326411Z [W1204 10:28:40.157726178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2326415Z 2025-12-04T10:58:28.2326564Z [W1204 10:28:40.157782127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2326566Z 2025-12-04T10:58:28.2326716Z [W1204 10:28:40.157872796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2326718Z 2025-12-04T10:58:28.2326867Z [W1204 10:28:40.157917805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2326870Z 2025-12-04T10:58:28.2327019Z [W1204 10:28:40.158006143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327021Z 2025-12-04T10:58:28.2327169Z [W1204 10:28:40.158051543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327171Z 2025-12-04T10:58:28.2327320Z [W1204 10:28:40.158121302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327322Z 2025-12-04T10:58:28.2327471Z [W1204 10:28:40.158163271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327475Z 2025-12-04T10:58:28.2327622Z [W1204 10:28:40.160427596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327626Z 2025-12-04T10:58:28.2327775Z [W1204 10:28:40.160506995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327777Z 2025-12-04T10:58:28.2327926Z [W1204 10:28:40.160561294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2327928Z 2025-12-04T10:58:28.2328078Z [W1204 10:28:40.160639353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328080Z 2025-12-04T10:58:28.2328228Z [W1204 10:28:40.160682472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328231Z 2025-12-04T10:58:28.2328401Z [W1204 10:28:40.160758101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328405Z 2025-12-04T10:58:28.2328555Z [W1204 10:28:40.160800120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328557Z 2025-12-04T10:58:28.2328705Z [W1204 10:28:40.160866439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328707Z 2025-12-04T10:58:28.2328856Z [W1204 10:28:40.160906968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2328858Z 2025-12-04T10:58:28.2329005Z [W1204 10:28:40.194671155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329007Z 2025-12-04T10:58:28.2329158Z [W1204 10:28:40.194757283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329183Z 2025-12-04T10:58:28.2329332Z [W1204 10:28:40.194812832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329334Z 2025-12-04T10:58:28.2329482Z [W1204 10:28:40.194897041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329484Z 2025-12-04T10:58:28.2329633Z [W1204 10:28:40.194941380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329635Z 2025-12-04T10:58:28.2329783Z [W1204 10:28:40.195027909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329785Z 2025-12-04T10:58:28.2329936Z [W1204 10:28:40.195072888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2329940Z 2025-12-04T10:58:28.2330090Z [W1204 10:28:40.195141857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2330093Z 2025-12-04T10:58:28.2330241Z [W1204 10:28:40.195183557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2330243Z 2025-12-04T10:58:28.2330291Z ('RERUN', {'yellow': True}) [0.7426s] [100%] 2025-12-04T10:58:28.2330661Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:28:41.937216024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2330663Z 2025-12-04T10:58:28.2330815Z [W1204 10:28:41.937368632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2330818Z 2025-12-04T10:58:28.2330966Z [W1204 10:28:41.937427011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2330968Z 2025-12-04T10:58:28.2331117Z [W1204 10:28:41.937518360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331119Z 2025-12-04T10:58:28.2331268Z [W1204 10:28:41.937564839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331270Z 2025-12-04T10:58:28.2331417Z [W1204 10:28:41.937647078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331419Z 2025-12-04T10:58:28.2331590Z [W1204 10:28:41.937690267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331593Z 2025-12-04T10:58:28.2331742Z [W1204 10:28:41.937759066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331744Z 2025-12-04T10:58:28.2331895Z [W1204 10:28:41.937800415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2331897Z 2025-12-04T10:58:28.2332048Z [W1204 10:28:41.940068140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332049Z 2025-12-04T10:58:28.2332198Z [W1204 10:28:41.940153019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332200Z 2025-12-04T10:58:28.2332351Z [W1204 10:28:41.940207178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332353Z 2025-12-04T10:58:28.2332522Z [W1204 10:28:41.940283997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332524Z 2025-12-04T10:58:28.2332675Z [W1204 10:28:41.940327746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332676Z 2025-12-04T10:58:28.2332824Z [W1204 10:28:41.940405445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332828Z 2025-12-04T10:58:28.2332976Z [W1204 10:28:41.940447514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2332978Z 2025-12-04T10:58:28.2333128Z [W1204 10:28:41.940515793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333130Z 2025-12-04T10:58:28.2333316Z [W1204 10:28:41.940557722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333319Z 2025-12-04T10:58:28.2333468Z [W1204 10:28:41.974124822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333470Z 2025-12-04T10:58:28.2333617Z [W1204 10:28:41.974216250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333618Z 2025-12-04T10:58:28.2333767Z [W1204 10:28:41.974272099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333768Z 2025-12-04T10:58:28.2333918Z [W1204 10:28:41.974356458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2333920Z 2025-12-04T10:58:28.2334069Z [W1204 10:28:41.974400357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2334072Z 2025-12-04T10:58:28.2334222Z [W1204 10:28:41.974482716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2334224Z 2025-12-04T10:58:28.2334371Z [W1204 10:28:41.974525695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2334373Z 2025-12-04T10:58:28.2334523Z [W1204 10:28:41.974595234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2334524Z 2025-12-04T10:58:28.2334671Z [W1204 10:28:41.974637124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2334701Z 2025-12-04T10:58:28.2334740Z FAILED [0.7523s] [100%] 2025-12-04T10:58:28.2334742Z 2025-12-04T10:58:28.2334795Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2334956Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2335004Z Traceback (most recent call last): 2025-12-04T10:58:28.2335161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2335204Z method(*args, **kwargs) 2025-12-04T10:58:28.2335356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2335397Z method(*args, **kwargs) 2025-12-04T10:58:28.2335548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2335587Z with policy(): 2025-12-04T10:58:28.2335740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2335810Z raise RuntimeError(msg) 2025-12-04T10:58:28.2336217Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2336220Z 2025-12-04T10:58:28.2336295Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2336593Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2336595Z 2025-12-04T10:58:28.2336685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2336762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2336821Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2337001Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2337075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2337113Z graph_break [] 2025-12-04T10:58:28.2337272Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2337319Z Traceback (most recent call last): 2025-12-04T10:58:28.2337472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2337514Z method(*args, **kwargs) 2025-12-04T10:58:28.2337664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2337706Z method(*args, **kwargs) 2025-12-04T10:58:28.2337855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2337892Z with policy(): 2025-12-04T10:58:28.2338044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2338085Z raise RuntimeError(msg) 2025-12-04T10:58:28.2338523Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2338526Z 2025-12-04T10:58:28.2338602Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2338900Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2338904Z 2025-12-04T10:58:28.2338990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2339065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2339123Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2339302Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2339376Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2339413Z graph_break [] 2025-12-04T10:58:28.2339486Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2339576Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2339647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2339826Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2339862Z graph_break [] 2025-12-04T10:58:28.2339916Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2340074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2340122Z Traceback (most recent call last): 2025-12-04T10:58:28.2340277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2340318Z method(*args, **kwargs) 2025-12-04T10:58:28.2340470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2340510Z method(*args, **kwargs) 2025-12-04T10:58:28.2340661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2340699Z with policy(): 2025-12-04T10:58:28.2340851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2340893Z raise RuntimeError(msg) 2025-12-04T10:58:28.2341311Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2341313Z 2025-12-04T10:58:28.2341390Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2341687Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2341690Z 2025-12-04T10:58:28.2341777Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2341851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2341909Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2342086Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2342181Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2342219Z graph_break [] 2025-12-04T10:58:28.2342293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2342352Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2342423Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2342600Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2342636Z graph_break [] 2025-12-04T10:58:28.2342710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2342766Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2342837Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2343014Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2343074Z graph_break [] 2025-12-04T10:58:28.2343364Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0ba6ded0358f71dd.xml - 2025-12-04T10:58:28.2343425Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2344080Z FAILED [0.7523s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2344084Z 2025-12-04T10:58:28.2344157Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2344456Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2344458Z 2025-12-04T10:58:28.2344542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2344606Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2344672Z ================== 1 failed, 57 deselected, 2 rerun in 4.55s =================== 2025-12-04T10:58:28.2344709Z Got exit code 1 2025-12-04T10:58:28.2344958Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2345090Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2345289Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94592ed9b13d57e4.xml 2025-12-04T10:58:28.2345348Z ============================= test session starts ============================== 2025-12-04T10:58:28.2345460Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2345502Z cachedir: .pytest_cache 2025-12-04T10:58:28.2345661Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2345709Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2345750Z configfile: pytest.ini 2025-12-04T10:58:28.2345941Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2346016Z collecting ... collected 58 items / 3 deselected / 55 selected 2025-12-04T10:58:28.2346070Z stepcurrent: skipping 3 already run items. 2025-12-04T10:58:28.2346114Z Running 55 items in this shard 2025-12-04T10:58:28.2346117Z 2025-12-04T10:58:28.2346378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4867s] [ 1%] 2025-12-04T10:58:28.2346634Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4830s] [ 1%] 2025-12-04T10:58:28.2346914Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5040s] [ 1%] 2025-12-04T10:58:28.2346917Z 2025-12-04T10:58:28.2346970Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2347157Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2347204Z Traceback (most recent call last): 2025-12-04T10:58:28.2347361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2347404Z method(*args, **kwargs) 2025-12-04T10:58:28.2347555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2347596Z method(*args, **kwargs) 2025-12-04T10:58:28.2347745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2347783Z with policy(): 2025-12-04T10:58:28.2347936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2347978Z raise RuntimeError(msg) 2025-12-04T10:58:28.2348389Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2348393Z 2025-12-04T10:58:28.2348466Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2348763Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2348765Z 2025-12-04T10:58:28.2348852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2348925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2348986Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2349166Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2349240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2349281Z graph_break [] 2025-12-04T10:58:28.2349444Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2349495Z Traceback (most recent call last): 2025-12-04T10:58:28.2349651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2349694Z method(*args, **kwargs) 2025-12-04T10:58:28.2349867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2349913Z method(*args, **kwargs) 2025-12-04T10:58:28.2350064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2350105Z with policy(): 2025-12-04T10:58:28.2350259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2350303Z raise RuntimeError(msg) 2025-12-04T10:58:28.2350720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2350722Z 2025-12-04T10:58:28.2350798Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2351100Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2351129Z 2025-12-04T10:58:28.2351217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2351294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2351355Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2351540Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2351615Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2351655Z graph_break [] 2025-12-04T10:58:28.2351730Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2351792Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2351867Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2352049Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2352086Z graph_break [] 2025-12-04T10:58:28.2352143Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2352302Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2352352Z Traceback (most recent call last): 2025-12-04T10:58:28.2352506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2352553Z method(*args, **kwargs) 2025-12-04T10:58:28.2352705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2352750Z method(*args, **kwargs) 2025-12-04T10:58:28.2352918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2352956Z with policy(): 2025-12-04T10:58:28.2353113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2353155Z raise RuntimeError(msg) 2025-12-04T10:58:28.2353607Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2353669Z 2025-12-04T10:58:28.2353746Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2354057Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2354059Z 2025-12-04T10:58:28.2354145Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2354222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2354281Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2354463Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2354537Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2354578Z graph_break [] 2025-12-04T10:58:28.2354650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2354740Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2354811Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2354992Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2355030Z graph_break [] 2025-12-04T10:58:28.2355107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2355164Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2355238Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2355421Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2355458Z graph_break [] 2025-12-04T10:58:28.2355707Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-94592ed9b13d57e4.xml - 2025-12-04T10:58:28.2355769Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2356485Z FAILED [0.5040s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2356488Z 2025-12-04T10:58:28.2356565Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2356865Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2356869Z 2025-12-04T10:58:28.2356959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2357023Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2357092Z =================== 1 failed, 3 deselected, 2 rerun in 3.64s =================== 2025-12-04T10:58:28.2357130Z Got exit code 1 2025-12-04T10:58:28.2357175Z Retrying single test... 2025-12-04T10:58:28.2357372Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-026f9bbf3281e9e7.xml 2025-12-04T10:58:28.2357457Z ============================= test session starts ============================== 2025-12-04T10:58:28.2357569Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2357614Z cachedir: .pytest_cache 2025-12-04T10:58:28.2357773Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2357824Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2357866Z configfile: pytest.ini 2025-12-04T10:58:28.2358031Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2358106Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2358407Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2358454Z Running 1 items in this shard 2025-12-04T10:58:28.2358456Z 2025-12-04T10:58:28.2358836Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:01.393970458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2358869Z 2025-12-04T10:58:28.2359025Z [W1204 10:29:01.668023848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359030Z 2025-12-04T10:58:28.2359183Z [W1204 10:29:01.668246015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359185Z 2025-12-04T10:58:28.2359338Z [W1204 10:29:01.668802196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359341Z 2025-12-04T10:58:28.2359492Z [W1204 10:29:01.668909014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359496Z 2025-12-04T10:58:28.2359648Z [W1204 10:29:01.669992098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359650Z 2025-12-04T10:58:28.2359801Z [W1204 10:29:01.670061246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359803Z 2025-12-04T10:58:28.2359956Z [W1204 10:29:01.670179735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2359958Z 2025-12-04T10:58:28.2360111Z [W1204 10:29:01.670244194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360113Z 2025-12-04T10:58:28.2360263Z [W1204 10:29:01.674243012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360266Z 2025-12-04T10:58:28.2360420Z [W1204 10:29:01.674326091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360422Z 2025-12-04T10:58:28.2360572Z [W1204 10:29:01.674384500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360574Z 2025-12-04T10:58:28.2360724Z [W1204 10:29:01.674470238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360725Z 2025-12-04T10:58:28.2360875Z [W1204 10:29:01.674516518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2360881Z 2025-12-04T10:58:28.2361052Z [W1204 10:29:01.674595426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361055Z 2025-12-04T10:58:28.2361209Z [W1204 10:29:01.674639356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361211Z 2025-12-04T10:58:28.2361360Z [W1204 10:29:01.674708125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361362Z 2025-12-04T10:58:28.2361514Z [W1204 10:29:01.674750404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361516Z 2025-12-04T10:58:28.2361665Z [W1204 10:29:01.709884189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361667Z 2025-12-04T10:58:28.2361820Z [W1204 10:29:01.709972598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361842Z 2025-12-04T10:58:28.2361995Z [W1204 10:29:01.710035727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2361997Z 2025-12-04T10:58:28.2362145Z [W1204 10:29:01.710124115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362147Z 2025-12-04T10:58:28.2362302Z [W1204 10:29:01.710169965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362303Z 2025-12-04T10:58:28.2362453Z [W1204 10:29:01.710250483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362455Z 2025-12-04T10:58:28.2362610Z [W1204 10:29:01.710293653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362612Z 2025-12-04T10:58:28.2362763Z [W1204 10:29:01.710360192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362767Z 2025-12-04T10:58:28.2362916Z [W1204 10:29:01.710402241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2362918Z 2025-12-04T10:58:28.2362972Z ('RERUN', {'yellow': True}) [2.7916s] [100%] 2025-12-04T10:58:28.2363377Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:02.716901424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2363379Z 2025-12-04T10:58:28.2363534Z [W1204 10:29:02.717062061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2363536Z 2025-12-04T10:58:28.2363687Z [W1204 10:29:02.717119201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2363689Z 2025-12-04T10:58:28.2363841Z [W1204 10:29:02.717209869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2363843Z 2025-12-04T10:58:28.2363995Z [W1204 10:29:02.717255738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2363997Z 2025-12-04T10:58:28.2364147Z [W1204 10:29:02.717336887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364149Z 2025-12-04T10:58:28.2364329Z [W1204 10:29:02.717380107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364331Z 2025-12-04T10:58:28.2364481Z [W1204 10:29:02.717448665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364484Z 2025-12-04T10:58:28.2364638Z [W1204 10:29:02.717490025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364640Z 2025-12-04T10:58:28.2364793Z [W1204 10:29:02.719701741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364794Z 2025-12-04T10:58:28.2364943Z [W1204 10:29:02.719777589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2364945Z 2025-12-04T10:58:28.2365099Z [W1204 10:29:02.719829859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365101Z 2025-12-04T10:58:28.2365251Z [W1204 10:29:02.719906387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365284Z 2025-12-04T10:58:28.2365436Z [W1204 10:29:02.719949697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365438Z 2025-12-04T10:58:28.2365589Z [W1204 10:29:02.720030455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365593Z 2025-12-04T10:58:28.2365743Z [W1204 10:29:02.720073915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365745Z 2025-12-04T10:58:28.2365897Z [W1204 10:29:02.720140824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2365901Z 2025-12-04T10:58:28.2366050Z [W1204 10:29:02.720181413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366053Z 2025-12-04T10:58:28.2366205Z [W1204 10:29:02.753321429 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366207Z 2025-12-04T10:58:28.2366356Z [W1204 10:29:02.753406418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366358Z 2025-12-04T10:58:28.2366511Z [W1204 10:29:02.753463557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366513Z 2025-12-04T10:58:28.2366664Z [W1204 10:29:02.753548156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366666Z 2025-12-04T10:58:28.2366818Z [W1204 10:29:02.753592425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366821Z 2025-12-04T10:58:28.2366974Z [W1204 10:29:02.753672364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2366976Z 2025-12-04T10:58:28.2367125Z [W1204 10:29:02.753715533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2367127Z 2025-12-04T10:58:28.2367280Z [W1204 10:29:02.753783382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2367282Z 2025-12-04T10:58:28.2367432Z [W1204 10:29:02.753824381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2367436Z 2025-12-04T10:58:28.2367512Z ('RERUN', {'yellow': True}) [0.5296s] [100%] 2025-12-04T10:58:28.2367887Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:02.244813688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2367891Z 2025-12-04T10:58:28.2368040Z [W1204 10:29:02.244966246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368041Z 2025-12-04T10:58:28.2368195Z [W1204 10:29:02.245026985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368197Z 2025-12-04T10:58:28.2368346Z [W1204 10:29:02.245117003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368348Z 2025-12-04T10:58:28.2368502Z [W1204 10:29:02.245163643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368526Z 2025-12-04T10:58:28.2368680Z [W1204 10:29:02.245245361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368682Z 2025-12-04T10:58:28.2368832Z [W1204 10:29:02.245287841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368834Z 2025-12-04T10:58:28.2368988Z [W1204 10:29:02.245355510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2368990Z 2025-12-04T10:58:28.2369140Z [W1204 10:29:02.245396919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369142Z 2025-12-04T10:58:28.2369295Z [W1204 10:29:02.247629794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369298Z 2025-12-04T10:58:28.2369446Z [W1204 10:29:02.247711743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369451Z 2025-12-04T10:58:28.2369599Z [W1204 10:29:02.247766122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369601Z 2025-12-04T10:58:28.2369753Z [W1204 10:29:02.247842991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369755Z 2025-12-04T10:58:28.2369904Z [W1204 10:29:02.247886120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2369906Z 2025-12-04T10:58:28.2370061Z [W1204 10:29:02.247961699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370064Z 2025-12-04T10:58:28.2370213Z [W1204 10:29:02.248008309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370215Z 2025-12-04T10:58:28.2370368Z [W1204 10:29:02.248080238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370370Z 2025-12-04T10:58:28.2370523Z [W1204 10:29:02.248122287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370525Z 2025-12-04T10:58:28.2370674Z [W1204 10:29:03.281481029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370676Z 2025-12-04T10:58:28.2370849Z [W1204 10:29:03.281566698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2370851Z 2025-12-04T10:58:28.2371002Z [W1204 10:29:03.281621457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371003Z 2025-12-04T10:58:28.2371156Z [W1204 10:29:03.281705816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371158Z 2025-12-04T10:58:28.2371307Z [W1204 10:29:03.281749855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371311Z 2025-12-04T10:58:28.2371461Z [W1204 10:29:03.281829494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371462Z 2025-12-04T10:58:28.2371613Z [W1204 10:29:03.281871363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371615Z 2025-12-04T10:58:28.2371764Z [W1204 10:29:03.281939562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371794Z 2025-12-04T10:58:28.2371944Z [W1204 10:29:03.281980072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2371946Z 2025-12-04T10:58:28.2371984Z FAILED [0.5223s] [100%] 2025-12-04T10:58:28.2371986Z 2025-12-04T10:58:28.2372039Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2372198Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2372245Z Traceback (most recent call last): 2025-12-04T10:58:28.2372405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2372449Z method(*args, **kwargs) 2025-12-04T10:58:28.2372601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2372646Z method(*args, **kwargs) 2025-12-04T10:58:28.2372796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2372835Z with policy(): 2025-12-04T10:58:28.2372990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2373031Z raise RuntimeError(msg) 2025-12-04T10:58:28.2373460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2373462Z 2025-12-04T10:58:28.2373537Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2373842Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2373844Z 2025-12-04T10:58:28.2373931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2374006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2374066Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2374247Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2374351Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2374390Z graph_break [] 2025-12-04T10:58:28.2374550Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2374600Z Traceback (most recent call last): 2025-12-04T10:58:28.2374753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2374796Z method(*args, **kwargs) 2025-12-04T10:58:28.2374945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2374987Z method(*args, **kwargs) 2025-12-04T10:58:28.2375138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2375174Z with policy(): 2025-12-04T10:58:28.2375329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2375369Z raise RuntimeError(msg) 2025-12-04T10:58:28.2375817Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2375819Z 2025-12-04T10:58:28.2375893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2376191Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2376193Z 2025-12-04T10:58:28.2376279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2376354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2376417Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2376595Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2376669Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2376708Z graph_break [] 2025-12-04T10:58:28.2376781Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2376841Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2376911Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2377090Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2377128Z graph_break [] 2025-12-04T10:58:28.2377181Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2377342Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2377387Z Traceback (most recent call last): 2025-12-04T10:58:28.2377543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2377584Z method(*args, **kwargs) 2025-12-04T10:58:28.2377735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2377774Z method(*args, **kwargs) 2025-12-04T10:58:28.2377924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2377960Z with policy(): 2025-12-04T10:58:28.2378134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2378177Z raise RuntimeError(msg) 2025-12-04T10:58:28.2378588Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2378590Z 2025-12-04T10:58:28.2378663Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2378960Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2378962Z 2025-12-04T10:58:28.2379049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2379123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2379213Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2379391Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2379465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2379500Z graph_break [] 2025-12-04T10:58:28.2379574Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2379630Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2379702Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2379878Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2379916Z graph_break [] 2025-12-04T10:58:28.2379992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2380050Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2380120Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2380297Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2380333Z graph_break [] 2025-12-04T10:58:28.2380580Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-026f9bbf3281e9e7.xml - 2025-12-04T10:58:28.2380638Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2381292Z FAILED [0.5223s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2381297Z 2025-12-04T10:58:28.2381370Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2381671Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2381673Z 2025-12-04T10:58:28.2381758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2381844Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2381911Z ================== 1 failed, 57 deselected, 2 rerun in 4.01s =================== 2025-12-04T10:58:28.2381949Z Got exit code 1 2025-12-04T10:58:28.2381990Z Retrying single test... 2025-12-04T10:58:28.2382185Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d1acc24202be3b8.xml 2025-12-04T10:58:28.2382243Z ============================= test session starts ============================== 2025-12-04T10:58:28.2382353Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2382394Z cachedir: .pytest_cache 2025-12-04T10:58:28.2382552Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2382600Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2382642Z configfile: pytest.ini 2025-12-04T10:58:28.2382805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2382901Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2383196Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2383240Z Running 1 items in this shard 2025-12-04T10:58:28.2383242Z 2025-12-04T10:58:28.2383645Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:11.977844800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2383647Z 2025-12-04T10:58:28.2383802Z [W1204 10:29:11.249171654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2383806Z 2025-12-04T10:58:28.2383958Z [W1204 10:29:11.249307022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2383960Z 2025-12-04T10:58:28.2384110Z [W1204 10:29:11.249718205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384112Z 2025-12-04T10:58:28.2384261Z [W1204 10:29:11.249795464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384262Z 2025-12-04T10:58:28.2384412Z [W1204 10:29:11.250495223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384413Z 2025-12-04T10:58:28.2384561Z [W1204 10:29:11.250550442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384566Z 2025-12-04T10:58:28.2384714Z [W1204 10:29:11.250648161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384716Z 2025-12-04T10:58:28.2384865Z [W1204 10:29:11.250693940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2384867Z 2025-12-04T10:58:28.2385014Z [W1204 10:29:11.254772037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385016Z 2025-12-04T10:58:28.2385164Z [W1204 10:29:11.254853636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385166Z 2025-12-04T10:58:28.2385346Z [W1204 10:29:11.254909855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385348Z 2025-12-04T10:58:28.2385498Z [W1204 10:29:11.254993533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385500Z 2025-12-04T10:58:28.2385648Z [W1204 10:29:11.255044833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385650Z 2025-12-04T10:58:28.2385797Z [W1204 10:29:11.255124051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385799Z 2025-12-04T10:58:28.2385949Z [W1204 10:29:11.255167541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2385950Z 2025-12-04T10:58:28.2386099Z [W1204 10:29:11.255237040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386101Z 2025-12-04T10:58:28.2386249Z [W1204 10:29:11.255279389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386295Z 2025-12-04T10:58:28.2386443Z [W1204 10:29:12.291191472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386446Z 2025-12-04T10:58:28.2386593Z [W1204 10:29:12.291284191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386595Z 2025-12-04T10:58:28.2386743Z [W1204 10:29:12.291342700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386745Z 2025-12-04T10:58:28.2386894Z [W1204 10:29:12.291430728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2386895Z 2025-12-04T10:58:28.2387045Z [W1204 10:29:12.291476398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2387048Z 2025-12-04T10:58:28.2387195Z [W1204 10:29:12.291554757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2387197Z 2025-12-04T10:58:28.2387345Z [W1204 10:29:12.291598336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2387347Z 2025-12-04T10:58:28.2387496Z [W1204 10:29:12.291667205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2387497Z 2025-12-04T10:58:28.2387644Z [W1204 10:29:12.291709284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2387648Z 2025-12-04T10:58:28.2387697Z ('RERUN', {'yellow': True}) [2.8680s] [100%] 2025-12-04T10:58:28.2388074Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:13.433931087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388076Z 2025-12-04T10:58:28.2388226Z [W1204 10:29:13.434090835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388227Z 2025-12-04T10:58:28.2388373Z [W1204 10:29:13.434156434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388376Z 2025-12-04T10:58:28.2388545Z [W1204 10:29:13.434246963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388547Z 2025-12-04T10:58:28.2388699Z [W1204 10:29:13.434294052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388702Z 2025-12-04T10:58:28.2388851Z [W1204 10:29:13.434378311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2388853Z 2025-12-04T10:58:28.2389002Z [W1204 10:29:13.434422290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389004Z 2025-12-04T10:58:28.2389152Z [W1204 10:29:13.434492249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389154Z 2025-12-04T10:58:28.2389303Z [W1204 10:29:13.434533308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389306Z 2025-12-04T10:58:28.2389454Z [W1204 10:29:13.436807583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389483Z 2025-12-04T10:58:28.2389631Z [W1204 10:29:13.436892682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389633Z 2025-12-04T10:58:28.2389782Z [W1204 10:29:13.436948101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389784Z 2025-12-04T10:58:28.2389931Z [W1204 10:29:13.437034019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2389933Z 2025-12-04T10:58:28.2390083Z [W1204 10:29:13.437079529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390085Z 2025-12-04T10:58:28.2390235Z [W1204 10:29:13.437157717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390238Z 2025-12-04T10:58:28.2390386Z [W1204 10:29:13.437199867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390388Z 2025-12-04T10:58:28.2390538Z [W1204 10:29:13.437266676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390539Z 2025-12-04T10:58:28.2390687Z [W1204 10:29:13.437307285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390689Z 2025-12-04T10:58:28.2390837Z [W1204 10:29:13.471970508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390839Z 2025-12-04T10:58:28.2390989Z [W1204 10:29:13.472063596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2390993Z 2025-12-04T10:58:28.2391142Z [W1204 10:29:13.472122245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391144Z 2025-12-04T10:58:28.2391292Z [W1204 10:29:13.472210254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391294Z 2025-12-04T10:58:28.2391442Z [W1204 10:29:13.472255303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391443Z 2025-12-04T10:58:28.2391592Z [W1204 10:29:13.472336202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391594Z 2025-12-04T10:58:28.2391761Z [W1204 10:29:13.472379781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391763Z 2025-12-04T10:58:28.2391914Z [W1204 10:29:13.472448610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2391916Z 2025-12-04T10:58:28.2392069Z [W1204 10:29:13.472490570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2392071Z 2025-12-04T10:58:28.2392119Z ('RERUN', {'yellow': True}) [0.6937s] [100%] 2025-12-04T10:58:28.2392489Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:29:13.165754893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2392491Z 2025-12-04T10:58:28.2392641Z [W1204 10:29:13.165908770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2392682Z 2025-12-04T10:58:28.2392831Z [W1204 10:29:13.165965410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2392833Z 2025-12-04T10:58:28.2392981Z [W1204 10:29:13.166064938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2392984Z 2025-12-04T10:58:28.2393131Z [W1204 10:29:13.166114767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393133Z 2025-12-04T10:58:28.2393329Z [W1204 10:29:13.166199076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393331Z 2025-12-04T10:58:28.2393480Z [W1204 10:29:13.166241645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393482Z 2025-12-04T10:58:28.2393632Z [W1204 10:29:13.166310224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393634Z 2025-12-04T10:58:28.2393782Z [W1204 10:29:13.166351114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393784Z 2025-12-04T10:58:28.2393932Z [W1204 10:29:13.168619088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2393934Z 2025-12-04T10:58:28.2394083Z [W1204 10:29:13.168699847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394085Z 2025-12-04T10:58:28.2394236Z [W1204 10:29:13.168752506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394238Z 2025-12-04T10:58:28.2394387Z [W1204 10:29:13.168829645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394390Z 2025-12-04T10:58:28.2394538Z [W1204 10:29:13.168872884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394540Z 2025-12-04T10:58:28.2394688Z [W1204 10:29:13.168950453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394690Z 2025-12-04T10:58:28.2394839Z [W1204 10:29:13.168992683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2394842Z 2025-12-04T10:58:28.2424136Z [W1204 10:29:13.169065391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424143Z 2025-12-04T10:58:28.2424351Z [W1204 10:29:13.169107771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424360Z 2025-12-04T10:58:28.2424515Z [W1204 10:29:13.203496538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424517Z 2025-12-04T10:58:28.2424674Z [W1204 10:29:13.203582097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424676Z 2025-12-04T10:58:28.2424824Z [W1204 10:29:13.203637646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424826Z 2025-12-04T10:58:28.2424978Z [W1204 10:29:13.203722844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2424984Z 2025-12-04T10:58:28.2425134Z [W1204 10:29:13.203766634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2425179Z 2025-12-04T10:58:28.2425327Z [W1204 10:29:13.203847392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2425329Z 2025-12-04T10:58:28.2425479Z [W1204 10:29:13.203889822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2425481Z 2025-12-04T10:58:28.2425629Z [W1204 10:29:13.203958121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2425631Z 2025-12-04T10:58:28.2425781Z [W1204 10:29:13.203999490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2425783Z 2025-12-04T10:58:28.2425828Z FAILED [0.7229s] [100%] 2025-12-04T10:58:28.2425831Z 2025-12-04T10:58:28.2425889Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2426059Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2426110Z Traceback (most recent call last): 2025-12-04T10:58:28.2426282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2426326Z method(*args, **kwargs) 2025-12-04T10:58:28.2426481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2426520Z method(*args, **kwargs) 2025-12-04T10:58:28.2426672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2426716Z with policy(): 2025-12-04T10:58:28.2426873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2426917Z raise RuntimeError(msg) 2025-12-04T10:58:28.2427337Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 24576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2427340Z 2025-12-04T10:58:28.2427417Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2427727Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2427730Z 2025-12-04T10:58:28.2427845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2427926Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2427989Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2428176Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2428251Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2428289Z graph_break [] 2025-12-04T10:58:28.2428451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2428501Z Traceback (most recent call last): 2025-12-04T10:58:28.2428658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2428699Z method(*args, **kwargs) 2025-12-04T10:58:28.2428851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2428914Z method(*args, **kwargs) 2025-12-04T10:58:28.2429064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2429102Z with policy(): 2025-12-04T10:58:28.2429257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2429298Z raise RuntimeError(msg) 2025-12-04T10:58:28.2429720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 24576 and is now reported as 49152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2429723Z 2025-12-04T10:58:28.2429797Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2430101Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2430103Z 2025-12-04T10:58:28.2430191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2430266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2430325Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2430509Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2430581Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2430621Z graph_break [] 2025-12-04T10:58:28.2430694Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2430754Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2430824Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2431002Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2431038Z graph_break [] 2025-12-04T10:58:28.2431090Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2431254Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2431299Z Traceback (most recent call last): 2025-12-04T10:58:28.2431475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2431514Z method(*args, **kwargs) 2025-12-04T10:58:28.2431667Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2431706Z method(*args, **kwargs) 2025-12-04T10:58:28.2431858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2431894Z with policy(): 2025-12-04T10:58:28.2432046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2432086Z raise RuntimeError(msg) 2025-12-04T10:58:28.2432503Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2432505Z 2025-12-04T10:58:28.2432603Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2432904Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2432906Z 2025-12-04T10:58:28.2432991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2433065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2433123Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2433466Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2433541Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2433577Z graph_break [] 2025-12-04T10:58:28.2433652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2433709Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2433780Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2433956Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2433992Z graph_break [] 2025-12-04T10:58:28.2434064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2434122Z stats [('calls_captured', 18), ('unique_graphs', 1)] 2025-12-04T10:58:28.2434193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2434371Z inductor [('pattern_matcher_nodes', 18), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 9), ('woq_matcher_count', 3), ('extern_calls', 3), ('fxgraph_cache_miss', 1)] 2025-12-04T10:58:28.2434408Z graph_break [] 2025-12-04T10:58:28.2434659Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4d1acc24202be3b8.xml - 2025-12-04T10:58:28.2434719Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2435382Z FAILED [0.7229s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 49152 and is now reported as 73728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2435426Z 2025-12-04T10:58:28.2435500Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2435799Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2435801Z 2025-12-04T10:58:28.2435887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2435949Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2436018Z ================== 1 failed, 57 deselected, 2 rerun in 4.45s =================== 2025-12-04T10:58:28.2436054Z Got exit code 1 2025-12-04T10:58:28.2436303Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2436434Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2436665Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce631f215be3e5e5.xml 2025-12-04T10:58:28.2436723Z ============================= test session starts ============================== 2025-12-04T10:58:28.2436838Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2436879Z cachedir: .pytest_cache 2025-12-04T10:58:28.2437040Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2437087Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2437128Z configfile: pytest.ini 2025-12-04T10:58:28.2437294Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2437369Z collecting ... collected 58 items / 4 deselected / 54 selected 2025-12-04T10:58:28.2437423Z stepcurrent: skipping 4 already run items. 2025-12-04T10:58:28.2437468Z Running 54 items in this shard 2025-12-04T10:58:28.2437470Z 2025-12-04T10:58:28.2437726Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.7724s] [ 1%] 2025-12-04T10:58:28.2437975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6277s] [ 1%] 2025-12-04T10:58:28.2438201Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6276s] [ 1%] 2025-12-04T10:58:28.2438204Z 2025-12-04T10:58:28.2438258Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2438416Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2438462Z Traceback (most recent call last): 2025-12-04T10:58:28.2438620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2438659Z method(*args, **kwargs) 2025-12-04T10:58:28.2438811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2438850Z method(*args, **kwargs) 2025-12-04T10:58:28.2439000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2439036Z with policy(): 2025-12-04T10:58:28.2439209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2439249Z raise RuntimeError(msg) 2025-12-04T10:58:28.2439654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2439656Z 2025-12-04T10:58:28.2439730Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2440025Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2440027Z 2025-12-04T10:58:28.2440112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2440187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2440245Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2440551Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2440624Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2440660Z graph_break [] 2025-12-04T10:58:28.2440815Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2440860Z Traceback (most recent call last): 2025-12-04T10:58:28.2441015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2441054Z method(*args, **kwargs) 2025-12-04T10:58:28.2441206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2441247Z method(*args, **kwargs) 2025-12-04T10:58:28.2441397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2441432Z with policy(): 2025-12-04T10:58:28.2441584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2441623Z raise RuntimeError(msg) 2025-12-04T10:58:28.2442033Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2442035Z 2025-12-04T10:58:28.2442113Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2442407Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2442410Z 2025-12-04T10:58:28.2442496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2442569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2442628Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2442902Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2442997Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2443033Z graph_break [] 2025-12-04T10:58:28.2443107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2443163Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2443234Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2443537Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2443573Z graph_break [] 2025-12-04T10:58:28.2443625Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2443780Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2443825Z Traceback (most recent call last): 2025-12-04T10:58:28.2443979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2444051Z method(*args, **kwargs) 2025-12-04T10:58:28.2444201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2444240Z method(*args, **kwargs) 2025-12-04T10:58:28.2444391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2444428Z with policy(): 2025-12-04T10:58:28.2444580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2444620Z raise RuntimeError(msg) 2025-12-04T10:58:28.2445031Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2445035Z 2025-12-04T10:58:28.2445109Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2445401Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2445403Z 2025-12-04T10:58:28.2445490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2445564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2445620Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2445893Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2445968Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2446003Z graph_break [] 2025-12-04T10:58:28.2446075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2446129Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2446201Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2446472Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2446507Z graph_break [] 2025-12-04T10:58:28.2446580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2446658Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2446730Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2447002Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2447038Z graph_break [] 2025-12-04T10:58:28.2447285Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce631f215be3e5e5.xml - 2025-12-04T10:58:28.2447348Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2448001Z FAILED [0.6276s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2448031Z 2025-12-04T10:58:28.2448103Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2448393Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2448395Z 2025-12-04T10:58:28.2448481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2448543Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2448610Z =================== 1 failed, 4 deselected, 2 rerun in 4.20s =================== 2025-12-04T10:58:28.2448649Z Got exit code 1 2025-12-04T10:58:28.2448688Z Retrying single test... 2025-12-04T10:58:28.2448890Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c7dcedba4bfe30a6.xml 2025-12-04T10:58:28.2448947Z ============================= test session starts ============================== 2025-12-04T10:58:28.2449058Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2449100Z cachedir: .pytest_cache 2025-12-04T10:58:28.2449260Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2449305Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2449345Z configfile: pytest.ini 2025-12-04T10:58:28.2449509Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2449584Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2449876Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2449922Z Running 1 items in this shard 2025-12-04T10:58:28.2449924Z 2025-12-04T10:58:28.2450293Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:34.041048889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2450296Z 2025-12-04T10:58:28.2450450Z [W1204 10:29:35.306257410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2450452Z 2025-12-04T10:58:28.2450633Z [W1204 10:29:35.306418858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2450637Z 2025-12-04T10:58:28.2450786Z [W1204 10:29:35.309761036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2450788Z 2025-12-04T10:58:28.2450938Z [W1204 10:29:35.310093121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2450940Z 2025-12-04T10:58:28.2451088Z [W1204 10:29:35.310159420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2451090Z 2025-12-04T10:58:28.2451238Z [W1204 10:29:35.312493993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2451240Z 2025-12-04T10:58:28.2451390Z [W1204 10:29:35.312776329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2451416Z 2025-12-04T10:58:28.2451564Z [W1204 10:29:35.312835798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2451565Z 2025-12-04T10:58:28.2451616Z ('RERUN', {'yellow': True}) [2.9269s] [100%] 2025-12-04T10:58:28.2451983Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:35.011123199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2451985Z 2025-12-04T10:58:28.2452134Z [W1204 10:29:35.011486254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452136Z 2025-12-04T10:58:28.2452285Z [W1204 10:29:35.011548253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452290Z 2025-12-04T10:58:28.2452437Z [W1204 10:29:35.012804293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452439Z 2025-12-04T10:58:28.2452587Z [W1204 10:29:35.013063269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452589Z 2025-12-04T10:58:28.2452736Z [W1204 10:29:35.013124738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452737Z 2025-12-04T10:58:28.2452884Z [W1204 10:29:35.015190296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2452886Z 2025-12-04T10:58:28.2453034Z [W1204 10:29:35.015454582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2453038Z 2025-12-04T10:58:28.2453186Z [W1204 10:29:35.015514451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2453188Z 2025-12-04T10:58:28.2453236Z ('RERUN', {'yellow': True}) [0.6558s] [100%] 2025-12-04T10:58:28.2453634Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:36.738570229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2453636Z 2025-12-04T10:58:28.2453786Z [W1204 10:29:36.738929523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2453788Z 2025-12-04T10:58:28.2453962Z [W1204 10:29:36.738993792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2453966Z 2025-12-04T10:58:28.2454115Z [W1204 10:29:36.740248483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454117Z 2025-12-04T10:58:28.2454267Z [W1204 10:29:36.740506299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454269Z 2025-12-04T10:58:28.2454416Z [W1204 10:29:36.740563918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454418Z 2025-12-04T10:58:28.2454567Z [W1204 10:29:36.742605376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454569Z 2025-12-04T10:58:28.2454717Z [W1204 10:29:36.742865962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454719Z 2025-12-04T10:58:28.2454900Z [W1204 10:29:36.742924332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2454902Z 2025-12-04T10:58:28.2454940Z FAILED [0.7176s] [100%] 2025-12-04T10:58:28.2454942Z 2025-12-04T10:58:28.2454994Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2455146Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2455192Z Traceback (most recent call last): 2025-12-04T10:58:28.2455349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2455390Z method(*args, **kwargs) 2025-12-04T10:58:28.2455544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2455584Z method(*args, **kwargs) 2025-12-04T10:58:28.2455737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2455773Z with policy(): 2025-12-04T10:58:28.2455926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2455966Z raise RuntimeError(msg) 2025-12-04T10:58:28.2456369Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2456372Z 2025-12-04T10:58:28.2456447Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2456742Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2456746Z 2025-12-04T10:58:28.2456833Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2456907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2456964Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2457238Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2457310Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2457347Z graph_break [] 2025-12-04T10:58:28.2457521Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2457568Z Traceback (most recent call last): 2025-12-04T10:58:28.2457721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2457761Z method(*args, **kwargs) 2025-12-04T10:58:28.2457913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2457952Z method(*args, **kwargs) 2025-12-04T10:58:28.2458103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2458138Z with policy(): 2025-12-04T10:58:28.2458291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2458332Z raise RuntimeError(msg) 2025-12-04T10:58:28.2458744Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2458771Z 2025-12-04T10:58:28.2458844Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2459136Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2459138Z 2025-12-04T10:58:28.2459223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2459297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2459354Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2459625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2459702Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2459737Z graph_break [] 2025-12-04T10:58:28.2459813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2459867Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2459938Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2460209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2460245Z graph_break [] 2025-12-04T10:58:28.2460296Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2460451Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2460495Z Traceback (most recent call last): 2025-12-04T10:58:28.2460648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2460688Z method(*args, **kwargs) 2025-12-04T10:58:28.2461043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2461084Z method(*args, **kwargs) 2025-12-04T10:58:28.2461235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2461299Z with policy(): 2025-12-04T10:58:28.2461454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2461496Z raise RuntimeError(msg) 2025-12-04T10:58:28.2461909Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2461912Z 2025-12-04T10:58:28.2461985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2462276Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2462279Z 2025-12-04T10:58:28.2462366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2462471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2462526Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2462796Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2462868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2462903Z graph_break [] 2025-12-04T10:58:28.2462976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2463029Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2463100Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2463397Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2463440Z graph_break [] 2025-12-04T10:58:28.2463513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2463569Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2463639Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2463910Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2463945Z graph_break [] 2025-12-04T10:58:28.2464195Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c7dcedba4bfe30a6.xml - 2025-12-04T10:58:28.2464254Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2464903Z FAILED [0.7176s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2464905Z 2025-12-04T10:58:28.2464978Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2465297Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2465301Z 2025-12-04T10:58:28.2465387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2465448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2465515Z ================== 1 failed, 57 deselected, 2 rerun in 4.46s =================== 2025-12-04T10:58:28.2465551Z Got exit code 1 2025-12-04T10:58:28.2465591Z Retrying single test... 2025-12-04T10:58:28.2465788Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64dfbac658476aca.xml 2025-12-04T10:58:28.2465845Z ============================= test session starts ============================== 2025-12-04T10:58:28.2465955Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2465995Z cachedir: .pytest_cache 2025-12-04T10:58:28.2466154Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2466232Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2466271Z configfile: pytest.ini 2025-12-04T10:58:28.2466433Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2466506Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2466792Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2466837Z Running 1 items in this shard 2025-12-04T10:58:28.2466839Z 2025-12-04T10:58:28.2467203Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:46.554778556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467207Z 2025-12-04T10:58:28.2467364Z [W1204 10:29:46.809745057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467367Z 2025-12-04T10:58:28.2467516Z [W1204 10:29:46.809898685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467519Z 2025-12-04T10:58:28.2467667Z [W1204 10:29:46.813318992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467669Z 2025-12-04T10:58:28.2467817Z [W1204 10:29:46.813632127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467819Z 2025-12-04T10:58:28.2467969Z [W1204 10:29:46.813697476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2467972Z 2025-12-04T10:58:28.2468122Z [W1204 10:29:46.815853132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2468124Z 2025-12-04T10:58:28.2468271Z [W1204 10:29:46.816131668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2468273Z 2025-12-04T10:58:28.2468421Z [W1204 10:29:46.816194207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2468423Z 2025-12-04T10:58:28.2468471Z ('RERUN', {'yellow': True}) [2.9363s] [100%] 2025-12-04T10:58:28.2468858Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:47.515601864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2468861Z 2025-12-04T10:58:28.2469011Z [W1204 10:29:47.515970838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469013Z 2025-12-04T10:58:28.2469162Z [W1204 10:29:47.516039807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469164Z 2025-12-04T10:58:28.2469312Z [W1204 10:29:47.517303788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469314Z 2025-12-04T10:58:28.2469462Z [W1204 10:29:47.517556234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469464Z 2025-12-04T10:58:28.2469615Z [W1204 10:29:47.517614753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469640Z 2025-12-04T10:58:28.2469790Z [W1204 10:29:47.519656041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469792Z 2025-12-04T10:58:28.2469939Z [W1204 10:29:47.519919997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2469941Z 2025-12-04T10:58:28.2470091Z [W1204 10:29:47.519980176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2470093Z 2025-12-04T10:58:28.2470141Z ('RERUN', {'yellow': True}) [0.5706s] [100%] 2025-12-04T10:58:28.2470504Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:29:47.067467666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2470508Z 2025-12-04T10:58:28.2470656Z [W1204 10:29:47.067823321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2470660Z 2025-12-04T10:58:28.2470808Z [W1204 10:29:47.067887770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2470810Z 2025-12-04T10:58:28.2470959Z [W1204 10:29:47.069154510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2470961Z 2025-12-04T10:58:28.2471109Z [W1204 10:29:47.069409986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2471111Z 2025-12-04T10:58:28.2471262Z [W1204 10:29:47.069469045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2471265Z 2025-12-04T10:58:28.2471413Z [W1204 10:29:47.071474364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2471414Z 2025-12-04T10:58:28.2471563Z [W1204 10:29:47.071736520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2471565Z 2025-12-04T10:58:28.2471715Z [W1204 10:29:47.071796639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2471716Z 2025-12-04T10:58:28.2471754Z FAILED [0.5336s] [100%] 2025-12-04T10:58:28.2471756Z 2025-12-04T10:58:28.2471807Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2471980Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2472027Z Traceback (most recent call last): 2025-12-04T10:58:28.2472184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2472225Z method(*args, **kwargs) 2025-12-04T10:58:28.2472376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2472416Z method(*args, **kwargs) 2025-12-04T10:58:28.2472566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2472604Z with policy(): 2025-12-04T10:58:28.2472755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2472797Z raise RuntimeError(msg) 2025-12-04T10:58:28.2473201Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2473228Z 2025-12-04T10:58:28.2473331Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2473623Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2473626Z 2025-12-04T10:58:28.2473711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2473784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2473841Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2474116Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2474191Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2474228Z graph_break [] 2025-12-04T10:58:28.2474380Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2474427Z Traceback (most recent call last): 2025-12-04T10:58:28.2474580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2474620Z method(*args, **kwargs) 2025-12-04T10:58:28.2474770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2474812Z method(*args, **kwargs) 2025-12-04T10:58:28.2474962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2475001Z with policy(): 2025-12-04T10:58:28.2475154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2475195Z raise RuntimeError(msg) 2025-12-04T10:58:28.2475608Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2475612Z 2025-12-04T10:58:28.2475685Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2476008Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2476012Z 2025-12-04T10:58:28.2476098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2476172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2476227Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2476500Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2476572Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2476608Z graph_break [] 2025-12-04T10:58:28.2476681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2476736Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2476843Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2477113Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2477148Z graph_break [] 2025-12-04T10:58:28.2477201Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2477354Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2477399Z Traceback (most recent call last): 2025-12-04T10:58:28.2477552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2477594Z method(*args, **kwargs) 2025-12-04T10:58:28.2477744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2477786Z method(*args, **kwargs) 2025-12-04T10:58:28.2477935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2477972Z with policy(): 2025-12-04T10:58:28.2478122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2478163Z raise RuntimeError(msg) 2025-12-04T10:58:28.2478574Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2478578Z 2025-12-04T10:58:28.2478651Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2478943Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2478945Z 2025-12-04T10:58:28.2479030Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2479103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2479157Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2479433Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2479526Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2479563Z graph_break [] 2025-12-04T10:58:28.2479637Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2479692Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2479763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2480034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2480070Z graph_break [] 2025-12-04T10:58:28.2480142Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2480198Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2480269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2480542Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2480604Z graph_break [] 2025-12-04T10:58:28.2480850Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-64dfbac658476aca.xml - 2025-12-04T10:58:28.2480909Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2481561Z FAILED [0.5336s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2481565Z 2025-12-04T10:58:28.2481636Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2481928Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2481930Z 2025-12-04T10:58:28.2482017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2482078Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2482145Z ================== 1 failed, 57 deselected, 2 rerun in 4.20s =================== 2025-12-04T10:58:28.2482182Z Got exit code 1 2025-12-04T10:58:28.2482431Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2482561Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2482760Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b9f5b15c9f842fa.xml 2025-12-04T10:58:28.2482816Z ============================= test session starts ============================== 2025-12-04T10:58:28.2482926Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2482966Z cachedir: .pytest_cache 2025-12-04T10:58:28.2483125Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2483171Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2483211Z configfile: pytest.ini 2025-12-04T10:58:28.2483436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2483512Z collecting ... collected 58 items / 5 deselected / 53 selected 2025-12-04T10:58:28.2483563Z stepcurrent: skipping 5 already run items. 2025-12-04T10:58:28.2483608Z Running 53 items in this shard 2025-12-04T10:58:28.2483610Z 2025-12-04T10:58:28.2484234Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/167814 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 1%] 2025-12-04T10:58:28.2484486Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8705s] [ 3%] 2025-12-04T10:58:28.2484767Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4477s] [ 3%] 2025-12-04T10:58:28.2484989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4466s] [ 3%] 2025-12-04T10:58:28.2484992Z 2025-12-04T10:58:28.2485043Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2485191Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2485237Z Traceback (most recent call last): 2025-12-04T10:58:28.2485396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2485436Z method(*args, **kwargs) 2025-12-04T10:58:28.2485589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2485629Z method(*args, **kwargs) 2025-12-04T10:58:28.2485779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2485816Z with policy(): 2025-12-04T10:58:28.2485967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2486008Z raise RuntimeError(msg) 2025-12-04T10:58:28.2486411Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2486414Z 2025-12-04T10:58:28.2486490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2486783Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2486785Z 2025-12-04T10:58:28.2486872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2486947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2487002Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2487299Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2487372Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2487410Z graph_break [] 2025-12-04T10:58:28.2487560Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2487606Z Traceback (most recent call last): 2025-12-04T10:58:28.2487758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2487800Z method(*args, **kwargs) 2025-12-04T10:58:28.2487949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2487988Z method(*args, **kwargs) 2025-12-04T10:58:28.2488138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2488176Z with policy(): 2025-12-04T10:58:28.2488328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2488392Z raise RuntimeError(msg) 2025-12-04T10:58:28.2488799Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2488802Z 2025-12-04T10:58:28.2488876Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2489168Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2489170Z 2025-12-04T10:58:28.2489257Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2489330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2489387Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2489659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2489731Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2489767Z graph_break [] 2025-12-04T10:58:28.2489840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2489896Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2489967Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2490241Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2490279Z graph_break [] 2025-12-04T10:58:28.2490330Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2490482Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2490526Z Traceback (most recent call last): 2025-12-04T10:58:28.2490679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2490719Z method(*args, **kwargs) 2025-12-04T10:58:28.2490870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2490909Z method(*args, **kwargs) 2025-12-04T10:58:28.2491091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2491130Z with policy(): 2025-12-04T10:58:28.2491286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2491327Z raise RuntimeError(msg) 2025-12-04T10:58:28.2491734Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2491736Z 2025-12-04T10:58:28.2491809Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2492102Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2492134Z 2025-12-04T10:58:28.2492220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2492294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2492350Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2492622Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2492696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2492732Z graph_break [] 2025-12-04T10:58:28.2492804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2492862Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2492933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2493209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2493245Z graph_break [] 2025-12-04T10:58:28.2493346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2493402Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2493474Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2493751Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2493786Z graph_break [] 2025-12-04T10:58:28.2494031Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b9f5b15c9f842fa.xml - 2025-12-04T10:58:28.2494091Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2494735Z FAILED [0.4466s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2494737Z 2025-12-04T10:58:28.2494845Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2495135Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2495139Z 2025-12-04T10:58:28.2495225Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2495287Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2495361Z ============= 1 failed, 1 skipped, 5 deselected, 2 rerun in 3.93s ============== 2025-12-04T10:58:28.2495398Z Got exit code 1 2025-12-04T10:58:28.2495438Z Retrying single test... 2025-12-04T10:58:28.2495637Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ffc5a8a3942bba0d.xml 2025-12-04T10:58:28.2495695Z ============================= test session starts ============================== 2025-12-04T10:58:28.2495807Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2495882Z cachedir: .pytest_cache 2025-12-04T10:58:28.2496041Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2496088Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2496127Z configfile: pytest.ini 2025-12-04T10:58:28.2496290Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2496363Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2496654Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2496697Z Running 1 items in this shard 2025-12-04T10:58:28.2496702Z 2025-12-04T10:58:28.2497069Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:07.786421518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497073Z 2025-12-04T10:58:28.2497229Z [W1204 10:30:07.054183613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497231Z 2025-12-04T10:58:28.2497384Z [W1204 10:30:07.054358650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497386Z 2025-12-04T10:58:28.2497541Z [W1204 10:30:07.058338928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497543Z 2025-12-04T10:58:28.2497695Z [W1204 10:30:07.058652523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497700Z 2025-12-04T10:58:28.2497849Z [W1204 10:30:07.058715282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2497851Z 2025-12-04T10:58:28.2498001Z [W1204 10:30:07.060965428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2498003Z 2025-12-04T10:58:28.2498151Z [W1204 10:30:07.061243443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2498153Z 2025-12-04T10:58:28.2498302Z [W1204 10:30:07.061304842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2498304Z 2025-12-04T10:58:28.2498378Z ('RERUN', {'yellow': True}) [3.2779s] [100%] 2025-12-04T10:58:28.2498745Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:08.882966722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2498749Z 2025-12-04T10:58:28.2498899Z [W1204 10:30:08.883382156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2498901Z 2025-12-04T10:58:28.2499050Z [W1204 10:30:08.883460414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499052Z 2025-12-04T10:58:28.2499202Z [W1204 10:30:08.884743174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499204Z 2025-12-04T10:58:28.2499354Z [W1204 10:30:08.885015420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499380Z 2025-12-04T10:58:28.2499530Z [W1204 10:30:08.885080299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499532Z 2025-12-04T10:58:28.2499682Z [W1204 10:30:08.887167027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499685Z 2025-12-04T10:58:28.2499835Z [W1204 10:30:08.887431853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499836Z 2025-12-04T10:58:28.2499986Z [W1204 10:30:08.887491662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2499988Z 2025-12-04T10:58:28.2500037Z ('RERUN', {'yellow': True}) [0.6894s] [100%] 2025-12-04T10:58:28.2500404Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:09.532746043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2500407Z 2025-12-04T10:58:28.2500557Z [W1204 10:30:09.533198816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2500559Z 2025-12-04T10:58:28.2500709Z [W1204 10:30:09.533284775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2500710Z 2025-12-04T10:58:28.2500861Z [W1204 10:30:09.534615044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2500863Z 2025-12-04T10:58:28.2501014Z [W1204 10:30:09.534885420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2501016Z 2025-12-04T10:58:28.2501168Z [W1204 10:30:09.534949199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2501170Z 2025-12-04T10:58:28.2501318Z [W1204 10:30:09.537057506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2501320Z 2025-12-04T10:58:28.2501470Z [W1204 10:30:09.537321612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2501472Z 2025-12-04T10:58:28.2501622Z [W1204 10:30:09.537382961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2501626Z 2025-12-04T10:58:28.2501664Z FAILED [0.6370s] [100%] 2025-12-04T10:58:28.2501666Z 2025-12-04T10:58:28.2501741Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2501893Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2501941Z Traceback (most recent call last): 2025-12-04T10:58:28.2502100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2502141Z method(*args, **kwargs) 2025-12-04T10:58:28.2502296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2502337Z method(*args, **kwargs) 2025-12-04T10:58:28.2502489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2502526Z with policy(): 2025-12-04T10:58:28.2502681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2502723Z raise RuntimeError(msg) 2025-12-04T10:58:28.2503159Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2503162Z 2025-12-04T10:58:28.2503238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2503603Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2503605Z 2025-12-04T10:58:28.2503695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2503773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2503831Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2504117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2504192Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2504229Z graph_break [] 2025-12-04T10:58:28.2504383Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2504429Z Traceback (most recent call last): 2025-12-04T10:58:28.2504585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2504627Z method(*args, **kwargs) 2025-12-04T10:58:28.2504783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2504826Z method(*args, **kwargs) 2025-12-04T10:58:28.2504981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2505020Z with policy(): 2025-12-04T10:58:28.2505177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2505219Z raise RuntimeError(msg) 2025-12-04T10:58:28.2505635Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2505637Z 2025-12-04T10:58:28.2505752Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2506051Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2506056Z 2025-12-04T10:58:28.2506143Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2506220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2506277Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2506559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2506635Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2506672Z graph_break [] 2025-12-04T10:58:28.2506747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2506837Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2506909Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2507192Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2507229Z graph_break [] 2025-12-04T10:58:28.2507282Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2507438Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2507484Z Traceback (most recent call last): 2025-12-04T10:58:28.2507643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2507686Z method(*args, **kwargs) 2025-12-04T10:58:28.2507841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2507882Z method(*args, **kwargs) 2025-12-04T10:58:28.2508039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2508078Z with policy(): 2025-12-04T10:58:28.2508234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2508276Z raise RuntimeError(msg) 2025-12-04T10:58:28.2508693Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2508698Z 2025-12-04T10:58:28.2508773Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2509072Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2509074Z 2025-12-04T10:58:28.2509162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2509237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2509294Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2509596Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2509672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2509709Z graph_break [] 2025-12-04T10:58:28.2509784Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2509843Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2509915Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2510196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2510232Z graph_break [] 2025-12-04T10:58:28.2510306Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2510363Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2510437Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2510741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2510778Z graph_break [] 2025-12-04T10:58:28.2511031Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ffc5a8a3942bba0d.xml - 2025-12-04T10:58:28.2511091Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2511752Z FAILED [0.6370s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2511756Z 2025-12-04T10:58:28.2511831Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2512133Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2512136Z 2025-12-04T10:58:28.2512222Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2512287Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2512354Z ================== 1 failed, 57 deselected, 2 rerun in 4.77s =================== 2025-12-04T10:58:28.2512395Z Got exit code 1 2025-12-04T10:58:28.2512435Z Retrying single test... 2025-12-04T10:58:28.2512640Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ba967d6f2b278a3f.xml 2025-12-04T10:58:28.2512699Z ============================= test session starts ============================== 2025-12-04T10:58:28.2512812Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2512857Z cachedir: .pytest_cache 2025-12-04T10:58:28.2513027Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2513077Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2513120Z configfile: pytest.ini 2025-12-04T10:58:28.2513319Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2513427Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2513740Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2513790Z Running 1 items in this shard 2025-12-04T10:58:28.2513792Z 2025-12-04T10:58:28.2514191Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:19.642905607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2514194Z 2025-12-04T10:58:28.2514357Z [W1204 10:30:19.936888697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2514359Z 2025-12-04T10:58:28.2514524Z [W1204 10:30:19.937057484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2514559Z 2025-12-04T10:58:28.2514720Z [W1204 10:30:19.940718958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2514722Z 2025-12-04T10:58:28.2514880Z [W1204 10:30:19.941029343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2514882Z 2025-12-04T10:58:28.2515041Z [W1204 10:30:19.941094682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2515043Z 2025-12-04T10:58:28.2515203Z [W1204 10:30:19.943285388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2515205Z 2025-12-04T10:58:28.2515368Z [W1204 10:30:19.943566524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2515372Z 2025-12-04T10:58:28.2515531Z [W1204 10:30:19.943633062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2515534Z 2025-12-04T10:58:28.2515587Z ('RERUN', {'yellow': True}) [3.2844s] [100%] 2025-12-04T10:58:28.2515975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:20.773804644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2515978Z 2025-12-04T10:58:28.2516138Z [W1204 10:30:20.774199618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516140Z 2025-12-04T10:58:28.2516302Z [W1204 10:30:20.774271827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516305Z 2025-12-04T10:58:28.2516464Z [W1204 10:30:20.775575967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516466Z 2025-12-04T10:58:28.2516629Z [W1204 10:30:20.775838832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516631Z 2025-12-04T10:58:28.2516791Z [W1204 10:30:20.775903121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516793Z 2025-12-04T10:58:28.2516952Z [W1204 10:30:20.777992169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2516954Z 2025-12-04T10:58:28.2517139Z [W1204 10:30:20.778262945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2517141Z 2025-12-04T10:58:28.2517303Z [W1204 10:30:20.778326514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2517305Z 2025-12-04T10:58:28.2517358Z ('RERUN', {'yellow': True}) [0.7213s] [100%] 2025-12-04T10:58:28.2517744Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:30:21.472967103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2517747Z 2025-12-04T10:58:28.2517906Z [W1204 10:30:21.473395396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2517908Z 2025-12-04T10:58:28.2518071Z [W1204 10:30:21.473483945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518073Z 2025-12-04T10:58:28.2518266Z [W1204 10:30:21.474805325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518268Z 2025-12-04T10:58:28.2518427Z [W1204 10:30:21.475091280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518429Z 2025-12-04T10:58:28.2518587Z [W1204 10:30:21.475159849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518589Z 2025-12-04T10:58:28.2518750Z [W1204 10:30:21.477264427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518752Z 2025-12-04T10:58:28.2518915Z [W1204 10:30:21.477543252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2518917Z 2025-12-04T10:58:28.2519078Z [W1204 10:30:21.477605701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2519081Z 2025-12-04T10:58:28.2519123Z FAILED [0.6663s] [100%] 2025-12-04T10:58:28.2519125Z 2025-12-04T10:58:28.2519179Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2519342Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2519390Z Traceback (most recent call last): 2025-12-04T10:58:28.2519559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2519602Z method(*args, **kwargs) 2025-12-04T10:58:28.2519769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2519811Z method(*args, **kwargs) 2025-12-04T10:58:28.2519976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2520019Z with policy(): 2025-12-04T10:58:28.2520185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2520228Z raise RuntimeError(msg) 2025-12-04T10:58:28.2520659Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2520662Z 2025-12-04T10:58:28.2520743Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2521079Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2521083Z 2025-12-04T10:58:28.2521178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2521258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2521319Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2521614Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2521693Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2521731Z graph_break [] 2025-12-04T10:58:28.2521896Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2521970Z Traceback (most recent call last): 2025-12-04T10:58:28.2522135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2522177Z method(*args, **kwargs) 2025-12-04T10:58:28.2522340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2522382Z method(*args, **kwargs) 2025-12-04T10:58:28.2522544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2522583Z with policy(): 2025-12-04T10:58:28.2522747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2522790Z raise RuntimeError(msg) 2025-12-04T10:58:28.2523239Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2523243Z 2025-12-04T10:58:28.2523392Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2523713Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2523715Z 2025-12-04T10:58:28.2523810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2523891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2523955Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2524254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2524336Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2524374Z graph_break [] 2025-12-04T10:58:28.2524455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2524514Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2524593Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2524887Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2524957Z graph_break [] 2025-12-04T10:58:28.2525013Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2525182Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2525231Z Traceback (most recent call last): 2025-12-04T10:58:28.2525399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2525444Z method(*args, **kwargs) 2025-12-04T10:58:28.2525608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2525652Z method(*args, **kwargs) 2025-12-04T10:58:28.2525816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2525857Z with policy(): 2025-12-04T10:58:28.2526023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2526100Z raise RuntimeError(msg) 2025-12-04T10:58:28.2526543Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2526546Z 2025-12-04T10:58:28.2526626Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2526943Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2526945Z 2025-12-04T10:58:28.2527041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2527121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2527184Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2527485Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2527568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2527607Z graph_break [] 2025-12-04T10:58:28.2527686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2527746Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2527823Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2528123Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2528163Z graph_break [] 2025-12-04T10:58:28.2528243Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2528302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2528379Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2528673Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2528712Z graph_break [] 2025-12-04T10:58:28.2529002Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ba967d6f2b278a3f.xml - 2025-12-04T10:58:28.2529068Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2529768Z FAILED [0.6663s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2529772Z 2025-12-04T10:58:28.2529852Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2530172Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2530174Z 2025-12-04T10:58:28.2530267Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2530363Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2530435Z ================== 1 failed, 57 deselected, 2 rerun in 4.83s =================== 2025-12-04T10:58:28.2530476Z Got exit code 1 2025-12-04T10:58:28.2530742Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2530882Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2531100Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c4b407b36b04708.xml 2025-12-04T10:58:28.2531165Z ============================= test session starts ============================== 2025-12-04T10:58:28.2531285Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2531333Z cachedir: .pytest_cache 2025-12-04T10:58:28.2531507Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2531557Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2531600Z configfile: pytest.ini 2025-12-04T10:58:28.2531776Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2531856Z collecting ... collected 58 items / 7 deselected / 51 selected 2025-12-04T10:58:28.2531914Z stepcurrent: skipping 7 already run items. 2025-12-04T10:58:28.2531962Z Running 51 items in this shard 2025-12-04T10:58:28.2531964Z 2025-12-04T10:58:28.2532250Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5908s] [ 1%] 2025-12-04T10:58:28.2532528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4787s] [ 1%] 2025-12-04T10:58:28.2532773Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4726s] [ 1%] 2025-12-04T10:58:28.2532776Z 2025-12-04T10:58:28.2532833Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2532999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2533049Z Traceback (most recent call last): 2025-12-04T10:58:28.2533242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2533346Z method(*args, **kwargs) 2025-12-04T10:58:28.2533515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2533559Z method(*args, **kwargs) 2025-12-04T10:58:28.2533724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2533764Z with policy(): 2025-12-04T10:58:28.2533932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2533977Z raise RuntimeError(msg) 2025-12-04T10:58:28.2534416Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2534463Z 2025-12-04T10:58:28.2534543Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2534867Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2534869Z 2025-12-04T10:58:28.2534962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2535043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2535104Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2535407Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2535487Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2535530Z graph_break [] 2025-12-04T10:58:28.2535698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2535748Z Traceback (most recent call last): 2025-12-04T10:58:28.2535916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2535962Z method(*args, **kwargs) 2025-12-04T10:58:28.2536127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2536171Z method(*args, **kwargs) 2025-12-04T10:58:28.2536336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2536379Z with policy(): 2025-12-04T10:58:28.2536545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2536593Z raise RuntimeError(msg) 2025-12-04T10:58:28.2537040Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2537045Z 2025-12-04T10:58:28.2537126Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2537454Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2537456Z 2025-12-04T10:58:28.2537581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2537667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2537731Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2538032Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2538112Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2538153Z graph_break [] 2025-12-04T10:58:28.2538232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2538292Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2538371Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2538668Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2538735Z graph_break [] 2025-12-04T10:58:28.2538793Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2538962Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2539013Z Traceback (most recent call last): 2025-12-04T10:58:28.2539181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2539226Z method(*args, **kwargs) 2025-12-04T10:58:28.2539391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2539439Z method(*args, **kwargs) 2025-12-04T10:58:28.2539603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2539646Z with policy(): 2025-12-04T10:58:28.2539814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2539858Z raise RuntimeError(msg) 2025-12-04T10:58:28.2540311Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2540314Z 2025-12-04T10:58:28.2540395Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2540718Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2540722Z 2025-12-04T10:58:28.2540815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2540896Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2540957Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2541256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2541336Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2541376Z graph_break [] 2025-12-04T10:58:28.2541455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2541542Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2541620Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2541919Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2541958Z graph_break [] 2025-12-04T10:58:28.2542038Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2542098Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2542176Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2542477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2542515Z graph_break [] 2025-12-04T10:58:28.2542784Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c4b407b36b04708.xml - 2025-12-04T10:58:28.2542876Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2543613Z FAILED [0.4726s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2543616Z 2025-12-04T10:58:28.2543702Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2544024Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2544028Z 2025-12-04T10:58:28.2544124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2544195Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2544272Z =================== 1 failed, 7 deselected, 2 rerun in 3.71s =================== 2025-12-04T10:58:28.2544313Z Got exit code 1 2025-12-04T10:58:28.2544359Z Retrying single test... 2025-12-04T10:58:28.2544577Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b37b000601a97eda.xml 2025-12-04T10:58:28.2544644Z ============================= test session starts ============================== 2025-12-04T10:58:28.2544766Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2544812Z cachedir: .pytest_cache 2025-12-04T10:58:28.2544987Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2545038Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2545081Z configfile: pytest.ini 2025-12-04T10:58:28.2545260Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2545341Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2545660Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2545739Z Running 1 items in this shard 2025-12-04T10:58:28.2545741Z 2025-12-04T10:58:28.2546141Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:41.386169131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2546146Z 2025-12-04T10:58:28.2546316Z [W1204 10:30:41.655410036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2546318Z 2025-12-04T10:58:28.2546485Z [W1204 10:30:41.655589663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2546487Z 2025-12-04T10:58:28.2546653Z [W1204 10:30:41.659324225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2546655Z 2025-12-04T10:58:28.2546819Z [W1204 10:30:41.659658140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2546854Z 2025-12-04T10:58:28.2547018Z [W1204 10:30:41.659728819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2547020Z 2025-12-04T10:58:28.2547183Z [W1204 10:30:41.661986694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2547188Z 2025-12-04T10:58:28.2547350Z [W1204 10:30:41.662274679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2547352Z 2025-12-04T10:58:28.2547516Z [W1204 10:30:41.662343828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2547518Z 2025-12-04T10:58:28.2547574Z ('RERUN', {'yellow': True}) [2.9556s] [100%] 2025-12-04T10:58:28.2547974Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:42.469895316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2547978Z 2025-12-04T10:58:28.2548141Z [W1204 10:30:42.470295350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548143Z 2025-12-04T10:58:28.2548307Z [W1204 10:30:42.470377498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548309Z 2025-12-04T10:58:28.2548472Z [W1204 10:30:42.471665788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548474Z 2025-12-04T10:58:28.2548638Z [W1204 10:30:42.471934554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548642Z 2025-12-04T10:58:28.2548806Z [W1204 10:30:42.471997823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548808Z 2025-12-04T10:58:28.2548970Z [W1204 10:30:42.474100591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2548972Z 2025-12-04T10:58:28.2549135Z [W1204 10:30:42.474369547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2549136Z 2025-12-04T10:58:28.2549298Z [W1204 10:30:42.474431466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2549301Z 2025-12-04T10:58:28.2549382Z ('RERUN', {'yellow': True}) [0.6826s] [100%] 2025-12-04T10:58:28.2549780Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:42.147860878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2549784Z 2025-12-04T10:58:28.2549949Z [W1204 10:30:42.148320771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2549951Z 2025-12-04T10:58:28.2550115Z [W1204 10:30:42.148406690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550117Z 2025-12-04T10:58:28.2550278Z [W1204 10:30:42.149714709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550280Z 2025-12-04T10:58:28.2550445Z [W1204 10:30:42.149996585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550474Z 2025-12-04T10:58:28.2550638Z [W1204 10:30:42.150068024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550640Z 2025-12-04T10:58:28.2550802Z [W1204 10:30:42.152195991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550804Z 2025-12-04T10:58:28.2550967Z [W1204 10:30:42.152465627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2550969Z 2025-12-04T10:58:28.2551131Z [W1204 10:30:42.152526506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2551133Z 2025-12-04T10:58:28.2551175Z FAILED [0.6695s] [100%] 2025-12-04T10:58:28.2551179Z 2025-12-04T10:58:28.2551237Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2551408Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2551459Z Traceback (most recent call last): 2025-12-04T10:58:28.2551631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2551675Z method(*args, **kwargs) 2025-12-04T10:58:28.2551844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2551888Z method(*args, **kwargs) 2025-12-04T10:58:28.2552056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2552099Z with policy(): 2025-12-04T10:58:28.2552268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2552313Z raise RuntimeError(msg) 2025-12-04T10:58:28.2552758Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2552761Z 2025-12-04T10:58:28.2552845Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2553165Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2553167Z 2025-12-04T10:58:28.2553329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2553411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2553478Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2553780Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2553863Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2553904Z graph_break [] 2025-12-04T10:58:28.2554074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2554124Z Traceback (most recent call last): 2025-12-04T10:58:28.2554297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2554345Z method(*args, **kwargs) 2025-12-04T10:58:28.2554509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2554590Z method(*args, **kwargs) 2025-12-04T10:58:28.2554755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2554798Z with policy(): 2025-12-04T10:58:28.2554966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2555012Z raise RuntimeError(msg) 2025-12-04T10:58:28.2555459Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2555463Z 2025-12-04T10:58:28.2555546Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2555868Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2555870Z 2025-12-04T10:58:28.2555969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2556050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2556116Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2556413Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2556497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2556539Z graph_break [] 2025-12-04T10:58:28.2556622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2556685Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2556763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2557064Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2557105Z graph_break [] 2025-12-04T10:58:28.2557164Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2557332Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2557410Z Traceback (most recent call last): 2025-12-04T10:58:28.2557582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2557635Z method(*args, **kwargs) 2025-12-04T10:58:28.2557802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2557850Z method(*args, **kwargs) 2025-12-04T10:58:28.2558017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2558062Z with policy(): 2025-12-04T10:58:28.2558229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2558278Z raise RuntimeError(msg) 2025-12-04T10:58:28.2558726Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2558757Z 2025-12-04T10:58:28.2558842Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2559161Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2559167Z 2025-12-04T10:58:28.2559263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2559351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2559415Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2559722Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2559806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2559854Z graph_break [] 2025-12-04T10:58:28.2559937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2560005Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2560087Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2560396Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2560439Z graph_break [] 2025-12-04T10:58:28.2560527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2560592Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2560678Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2560977Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2561023Z graph_break [] 2025-12-04T10:58:28.2561294Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b37b000601a97eda.xml - 2025-12-04T10:58:28.2561369Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2562100Z FAILED [0.6695s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2562105Z 2025-12-04T10:58:28.2562188Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2562512Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2562514Z 2025-12-04T10:58:28.2562611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2562686Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2562765Z ================== 1 failed, 57 deselected, 2 rerun in 4.47s =================== 2025-12-04T10:58:28.2562815Z Got exit code 1 2025-12-04T10:58:28.2562862Z Retrying single test... 2025-12-04T10:58:28.2563120Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cc6f7f8c3cae832.xml 2025-12-04T10:58:28.2563188Z ============================= test session starts ============================== 2025-12-04T10:58:28.2563360Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2563409Z cachedir: .pytest_cache 2025-12-04T10:58:28.2563593Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2563648Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2563701Z configfile: pytest.ini 2025-12-04T10:58:28.2563883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2563968Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2564293Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2564343Z Running 1 items in this shard 2025-12-04T10:58:28.2564345Z 2025-12-04T10:58:28.2564749Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:52.050331949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2564752Z 2025-12-04T10:58:28.2564922Z [W1204 10:30:53.333268674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2564924Z 2025-12-04T10:58:28.2565095Z [W1204 10:30:53.333447931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565099Z 2025-12-04T10:58:28.2565266Z [W1204 10:30:53.336971496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565271Z 2025-12-04T10:58:28.2565438Z [W1204 10:30:53.337298401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565440Z 2025-12-04T10:58:28.2565608Z [W1204 10:30:53.337366360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565610Z 2025-12-04T10:58:28.2565775Z [W1204 10:30:53.339532237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565777Z 2025-12-04T10:58:28.2565993Z [W1204 10:30:53.339803843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2565997Z 2025-12-04T10:58:28.2566163Z [W1204 10:30:53.339864312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2566165Z 2025-12-04T10:58:28.2566224Z ('RERUN', {'yellow': True}) [2.9373s] [100%] 2025-12-04T10:58:28.2566626Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:53.966740256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2566631Z 2025-12-04T10:58:28.2566796Z [W1204 10:30:53.967106241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2566799Z 2025-12-04T10:58:28.2566970Z [W1204 10:30:53.967174880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567008Z 2025-12-04T10:58:28.2567174Z [W1204 10:30:53.968417610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567176Z 2025-12-04T10:58:28.2567343Z [W1204 10:30:53.968672377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567345Z 2025-12-04T10:58:28.2567512Z [W1204 10:30:53.968733316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567517Z 2025-12-04T10:58:28.2567682Z [W1204 10:30:53.970737924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567684Z 2025-12-04T10:58:28.2567856Z [W1204 10:30:53.971007000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2567860Z 2025-12-04T10:58:28.2568026Z [W1204 10:30:53.971069889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2568028Z 2025-12-04T10:58:28.2568086Z ('RERUN', {'yellow': True}) [0.4854s] [100%] 2025-12-04T10:58:28.2568484Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:30:54.457213771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2568487Z 2025-12-04T10:58:28.2568859Z [W1204 10:30:54.457608075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2568861Z 2025-12-04T10:58:28.2569062Z [W1204 10:30:54.457685994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2569068Z 2025-12-04T10:58:28.2569269Z [W1204 10:30:54.458956714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2569271Z 2025-12-04T10:58:28.2569483Z [W1204 10:30:54.459222900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2569486Z 2025-12-04T10:58:28.2569668Z [W1204 10:30:54.459285019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2569670Z 2025-12-04T10:58:28.2569891Z [W1204 10:30:54.461290948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2569893Z 2025-12-04T10:58:28.2570114Z [W1204 10:30:54.461554114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2570118Z 2025-12-04T10:58:28.2570326Z [W1204 10:30:54.461613463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2570328Z 2025-12-04T10:58:28.2570388Z FAILED [0.4903s] [100%] 2025-12-04T10:58:28.2570409Z 2025-12-04T10:58:28.2570486Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2570698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2570774Z Traceback (most recent call last): 2025-12-04T10:58:28.2570989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2571048Z method(*args, **kwargs) 2025-12-04T10:58:28.2571253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2571311Z method(*args, **kwargs) 2025-12-04T10:58:28.2571579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2571639Z with policy(): 2025-12-04T10:58:28.2571844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2571915Z raise RuntimeError(msg) 2025-12-04T10:58:28.2572383Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2572385Z 2025-12-04T10:58:28.2572500Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2572871Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2572877Z 2025-12-04T10:58:28.2573008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2573110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2573203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2573586Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2573713Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2573773Z graph_break [] 2025-12-04T10:58:28.2573986Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2574055Z Traceback (most recent call last): 2025-12-04T10:58:28.2574276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2574348Z method(*args, **kwargs) 2025-12-04T10:58:28.2574554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2574615Z method(*args, **kwargs) 2025-12-04T10:58:28.2574816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2574873Z with policy(): 2025-12-04T10:58:28.2575303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2575397Z raise RuntimeError(msg) 2025-12-04T10:58:28.2575886Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2575891Z 2025-12-04T10:58:28.2576013Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2576353Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2576356Z 2025-12-04T10:58:28.2576507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2576605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2576701Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2577035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2577178Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2577248Z graph_break [] 2025-12-04T10:58:28.2577364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2577445Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2577554Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2577876Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2577957Z graph_break [] 2025-12-04T10:58:28.2578037Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2578249Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2578331Z Traceback (most recent call last): 2025-12-04T10:58:28.2578919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2579007Z method(*args, **kwargs) 2025-12-04T10:58:28.2579200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2579278Z method(*args, **kwargs) 2025-12-04T10:58:28.2579462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2579533Z with policy(): 2025-12-04T10:58:28.2579711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2579810Z raise RuntimeError(msg) 2025-12-04T10:58:28.2580291Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2580293Z 2025-12-04T10:58:28.2580405Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2580747Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2580764Z 2025-12-04T10:58:28.2580897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2581040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2581114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2581453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2581547Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2581608Z graph_break [] 2025-12-04T10:58:28.2581729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2581828Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2581921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2582256Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2582340Z graph_break [] 2025-12-04T10:58:28.2582457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2582539Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2582648Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2582964Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2583045Z graph_break [] 2025-12-04T10:58:28.2583376Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6cc6f7f8c3cae832.xml - 2025-12-04T10:58:28.2583505Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2584273Z FAILED [0.4903s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2584276Z 2025-12-04T10:58:28.2584375Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2584742Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2584747Z 2025-12-04T10:58:28.2584855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2584976Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2585066Z ================== 1 failed, 57 deselected, 2 rerun in 4.08s =================== 2025-12-04T10:58:28.2585139Z Got exit code 1 2025-12-04T10:58:28.2585439Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2585614Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2585935Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84973f9f86b97f21.xml 2025-12-04T10:58:28.2586848Z ============================= test session starts ============================== 2025-12-04T10:58:28.2587016Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2587076Z cachedir: .pytest_cache 2025-12-04T10:58:28.2587280Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2587363Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2587450Z configfile: pytest.ini 2025-12-04T10:58:28.2587651Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2587767Z collecting ... collected 58 items / 8 deselected / 50 selected 2025-12-04T10:58:28.2587842Z stepcurrent: skipping 8 already run items. 2025-12-04T10:58:28.2587944Z Running 50 items in this shard 2025-12-04T10:58:28.2587947Z 2025-12-04T10:58:28.2588258Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6428s] [ 2%] 2025-12-04T10:58:28.2588620Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5620s] [ 2%] 2025-12-04T10:58:28.2588890Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.5606s] [ 2%] 2025-12-04T10:58:28.2588908Z 2025-12-04T10:58:28.2588980Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2589221Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2589298Z Traceback (most recent call last): 2025-12-04T10:58:28.2589511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2589572Z method(*args, **kwargs) 2025-12-04T10:58:28.2589777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2589829Z method(*args, **kwargs) 2025-12-04T10:58:28.2590057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2590113Z with policy(): 2025-12-04T10:58:28.2590315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2590374Z raise RuntimeError(msg) 2025-12-04T10:58:28.2590850Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2590855Z 2025-12-04T10:58:28.2590973Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2591338Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2591341Z 2025-12-04T10:58:28.2591474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2591577Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2591663Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2592041Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2592164Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2592219Z graph_break [] 2025-12-04T10:58:28.2592421Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2592494Z Traceback (most recent call last): 2025-12-04T10:58:28.2592710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2592779Z method(*args, **kwargs) 2025-12-04T10:58:28.2592986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2593048Z method(*args, **kwargs) 2025-12-04T10:58:28.2593293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2593345Z with policy(): 2025-12-04T10:58:28.2593576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2593683Z raise RuntimeError(msg) 2025-12-04T10:58:28.2594183Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2594186Z 2025-12-04T10:58:28.2594302Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2594648Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2594650Z 2025-12-04T10:58:28.2594807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2594913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2595017Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2595349Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2595456Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2595528Z graph_break [] 2025-12-04T10:58:28.2595656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2595742Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2595862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2596191Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2596281Z graph_break [] 2025-12-04T10:58:28.2596384Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2596583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2596669Z Traceback (most recent call last): 2025-12-04T10:58:28.2596869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2596955Z method(*args, **kwargs) 2025-12-04T10:58:28.2597196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2597273Z method(*args, **kwargs) 2025-12-04T10:58:28.2597466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2597545Z with policy(): 2025-12-04T10:58:28.2597730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2597840Z raise RuntimeError(msg) 2025-12-04T10:58:28.2598325Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2598327Z 2025-12-04T10:58:28.2598447Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2598803Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2598856Z 2025-12-04T10:58:28.2598974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2599118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2599199Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2599547Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2599649Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2599721Z graph_break [] 2025-12-04T10:58:28.2599839Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2599946Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2601546Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2601887Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2601948Z graph_break [] 2025-12-04T10:58:28.2602073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2602153Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2602265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2602595Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2602676Z graph_break [] 2025-12-04T10:58:28.2602970Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-84973f9f86b97f21.xml - 2025-12-04T10:58:28.2603090Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2603912Z FAILED [0.5606s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2603915Z 2025-12-04T10:58:28.2604059Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2604419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2604424Z 2025-12-04T10:58:28.2604530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2604655Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2604745Z =================== 1 failed, 8 deselected, 2 rerun in 3.91s =================== 2025-12-04T10:58:28.2604822Z Got exit code 1 2025-12-04T10:58:28.2604881Z Retrying single test... 2025-12-04T10:58:28.2605137Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03ff54c60b3123de.xml 2025-12-04T10:58:28.2605259Z ============================= test session starts ============================== 2025-12-04T10:58:28.2605399Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2605530Z cachedir: .pytest_cache 2025-12-04T10:58:28.2605732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2605809Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2605889Z configfile: pytest.ini 2025-12-04T10:58:28.2606114Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2606212Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2606569Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2606635Z Running 1 items in this shard 2025-12-04T10:58:28.2606638Z 2025-12-04T10:58:28.2607106Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:14.347419320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2607113Z 2025-12-04T10:58:28.2607316Z [W1204 10:31:14.632352685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2607336Z 2025-12-04T10:58:28.2607524Z [W1204 10:31:14.632524652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2607526Z 2025-12-04T10:58:28.2607730Z [W1204 10:31:14.636361853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2607734Z 2025-12-04T10:58:28.2607920Z [W1204 10:31:14.636671198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2607925Z 2025-12-04T10:58:28.2608138Z [W1204 10:31:14.636732847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2608141Z 2025-12-04T10:58:28.2608349Z [W1204 10:31:14.638898214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2608369Z 2025-12-04T10:58:28.2608556Z [W1204 10:31:14.639186709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2608558Z 2025-12-04T10:58:28.2608756Z [W1204 10:31:14.639254738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2608758Z 2025-12-04T10:58:28.2608856Z ('RERUN', {'yellow': True}) [2.9332s] [100%] 2025-12-04T10:58:28.2609328Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:15.434078971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2609332Z 2025-12-04T10:58:28.2609526Z [W1204 10:31:15.434493814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2609528Z 2025-12-04T10:58:28.2609730Z [W1204 10:31:15.434570543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2609732Z 2025-12-04T10:58:28.2609930Z [W1204 10:31:15.435845823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2609932Z 2025-12-04T10:58:28.2610122Z [W1204 10:31:15.436123399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2610151Z 2025-12-04T10:58:28.2610363Z [W1204 10:31:15.436185678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2610365Z 2025-12-04T10:58:28.2610557Z [W1204 10:31:15.438209277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2610559Z 2025-12-04T10:58:28.2610759Z [W1204 10:31:15.438475633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2610761Z 2025-12-04T10:58:28.2610973Z [W1204 10:31:15.438534982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2610976Z 2025-12-04T10:58:28.2611048Z ('RERUN', {'yellow': True}) [0.6573s] [100%] 2025-12-04T10:58:28.2611501Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:15.106622634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2611505Z 2025-12-04T10:58:28.2611696Z [W1204 10:31:15.107028688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2611698Z 2025-12-04T10:58:28.2611901Z [W1204 10:31:15.107102517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2611903Z 2025-12-04T10:58:28.2612099Z [W1204 10:31:15.108376437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2612114Z 2025-12-04T10:58:28.2612301Z [W1204 10:31:15.108637893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2612305Z 2025-12-04T10:58:28.2612517Z [W1204 10:31:15.108699802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2612519Z 2025-12-04T10:58:28.2612715Z [W1204 10:31:15.110734391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2612717Z 2025-12-04T10:58:28.2612920Z [W1204 10:31:15.110998687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2612922Z 2025-12-04T10:58:28.2613114Z [W1204 10:31:15.111067015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2613116Z 2025-12-04T10:58:28.2613195Z FAILED [0.6581s] [100%] 2025-12-04T10:58:28.2613227Z 2025-12-04T10:58:28.2613401Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2613599Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2613689Z Traceback (most recent call last): 2025-12-04T10:58:28.2613883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2613959Z method(*args, **kwargs) 2025-12-04T10:58:28.2614146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2614247Z method(*args, **kwargs) 2025-12-04T10:58:28.2614444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2614521Z with policy(): 2025-12-04T10:58:28.2614716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2614824Z raise RuntimeError(msg) 2025-12-04T10:58:28.2615314Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2615316Z 2025-12-04T10:58:28.2615450Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2615802Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2615822Z 2025-12-04T10:58:28.2615941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2616053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2616159Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2616514Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2616615Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2616688Z graph_break [] 2025-12-04T10:58:28.2616879Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2616973Z Traceback (most recent call last): 2025-12-04T10:58:28.2617173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2617253Z method(*args, **kwargs) 2025-12-04T10:58:28.2617442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2617519Z method(*args, **kwargs) 2025-12-04T10:58:28.2617708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2617810Z with policy(): 2025-12-04T10:58:28.2618000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2618081Z raise RuntimeError(msg) 2025-12-04T10:58:28.2618559Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2618612Z 2025-12-04T10:58:28.2618705Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2619117Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2619119Z 2025-12-04T10:58:28.2619236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2619354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2619439Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2619781Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2619897Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2619976Z graph_break [] 2025-12-04T10:58:28.2620114Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2620208Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2620306Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2625678Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2625738Z graph_break [] 2025-12-04T10:58:28.2625804Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2625997Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2626060Z Traceback (most recent call last): 2025-12-04T10:58:28.2626248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2626311Z method(*args, **kwargs) 2025-12-04T10:58:28.2626487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2626543Z method(*args, **kwargs) 2025-12-04T10:58:28.2626717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2626769Z with policy(): 2025-12-04T10:58:28.2626946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2627002Z raise RuntimeError(msg) 2025-12-04T10:58:28.2627470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2627475Z 2025-12-04T10:58:28.2627568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2627911Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2627914Z 2025-12-04T10:58:28.2628021Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2628109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2628183Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2628571Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2628665Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2628710Z graph_break [] 2025-12-04T10:58:28.2628801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2628873Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2628958Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2629274Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2629319Z graph_break [] 2025-12-04T10:58:28.2629409Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2629479Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2629568Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2629913Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2629962Z graph_break [] 2025-12-04T10:58:28.2630246Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-03ff54c60b3123de.xml - 2025-12-04T10:58:28.2630320Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2631051Z FAILED [0.6581s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2631058Z 2025-12-04T10:58:28.2631142Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2631475Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2631478Z 2025-12-04T10:58:28.2631579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2631654Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2631733Z ================== 1 failed, 57 deselected, 2 rerun in 4.41s =================== 2025-12-04T10:58:28.2631780Z Got exit code 1 2025-12-04T10:58:28.2631828Z Retrying single test... 2025-12-04T10:58:28.2632063Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d359b1c15deaa081.xml 2025-12-04T10:58:28.2632131Z ============================= test session starts ============================== 2025-12-04T10:58:28.2632267Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2632316Z cachedir: .pytest_cache 2025-12-04T10:58:28.2632505Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2632560Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2632613Z configfile: pytest.ini 2025-12-04T10:58:28.2632829Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2632919Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2633248Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2633378Z Running 1 items in this shard 2025-12-04T10:58:28.2633380Z 2025-12-04T10:58:28.2633799Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:25.870673379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2633805Z 2025-12-04T10:58:28.2633985Z [W1204 10:31:25.145799727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2633987Z 2025-12-04T10:58:28.2634169Z [W1204 10:31:25.145951124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2634220Z 2025-12-04T10:58:28.2634392Z [W1204 10:31:25.149588138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2634394Z 2025-12-04T10:58:28.2634568Z [W1204 10:31:25.149909373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2634571Z 2025-12-04T10:58:28.2634740Z [W1204 10:31:25.149975952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2634746Z 2025-12-04T10:58:28.2634915Z [W1204 10:31:25.152148398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2634917Z 2025-12-04T10:58:28.2635094Z [W1204 10:31:25.152430364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2635099Z 2025-12-04T10:58:28.2635269Z [W1204 10:31:25.152494493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2635271Z 2025-12-04T10:58:28.2635332Z ('RERUN', {'yellow': True}) [2.9234s] [100%] 2025-12-04T10:58:28.2635746Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:26.954087764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2635748Z 2025-12-04T10:58:28.2635923Z [W1204 10:31:26.954469788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2635925Z 2025-12-04T10:58:28.2636100Z [W1204 10:31:26.954534547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636104Z 2025-12-04T10:58:28.2636273Z [W1204 10:31:26.955774438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636275Z 2025-12-04T10:58:28.2636447Z [W1204 10:31:26.956032564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636449Z 2025-12-04T10:58:28.2636617Z [W1204 10:31:26.956092843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636619Z 2025-12-04T10:58:28.2636792Z [W1204 10:31:26.958131811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636794Z 2025-12-04T10:58:28.2636990Z [W1204 10:31:26.958391677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2636999Z 2025-12-04T10:58:28.2637167Z [W1204 10:31:26.958449257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2637169Z 2025-12-04T10:58:28.2637230Z ('RERUN', {'yellow': True}) [0.6461s] [100%] 2025-12-04T10:58:28.2637636Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:31:27.586757066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2637638Z 2025-12-04T10:58:28.2637811Z [W1204 10:31:27.587177649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2637813Z 2025-12-04T10:58:28.2637984Z [W1204 10:31:27.587252678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638013Z 2025-12-04T10:58:28.2638190Z [W1204 10:31:27.588528749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638192Z 2025-12-04T10:58:28.2638364Z [W1204 10:31:27.588785025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638366Z 2025-12-04T10:58:28.2638536Z [W1204 10:31:27.588844354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638538Z 2025-12-04T10:58:28.2638709Z [W1204 10:31:27.590877992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638711Z 2025-12-04T10:58:28.2638881Z [W1204 10:31:27.591147768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2638883Z 2025-12-04T10:58:28.2639057Z [W1204 10:31:27.591208757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2639059Z 2025-12-04T10:58:28.2639110Z FAILED [0.6328s] [100%] 2025-12-04T10:58:28.2639112Z 2025-12-04T10:58:28.2639177Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2639350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2639407Z Traceback (most recent call last): 2025-12-04T10:58:28.2639592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2639641Z method(*args, **kwargs) 2025-12-04T10:58:28.2639821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2639868Z method(*args, **kwargs) 2025-12-04T10:58:28.2640047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2640090Z with policy(): 2025-12-04T10:58:28.2640266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2640314Z raise RuntimeError(msg) 2025-12-04T10:58:28.2640766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2640768Z 2025-12-04T10:58:28.2640853Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2641214Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2641219Z 2025-12-04T10:58:28.2641319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2641410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2641475Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2641792Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2641881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2641924Z graph_break [] 2025-12-04T10:58:28.2642102Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2642185Z Traceback (most recent call last): 2025-12-04T10:58:28.2642364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2642411Z method(*args, **kwargs) 2025-12-04T10:58:28.2642586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2642632Z method(*args, **kwargs) 2025-12-04T10:58:28.2642807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2642849Z with policy(): 2025-12-04T10:58:28.2643024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2643071Z raise RuntimeError(msg) 2025-12-04T10:58:28.2643582Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2643586Z 2025-12-04T10:58:28.2643673Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2644006Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2644009Z 2025-12-04T10:58:28.2644107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2644194Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2644262Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2644574Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2644664Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2644707Z graph_break [] 2025-12-04T10:58:28.2644793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2644857Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2644944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2645315Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2645359Z graph_break [] 2025-12-04T10:58:28.2645420Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2645600Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2645654Z Traceback (most recent call last): 2025-12-04T10:58:28.2645832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2645878Z method(*args, **kwargs) 2025-12-04T10:58:28.2646054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2646100Z method(*args, **kwargs) 2025-12-04T10:58:28.2646272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2646314Z with policy(): 2025-12-04T10:58:28.2646493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2646573Z raise RuntimeError(msg) 2025-12-04T10:58:28.2647034Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2647036Z 2025-12-04T10:58:28.2647119Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2647447Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2647449Z 2025-12-04T10:58:28.2647551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2647635Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2647703Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2648010Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2648096Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2648137Z graph_break [] 2025-12-04T10:58:28.2648222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2648285Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2648368Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2648675Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2648726Z graph_break [] 2025-12-04T10:58:28.2648808Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2648873Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2648954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2649266Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2649307Z graph_break [] 2025-12-04T10:58:28.2649621Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d359b1c15deaa081.xml - 2025-12-04T10:58:28.2649689Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2650411Z FAILED [0.6328s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2650414Z 2025-12-04T10:58:28.2650499Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2650828Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2650830Z 2025-12-04T10:58:28.2650930Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2651032Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2651111Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.2651153Z Got exit code 1 2025-12-04T10:58:28.2651425Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2651572Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2651800Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e89e71b3e67fe169.xml 2025-12-04T10:58:28.2651871Z ============================= test session starts ============================== 2025-12-04T10:58:28.2651999Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2652051Z cachedir: .pytest_cache 2025-12-04T10:58:28.2652231Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2652285Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2652334Z configfile: pytest.ini 2025-12-04T10:58:28.2652518Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2652601Z collecting ... collected 58 items / 9 deselected / 49 selected 2025-12-04T10:58:28.2652663Z stepcurrent: skipping 9 already run items. 2025-12-04T10:58:28.2652713Z Running 49 items in this shard 2025-12-04T10:58:28.2652715Z 2025-12-04T10:58:28.2653000Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.9239s] [ 2%] 2025-12-04T10:58:28.2653334Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4715s] [ 2%] 2025-12-04T10:58:28.2653589Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4577s] [ 2%] 2025-12-04T10:58:28.2653592Z 2025-12-04T10:58:28.2653650Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2653821Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2653873Z Traceback (most recent call last): 2025-12-04T10:58:28.2654082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2654129Z method(*args, **kwargs) 2025-12-04T10:58:28.2654306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2654351Z method(*args, **kwargs) 2025-12-04T10:58:28.2654524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2654566Z with policy(): 2025-12-04T10:58:28.2654742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2654791Z raise RuntimeError(msg) 2025-12-04T10:58:28.2655239Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2655276Z 2025-12-04T10:58:28.2655361Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2655685Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2655687Z 2025-12-04T10:58:28.2655787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2655870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2655935Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2656246Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2656332Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2656377Z graph_break [] 2025-12-04T10:58:28.2656551Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2656603Z Traceback (most recent call last): 2025-12-04T10:58:28.2656779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2656824Z method(*args, **kwargs) 2025-12-04T10:58:28.2656995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2657043Z method(*args, **kwargs) 2025-12-04T10:58:28.2657213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2657259Z with policy(): 2025-12-04T10:58:28.2657432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2657483Z raise RuntimeError(msg) 2025-12-04T10:58:28.2657934Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2657936Z 2025-12-04T10:58:28.2658022Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2658346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2658348Z 2025-12-04T10:58:28.2658472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2658556Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2658623Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2658930Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2659016Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2659057Z graph_break [] 2025-12-04T10:58:28.2659141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2659203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2659286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2659597Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2659667Z graph_break [] 2025-12-04T10:58:28.2659730Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2659907Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2659960Z Traceback (most recent call last): 2025-12-04T10:58:28.2660132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2660181Z method(*args, **kwargs) 2025-12-04T10:58:28.2660351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2660401Z method(*args, **kwargs) 2025-12-04T10:58:28.2660571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2660618Z with policy(): 2025-12-04T10:58:28.2660790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2660838Z raise RuntimeError(msg) 2025-12-04T10:58:28.2661291Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2661293Z 2025-12-04T10:58:28.2661379Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2661706Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2661713Z 2025-12-04T10:58:28.2661811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2661897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2661959Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2662268Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2662352Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2662395Z graph_break [] 2025-12-04T10:58:28.2662477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2662568Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2662649Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2662957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2662997Z graph_break [] 2025-12-04T10:58:28.2663081Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2663142Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2663225Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2663576Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2663620Z graph_break [] 2025-12-04T10:58:28.2663897Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e89e71b3e67fe169.xml - 2025-12-04T10:58:28.2664005Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2664702Z FAILED [0.4577s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2664707Z 2025-12-04T10:58:28.2664789Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2665111Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2665115Z 2025-12-04T10:58:28.2665211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2665284Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2665358Z =================== 1 failed, 9 deselected, 2 rerun in 4.02s =================== 2025-12-04T10:58:28.2665403Z Got exit code 1 2025-12-04T10:58:28.2665448Z Retrying single test... 2025-12-04T10:58:28.2665668Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b591afb1f9a0ac3.xml 2025-12-04T10:58:28.2665732Z ============================= test session starts ============================== 2025-12-04T10:58:28.2665859Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2665906Z cachedir: .pytest_cache 2025-12-04T10:58:28.2666086Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2666137Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2666185Z configfile: pytest.ini 2025-12-04T10:58:28.2666364Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2666448Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2666764Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2666818Z Running 1 items in this shard 2025-12-04T10:58:28.2666865Z 2025-12-04T10:58:28.2667271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:31:47.195741213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2667275Z 2025-12-04T10:58:28.2667446Z [W1204 10:31:48.471239367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2667448Z 2025-12-04T10:58:28.2667620Z [W1204 10:31:48.471383935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2667622Z 2025-12-04T10:58:28.2667790Z [W1204 10:31:48.475235335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2667792Z 2025-12-04T10:58:28.2667962Z [W1204 10:31:48.475547730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2667993Z 2025-12-04T10:58:28.2668161Z [W1204 10:31:48.475609759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2668163Z 2025-12-04T10:58:28.2668327Z [W1204 10:31:48.477787786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2668329Z 2025-12-04T10:58:28.2668495Z [W1204 10:31:48.478072241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2668497Z 2025-12-04T10:58:28.2668662Z [W1204 10:31:48.478158160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2668664Z 2025-12-04T10:58:28.2668722Z ('RERUN', {'yellow': True}) [3.1861s] [100%] 2025-12-04T10:58:28.2669122Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:31:48.104688251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669129Z 2025-12-04T10:58:28.2669296Z [W1204 10:31:48.105094165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669298Z 2025-12-04T10:58:28.2669466Z [W1204 10:31:48.105173013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669468Z 2025-12-04T10:58:28.2669633Z [W1204 10:31:48.106432914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669635Z 2025-12-04T10:58:28.2669805Z [W1204 10:31:48.106688150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669809Z 2025-12-04T10:58:28.2669974Z [W1204 10:31:48.106746729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2669977Z 2025-12-04T10:58:28.2670146Z [W1204 10:31:48.108746608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2670149Z 2025-12-04T10:58:28.2670315Z [W1204 10:31:48.109011144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2670317Z 2025-12-04T10:58:28.2670481Z [W1204 10:31:48.109073793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2670483Z 2025-12-04T10:58:28.2670540Z ('RERUN', {'yellow': True}) [0.4910s] [100%] 2025-12-04T10:58:28.2670959Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:31:49.609833567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2670964Z 2025-12-04T10:58:28.2671134Z [W1204 10:31:49.610193941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2671136Z 2025-12-04T10:58:28.2671300Z [W1204 10:31:49.610261920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2671305Z 2025-12-04T10:58:28.2671469Z [W1204 10:31:49.611508141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2671471Z 2025-12-04T10:58:28.2671639Z [W1204 10:31:49.611761077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2671641Z 2025-12-04T10:58:28.2671830Z [W1204 10:31:49.611820796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2671832Z 2025-12-04T10:58:28.2671999Z [W1204 10:31:49.613811596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2672001Z 2025-12-04T10:58:28.2672166Z [W1204 10:31:49.614076881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2672168Z 2025-12-04T10:58:28.2672336Z [W1204 10:31:49.614138270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2672338Z 2025-12-04T10:58:28.2672385Z FAILED [0.4998s] [100%] 2025-12-04T10:58:28.2672387Z 2025-12-04T10:58:28.2672446Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2672619Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2672672Z Traceback (most recent call last): 2025-12-04T10:58:28.2672849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2672894Z method(*args, **kwargs) 2025-12-04T10:58:28.2673066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2673111Z method(*args, **kwargs) 2025-12-04T10:58:28.2673324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2673366Z with policy(): 2025-12-04T10:58:28.2673540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2673586Z raise RuntimeError(msg) 2025-12-04T10:58:28.2674028Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2674031Z 2025-12-04T10:58:28.2674113Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2674435Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2674437Z 2025-12-04T10:58:28.2674536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2674646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2674712Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2675018Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2675102Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2675142Z graph_break [] 2025-12-04T10:58:28.2675312Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2675362Z Traceback (most recent call last): 2025-12-04T10:58:28.2675534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2675578Z method(*args, **kwargs) 2025-12-04T10:58:28.2675754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2675831Z method(*args, **kwargs) 2025-12-04T10:58:28.2675998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2676039Z with policy(): 2025-12-04T10:58:28.2676208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2676253Z raise RuntimeError(msg) 2025-12-04T10:58:28.2676694Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2676697Z 2025-12-04T10:58:28.2676778Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2677102Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2677106Z 2025-12-04T10:58:28.2677204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2677285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2677349Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2677649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2677733Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2677774Z graph_break [] 2025-12-04T10:58:28.2677857Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2677919Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2678001Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2678299Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2678341Z graph_break [] 2025-12-04T10:58:28.2678398Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2678574Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2678623Z Traceback (most recent call last): 2025-12-04T10:58:28.2678820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2678866Z method(*args, **kwargs) 2025-12-04T10:58:28.2679035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2679079Z method(*args, **kwargs) 2025-12-04T10:58:28.2679245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2679285Z with policy(): 2025-12-04T10:58:28.2679456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2679501Z raise RuntimeError(msg) 2025-12-04T10:58:28.2679946Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2679984Z 2025-12-04T10:58:28.2680068Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2680387Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2680390Z 2025-12-04T10:58:28.2680487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2680568Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2680633Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2680935Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2681017Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2681059Z graph_break [] 2025-12-04T10:58:28.2681143Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2681203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2681286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2681581Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2681625Z graph_break [] 2025-12-04T10:58:28.2681704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2681768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2681848Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2682146Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2682188Z graph_break [] 2025-12-04T10:58:28.2682456Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3b591afb1f9a0ac3.xml - 2025-12-04T10:58:28.2682523Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2683241Z FAILED [0.4998s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2683246Z 2025-12-04T10:58:28.2683365Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2683684Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2683687Z 2025-12-04T10:58:28.2683784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2683852Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2683928Z ================== 1 failed, 57 deselected, 2 rerun in 4.35s =================== 2025-12-04T10:58:28.2683974Z Got exit code 1 2025-12-04T10:58:28.2684018Z Retrying single test... 2025-12-04T10:58:28.2684239Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-215e0251bc147af8.xml 2025-12-04T10:58:28.2684358Z ============================= test session starts ============================== 2025-12-04T10:58:28.2684481Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2684526Z cachedir: .pytest_cache 2025-12-04T10:58:28.2684702Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2684752Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2684798Z configfile: pytest.ini 2025-12-04T10:58:28.2684974Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2685058Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2685369Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2685423Z Running 1 items in this shard 2025-12-04T10:58:28.2685426Z 2025-12-04T10:58:28.2685821Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:31:59.302947055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2685824Z 2025-12-04T10:58:28.2685994Z [W1204 10:31:59.578750695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2685996Z 2025-12-04T10:58:28.2686167Z [W1204 10:31:59.578918362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2686171Z 2025-12-04T10:58:28.2686336Z [W1204 10:31:59.582650295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2686339Z 2025-12-04T10:58:28.2686506Z [W1204 10:31:59.582948010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2686508Z 2025-12-04T10:58:28.2686672Z [W1204 10:31:59.583014929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2686674Z 2025-12-04T10:58:28.2686839Z [W1204 10:31:59.585140146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2686841Z 2025-12-04T10:58:28.2687034Z [W1204 10:31:59.585415302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2687038Z 2025-12-04T10:58:28.2687203Z [W1204 10:31:59.585474571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2687205Z 2025-12-04T10:58:28.2687262Z ('RERUN', {'yellow': True}) [3.2545s] [100%] 2025-12-04T10:58:28.2687654Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:32:00.290010909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2687656Z 2025-12-04T10:58:28.2687822Z [W1204 10:32:00.290380023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2687824Z 2025-12-04T10:58:28.2687988Z [W1204 10:32:00.290449872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2687990Z 2025-12-04T10:58:28.2688186Z [W1204 10:32:00.291719112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2688188Z 2025-12-04T10:58:28.2688353Z [W1204 10:32:00.291989328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2688355Z 2025-12-04T10:58:28.2688519Z [W1204 10:32:00.292056597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2688521Z 2025-12-04T10:58:28.2688687Z [W1204 10:32:00.294069076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2688689Z 2025-12-04T10:58:28.2688854Z [W1204 10:32:00.294338152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2688856Z 2025-12-04T10:58:28.2689021Z [W1204 10:32:00.294398671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2689025Z 2025-12-04T10:58:28.2689078Z ('RERUN', {'yellow': True}) [0.5640s] [100%] 2025-12-04T10:58:28.2689472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:32:00.852140417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2689474Z 2025-12-04T10:58:28.2689641Z [W1204 10:32:00.852529901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2689643Z 2025-12-04T10:58:28.2689807Z [W1204 10:32:00.852604179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2689809Z 2025-12-04T10:58:28.2689974Z [W1204 10:32:00.853871930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2689978Z 2025-12-04T10:58:28.2690139Z [W1204 10:32:00.854144656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2690141Z 2025-12-04T10:58:28.2690306Z [W1204 10:32:00.854210804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2690308Z 2025-12-04T10:58:28.2690471Z [W1204 10:32:00.856218414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2690473Z 2025-12-04T10:58:28.2690661Z [W1204 10:32:00.856489449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2690663Z 2025-12-04T10:58:28.2690829Z [W1204 10:32:00.856549958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2690833Z 2025-12-04T10:58:28.2690878Z FAILED [0.5706s] [100%] 2025-12-04T10:58:28.2690881Z 2025-12-04T10:58:28.2690939Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2691106Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2691158Z Traceback (most recent call last): 2025-12-04T10:58:28.2691331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2691378Z method(*args, **kwargs) 2025-12-04T10:58:28.2691546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2691594Z method(*args, **kwargs) 2025-12-04T10:58:28.2691761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2691832Z with policy(): 2025-12-04T10:58:28.2691999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2692047Z raise RuntimeError(msg) 2025-12-04T10:58:28.2692479Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2692483Z 2025-12-04T10:58:28.2692564Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2692888Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2692892Z 2025-12-04T10:58:28.2692987Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2693070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2693132Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2693478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2693560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2693602Z graph_break [] 2025-12-04T10:58:28.2693773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2693825Z Traceback (most recent call last): 2025-12-04T10:58:28.2693996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2694042Z method(*args, **kwargs) 2025-12-04T10:58:28.2694207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2694253Z method(*args, **kwargs) 2025-12-04T10:58:28.2694418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2694461Z with policy(): 2025-12-04T10:58:28.2694627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2694675Z raise RuntimeError(msg) 2025-12-04T10:58:28.2695144Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2695151Z 2025-12-04T10:58:28.2695233Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2695551Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2695553Z 2025-12-04T10:58:28.2695647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2695730Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2695791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2696091Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2696212Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2696254Z graph_break [] 2025-12-04T10:58:28.2696335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2696397Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2696476Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2696776Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2696816Z graph_break [] 2025-12-04T10:58:28.2696877Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2697046Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2697099Z Traceback (most recent call last): 2025-12-04T10:58:28.2697270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2697314Z method(*args, **kwargs) 2025-12-04T10:58:28.2697483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2697527Z method(*args, **kwargs) 2025-12-04T10:58:28.2697694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2697736Z with policy(): 2025-12-04T10:58:28.2697907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2697952Z raise RuntimeError(msg) 2025-12-04T10:58:28.2698400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2698402Z 2025-12-04T10:58:28.2698484Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2698804Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2698807Z 2025-12-04T10:58:28.2698901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2699008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2699071Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2699374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2699456Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2699496Z graph_break [] 2025-12-04T10:58:28.2699578Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2699639Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2699722Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2700020Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2700091Z graph_break [] 2025-12-04T10:58:28.2700171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2700234Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2700314Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2700612Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2700653Z graph_break [] 2025-12-04T10:58:28.2700926Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-215e0251bc147af8.xml - 2025-12-04T10:58:28.2700994Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2701693Z FAILED [0.5706s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2701698Z 2025-12-04T10:58:28.2701782Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2702099Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2702103Z 2025-12-04T10:58:28.2702201Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2702272Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2702348Z ================== 1 failed, 57 deselected, 2 rerun in 4.53s =================== 2025-12-04T10:58:28.2702400Z Got exit code 1 2025-12-04T10:58:28.2702671Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2703160Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2703610Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a7ecbef1f75f01ab.xml 2025-12-04T10:58:28.2703941Z ============================= test session starts ============================== 2025-12-04T10:58:28.2704210Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2704426Z cachedir: .pytest_cache 2025-12-04T10:58:28.2704678Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2704947Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2705081Z configfile: pytest.ini 2025-12-04T10:58:28.2705334Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2705636Z collecting ... collected 58 items / 10 deselected / 48 selected 2025-12-04T10:58:28.2705821Z stepcurrent: skipping 10 already run items. 2025-12-04T10:58:28.2705971Z Running 48 items in this shard 2025-12-04T10:58:28.2706053Z 2025-12-04T10:58:28.2706338Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6911s] [ 2%] 2025-12-04T10:58:28.2706947Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6308s] [ 2%] 2025-12-04T10:58:28.2707553Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.6329s] [ 2%] 2025-12-04T10:58:28.2707845Z 2025-12-04T10:58:28.2707911Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2708201Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2708531Z Traceback (most recent call last): 2025-12-04T10:58:28.2708877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2709174Z method(*args, **kwargs) 2025-12-04T10:58:28.2709425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2709686Z method(*args, **kwargs) 2025-12-04T10:58:28.2709927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2710180Z with policy(): 2025-12-04T10:58:28.2710414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2710670Z raise RuntimeError(msg) 2025-12-04T10:58:28.2711190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2711692Z 2025-12-04T10:58:28.2711775Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2712229Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2712597Z 2025-12-04T10:58:28.2712695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2712919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2713111Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2713619Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2714051Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2714220Z graph_break [] 2025-12-04T10:58:28.2714454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2714717Z Traceback (most recent call last): 2025-12-04T10:58:28.2714975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2715234Z method(*args, **kwargs) 2025-12-04T10:58:28.2715478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2715733Z method(*args, **kwargs) 2025-12-04T10:58:28.2715974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2716225Z with policy(): 2025-12-04T10:58:28.2716462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2716756Z raise RuntimeError(msg) 2025-12-04T10:58:28.2717285Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2717777Z 2025-12-04T10:58:28.2717860Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2718306Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2718672Z 2025-12-04T10:58:28.2718774Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2718997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2719191Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2719602Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2720026Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2720193Z graph_break [] 2025-12-04T10:58:28.2720339Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2720529Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2720709Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2721128Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2721509Z graph_break [] 2025-12-04T10:58:28.2721625Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2721903Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2722161Z Traceback (most recent call last): 2025-12-04T10:58:28.2722417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2722670Z method(*args, **kwargs) 2025-12-04T10:58:28.2722910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2723160Z method(*args, **kwargs) 2025-12-04T10:58:28.2723492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2723747Z with policy(): 2025-12-04T10:58:28.2723981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2724233Z raise RuntimeError(msg) 2025-12-04T10:58:28.2724755Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2725242Z 2025-12-04T10:58:28.2725326Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2725771Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2726171Z 2025-12-04T10:58:28.2726269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2726484Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2726668Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2727070Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2727488Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2727649Z graph_break [] 2025-12-04T10:58:28.2727790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2727977Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2728157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2728580Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2728956Z graph_break [] 2025-12-04T10:58:28.2729095Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2729278Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2729457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2729871Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2730251Z graph_break [] 2025-12-04T10:58:28.2730582Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a7ecbef1f75f01ab.xml - 2025-12-04T10:58:28.2730957Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2731778Z FAILED [0.6329s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2732517Z 2025-12-04T10:58:28.2732599Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2733066Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2733479Z 2025-12-04T10:58:28.2733574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2733780Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2733963Z ================== 1 failed, 10 deselected, 2 rerun in 4.11s =================== 2025-12-04T10:58:28.2734119Z Got exit code 1 2025-12-04T10:58:28.2734225Z Retrying single test... 2025-12-04T10:58:28.2734510Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1592c1c13c42eda.xml 2025-12-04T10:58:28.2734827Z ============================= test session starts ============================== 2025-12-04T10:58:28.2735057Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2735302Z cachedir: .pytest_cache 2025-12-04T10:58:28.2735545Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2735806Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2735936Z configfile: pytest.ini 2025-12-04T10:58:28.2736183Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2736479Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2736917Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2737314Z Running 1 items in this shard 2025-12-04T10:58:28.2737396Z 2025-12-04T10:58:28.2737796Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:21.624772875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2738240Z 2025-12-04T10:58:28.2738409Z [W1204 10:32:21.893349908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2738620Z 2025-12-04T10:58:28.2738786Z [W1204 10:32:21.893517166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2738994Z 2025-12-04T10:58:28.2739160Z [W1204 10:32:21.897101200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2739365Z 2025-12-04T10:58:28.2739533Z [W1204 10:32:21.897412225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2739740Z 2025-12-04T10:58:28.2739907Z [W1204 10:32:21.897475324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2740114Z 2025-12-04T10:58:28.2740280Z [W1204 10:32:21.899760239 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2740485Z 2025-12-04T10:58:28.2740651Z [W1204 10:32:21.900047835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2740855Z 2025-12-04T10:58:28.2741021Z [W1204 10:32:21.900118094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2741224Z 2025-12-04T10:58:28.2741280Z ('RERUN', {'yellow': True}) [3.0065s] [100%] 2025-12-04T10:58:28.2741802Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:22.717908297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2742240Z 2025-12-04T10:58:28.2742405Z [W1204 10:32:22.718346221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2742611Z 2025-12-04T10:58:28.2742775Z [W1204 10:32:22.718425769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2742980Z 2025-12-04T10:58:28.2743142Z [W1204 10:32:22.719705799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2743387Z 2025-12-04T10:58:28.2743550Z [W1204 10:32:22.719973195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2743792Z 2025-12-04T10:58:28.2743955Z [W1204 10:32:22.720038474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2744162Z 2025-12-04T10:58:28.2744325Z [W1204 10:32:22.722166672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2744532Z 2025-12-04T10:58:28.2744698Z [W1204 10:32:22.722437987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2744901Z 2025-12-04T10:58:28.2745066Z [W1204 10:32:22.722498226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2745270Z 2025-12-04T10:58:28.2745325Z ('RERUN', {'yellow': True}) [0.7086s] [100%] 2025-12-04T10:58:28.2745811Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:23.452446117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2746245Z 2025-12-04T10:58:28.2746411Z [W1204 10:32:23.452866420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2746619Z 2025-12-04T10:58:28.2746784Z [W1204 10:32:23.452937699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2746988Z 2025-12-04T10:58:28.2747153Z [W1204 10:32:23.454260359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2747360Z 2025-12-04T10:58:28.2747524Z [W1204 10:32:23.454524364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2747732Z 2025-12-04T10:58:28.2747898Z [W1204 10:32:23.454585343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2748104Z 2025-12-04T10:58:28.2748267Z [W1204 10:32:23.456736510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2748472Z 2025-12-04T10:58:28.2748634Z [W1204 10:32:23.457011646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2748839Z 2025-12-04T10:58:28.2749001Z [W1204 10:32:23.457077385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2749206Z 2025-12-04T10:58:28.2749249Z FAILED [0.7065s] [100%] 2025-12-04T10:58:28.2749322Z 2025-12-04T10:58:28.2749407Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2749678Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2749935Z Traceback (most recent call last): 2025-12-04T10:58:28.2750194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2750448Z method(*args, **kwargs) 2025-12-04T10:58:28.2750689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2750942Z method(*args, **kwargs) 2025-12-04T10:58:28.2751182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2751428Z with policy(): 2025-12-04T10:58:28.2751658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2751911Z raise RuntimeError(msg) 2025-12-04T10:58:28.2752453Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2752931Z 2025-12-04T10:58:28.2753012Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2753496Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2753855Z 2025-12-04T10:58:28.2753954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2754172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2754356Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2754763Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2755180Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2755341Z graph_break [] 2025-12-04T10:58:28.2755567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2755822Z Traceback (most recent call last): 2025-12-04T10:58:28.2756073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2756324Z method(*args, **kwargs) 2025-12-04T10:58:28.2756565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2756819Z method(*args, **kwargs) 2025-12-04T10:58:28.2757057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2757302Z with policy(): 2025-12-04T10:58:28.2757534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2757786Z raise RuntimeError(msg) 2025-12-04T10:58:28.2758307Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2758791Z 2025-12-04T10:58:28.2758918Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2759359Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2759719Z 2025-12-04T10:58:28.2759818Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2760035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2760219Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2760617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2761034Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2761195Z graph_break [] 2025-12-04T10:58:28.2761335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2761556Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2761735Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2762147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2762524Z graph_break [] 2025-12-04T10:58:28.2762640Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2762918Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2763173Z Traceback (most recent call last): 2025-12-04T10:58:28.2763475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2763735Z method(*args, **kwargs) 2025-12-04T10:58:28.2763976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2764228Z method(*args, **kwargs) 2025-12-04T10:58:28.2764466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2764713Z with policy(): 2025-12-04T10:58:28.2764938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2765190Z raise RuntimeError(msg) 2025-12-04T10:58:28.2765711Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2766196Z 2025-12-04T10:58:28.2766280Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2766720Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2767076Z 2025-12-04T10:58:28.2767174Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2767391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2767577Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2768010Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2768430Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2768588Z graph_break [] 2025-12-04T10:58:28.2768729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2768912Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2769091Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2769509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2769888Z graph_break [] 2025-12-04T10:58:28.2770026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2770211Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2770390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2770842Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2771219Z graph_break [] 2025-12-04T10:58:28.2771546Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f1592c1c13c42eda.xml - 2025-12-04T10:58:28.2771912Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2772727Z FAILED [0.7065s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2773496Z 2025-12-04T10:58:28.2773577Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2774012Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2774369Z 2025-12-04T10:58:28.2774464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2774668Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2774850Z ================== 1 failed, 57 deselected, 2 rerun in 4.59s =================== 2025-12-04T10:58:28.2775005Z Got exit code 1 2025-12-04T10:58:28.2775110Z Retrying single test... 2025-12-04T10:58:28.2775399Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71e9fdc62e87ecb4.xml 2025-12-04T10:58:28.2775715Z ============================= test session starts ============================== 2025-12-04T10:58:28.2775939Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2776144Z cachedir: .pytest_cache 2025-12-04T10:58:28.2776389Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2776649Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2776777Z configfile: pytest.ini 2025-12-04T10:58:28.2777023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2777353Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2777785Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2778180Z Running 1 items in this shard 2025-12-04T10:58:28.2778261Z 2025-12-04T10:58:28.2778658Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:33.360630376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2779094Z 2025-12-04T10:58:28.2779263Z [W1204 10:32:33.616076673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2779473Z 2025-12-04T10:58:28.2779639Z [W1204 10:32:33.616237900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2779886Z 2025-12-04T10:58:28.2780048Z [W1204 10:32:33.619561079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2780253Z 2025-12-04T10:58:28.2780416Z [W1204 10:32:33.619893674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2780623Z 2025-12-04T10:58:28.2780787Z [W1204 10:32:33.619962103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2780991Z 2025-12-04T10:58:28.2781155Z [W1204 10:32:33.622243958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2781359Z 2025-12-04T10:58:28.2781527Z [W1204 10:32:33.622527663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2781736Z 2025-12-04T10:58:28.2781903Z [W1204 10:32:33.622589952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2782108Z 2025-12-04T10:58:28.2782163Z ('RERUN', {'yellow': True}) [2.9988s] [100%] 2025-12-04T10:58:28.2782651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:34.459614142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2783081Z 2025-12-04T10:58:28.2783322Z [W1204 10:32:34.460048475 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2783530Z 2025-12-04T10:58:28.2783695Z [W1204 10:32:34.460127744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2783907Z 2025-12-04T10:58:28.2784070Z [W1204 10:32:34.461395294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2784275Z 2025-12-04T10:58:28.2784438Z [W1204 10:32:34.461661570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2784643Z 2025-12-04T10:58:28.2784806Z [W1204 10:32:34.461724409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2785012Z 2025-12-04T10:58:28.2785175Z [W1204 10:32:34.463806627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2785382Z 2025-12-04T10:58:28.2785575Z [W1204 10:32:34.464076193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2785783Z 2025-12-04T10:58:28.2785948Z [W1204 10:32:34.464138362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2786154Z 2025-12-04T10:58:28.2786210Z ('RERUN', {'yellow': True}) [0.6697s] [100%] 2025-12-04T10:58:28.2786697Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:32:34.117010314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2787134Z 2025-12-04T10:58:28.2787301Z [W1204 10:32:34.117416698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2787504Z 2025-12-04T10:58:28.2787670Z [W1204 10:32:34.117492007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2787874Z 2025-12-04T10:58:28.2788084Z [W1204 10:32:34.118761257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2788289Z 2025-12-04T10:58:28.2788454Z [W1204 10:32:34.119023613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2788659Z 2025-12-04T10:58:28.2788824Z [W1204 10:32:34.119085682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2789033Z 2025-12-04T10:58:28.2789195Z [W1204 10:32:34.121145310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2789401Z 2025-12-04T10:58:28.2789566Z [W1204 10:32:34.121407896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2789773Z 2025-12-04T10:58:28.2789936Z [W1204 10:32:34.121466625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2790147Z 2025-12-04T10:58:28.2790191Z FAILED [0.6472s] [100%] 2025-12-04T10:58:28.2790259Z 2025-12-04T10:58:28.2790318Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2790586Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2790842Z Traceback (most recent call last): 2025-12-04T10:58:28.2791100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2791354Z method(*args, **kwargs) 2025-12-04T10:58:28.2791595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2791846Z method(*args, **kwargs) 2025-12-04T10:58:28.2792086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2792332Z with policy(): 2025-12-04T10:58:28.2792560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2792811Z raise RuntimeError(msg) 2025-12-04T10:58:28.2793374Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2793847Z 2025-12-04T10:58:28.2793929Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2794410Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2794777Z 2025-12-04T10:58:28.2794873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2795092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2795277Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2795677Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2796096Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2796258Z graph_break [] 2025-12-04T10:58:28.2796487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2796778Z Traceback (most recent call last): 2025-12-04T10:58:28.2797030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2797283Z method(*args, **kwargs) 2025-12-04T10:58:28.2797522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2797771Z method(*args, **kwargs) 2025-12-04T10:58:28.2798007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2798252Z with policy(): 2025-12-04T10:58:28.2798479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2798732Z raise RuntimeError(msg) 2025-12-04T10:58:28.2799255Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2799745Z 2025-12-04T10:58:28.2799825Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2800265Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2800622Z 2025-12-04T10:58:28.2800719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2800936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2801120Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2801526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2801947Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2802108Z graph_break [] 2025-12-04T10:58:28.2802245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2802430Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2802609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2803022Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2804008Z graph_break [] 2025-12-04T10:58:28.2804124Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2804394Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2804648Z Traceback (most recent call last): 2025-12-04T10:58:28.2804903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2805155Z method(*args, **kwargs) 2025-12-04T10:58:28.2805394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2805648Z method(*args, **kwargs) 2025-12-04T10:58:28.2805887Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2806134Z with policy(): 2025-12-04T10:58:28.2806368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2806656Z raise RuntimeError(msg) 2025-12-04T10:58:28.2807176Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2807656Z 2025-12-04T10:58:28.2807739Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2808176Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2808533Z 2025-12-04T10:58:28.2808634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2808848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2809037Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2809434Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2809848Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2810006Z graph_break [] 2025-12-04T10:58:28.2810145Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2810327Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2810506Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2810924Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2811302Z graph_break [] 2025-12-04T10:58:28.2811441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2811626Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2811806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2812219Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2812595Z graph_break [] 2025-12-04T10:58:28.2812953Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-71e9fdc62e87ecb4.xml - 2025-12-04T10:58:28.2813356Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2814167Z FAILED [0.6472s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2814896Z 2025-12-04T10:58:28.2814979Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2815417Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2815780Z 2025-12-04T10:58:28.2815875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2816119Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2816302Z ================== 1 failed, 57 deselected, 2 rerun in 4.49s =================== 2025-12-04T10:58:28.2816457Z Got exit code 1 2025-12-04T10:58:28.2816785Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2817224Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2817620Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1b22abd68e0c7c6.xml 2025-12-04T10:58:28.2817939Z ============================= test session starts ============================== 2025-12-04T10:58:28.2818164Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2818375Z cachedir: .pytest_cache 2025-12-04T10:58:28.2818618Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2818876Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2819004Z configfile: pytest.ini 2025-12-04T10:58:28.2819249Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2819544Z collecting ... collected 58 items / 11 deselected / 47 selected 2025-12-04T10:58:28.2819722Z stepcurrent: skipping 11 already run items. 2025-12-04T10:58:28.2819865Z Running 47 items in this shard 2025-12-04T10:58:28.2819943Z 2025-12-04T10:58:28.2820222Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5730s] [ 2%] 2025-12-04T10:58:28.2820804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4468s] [ 2%] 2025-12-04T10:58:28.2821353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4442s] [ 2%] 2025-12-04T10:58:28.2821640Z 2025-12-04T10:58:28.2821697Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2821962Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2822216Z Traceback (most recent call last): 2025-12-04T10:58:28.2822517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2822776Z method(*args, **kwargs) 2025-12-04T10:58:28.2823018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2823320Z method(*args, **kwargs) 2025-12-04T10:58:28.2823559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2823805Z with policy(): 2025-12-04T10:58:28.2824034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2824287Z raise RuntimeError(msg) 2025-12-04T10:58:28.2824795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2825309Z 2025-12-04T10:58:28.2825390Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2825828Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2826183Z 2025-12-04T10:58:28.2826278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2826494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2826681Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2827082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2827500Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2827665Z graph_break [] 2025-12-04T10:58:28.2827892Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2828148Z Traceback (most recent call last): 2025-12-04T10:58:28.2828403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2828655Z method(*args, **kwargs) 2025-12-04T10:58:28.2828899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2829148Z method(*args, **kwargs) 2025-12-04T10:58:28.2829385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2829631Z with policy(): 2025-12-04T10:58:28.2829861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2836871Z raise RuntimeError(msg) 2025-12-04T10:58:28.2837404Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2837880Z 2025-12-04T10:58:28.2837961Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2838397Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2838756Z 2025-12-04T10:58:28.2838925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2839145Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2839330Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2839733Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2840153Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2840312Z graph_break [] 2025-12-04T10:58:28.2840452Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2840633Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2840810Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2841227Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2841652Z graph_break [] 2025-12-04T10:58:28.2841765Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2842040Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2842294Z Traceback (most recent call last): 2025-12-04T10:58:28.2842552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2842804Z method(*args, **kwargs) 2025-12-04T10:58:28.2843042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2843334Z method(*args, **kwargs) 2025-12-04T10:58:28.2843569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2843819Z with policy(): 2025-12-04T10:58:28.2844045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2844296Z raise RuntimeError(msg) 2025-12-04T10:58:28.2844809Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2845289Z 2025-12-04T10:58:28.2845369Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2845810Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2846170Z 2025-12-04T10:58:28.2846264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2846478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2846658Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2847053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2847465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2847620Z graph_break [] 2025-12-04T10:58:28.2847759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2847973Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2848150Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2848562Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2848934Z graph_break [] 2025-12-04T10:58:28.2849070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2849249Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2849424Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2849834Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2850204Z graph_break [] 2025-12-04T10:58:28.2850532Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a1b22abd68e0c7c6.xml - 2025-12-04T10:58:28.2850933Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2851730Z FAILED [0.4442s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2852449Z 2025-12-04T10:58:28.2852533Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2852970Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2853366Z 2025-12-04T10:58:28.2853462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2853665Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2853846Z ================== 1 failed, 11 deselected, 2 rerun in 3.63s =================== 2025-12-04T10:58:28.2853999Z Got exit code 1 2025-12-04T10:58:28.2854101Z Retrying single test... 2025-12-04T10:58:28.2854385Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-07dce45b5432cfe5.xml 2025-12-04T10:58:28.2854701Z ============================= test session starts ============================== 2025-12-04T10:58:28.2854926Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2855135Z cachedir: .pytest_cache 2025-12-04T10:58:28.2855379Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2855637Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2855763Z configfile: pytest.ini 2025-12-04T10:58:28.2856009Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2856305Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2856733Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2857168Z Running 1 items in this shard 2025-12-04T10:58:28.2857247Z 2025-12-04T10:58:28.2857647Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:32:54.873959677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2858083Z 2025-12-04T10:58:28.2858254Z [W1204 10:32:54.138659533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2858461Z 2025-12-04T10:58:28.2858626Z [W1204 10:32:54.138836740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2858830Z 2025-12-04T10:58:28.2858992Z [W1204 10:32:54.142611642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2859196Z 2025-12-04T10:58:28.2859361Z [W1204 10:32:54.142940537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2859601Z 2025-12-04T10:58:28.2859763Z [W1204 10:32:54.143009236 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2859968Z 2025-12-04T10:58:28.2860130Z [W1204 10:32:54.145329850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2860335Z 2025-12-04T10:58:28.2860497Z [W1204 10:32:54.145609715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2860702Z 2025-12-04T10:58:28.2860864Z [W1204 10:32:54.145669875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2861069Z 2025-12-04T10:58:28.2861128Z ('RERUN', {'yellow': True}) [2.9558s] [100%] 2025-12-04T10:58:28.2861608Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:32:55.957293161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2862044Z 2025-12-04T10:58:28.2862209Z [W1204 10:32:55.957686055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2862414Z 2025-12-04T10:58:28.2862577Z [W1204 10:32:55.957753894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2862780Z 2025-12-04T10:58:28.2862943Z [W1204 10:32:55.959042574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2863146Z 2025-12-04T10:58:28.2863362Z [W1204 10:32:55.959296580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2863568Z 2025-12-04T10:58:28.2863731Z [W1204 10:32:55.959359849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2863935Z 2025-12-04T10:58:28.2864098Z [W1204 10:32:55.961505346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2864306Z 2025-12-04T10:58:28.2864468Z [W1204 10:32:55.961773052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2864672Z 2025-12-04T10:58:28.2864833Z [W1204 10:32:55.961833841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2865037Z 2025-12-04T10:58:28.2865090Z ('RERUN', {'yellow': True}) [0.6677s] [100%] 2025-12-04T10:58:28.2865612Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:32:56.618717015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2866043Z 2025-12-04T10:58:28.2866205Z [W1204 10:32:56.619118718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2866409Z 2025-12-04T10:58:28.2866570Z [W1204 10:32:56.619192867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2866777Z 2025-12-04T10:58:28.2866939Z [W1204 10:32:56.620491637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2867143Z 2025-12-04T10:58:28.2867308Z [W1204 10:32:56.620749453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2867547Z 2025-12-04T10:58:28.2867710Z [W1204 10:32:56.620808822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2867913Z 2025-12-04T10:58:28.2868076Z [W1204 10:32:56.622930340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2868278Z 2025-12-04T10:58:28.2868442Z [W1204 10:32:56.623208365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2868644Z 2025-12-04T10:58:28.2868806Z [W1204 10:32:56.623271034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2869009Z 2025-12-04T10:58:28.2869052Z FAILED [0.6507s] [100%] 2025-12-04T10:58:28.2869118Z 2025-12-04T10:58:28.2869180Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2869444Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2869700Z Traceback (most recent call last): 2025-12-04T10:58:28.2869956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2870210Z method(*args, **kwargs) 2025-12-04T10:58:28.2870448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2870696Z method(*args, **kwargs) 2025-12-04T10:58:28.2870930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2871172Z with policy(): 2025-12-04T10:58:28.2871400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2871649Z raise RuntimeError(msg) 2025-12-04T10:58:28.2872153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2872156Z 2025-12-04T10:58:28.2872236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2872553Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2872556Z 2025-12-04T10:58:28.2872652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2872761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2872824Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2873126Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2873207Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2873247Z graph_break [] 2025-12-04T10:58:28.2873456Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2873505Z Traceback (most recent call last): 2025-12-04T10:58:28.2873673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2873716Z method(*args, **kwargs) 2025-12-04T10:58:28.2873881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2873966Z method(*args, **kwargs) 2025-12-04T10:58:28.2874129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2874169Z with policy(): 2025-12-04T10:58:28.2874334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2874377Z raise RuntimeError(msg) 2025-12-04T10:58:28.2874812Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2874814Z 2025-12-04T10:58:28.2874896Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2875215Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2875221Z 2025-12-04T10:58:28.2875317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2875397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2875459Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2875757Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2875839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2875878Z graph_break [] 2025-12-04T10:58:28.2875958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2876019Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2876097Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2876388Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2876428Z graph_break [] 2025-12-04T10:58:28.2876484Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2876648Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2876696Z Traceback (most recent call last): 2025-12-04T10:58:28.2876892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2876937Z method(*args, **kwargs) 2025-12-04T10:58:28.2877105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2877149Z method(*args, **kwargs) 2025-12-04T10:58:28.2877314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2877354Z with policy(): 2025-12-04T10:58:28.2877519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2877563Z raise RuntimeError(msg) 2025-12-04T10:58:28.2878001Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2878029Z 2025-12-04T10:58:28.2878110Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2878422Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2878425Z 2025-12-04T10:58:28.2878519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2878598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2878659Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2878957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2879038Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2879077Z graph_break [] 2025-12-04T10:58:28.2879157Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2879215Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2879293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2879594Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2879634Z graph_break [] 2025-12-04T10:58:28.2879713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2879774Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2879851Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2880143Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2880185Z graph_break [] 2025-12-04T10:58:28.2880450Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-07dce45b5432cfe5.xml - 2025-12-04T10:58:28.2880516Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2881228Z FAILED [0.6507s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2881233Z 2025-12-04T10:58:28.2881312Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2881624Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2881627Z 2025-12-04T10:58:28.2881719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2881787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2881859Z ================== 1 failed, 57 deselected, 2 rerun in 4.44s =================== 2025-12-04T10:58:28.2881901Z Got exit code 1 2025-12-04T10:58:28.2881944Z Retrying single test... 2025-12-04T10:58:28.2882158Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-522355946182c2ee.xml 2025-12-04T10:58:28.2882251Z ============================= test session starts ============================== 2025-12-04T10:58:28.2882372Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2882416Z cachedir: .pytest_cache 2025-12-04T10:58:28.2882589Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2882638Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2882682Z configfile: pytest.ini 2025-12-04T10:58:28.2882855Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2882936Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2883302Z stepcurrent: skipping 11 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2883353Z Running 1 items in this shard 2025-12-04T10:58:28.2883355Z 2025-12-04T10:58:28.2883748Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:33:06.344960797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2883751Z 2025-12-04T10:58:28.2883918Z [W1204 10:33:06.613414515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2883920Z 2025-12-04T10:58:28.2884086Z [W1204 10:33:06.613597432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884090Z 2025-12-04T10:58:28.2884254Z [W1204 10:33:06.617369164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884256Z 2025-12-04T10:58:28.2884419Z [W1204 10:33:06.617695899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884421Z 2025-12-04T10:58:28.2884582Z [W1204 10:33:06.617762748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884584Z 2025-12-04T10:58:28.2884747Z [W1204 10:33:06.620087242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884749Z 2025-12-04T10:58:28.2884938Z [W1204 10:33:06.620375978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2884941Z 2025-12-04T10:58:28.2885104Z [W1204 10:33:06.620440707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2885107Z 2025-12-04T10:58:28.2885160Z ('RERUN', {'yellow': True}) [2.9822s] [100%] 2025-12-04T10:58:28.2885549Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:33:07.463059507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2885552Z 2025-12-04T10:58:28.2885715Z [W1204 10:33:07.463447791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2885717Z 2025-12-04T10:58:28.2885879Z [W1204 10:33:07.463517880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2885882Z 2025-12-04T10:58:28.2886077Z [W1204 10:33:07.464789900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886079Z 2025-12-04T10:58:28.2886241Z [W1204 10:33:07.465057326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886243Z 2025-12-04T10:58:28.2886404Z [W1204 10:33:07.465122305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886406Z 2025-12-04T10:58:28.2886567Z [W1204 10:33:07.467202323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886569Z 2025-12-04T10:58:28.2886732Z [W1204 10:33:07.467468349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886734Z 2025-12-04T10:58:28.2886897Z [W1204 10:33:07.467528178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2886901Z 2025-12-04T10:58:28.2886952Z ('RERUN', {'yellow': True}) [0.6831s] [100%] 2025-12-04T10:58:28.2887341Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:33:07.128374653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2887343Z 2025-12-04T10:58:28.2887506Z [W1204 10:33:07.128765417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2887508Z 2025-12-04T10:58:28.2887671Z [W1204 10:33:07.128840176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2887673Z 2025-12-04T10:58:28.2887836Z [W1204 10:33:07.130110036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2887838Z 2025-12-04T10:58:28.2887998Z [W1204 10:33:07.130360202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2887999Z 2025-12-04T10:58:28.2888160Z [W1204 10:33:07.130418321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2888162Z 2025-12-04T10:58:28.2888323Z [W1204 10:33:07.132505419 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2888325Z 2025-12-04T10:58:28.2888507Z [W1204 10:33:07.132772175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2888509Z 2025-12-04T10:58:28.2888672Z [W1204 10:33:07.132831674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2888676Z 2025-12-04T10:58:28.2888721Z FAILED [0.6558s] [100%] 2025-12-04T10:58:28.2888723Z 2025-12-04T10:58:28.2888780Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2888945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2888996Z Traceback (most recent call last): 2025-12-04T10:58:28.2889169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2889215Z method(*args, **kwargs) 2025-12-04T10:58:28.2889383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2889428Z method(*args, **kwargs) 2025-12-04T10:58:28.2889595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2889663Z with policy(): 2025-12-04T10:58:28.2889829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2889876Z raise RuntimeError(msg) 2025-12-04T10:58:28.2890306Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2890310Z 2025-12-04T10:58:28.2890389Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2890709Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2890713Z 2025-12-04T10:58:28.2890807Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2890889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2890951Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2891252Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2891333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2891373Z graph_break [] 2025-12-04T10:58:28.2891539Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2891590Z Traceback (most recent call last): 2025-12-04T10:58:28.2891758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2891802Z method(*args, **kwargs) 2025-12-04T10:58:28.2891967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2892011Z method(*args, **kwargs) 2025-12-04T10:58:28.2892176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2892217Z with policy(): 2025-12-04T10:58:28.2892382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2892428Z raise RuntimeError(msg) 2025-12-04T10:58:28.2892891Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2892895Z 2025-12-04T10:58:28.2892975Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2893318Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2893320Z 2025-12-04T10:58:28.2893413Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2893494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2893557Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2893854Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2893969Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2894009Z graph_break [] 2025-12-04T10:58:28.2894089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2894149Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2894227Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2894525Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2894565Z graph_break [] 2025-12-04T10:58:28.2894622Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2894796Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.2894845Z Traceback (most recent call last): 2025-12-04T10:58:28.2895014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2895057Z method(*args, **kwargs) 2025-12-04T10:58:28.2895222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2895265Z method(*args, **kwargs) 2025-12-04T10:58:28.2895428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2895468Z with policy(): 2025-12-04T10:58:28.2895637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2895681Z raise RuntimeError(msg) 2025-12-04T10:58:28.2896123Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2896126Z 2025-12-04T10:58:28.2896204Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2896519Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2896522Z 2025-12-04T10:58:28.2896642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2896723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2896785Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2897083Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2897162Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2897202Z graph_break [] 2025-12-04T10:58:28.2897283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2897341Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2897420Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2897717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2897790Z graph_break [] 2025-12-04T10:58:28.2897869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2897929Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2898006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2898300Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2898338Z graph_break [] 2025-12-04T10:58:28.2898603Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-522355946182c2ee.xml - 2025-12-04T10:58:28.2898669Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2899359Z FAILED [0.6558s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2899361Z 2025-12-04T10:58:28.2899440Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2899753Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2899757Z 2025-12-04T10:58:28.2899852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2899920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2899993Z ================== 1 failed, 57 deselected, 2 rerun in 4.48s =================== 2025-12-04T10:58:28.2900033Z Got exit code 1 2025-12-04T10:58:28.2900295Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.2900434Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2900649Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a161de8f60176c0.xml 2025-12-04T10:58:28.2900733Z ============================= test session starts ============================== 2025-12-04T10:58:28.2900854Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2900900Z cachedir: .pytest_cache 2025-12-04T10:58:28.2901073Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2901123Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2901168Z configfile: pytest.ini 2025-12-04T10:58:28.2901343Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2901425Z collecting ... collected 58 items / 12 deselected / 46 selected 2025-12-04T10:58:28.2901482Z stepcurrent: skipping 12 already run items. 2025-12-04T10:58:28.2901530Z Running 46 items in this shard 2025-12-04T10:58:28.2901532Z 2025-12-04T10:58:28.2901809Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8872s] [ 2%] 2025-12-04T10:58:28.2907740Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4544s] [ 2%] 2025-12-04T10:58:28.2907988Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4302s] [ 2%] 2025-12-04T10:58:28.2907991Z 2025-12-04T10:58:28.2908046Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2908210Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2908259Z Traceback (most recent call last): 2025-12-04T10:58:28.2908431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2908475Z method(*args, **kwargs) 2025-12-04T10:58:28.2908645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2908687Z method(*args, **kwargs) 2025-12-04T10:58:28.2908851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2908891Z with policy(): 2025-12-04T10:58:28.2909058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2909102Z raise RuntimeError(msg) 2025-12-04T10:58:28.2909537Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2909540Z 2025-12-04T10:58:28.2909620Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2909936Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2909939Z 2025-12-04T10:58:28.2910033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2910113Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2910174Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2910505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2910586Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2910627Z graph_break [] 2025-12-04T10:58:28.2910791Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2910839Z Traceback (most recent call last): 2025-12-04T10:58:28.2911006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2911049Z method(*args, **kwargs) 2025-12-04T10:58:28.2911214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2911256Z method(*args, **kwargs) 2025-12-04T10:58:28.2911420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2911459Z with policy(): 2025-12-04T10:58:28.2911626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2911697Z raise RuntimeError(msg) 2025-12-04T10:58:28.2912132Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2912134Z 2025-12-04T10:58:28.2912213Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2912527Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2912529Z 2025-12-04T10:58:28.2912625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2912705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2912768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2913065Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2913144Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2913183Z graph_break [] 2025-12-04T10:58:28.2913284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2913344Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2913423Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2913717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2913760Z graph_break [] 2025-12-04T10:58:28.2913817Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2913983Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2914031Z Traceback (most recent call last): 2025-12-04T10:58:28.2914201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2914244Z method(*args, **kwargs) 2025-12-04T10:58:28.2914408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2914451Z method(*args, **kwargs) 2025-12-04T10:58:28.2914644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2914687Z with policy(): 2025-12-04T10:58:28.2914854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2914899Z raise RuntimeError(msg) 2025-12-04T10:58:28.2915336Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2915338Z 2025-12-04T10:58:28.2915418Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2915735Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2915765Z 2025-12-04T10:58:28.2915861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2915940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2916001Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2916295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2916374Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2916412Z graph_break [] 2025-12-04T10:58:28.2916491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2916551Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2916629Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2916923Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2916962Z graph_break [] 2025-12-04T10:58:28.2917041Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2917100Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2917177Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2917467Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2917509Z graph_break [] 2025-12-04T10:58:28.2917774Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1a161de8f60176c0.xml - 2025-12-04T10:58:28.2917842Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2918525Z FAILED [0.4302s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2918528Z 2025-12-04T10:58:28.2918606Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2918957Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2918962Z 2025-12-04T10:58:28.2919056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2919123Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2919194Z ================== 1 failed, 12 deselected, 2 rerun in 3.94s =================== 2025-12-04T10:58:28.2919235Z Got exit code 1 2025-12-04T10:58:28.2919277Z Retrying single test... 2025-12-04T10:58:28.2919493Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9bd87e7f3a5ac3.xml 2025-12-04T10:58:28.2919555Z ============================= test session starts ============================== 2025-12-04T10:58:28.2919678Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2919722Z cachedir: .pytest_cache 2025-12-04T10:58:28.2919919Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2919969Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2920013Z configfile: pytest.ini 2025-12-04T10:58:28.2920188Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2920269Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2920580Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2920629Z Running 1 items in this shard 2025-12-04T10:58:28.2920631Z 2025-12-04T10:58:28.2921025Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:28.627547234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921030Z 2025-12-04T10:58:28.2921197Z [W1204 10:33:28.892568797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921199Z 2025-12-04T10:58:28.2921364Z [W1204 10:33:28.892710924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921366Z 2025-12-04T10:58:28.2921527Z [W1204 10:33:28.896154801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921529Z 2025-12-04T10:58:28.2921693Z [W1204 10:33:28.896484356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921696Z 2025-12-04T10:58:28.2921858Z [W1204 10:33:28.896547705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2921860Z 2025-12-04T10:58:28.2922022Z [W1204 10:33:28.898828720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2922024Z 2025-12-04T10:58:28.2922186Z [W1204 10:33:28.899112926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2922188Z 2025-12-04T10:58:28.2922348Z [W1204 10:33:28.899175555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2922350Z 2025-12-04T10:58:28.2922404Z ('RERUN', {'yellow': True}) [3.2872s] [100%] 2025-12-04T10:58:28.2922818Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:29.684200788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2922822Z 2025-12-04T10:58:28.2922988Z [W1204 10:33:29.684618391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2922990Z 2025-12-04T10:58:28.2923152Z [W1204 10:33:29.684698850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2923155Z 2025-12-04T10:58:28.2923361Z [W1204 10:33:29.685984150 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2923363Z 2025-12-04T10:58:28.2923526Z [W1204 10:33:29.686265636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2923528Z 2025-12-04T10:58:28.2923716Z [W1204 10:33:29.686334915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2923718Z 2025-12-04T10:58:28.2923881Z [W1204 10:33:29.688469872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2923883Z 2025-12-04T10:58:28.2924045Z [W1204 10:33:29.688747238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2924047Z 2025-12-04T10:58:28.2924208Z [W1204 10:33:29.688813077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2924210Z 2025-12-04T10:58:28.2924262Z ('RERUN', {'yellow': True}) [0.6521s] [100%] 2025-12-04T10:58:28.2924651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:30.374082438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2924656Z 2025-12-04T10:58:28.2924818Z [W1204 10:33:30.374482042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2924820Z 2025-12-04T10:58:28.2924980Z [W1204 10:33:30.374566751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2924982Z 2025-12-04T10:58:28.2925146Z [W1204 10:33:30.375857111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925148Z 2025-12-04T10:58:28.2925314Z [W1204 10:33:30.376147417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925317Z 2025-12-04T10:58:28.2925478Z [W1204 10:33:30.376215625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925482Z 2025-12-04T10:58:28.2925643Z [W1204 10:33:30.378318953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925645Z 2025-12-04T10:58:28.2925805Z [W1204 10:33:30.378588089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925807Z 2025-12-04T10:58:28.2925969Z [W1204 10:33:30.378648468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2925971Z 2025-12-04T10:58:28.2926014Z FAILED [0.6736s] [100%] 2025-12-04T10:58:28.2926016Z 2025-12-04T10:58:28.2926102Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2926267Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2926320Z Traceback (most recent call last): 2025-12-04T10:58:28.2926492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2926537Z method(*args, **kwargs) 2025-12-04T10:58:28.2926703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2926747Z method(*args, **kwargs) 2025-12-04T10:58:28.2926914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2926953Z with policy(): 2025-12-04T10:58:28.2927123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2927167Z raise RuntimeError(msg) 2025-12-04T10:58:28.2927596Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2927623Z 2025-12-04T10:58:28.2927703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2928024Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2928026Z 2025-12-04T10:58:28.2928121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2928203Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2928265Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2928566Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2928646Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2928686Z graph_break [] 2025-12-04T10:58:28.2928851Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2928901Z Traceback (most recent call last): 2025-12-04T10:58:28.2929068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2929111Z method(*args, **kwargs) 2025-12-04T10:58:28.2929278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2929320Z method(*args, **kwargs) 2025-12-04T10:58:28.2929486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2929525Z with policy(): 2025-12-04T10:58:28.2929692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2929736Z raise RuntimeError(msg) 2025-12-04T10:58:28.2930172Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2930174Z 2025-12-04T10:58:28.2930273Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2930593Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2930597Z 2025-12-04T10:58:28.2930691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2930772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2930832Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2931130Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2931209Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2931248Z graph_break [] 2025-12-04T10:58:28.2931330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2931413Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2931492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2931785Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2931825Z graph_break [] 2025-12-04T10:58:28.2931882Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2932054Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2932102Z Traceback (most recent call last): 2025-12-04T10:58:28.2932272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2932314Z method(*args, **kwargs) 2025-12-04T10:58:28.2932481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2932523Z method(*args, **kwargs) 2025-12-04T10:58:28.2932687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2932726Z with policy(): 2025-12-04T10:58:28.2932892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2932935Z raise RuntimeError(msg) 2025-12-04T10:58:28.2933404Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2933406Z 2025-12-04T10:58:28.2933486Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2933803Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2933805Z 2025-12-04T10:58:28.2933899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2933978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2934039Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2934375Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2934455Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2934496Z graph_break [] 2025-12-04T10:58:28.2934576Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2934634Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2934712Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2935003Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2935042Z graph_break [] 2025-12-04T10:58:28.2935120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2935179Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2935258Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2935548Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2935618Z graph_break [] 2025-12-04T10:58:28.2935887Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6b9bd87e7f3a5ac3.xml - 2025-12-04T10:58:28.2935951Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2936639Z FAILED [0.6736s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2936643Z 2025-12-04T10:58:28.2936722Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2937033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2937035Z 2025-12-04T10:58:28.2937129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2937196Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2937268Z ================== 1 failed, 57 deselected, 2 rerun in 4.78s =================== 2025-12-04T10:58:28.2937307Z Got exit code 1 2025-12-04T10:58:28.2937353Z Retrying single test... 2025-12-04T10:58:28.2937569Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f35d0edbd27d0e1a.xml 2025-12-04T10:58:28.2937633Z ============================= test session starts ============================== 2025-12-04T10:58:28.2937752Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2937797Z cachedir: .pytest_cache 2025-12-04T10:58:28.2937968Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2938019Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2938062Z configfile: pytest.ini 2025-12-04T10:58:28.2938238Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2938342Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2938655Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2938705Z Running 1 items in this shard 2025-12-04T10:58:28.2938708Z 2025-12-04T10:58:28.2939098Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:40.474931245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939101Z 2025-12-04T10:58:28.2939268Z [W1204 10:33:40.744994730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939270Z 2025-12-04T10:58:28.2939436Z [W1204 10:33:40.745158768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939438Z 2025-12-04T10:58:28.2939602Z [W1204 10:33:40.749093917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939624Z 2025-12-04T10:58:28.2939787Z [W1204 10:33:40.749452142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939789Z 2025-12-04T10:58:28.2939949Z [W1204 10:33:40.749515311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2939951Z 2025-12-04T10:58:28.2940112Z [W1204 10:33:40.751846795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2940114Z 2025-12-04T10:58:28.2940276Z [W1204 10:33:40.752132000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2940278Z 2025-12-04T10:58:28.2940440Z [W1204 10:33:40.752195679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2940443Z 2025-12-04T10:58:28.2940496Z ('RERUN', {'yellow': True}) [3.2854s] [100%] 2025-12-04T10:58:28.2940888Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:41.550478790 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2940890Z 2025-12-04T10:58:28.2941052Z [W1204 10:33:41.550861094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941054Z 2025-12-04T10:58:28.2941216Z [W1204 10:33:41.550927403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941218Z 2025-12-04T10:58:28.2941378Z [W1204 10:33:41.552188044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941382Z 2025-12-04T10:58:28.2941542Z [W1204 10:33:41.552469770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941544Z 2025-12-04T10:58:28.2941704Z [W1204 10:33:41.552531229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941706Z 2025-12-04T10:58:28.2941867Z [W1204 10:33:41.554635346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2941869Z 2025-12-04T10:58:28.2942050Z [W1204 10:33:41.554900052 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2942053Z 2025-12-04T10:58:28.2942214Z [W1204 10:33:41.554961121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2942219Z 2025-12-04T10:58:28.2942270Z ('RERUN', {'yellow': True}) [0.6525s] [100%] 2025-12-04T10:58:28.2942659Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:33:41.219432695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2942661Z 2025-12-04T10:58:28.2942823Z [W1204 10:33:41.219819279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2942827Z 2025-12-04T10:58:28.2942990Z [W1204 10:33:41.219898308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2942992Z 2025-12-04T10:58:28.2943155Z [W1204 10:33:41.221180679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2943184Z 2025-12-04T10:58:28.2943388Z [W1204 10:33:41.221442964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2943390Z 2025-12-04T10:58:28.2943553Z [W1204 10:33:41.221502484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2943555Z 2025-12-04T10:58:28.2943716Z [W1204 10:33:41.223546212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2943718Z 2025-12-04T10:58:28.2943879Z [W1204 10:33:41.223809758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2943883Z 2025-12-04T10:58:28.2944046Z [W1204 10:33:41.223869517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2944050Z 2025-12-04T10:58:28.2944094Z FAILED [0.6590s] [100%] 2025-12-04T10:58:28.2944096Z 2025-12-04T10:58:28.2944154Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2944319Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2944370Z Traceback (most recent call last): 2025-12-04T10:58:28.2944542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2944588Z method(*args, **kwargs) 2025-12-04T10:58:28.2944754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2944801Z method(*args, **kwargs) 2025-12-04T10:58:28.2944967Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2945012Z with policy(): 2025-12-04T10:58:28.2945179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2945225Z raise RuntimeError(msg) 2025-12-04T10:58:28.2945657Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.2945659Z 2025-12-04T10:58:28.2945741Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2946087Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2946093Z 2025-12-04T10:58:28.2946187Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2946268Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2946329Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2946625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2946704Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2946745Z graph_break [] 2025-12-04T10:58:28.2946910Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2946960Z Traceback (most recent call last): 2025-12-04T10:58:28.2947128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2947207Z method(*args, **kwargs) 2025-12-04T10:58:28.2947371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2947415Z method(*args, **kwargs) 2025-12-04T10:58:28.2947579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2947623Z with policy(): 2025-12-04T10:58:28.2947788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2947834Z raise RuntimeError(msg) 2025-12-04T10:58:28.2948271Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.2948275Z 2025-12-04T10:58:28.2948356Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2948671Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2948674Z 2025-12-04T10:58:28.2948768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2948849Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2948910Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2949206Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2949288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2949328Z graph_break [] 2025-12-04T10:58:28.2949407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2949468Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2949547Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2949842Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2949882Z graph_break [] 2025-12-04T10:58:28.2949961Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2950130Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.2950182Z Traceback (most recent call last): 2025-12-04T10:58:28.2950349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2950395Z method(*args, **kwargs) 2025-12-04T10:58:28.2950559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2950603Z method(*args, **kwargs) 2025-12-04T10:58:28.2950766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2950808Z with policy(): 2025-12-04T10:58:28.2950974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2951019Z raise RuntimeError(msg) 2025-12-04T10:58:28.2951457Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2951480Z 2025-12-04T10:58:28.2951560Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2951873Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2951876Z 2025-12-04T10:58:28.2951969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2952050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2952110Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2952406Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2952484Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2952525Z graph_break [] 2025-12-04T10:58:28.2952604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2952665Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2952742Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2953037Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2953078Z graph_break [] 2025-12-04T10:58:28.2953158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2953217Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2953333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2953628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2953668Z graph_break [] 2025-12-04T10:58:28.2953936Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f35d0edbd27d0e1a.xml - 2025-12-04T10:58:28.2954029Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2954712Z FAILED [0.6590s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.2954716Z 2025-12-04T10:58:28.2954794Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2955107Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2955110Z 2025-12-04T10:58:28.2955205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2955272Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2955373Z ================== 1 failed, 57 deselected, 2 rerun in 4.76s =================== 2025-12-04T10:58:28.2955415Z Got exit code 1 2025-12-04T10:58:28.2955673Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.2955813Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.2956027Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-66e1b8dde70263ef.xml 2025-12-04T10:58:28.2956089Z ============================= test session starts ============================== 2025-12-04T10:58:28.2956211Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2956256Z cachedir: .pytest_cache 2025-12-04T10:58:28.2956432Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2956481Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2956526Z configfile: pytest.ini 2025-12-04T10:58:28.2956698Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2956780Z collecting ... collected 58 items / 13 deselected / 45 selected 2025-12-04T10:58:28.2956837Z stepcurrent: skipping 13 already run items. 2025-12-04T10:58:28.2956886Z Running 45 items in this shard 2025-12-04T10:58:28.2956888Z 2025-12-04T10:58:28.2957166Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6961s] [ 2%] 2025-12-04T10:58:28.2957441Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.7044s] [ 2%] 2025-12-04T10:58:28.2957688Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.6818s] [ 2%] 2025-12-04T10:58:28.2957692Z 2025-12-04T10:58:28.2957748Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2957916Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2957964Z Traceback (most recent call last): 2025-12-04T10:58:28.2958137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2958200Z method(*args, **kwargs) 2025-12-04T10:58:28.2958369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2958414Z method(*args, **kwargs) 2025-12-04T10:58:28.2958582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2958621Z with policy(): 2025-12-04T10:58:28.2958789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2958833Z raise RuntimeError(msg) 2025-12-04T10:58:28.2959271Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2959274Z 2025-12-04T10:58:28.2959352Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2959698Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2959701Z 2025-12-04T10:58:28.2959794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2959876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2959938Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2960131Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2960213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2960252Z graph_break [] 2025-12-04T10:58:28.2960420Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2960470Z Traceback (most recent call last): 2025-12-04T10:58:28.2960637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2960680Z method(*args, **kwargs) 2025-12-04T10:58:28.2960845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2960888Z method(*args, **kwargs) 2025-12-04T10:58:28.2961053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2961092Z with policy(): 2025-12-04T10:58:28.2961259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2961304Z raise RuntimeError(msg) 2025-12-04T10:58:28.2961752Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2961757Z 2025-12-04T10:58:28.2961835Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2962154Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2962156Z 2025-12-04T10:58:28.2962250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2962350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2962412Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2962605Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2962685Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2962724Z graph_break [] 2025-12-04T10:58:28.2962805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2962864Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2962943Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2963134Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2963174Z graph_break [] 2025-12-04T10:58:28.2963232Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2963492Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2963541Z Traceback (most recent call last): 2025-12-04T10:58:28.2963709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2963753Z method(*args, **kwargs) 2025-12-04T10:58:28.2963918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2963961Z method(*args, **kwargs) 2025-12-04T10:58:28.2964126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2964166Z with policy(): 2025-12-04T10:58:28.2964334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2964378Z raise RuntimeError(msg) 2025-12-04T10:58:28.2964827Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2964830Z 2025-12-04T10:58:28.2964909Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2965227Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2965230Z 2025-12-04T10:58:28.2965328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2965409Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2965474Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2965665Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2965746Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2965786Z graph_break [] 2025-12-04T10:58:28.2965868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2965926Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2966007Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2966198Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2966269Z graph_break [] 2025-12-04T10:58:28.2966348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2966413Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2966492Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2966684Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2966724Z graph_break [] 2025-12-04T10:58:28.2966988Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-66e1b8dde70263ef.xml - 2025-12-04T10:58:28.2967055Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2967766Z FAILED [0.6818s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2967793Z 2025-12-04T10:58:28.2967876Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2968193Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2968195Z 2025-12-04T10:58:28.2968298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2968370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2968450Z ================== 1 failed, 13 deselected, 2 rerun in 4.24s =================== 2025-12-04T10:58:28.2968493Z Got exit code 1 2025-12-04T10:58:28.2968543Z Retrying single test... 2025-12-04T10:58:28.2968758Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca7613c09c5b8833.xml 2025-12-04T10:58:28.2968825Z ============================= test session starts ============================== 2025-12-04T10:58:28.2968947Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2968995Z cachedir: .pytest_cache 2025-12-04T10:58:28.2969168Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2969225Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2969274Z configfile: pytest.ini 2025-12-04T10:58:28.2969455Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2969537Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2969853Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2969901Z Running 1 items in this shard 2025-12-04T10:58:28.2969906Z 2025-12-04T10:58:28.2970302Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:02.953028879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2970305Z 2025-12-04T10:58:28.2970497Z [W1204 10:34:02.240004406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2970499Z 2025-12-04T10:58:28.2970668Z [W1204 10:34:02.240141923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2970670Z 2025-12-04T10:58:28.2970835Z [W1204 10:34:02.243474192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2970837Z 2025-12-04T10:58:28.2971000Z [W1204 10:34:02.243776887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2971004Z 2025-12-04T10:58:28.2971166Z [W1204 10:34:02.243838367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2971168Z 2025-12-04T10:58:28.2971335Z [W1204 10:34:02.245998243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2971337Z 2025-12-04T10:58:28.2971500Z [W1204 10:34:02.246281409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2971528Z 2025-12-04T10:58:28.2971694Z [W1204 10:34:02.246343088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2971696Z 2025-12-04T10:58:28.2971749Z ('RERUN', {'yellow': True}) [3.0745s] [100%] 2025-12-04T10:58:28.2972144Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:04.465984858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972146Z 2025-12-04T10:58:28.2972311Z [W1204 10:34:04.466356982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972313Z 2025-12-04T10:58:28.2972475Z [W1204 10:34:04.466422861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972479Z 2025-12-04T10:58:28.2972642Z [W1204 10:34:04.467723611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972644Z 2025-12-04T10:58:28.2972806Z [W1204 10:34:04.467981547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972808Z 2025-12-04T10:58:28.2972974Z [W1204 10:34:04.468046896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2972976Z 2025-12-04T10:58:28.2973141Z [W1204 10:34:04.470023456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2973145Z 2025-12-04T10:58:28.2973340Z [W1204 10:34:04.470360220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2973344Z 2025-12-04T10:58:28.2973512Z [W1204 10:34:04.470423470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2973514Z 2025-12-04T10:58:28.2973571Z ('RERUN', {'yellow': True}) [0.7015s] [100%] 2025-12-04T10:58:28.2973968Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:04.125335555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2973971Z 2025-12-04T10:58:28.2974163Z [W1204 10:34:04.125733458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2974165Z 2025-12-04T10:58:28.2974336Z [W1204 10:34:04.125807837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2974339Z 2025-12-04T10:58:28.2974505Z [W1204 10:34:04.127095717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2974507Z 2025-12-04T10:58:28.2974672Z [W1204 10:34:04.127360243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2974674Z 2025-12-04T10:58:28.2974840Z [W1204 10:34:04.127421062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2974841Z 2025-12-04T10:58:28.2975004Z [W1204 10:34:04.129386902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2975008Z 2025-12-04T10:58:28.2975175Z [W1204 10:34:04.129728907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2975207Z 2025-12-04T10:58:28.2975371Z [W1204 10:34:04.129791616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2975378Z 2025-12-04T10:58:28.2975424Z FAILED [0.6485s] [100%] 2025-12-04T10:58:28.2975426Z 2025-12-04T10:58:28.2975488Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2975660Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2975715Z Traceback (most recent call last): 2025-12-04T10:58:28.2975888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2975941Z method(*args, **kwargs) 2025-12-04T10:58:28.2976111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2976165Z method(*args, **kwargs) 2025-12-04T10:58:28.2976331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2976379Z with policy(): 2025-12-04T10:58:28.2976545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2976596Z raise RuntimeError(msg) 2025-12-04T10:58:28.2977033Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2977038Z 2025-12-04T10:58:28.2977123Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2977442Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2977448Z 2025-12-04T10:58:28.2977546Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2977633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2977697Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2977896Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2977977Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2978046Z graph_break [] 2025-12-04T10:58:28.2978211Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2978262Z Traceback (most recent call last): 2025-12-04T10:58:28.2978428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2978476Z method(*args, **kwargs) 2025-12-04T10:58:28.2978644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2978689Z method(*args, **kwargs) 2025-12-04T10:58:28.2978853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2978894Z with policy(): 2025-12-04T10:58:28.2979062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2979109Z raise RuntimeError(msg) 2025-12-04T10:58:28.2979557Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2979586Z 2025-12-04T10:58:28.2979669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2979985Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2979990Z 2025-12-04T10:58:28.2980084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2980168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2980228Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2980421Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2980499Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2980541Z graph_break [] 2025-12-04T10:58:28.2980619Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2980681Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2980759Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2980952Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2980991Z graph_break [] 2025-12-04T10:58:28.2981051Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2981217Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2981271Z Traceback (most recent call last): 2025-12-04T10:58:28.2981438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2981485Z method(*args, **kwargs) 2025-12-04T10:58:28.2981649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2981695Z method(*args, **kwargs) 2025-12-04T10:58:28.2981856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2981897Z with policy(): 2025-12-04T10:58:28.2982086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2982134Z raise RuntimeError(msg) 2025-12-04T10:58:28.2982585Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2982588Z 2025-12-04T10:58:28.2982669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2982984Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2982988Z 2025-12-04T10:58:28.2983083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2983168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2983229Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2983491Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2983571Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2983613Z graph_break [] 2025-12-04T10:58:28.2983693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2983755Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2983832Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2984024Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2984064Z graph_break [] 2025-12-04T10:58:28.2984145Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2984208Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2984287Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2984476Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2984518Z graph_break [] 2025-12-04T10:58:28.2984781Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca7613c09c5b8833.xml - 2025-12-04T10:58:28.2984846Z =========================== short test summary info ============================ 2025-12-04T10:58:28.2985552Z FAILED [0.6485s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.2985557Z 2025-12-04T10:58:28.2985637Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2985954Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2985957Z 2025-12-04T10:58:28.2986051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2986144Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.2986216Z ================== 1 failed, 57 deselected, 2 rerun in 4.57s =================== 2025-12-04T10:58:28.2986259Z Got exit code 1 2025-12-04T10:58:28.2986303Z Retrying single test... 2025-12-04T10:58:28.2986518Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-097f51db82a53838.xml 2025-12-04T10:58:28.2986578Z ============================= test session starts ============================== 2025-12-04T10:58:28.2986701Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.2986744Z cachedir: .pytest_cache 2025-12-04T10:58:28.2986918Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.2986968Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.2987013Z configfile: pytest.ini 2025-12-04T10:58:28.2987189Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.2987298Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.2987613Z stepcurrent: skipping 13 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2987663Z Running 1 items in this shard 2025-12-04T10:58:28.2987666Z 2025-12-04T10:58:28.2988065Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:13.123852390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988067Z 2025-12-04T10:58:28.2988236Z [W1204 10:34:14.387674344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988238Z 2025-12-04T10:58:28.2988404Z [W1204 10:34:14.387812002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988408Z 2025-12-04T10:58:28.2988576Z [W1204 10:34:14.391207160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988578Z 2025-12-04T10:58:28.2988746Z [W1204 10:34:14.391523825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988748Z 2025-12-04T10:58:28.2988914Z [W1204 10:34:14.391585184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2988916Z 2025-12-04T10:58:28.2989083Z [W1204 10:34:14.393927158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2989085Z 2025-12-04T10:58:28.2989255Z [W1204 10:34:14.394204414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2989259Z 2025-12-04T10:58:28.2989422Z [W1204 10:34:14.394267373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2989424Z 2025-12-04T10:58:28.2989483Z ('RERUN', {'yellow': True}) [2.9472s] [100%] 2025-12-04T10:58:28.2989877Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:15.610569107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2989883Z 2025-12-04T10:58:28.2990070Z [W1204 10:34:15.610959801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990072Z 2025-12-04T10:58:28.2990239Z [W1204 10:34:15.611028550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990243Z 2025-12-04T10:58:28.2990407Z [W1204 10:34:15.612313880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990409Z 2025-12-04T10:58:28.2990579Z [W1204 10:34:15.612570866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990581Z 2025-12-04T10:58:28.2990745Z [W1204 10:34:15.612630555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990747Z 2025-12-04T10:58:28.2990915Z [W1204 10:34:15.614693574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2990918Z 2025-12-04T10:58:28.2991083Z [W1204 10:34:15.615042308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2991110Z 2025-12-04T10:58:28.2991274Z [W1204 10:34:15.615108257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2991276Z 2025-12-04T10:58:28.2991335Z ('RERUN', {'yellow': True}) [0.7218s] [100%] 2025-12-04T10:58:28.2991728Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:34:16.328805918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2991730Z 2025-12-04T10:58:28.2991900Z [W1204 10:34:16.329204772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2991902Z 2025-12-04T10:58:28.2992066Z [W1204 10:34:16.329286591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992074Z 2025-12-04T10:58:28.2992239Z [W1204 10:34:16.330632750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992241Z 2025-12-04T10:58:28.2992409Z [W1204 10:34:16.330906756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992411Z 2025-12-04T10:58:28.2992575Z [W1204 10:34:16.330967035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992577Z 2025-12-04T10:58:28.2992744Z [W1204 10:34:16.332989103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992748Z 2025-12-04T10:58:28.2992912Z [W1204 10:34:16.333337808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2992916Z 2025-12-04T10:58:28.2993085Z [W1204 10:34:16.333402047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.2993087Z 2025-12-04T10:58:28.2993134Z FAILED [0.7111s] [100%] 2025-12-04T10:58:28.2993136Z 2025-12-04T10:58:28.2993196Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.2993415Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2993468Z Traceback (most recent call last): 2025-12-04T10:58:28.2993645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2993723Z method(*args, **kwargs) 2025-12-04T10:58:28.2993896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2993942Z method(*args, **kwargs) 2025-12-04T10:58:28.2994111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2994151Z with policy(): 2025-12-04T10:58:28.2994323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2994368Z raise RuntimeError(msg) 2025-12-04T10:58:28.2994807Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.2994810Z 2025-12-04T10:58:28.2994892Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2995248Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2995250Z 2025-12-04T10:58:28.2995349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2995429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2995493Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2995687Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2995769Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2995809Z graph_break [] 2025-12-04T10:58:28.2995981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2996034Z Traceback (most recent call last): 2025-12-04T10:58:28.2996203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2996247Z method(*args, **kwargs) 2025-12-04T10:58:28.2996414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2996457Z method(*args, **kwargs) 2025-12-04T10:58:28.2996623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2996663Z with policy(): 2025-12-04T10:58:28.2996831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2996878Z raise RuntimeError(msg) 2025-12-04T10:58:28.2997332Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.2997338Z 2025-12-04T10:58:28.2997419Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.2997739Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.2997742Z 2025-12-04T10:58:28.2997838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.2997942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2998006Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2998199Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2998280Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2998320Z graph_break [] 2025-12-04T10:58:28.2998402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.2998461Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.2998540Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.2998731Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.2998772Z graph_break [] 2025-12-04T10:58:28.2998831Z =================================== FAILURES =================================== 2025-12-04T10:58:28.2998999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.2999074Z Traceback (most recent call last): 2025-12-04T10:58:28.2999241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2999285Z method(*args, **kwargs) 2025-12-04T10:58:28.2999451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.2999493Z method(*args, **kwargs) 2025-12-04T10:58:28.2999659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.2999698Z with policy(): 2025-12-04T10:58:28.2999867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.2999912Z raise RuntimeError(msg) 2025-12-04T10:58:28.3000362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3000366Z 2025-12-04T10:58:28.3000446Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3000770Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3000772Z 2025-12-04T10:58:28.3000869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3000950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3001012Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3001205Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3001286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3001326Z graph_break [] 2025-12-04T10:58:28.3001406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3001466Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3001546Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3001735Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3001775Z graph_break [] 2025-12-04T10:58:28.3001876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3001939Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3002017Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3002209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3002248Z graph_break [] 2025-12-04T10:58:28.3002511Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-097f51db82a53838.xml - 2025-12-04T10:58:28.3002576Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3003335Z FAILED [0.7111s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3003373Z 2025-12-04T10:58:28.3003454Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3003772Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3003774Z 2025-12-04T10:58:28.3003870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3003938Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3004014Z ================== 1 failed, 57 deselected, 2 rerun in 4.54s =================== 2025-12-04T10:58:28.3004054Z Got exit code 1 2025-12-04T10:58:28.3004321Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3004460Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3004678Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-da16622bca217d76.xml 2025-12-04T10:58:28.3004740Z ============================= test session starts ============================== 2025-12-04T10:58:28.3004863Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3004908Z cachedir: .pytest_cache 2025-12-04T10:58:28.3005088Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3005138Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3005185Z configfile: pytest.ini 2025-12-04T10:58:28.3005360Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3005443Z collecting ... collected 58 items / 14 deselected / 44 selected 2025-12-04T10:58:28.3005503Z stepcurrent: skipping 14 already run items. 2025-12-04T10:58:28.3005553Z Running 44 items in this shard 2025-12-04T10:58:28.3005555Z 2025-12-04T10:58:28.3005835Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6431s] [ 2%] 2025-12-04T10:58:28.3006207Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6064s] [ 2%] 2025-12-04T10:58:28.3006453Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.5956s] [ 2%] 2025-12-04T10:58:28.3006458Z 2025-12-04T10:58:28.3006514Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3006681Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3006730Z Traceback (most recent call last): 2025-12-04T10:58:28.3006903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3006947Z method(*args, **kwargs) 2025-12-04T10:58:28.3007115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3007160Z method(*args, **kwargs) 2025-12-04T10:58:28.3007327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3007400Z with policy(): 2025-12-04T10:58:28.3007570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3007615Z raise RuntimeError(msg) 2025-12-04T10:58:28.3008053Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3008056Z 2025-12-04T10:58:28.3008137Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3008457Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3008461Z 2025-12-04T10:58:28.3008560Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3008643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3008707Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3008900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3008981Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3009022Z graph_break [] 2025-12-04T10:58:28.3009188Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3009239Z Traceback (most recent call last): 2025-12-04T10:58:28.3009409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3009454Z method(*args, **kwargs) 2025-12-04T10:58:28.3009621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3009664Z method(*args, **kwargs) 2025-12-04T10:58:28.3009830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3009870Z with policy(): 2025-12-04T10:58:28.3010038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3010082Z raise RuntimeError(msg) 2025-12-04T10:58:28.3010551Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3010555Z 2025-12-04T10:58:28.3010637Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3010955Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3010958Z 2025-12-04T10:58:28.3011054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3011134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3011196Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3011390Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3011497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3011537Z graph_break [] 2025-12-04T10:58:28.3011618Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3011678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3011758Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3011950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3011992Z graph_break [] 2025-12-04T10:58:28.3012048Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3012215Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3012264Z Traceback (most recent call last): 2025-12-04T10:58:28.3012434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3012483Z method(*args, **kwargs) 2025-12-04T10:58:28.3012650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3012693Z method(*args, **kwargs) 2025-12-04T10:58:28.3012860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3012900Z with policy(): 2025-12-04T10:58:28.3013069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3013115Z raise RuntimeError(msg) 2025-12-04T10:58:28.3013611Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3013615Z 2025-12-04T10:58:28.3013697Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3014014Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3014016Z 2025-12-04T10:58:28.3014113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3014193Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3014256Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3014478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3014562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3014602Z graph_break [] 2025-12-04T10:58:28.3014683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3014742Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3014822Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3015014Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3015055Z graph_break [] 2025-12-04T10:58:28.3015135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3015197Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3015277Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3015503Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3015543Z graph_break [] 2025-12-04T10:58:28.3015808Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-da16622bca217d76.xml - 2025-12-04T10:58:28.3015874Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3016569Z FAILED [0.5956s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3016573Z 2025-12-04T10:58:28.3016654Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3016969Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3016974Z 2025-12-04T10:58:28.3017068Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3017138Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3017210Z ================== 1 failed, 14 deselected, 2 rerun in 3.99s =================== 2025-12-04T10:58:28.3017253Z Got exit code 1 2025-12-04T10:58:28.3017297Z Retrying single test... 2025-12-04T10:58:28.3017516Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-53ec448439c3709b.xml 2025-12-04T10:58:28.3017582Z ============================= test session starts ============================== 2025-12-04T10:58:28.3017704Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3017749Z cachedir: .pytest_cache 2025-12-04T10:58:28.3017922Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3017972Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3018019Z configfile: pytest.ini 2025-12-04T10:58:28.3018193Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3018275Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3018608Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3018661Z Running 1 items in this shard 2025-12-04T10:58:28.3018663Z 2025-12-04T10:58:28.3019057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:36.727379879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019059Z 2025-12-04T10:58:28.3019229Z [W1204 10:34:36.999088023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019231Z 2025-12-04T10:58:28.3019398Z [W1204 10:34:36.999233991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019400Z 2025-12-04T10:58:28.3019564Z [W1204 10:34:36.002903374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019591Z 2025-12-04T10:58:28.3019756Z [W1204 10:34:36.003219159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019758Z 2025-12-04T10:58:28.3019921Z [W1204 10:34:36.003283778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3019923Z 2025-12-04T10:58:28.3020089Z [W1204 10:34:36.005598733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3020091Z 2025-12-04T10:58:28.3020254Z [W1204 10:34:36.005879399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3020259Z 2025-12-04T10:58:28.3020421Z [W1204 10:34:36.005941168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3020424Z 2025-12-04T10:58:28.3020480Z ('RERUN', {'yellow': True}) [2.8796s] [100%] 2025-12-04T10:58:28.3020873Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:37.184841954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3020876Z 2025-12-04T10:58:28.3021040Z [W1204 10:34:37.185234198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021042Z 2025-12-04T10:58:28.3021204Z [W1204 10:34:37.185305557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021207Z 2025-12-04T10:58:28.3021371Z [W1204 10:34:37.186596137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021375Z 2025-12-04T10:58:28.3021539Z [W1204 10:34:37.186859843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021541Z 2025-12-04T10:58:28.3021703Z [W1204 10:34:37.186921172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021705Z 2025-12-04T10:58:28.3021868Z [W1204 10:34:37.188934001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3021870Z 2025-12-04T10:58:28.3022031Z [W1204 10:34:37.189300035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3022033Z 2025-12-04T10:58:28.3022220Z [W1204 10:34:37.189365144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3022224Z 2025-12-04T10:58:28.3022276Z ('RERUN', {'yellow': True}) [0.6832s] [100%] 2025-12-04T10:58:28.3022670Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:38.815168163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3022673Z 2025-12-04T10:58:28.3022836Z [W1204 10:34:38.815557027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3022838Z 2025-12-04T10:58:28.3022999Z [W1204 10:34:38.815621336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023003Z 2025-12-04T10:58:28.3023167Z [W1204 10:34:38.816890876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023200Z 2025-12-04T10:58:28.3023404Z [W1204 10:34:38.817150902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023406Z 2025-12-04T10:58:28.3023571Z [W1204 10:34:38.817215011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023573Z 2025-12-04T10:58:28.3023736Z [W1204 10:34:38.819205231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023738Z 2025-12-04T10:58:28.3023900Z [W1204 10:34:38.819546285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3023902Z 2025-12-04T10:58:28.3024067Z [W1204 10:34:38.819609544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3024071Z 2025-12-04T10:58:28.3024113Z FAILED [0.6698s] [100%] 2025-12-04T10:58:28.3024115Z 2025-12-04T10:58:28.3024174Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3024342Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3024393Z Traceback (most recent call last): 2025-12-04T10:58:28.3024563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3024609Z method(*args, **kwargs) 2025-12-04T10:58:28.3024776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3024822Z method(*args, **kwargs) 2025-12-04T10:58:28.3024989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3025033Z with policy(): 2025-12-04T10:58:28.3025199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3025245Z raise RuntimeError(msg) 2025-12-04T10:58:28.3025677Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3025681Z 2025-12-04T10:58:28.3025762Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3026109Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3026113Z 2025-12-04T10:58:28.3026208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3026290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3026351Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3026545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3026626Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3026667Z graph_break [] 2025-12-04T10:58:28.3026832Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3026882Z Traceback (most recent call last): 2025-12-04T10:58:28.3027049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3027131Z method(*args, **kwargs) 2025-12-04T10:58:28.3027296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3027341Z method(*args, **kwargs) 2025-12-04T10:58:28.3027505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3027547Z with policy(): 2025-12-04T10:58:28.3027712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3027758Z raise RuntimeError(msg) 2025-12-04T10:58:28.3028198Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3028203Z 2025-12-04T10:58:28.3028283Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3028599Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3028602Z 2025-12-04T10:58:28.3028696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3028777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3028837Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3029031Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3029110Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3029154Z graph_break [] 2025-12-04T10:58:28.3029234Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3029296Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3029374Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3029567Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3029606Z graph_break [] 2025-12-04T10:58:28.3029666Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3029829Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3029902Z Traceback (most recent call last): 2025-12-04T10:58:28.3030072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3030120Z method(*args, **kwargs) 2025-12-04T10:58:28.3030285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3030331Z method(*args, **kwargs) 2025-12-04T10:58:28.3030495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3030538Z with policy(): 2025-12-04T10:58:28.3030703Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3030750Z raise RuntimeError(msg) 2025-12-04T10:58:28.3031195Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3031221Z 2025-12-04T10:58:28.3031301Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3031618Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3031620Z 2025-12-04T10:58:28.3031715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3031796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3031856Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3032051Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3032130Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3032173Z graph_break [] 2025-12-04T10:58:28.3032252Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3032313Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3032392Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3032584Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3032623Z graph_break [] 2025-12-04T10:58:28.3032704Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3032763Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3032845Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3033035Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3033078Z graph_break [] 2025-12-04T10:58:28.3033390Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-53ec448439c3709b.xml - 2025-12-04T10:58:28.3033454Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3034181Z FAILED [0.6698s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3034186Z 2025-12-04T10:58:28.3034265Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3034584Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3034586Z 2025-12-04T10:58:28.3034680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3034748Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3037465Z ================== 1 failed, 57 deselected, 2 rerun in 4.40s =================== 2025-12-04T10:58:28.3037511Z Got exit code 1 2025-12-04T10:58:28.3037555Z Retrying single test... 2025-12-04T10:58:28.3037777Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-19e1e77dc9f56de8.xml 2025-12-04T10:58:28.3037887Z ============================= test session starts ============================== 2025-12-04T10:58:28.3038011Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3038057Z cachedir: .pytest_cache 2025-12-04T10:58:28.3038231Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3038283Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3038328Z configfile: pytest.ini 2025-12-04T10:58:28.3038507Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3038587Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3038903Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3038956Z Running 1 items in this shard 2025-12-04T10:58:28.3038959Z 2025-12-04T10:58:28.3039358Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:48.327988021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3039361Z 2025-12-04T10:58:28.3039532Z [W1204 10:34:48.603236191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3039534Z 2025-12-04T10:58:28.3039703Z [W1204 10:34:48.603386188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3039705Z 2025-12-04T10:58:28.3039872Z [W1204 10:34:48.606604079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3039879Z 2025-12-04T10:58:28.3040042Z [W1204 10:34:48.606917794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3040044Z 2025-12-04T10:58:28.3040206Z [W1204 10:34:48.606979453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3040208Z 2025-12-04T10:58:28.3040370Z [W1204 10:34:48.609146779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3040372Z 2025-12-04T10:58:28.3040535Z [W1204 10:34:48.609424255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3040537Z 2025-12-04T10:58:28.3040735Z [W1204 10:34:48.609485354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3040744Z 2025-12-04T10:58:28.3040798Z ('RERUN', {'yellow': True}) [2.9329s] [100%] 2025-12-04T10:58:28.3041195Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:49.730874700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3041198Z 2025-12-04T10:58:28.3041360Z [W1204 10:34:49.731291003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3041362Z 2025-12-04T10:58:28.3041526Z [W1204 10:34:49.731371812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3041528Z 2025-12-04T10:58:28.3041691Z [W1204 10:34:49.732686832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3041723Z 2025-12-04T10:58:28.3041886Z [W1204 10:34:49.732968178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3041888Z 2025-12-04T10:58:28.3042052Z [W1204 10:34:49.733037616 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3042054Z 2025-12-04T10:58:28.3042214Z [W1204 10:34:49.734999086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3042217Z 2025-12-04T10:58:28.3042378Z [W1204 10:34:49.735347741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3042380Z 2025-12-04T10:58:28.3042542Z [W1204 10:34:49.735411340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3042545Z 2025-12-04T10:58:28.3042599Z ('RERUN', {'yellow': True}) [0.6219s] [100%] 2025-12-04T10:58:28.3042987Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:34:50.325060387 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3042990Z 2025-12-04T10:58:28.3043152Z [W1204 10:34:50.325438941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3043154Z 2025-12-04T10:58:28.3043359Z [W1204 10:34:50.325503080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3043361Z 2025-12-04T10:58:28.3043525Z [W1204 10:34:50.326758001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3043531Z 2025-12-04T10:58:28.3043693Z [W1204 10:34:50.327014027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3043695Z 2025-12-04T10:58:28.3043856Z [W1204 10:34:50.327076646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3043858Z 2025-12-04T10:58:28.3044020Z [W1204 10:34:50.329026526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3044022Z 2025-12-04T10:58:28.3044185Z [W1204 10:34:50.329366760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3044187Z 2025-12-04T10:58:28.3044379Z [W1204 10:34:50.329428299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3044381Z 2025-12-04T10:58:28.3044428Z FAILED [0.5878s] [100%] 2025-12-04T10:58:28.3044431Z 2025-12-04T10:58:28.3044487Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3044653Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3044702Z Traceback (most recent call last): 2025-12-04T10:58:28.3044875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3044919Z method(*args, **kwargs) 2025-12-04T10:58:28.3045087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3045131Z method(*args, **kwargs) 2025-12-04T10:58:28.3045297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3045337Z with policy(): 2025-12-04T10:58:28.3045558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3045602Z raise RuntimeError(msg) 2025-12-04T10:58:28.3046037Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3046040Z 2025-12-04T10:58:28.3046121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3046439Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3046442Z 2025-12-04T10:58:28.3046538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3046623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3046686Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3046883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3046962Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3047001Z graph_break [] 2025-12-04T10:58:28.3047165Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3047214Z Traceback (most recent call last): 2025-12-04T10:58:28.3047384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3047427Z method(*args, **kwargs) 2025-12-04T10:58:28.3047593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3047636Z method(*args, **kwargs) 2025-12-04T10:58:28.3047800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3047839Z with policy(): 2025-12-04T10:58:28.3048005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3048049Z raise RuntimeError(msg) 2025-12-04T10:58:28.3048511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3048516Z 2025-12-04T10:58:28.3048596Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3048910Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3048912Z 2025-12-04T10:58:28.3049008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3049089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3049151Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3049342Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3049423Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3049463Z graph_break [] 2025-12-04T10:58:28.3049569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3049628Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3049706Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3049897Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3049936Z graph_break [] 2025-12-04T10:58:28.3049993Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3050156Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3050205Z Traceback (most recent call last): 2025-12-04T10:58:28.3050375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3050419Z method(*args, **kwargs) 2025-12-04T10:58:28.3050583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3050626Z method(*args, **kwargs) 2025-12-04T10:58:28.3050790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3050829Z with policy(): 2025-12-04T10:58:28.3050995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3051039Z raise RuntimeError(msg) 2025-12-04T10:58:28.3051484Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3051487Z 2025-12-04T10:58:28.3051568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3051882Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3051884Z 2025-12-04T10:58:28.3051979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3052058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3052118Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3052329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3052408Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3052449Z graph_break [] 2025-12-04T10:58:28.3052528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3052586Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3052665Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3052853Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3052893Z graph_break [] 2025-12-04T10:58:28.3052972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3053032Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3053108Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3053348Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3053419Z graph_break [] 2025-12-04T10:58:28.3053688Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-19e1e77dc9f56de8.xml - 2025-12-04T10:58:28.3053753Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3054454Z FAILED [0.5878s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3054457Z 2025-12-04T10:58:28.3054538Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3054857Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3054860Z 2025-12-04T10:58:28.3054955Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3055022Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3055094Z ================== 1 failed, 57 deselected, 2 rerun in 4.29s =================== 2025-12-04T10:58:28.3055134Z Got exit code 1 2025-12-04T10:58:28.3055399Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3055540Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3055758Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f94c0c277428e00.xml 2025-12-04T10:58:28.3055821Z ============================= test session starts ============================== 2025-12-04T10:58:28.3055943Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3055988Z cachedir: .pytest_cache 2025-12-04T10:58:28.3056161Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3056211Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3056254Z configfile: pytest.ini 2025-12-04T10:58:28.3056466Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3056548Z collecting ... collected 58 items / 15 deselected / 43 selected 2025-12-04T10:58:28.3056607Z stepcurrent: skipping 15 already run items. 2025-12-04T10:58:28.3056654Z Running 43 items in this shard 2025-12-04T10:58:28.3056656Z 2025-12-04T10:58:28.3056930Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.9803s] [ 2%] 2025-12-04T10:58:28.3057198Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6689s] [ 2%] 2025-12-04T10:58:28.3057442Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.7168s] [ 2%] 2025-12-04T10:58:28.3057444Z 2025-12-04T10:58:28.3057502Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3057694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3057744Z Traceback (most recent call last): 2025-12-04T10:58:28.3057915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3057958Z method(*args, **kwargs) 2025-12-04T10:58:28.3058123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3058167Z method(*args, **kwargs) 2025-12-04T10:58:28.3058330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3058371Z with policy(): 2025-12-04T10:58:28.3060961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3061008Z raise RuntimeError(msg) 2025-12-04T10:58:28.3061447Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3061450Z 2025-12-04T10:58:28.3061530Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3061847Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3061849Z 2025-12-04T10:58:28.3061945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3062045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3062108Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3062410Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3062491Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3062530Z graph_break [] 2025-12-04T10:58:28.3062696Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3062745Z Traceback (most recent call last): 2025-12-04T10:58:28.3062914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3062993Z method(*args, **kwargs) 2025-12-04T10:58:28.3063160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3063206Z method(*args, **kwargs) 2025-12-04T10:58:28.3063410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3063449Z with policy(): 2025-12-04T10:58:28.3063617Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3063661Z raise RuntimeError(msg) 2025-12-04T10:58:28.3064103Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3064107Z 2025-12-04T10:58:28.3064190Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3064529Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3064531Z 2025-12-04T10:58:28.3064626Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3064705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3064767Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3065064Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3065213Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3065252Z graph_break [] 2025-12-04T10:58:28.3065332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3065394Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3065472Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3065765Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3065805Z graph_break [] 2025-12-04T10:58:28.3065861Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3066027Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3066077Z Traceback (most recent call last): 2025-12-04T10:58:28.3066249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3066296Z method(*args, **kwargs) 2025-12-04T10:58:28.3066461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3066504Z method(*args, **kwargs) 2025-12-04T10:58:28.3066668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3066709Z with policy(): 2025-12-04T10:58:28.3066876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3066921Z raise RuntimeError(msg) 2025-12-04T10:58:28.3067394Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3067400Z 2025-12-04T10:58:28.3067481Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3067797Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3067799Z 2025-12-04T10:58:28.3067893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3067973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3068033Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3068333Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3068430Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3068470Z graph_break [] 2025-12-04T10:58:28.3068549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3068608Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3068686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3068978Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3069018Z graph_break [] 2025-12-04T10:58:28.3069097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3069175Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3069254Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3069547Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3069587Z graph_break [] 2025-12-04T10:58:28.3069853Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2f94c0c277428e00.xml - 2025-12-04T10:58:28.3069919Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3070612Z FAILED [0.7168s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3070617Z 2025-12-04T10:58:28.3070696Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3071015Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3071017Z 2025-12-04T10:58:28.3071110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3071178Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3071274Z ================== 1 failed, 15 deselected, 2 rerun in 4.53s =================== 2025-12-04T10:58:28.3071317Z Got exit code 1 2025-12-04T10:58:28.3071360Z Retrying single test... 2025-12-04T10:58:28.3071578Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-402efefb310b0cbd.xml 2025-12-04T10:58:28.3071639Z ============================= test session starts ============================== 2025-12-04T10:58:28.3071759Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3071803Z cachedir: .pytest_cache 2025-12-04T10:58:28.3071975Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3072025Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3072069Z configfile: pytest.ini 2025-12-04T10:58:28.3072245Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3072326Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3072665Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3072714Z Running 1 items in this shard 2025-12-04T10:58:28.3072716Z 2025-12-04T10:58:28.3073112Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:11.867700603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3073116Z 2025-12-04T10:58:28.3073317Z [W1204 10:35:11.149473993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3073341Z 2025-12-04T10:58:28.3073511Z [W1204 10:35:11.149649581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3073515Z 2025-12-04T10:58:28.3073678Z [W1204 10:35:11.153290585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3073680Z 2025-12-04T10:58:28.3073842Z [W1204 10:35:11.153610040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3073845Z 2025-12-04T10:58:28.3074006Z [W1204 10:35:11.153671099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3074010Z 2025-12-04T10:58:28.3074171Z [W1204 10:35:11.156047522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3074173Z 2025-12-04T10:58:28.3074340Z [W1204 10:35:11.156331368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3074343Z 2025-12-04T10:58:28.3074505Z [W1204 10:35:11.156391957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3074507Z 2025-12-04T10:58:28.3074561Z ('RERUN', {'yellow': True}) [3.2972s] [100%] 2025-12-04T10:58:28.3074955Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:12.983054677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3074958Z 2025-12-04T10:58:28.3075121Z [W1204 10:35:12.983449190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075124Z 2025-12-04T10:58:28.3075314Z [W1204 10:35:12.983519949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075317Z 2025-12-04T10:58:28.3075479Z [W1204 10:35:12.984808999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075481Z 2025-12-04T10:58:28.3075644Z [W1204 10:35:12.985075585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075645Z 2025-12-04T10:58:28.3075808Z [W1204 10:35:12.985140964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075810Z 2025-12-04T10:58:28.3075972Z [W1204 10:35:12.987275051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3075974Z 2025-12-04T10:58:28.3076137Z [W1204 10:35:12.987545767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3076159Z 2025-12-04T10:58:28.3076320Z [W1204 10:35:12.987612056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3076322Z 2025-12-04T10:58:28.3076375Z ('RERUN', {'yellow': True}) [0.6892s] [100%] 2025-12-04T10:58:28.3076765Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:13.663054765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3076768Z 2025-12-04T10:58:28.3076930Z [W1204 10:35:13.663452349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3076932Z 2025-12-04T10:58:28.3077109Z [W1204 10:35:13.663523377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077112Z 2025-12-04T10:58:28.3077273Z [W1204 10:35:13.664802708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077275Z 2025-12-04T10:58:28.3077440Z [W1204 10:35:13.665068784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077442Z 2025-12-04T10:58:28.3077604Z [W1204 10:35:13.665129653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077605Z 2025-12-04T10:58:28.3077767Z [W1204 10:35:13.667224910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077768Z 2025-12-04T10:58:28.3077930Z [W1204 10:35:13.667495756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3077934Z 2025-12-04T10:58:28.3078099Z [W1204 10:35:13.667555725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3078101Z 2025-12-04T10:58:28.3078142Z FAILED [0.6683s] [100%] 2025-12-04T10:58:28.3078144Z 2025-12-04T10:58:28.3078200Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3078363Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3078414Z Traceback (most recent call last): 2025-12-04T10:58:28.3078584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3078628Z method(*args, **kwargs) 2025-12-04T10:58:28.3078816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3078862Z method(*args, **kwargs) 2025-12-04T10:58:28.3079029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3079069Z with policy(): 2025-12-04T10:58:28.3079237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3079281Z raise RuntimeError(msg) 2025-12-04T10:58:28.3079715Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3079718Z 2025-12-04T10:58:28.3079798Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3080118Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3080135Z 2025-12-04T10:58:28.3080230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3080312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3080373Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3080674Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3080754Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3080793Z graph_break [] 2025-12-04T10:58:28.3080973Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3081023Z Traceback (most recent call last): 2025-12-04T10:58:28.3081191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3081234Z method(*args, **kwargs) 2025-12-04T10:58:28.3081399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3081442Z method(*args, **kwargs) 2025-12-04T10:58:28.3081606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3081645Z with policy(): 2025-12-04T10:58:28.3081811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3081856Z raise RuntimeError(msg) 2025-12-04T10:58:28.3082298Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3082301Z 2025-12-04T10:58:28.3082380Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3082697Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3082699Z 2025-12-04T10:58:28.3082794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3082874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3082958Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3083316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3083397Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3083436Z graph_break [] 2025-12-04T10:58:28.3083516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3083576Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3083654Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3083948Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3083989Z graph_break [] 2025-12-04T10:58:28.3084045Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3084227Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3084275Z Traceback (most recent call last): 2025-12-04T10:58:28.3084443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3084486Z method(*args, **kwargs) 2025-12-04T10:58:28.3084651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3084693Z method(*args, **kwargs) 2025-12-04T10:58:28.3084859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3084918Z with policy(): 2025-12-04T10:58:28.3085086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3085131Z raise RuntimeError(msg) 2025-12-04T10:58:28.3085576Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3085579Z 2025-12-04T10:58:28.3085658Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3085973Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3085975Z 2025-12-04T10:58:28.3086072Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3086151Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3086213Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3086510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3086592Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3086631Z graph_break [] 2025-12-04T10:58:28.3086711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3086770Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3086848Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3087178Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3087219Z graph_break [] 2025-12-04T10:58:28.3087298Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3087358Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3087435Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3087734Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3087773Z graph_break [] 2025-12-04T10:58:28.3088041Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-402efefb310b0cbd.xml - 2025-12-04T10:58:28.3088106Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3088818Z FAILED [0.6683s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3088821Z 2025-12-04T10:58:28.3088900Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3089217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3089233Z 2025-12-04T10:58:28.3089328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3089396Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3089468Z ================== 1 failed, 57 deselected, 2 rerun in 4.82s =================== 2025-12-04T10:58:28.3089508Z Got exit code 1 2025-12-04T10:58:28.3089552Z Retrying single test... 2025-12-04T10:58:28.3089768Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dcadfc54b2f62fba.xml 2025-12-04T10:58:28.3089831Z ============================= test session starts ============================== 2025-12-04T10:58:28.3089951Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3089996Z cachedir: .pytest_cache 2025-12-04T10:58:28.3090173Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3090224Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3090269Z configfile: pytest.ini 2025-12-04T10:58:28.3090444Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3090524Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3090838Z stepcurrent: skipping 15 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3090886Z Running 1 items in this shard 2025-12-04T10:58:28.3090889Z 2025-12-04T10:58:28.3091304Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:23.752497240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3091308Z 2025-12-04T10:58:28.3091477Z [W1204 10:35:23.044296167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3091479Z 2025-12-04T10:58:28.3091644Z [W1204 10:35:23.044434615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3091646Z 2025-12-04T10:58:28.3091809Z [W1204 10:35:23.047915761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3091811Z 2025-12-04T10:58:28.3091973Z [W1204 10:35:23.048222837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3091975Z 2025-12-04T10:58:28.3092140Z [W1204 10:35:23.048287356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3092154Z 2025-12-04T10:58:28.3092317Z [W1204 10:35:23.050493862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3092319Z 2025-12-04T10:58:28.3092481Z [W1204 10:35:23.050764758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3092483Z 2025-12-04T10:58:28.3092646Z [W1204 10:35:23.050825317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3092647Z 2025-12-04T10:58:28.3092700Z ('RERUN', {'yellow': True}) [3.2046s] [100%] 2025-12-04T10:58:28.3093094Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:24.679766462 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093108Z 2025-12-04T10:58:28.3093299Z [W1204 10:35:24.680170386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093301Z 2025-12-04T10:58:28.3093464Z [W1204 10:35:24.680245285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093466Z 2025-12-04T10:58:28.3093628Z [W1204 10:35:24.681514376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093630Z 2025-12-04T10:58:28.3093790Z [W1204 10:35:24.681774391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093792Z 2025-12-04T10:58:28.3093955Z [W1204 10:35:24.681834211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3093958Z 2025-12-04T10:58:28.3094121Z [W1204 10:35:24.683891009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3094123Z 2025-12-04T10:58:28.3094284Z [W1204 10:35:24.684162655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3094286Z 2025-12-04T10:58:28.3094447Z [W1204 10:35:24.684225644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3094449Z 2025-12-04T10:58:28.3094500Z ('RERUN', {'yellow': True}) [0.4775s] [100%] 2025-12-04T10:58:28.3094920Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:35:24.146964119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3094925Z 2025-12-04T10:58:28.3095088Z [W1204 10:35:24.147358763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095091Z 2025-12-04T10:58:28.3095252Z [W1204 10:35:24.147432482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095255Z 2025-12-04T10:58:28.3095416Z [W1204 10:35:24.148686202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095418Z 2025-12-04T10:58:28.3095580Z [W1204 10:35:24.148946878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095582Z 2025-12-04T10:58:28.3095746Z [W1204 10:35:24.149012087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095749Z 2025-12-04T10:58:28.3095926Z [W1204 10:35:24.151058396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3095928Z 2025-12-04T10:58:28.3096090Z [W1204 10:35:24.151320492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3096092Z 2025-12-04T10:58:28.3096254Z [W1204 10:35:24.151383221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3096256Z 2025-12-04T10:58:28.3096298Z FAILED [0.4672s] [100%] 2025-12-04T10:58:28.3096300Z 2025-12-04T10:58:28.3096357Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3096523Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3096587Z Traceback (most recent call last): 2025-12-04T10:58:28.3096757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3096804Z method(*args, **kwargs) 2025-12-04T10:58:28.3096969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3097013Z method(*args, **kwargs) 2025-12-04T10:58:28.3097178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3097219Z with policy(): 2025-12-04T10:58:28.3097385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3097430Z raise RuntimeError(msg) 2025-12-04T10:58:28.3097864Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3097868Z 2025-12-04T10:58:28.3097949Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3098267Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3098269Z 2025-12-04T10:58:28.3098364Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3098444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3098505Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3098829Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3098910Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3098950Z graph_break [] 2025-12-04T10:58:28.3099115Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3099164Z Traceback (most recent call last): 2025-12-04T10:58:28.3099330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3099374Z method(*args, **kwargs) 2025-12-04T10:58:28.3099538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3099582Z method(*args, **kwargs) 2025-12-04T10:58:28.3099748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3099803Z with policy(): 2025-12-04T10:58:28.3099968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3100014Z raise RuntimeError(msg) 2025-12-04T10:58:28.3100453Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3100456Z 2025-12-04T10:58:28.3100535Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3100853Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3100869Z 2025-12-04T10:58:28.3100963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3101044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3101104Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3101402Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3101481Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3101521Z graph_break [] 2025-12-04T10:58:28.3101600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3101663Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3101740Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3102037Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3102076Z graph_break [] 2025-12-04T10:58:28.3102133Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3102297Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3102346Z Traceback (most recent call last): 2025-12-04T10:58:28.3102513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3102557Z method(*args, **kwargs) 2025-12-04T10:58:28.3102751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3102796Z method(*args, **kwargs) 2025-12-04T10:58:28.3102959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3102999Z with policy(): 2025-12-04T10:58:28.3103165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3103210Z raise RuntimeError(msg) 2025-12-04T10:58:28.3103698Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3103702Z 2025-12-04T10:58:28.3103782Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3104100Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3104121Z 2025-12-04T10:58:28.3104216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3104297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3104357Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3104654Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3104733Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3104793Z graph_break [] 2025-12-04T10:58:28.3104872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3104933Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3105010Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3105472Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3105513Z graph_break [] 2025-12-04T10:58:28.3105592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3105651Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3105729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3106027Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3106068Z graph_break [] 2025-12-04T10:58:28.3106336Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dcadfc54b2f62fba.xml - 2025-12-04T10:58:28.3106399Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3107138Z FAILED [0.4672s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3107142Z 2025-12-04T10:58:28.3107221Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3107537Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3107539Z 2025-12-04T10:58:28.3107633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3107700Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3107772Z ================== 1 failed, 57 deselected, 2 rerun in 4.32s =================== 2025-12-04T10:58:28.3107812Z Got exit code 1 2025-12-04T10:58:28.3108077Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3108217Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3108448Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9e57a8eb63801910.xml 2025-12-04T10:58:28.3108510Z ============================= test session starts ============================== 2025-12-04T10:58:28.3108631Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3108675Z cachedir: .pytest_cache 2025-12-04T10:58:28.3108849Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3108899Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3108943Z configfile: pytest.ini 2025-12-04T10:58:28.3109118Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3109214Z collecting ... collected 58 items / 16 deselected / 42 selected 2025-12-04T10:58:28.3109273Z stepcurrent: skipping 16 already run items. 2025-12-04T10:58:28.3109321Z Running 42 items in this shard 2025-12-04T10:58:28.3109323Z 2025-12-04T10:58:28.3109601Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5305s] [ 2%] 2025-12-04T10:58:28.3109872Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4840s] [ 2%] 2025-12-04T10:58:28.3110118Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4872s] [ 2%] 2025-12-04T10:58:28.3110123Z 2025-12-04T10:58:28.3110178Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3110345Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3110394Z Traceback (most recent call last): 2025-12-04T10:58:28.3110567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3110612Z method(*args, **kwargs) 2025-12-04T10:58:28.3110780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3110822Z method(*args, **kwargs) 2025-12-04T10:58:28.3110987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3111027Z with policy(): 2025-12-04T10:58:28.3111217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3111263Z raise RuntimeError(msg) 2025-12-04T10:58:28.3111699Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3111701Z 2025-12-04T10:58:28.3111780Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3112101Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3112103Z 2025-12-04T10:58:28.3112202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3112283Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3112360Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3112552Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3112632Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3112671Z graph_break [] 2025-12-04T10:58:28.3112838Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3112887Z Traceback (most recent call last): 2025-12-04T10:58:28.3113054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3113097Z method(*args, **kwargs) 2025-12-04T10:58:28.3113323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3113367Z method(*args, **kwargs) 2025-12-04T10:58:28.3113533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3113572Z with policy(): 2025-12-04T10:58:28.3113739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3113783Z raise RuntimeError(msg) 2025-12-04T10:58:28.3114230Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3114232Z 2025-12-04T10:58:28.3114314Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3114634Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3114638Z 2025-12-04T10:58:28.3114733Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3114812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3114875Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3115069Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3115148Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3115188Z graph_break [] 2025-12-04T10:58:28.3115299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3115358Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3115438Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3115628Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3115669Z graph_break [] 2025-12-04T10:58:28.3115726Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3115892Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3115942Z Traceback (most recent call last): 2025-12-04T10:58:28.3116111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3116155Z method(*args, **kwargs) 2025-12-04T10:58:28.3116325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3116386Z method(*args, **kwargs) 2025-12-04T10:58:28.3116551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3116590Z with policy(): 2025-12-04T10:58:28.3116758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3116803Z raise RuntimeError(msg) 2025-12-04T10:58:28.3117249Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3117274Z 2025-12-04T10:58:28.3117356Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3117676Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3117680Z 2025-12-04T10:58:28.3117775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3117856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3117917Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3118108Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3118188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3118228Z graph_break [] 2025-12-04T10:58:28.3118309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3118369Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3118448Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3118639Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3118678Z graph_break [] 2025-12-04T10:58:28.3118757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3118818Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3118896Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3119089Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3119153Z graph_break [] 2025-12-04T10:58:28.3119419Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9e57a8eb63801910.xml - 2025-12-04T10:58:28.3119485Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3120186Z FAILED [0.4872s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3120188Z 2025-12-04T10:58:28.3120270Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3120591Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3120608Z 2025-12-04T10:58:28.3120703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3120771Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3120843Z ================== 1 failed, 16 deselected, 2 rerun in 3.67s =================== 2025-12-04T10:58:28.3120883Z Got exit code 1 2025-12-04T10:58:28.3120929Z Retrying single test... 2025-12-04T10:58:28.3121145Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-07bc3678361ab51d.xml 2025-12-04T10:58:28.3121208Z ============================= test session starts ============================== 2025-12-04T10:58:28.3121343Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3121390Z cachedir: .pytest_cache 2025-12-04T10:58:28.3121563Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3121614Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3121657Z configfile: pytest.ini 2025-12-04T10:58:28.3121833Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3121913Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3122228Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3122278Z Running 1 items in this shard 2025-12-04T10:58:28.3122282Z 2025-12-04T10:58:28.3122680Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:43.957576026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3122684Z 2025-12-04T10:58:28.3122853Z [W1204 10:35:43.232756590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3122855Z 2025-12-04T10:58:28.3123021Z [W1204 10:35:43.232911998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123024Z 2025-12-04T10:58:28.3123188Z [W1204 10:35:43.235929031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123190Z 2025-12-04T10:58:28.3123426Z [W1204 10:35:43.236240476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123429Z 2025-12-04T10:58:28.3123593Z [W1204 10:35:43.236303185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123595Z 2025-12-04T10:58:28.3123759Z [W1204 10:35:43.238387323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123761Z 2025-12-04T10:58:28.3123923Z [W1204 10:35:43.238653449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3123925Z 2025-12-04T10:58:28.3124088Z [W1204 10:35:43.238714218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3124090Z 2025-12-04T10:58:28.3124143Z ('RERUN', {'yellow': True}) [2.7815s] [100%] 2025-12-04T10:58:28.3124545Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:44.200345995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3124564Z 2025-12-04T10:58:28.3124730Z [W1204 10:35:44.200719009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3124732Z 2025-12-04T10:58:28.3124894Z [W1204 10:35:44.200796478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3124896Z 2025-12-04T10:58:28.3125059Z [W1204 10:35:44.202060099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125061Z 2025-12-04T10:58:28.3125241Z [W1204 10:35:44.202329325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125244Z 2025-12-04T10:58:28.3125407Z [W1204 10:35:44.202390234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125409Z 2025-12-04T10:58:28.3125572Z [W1204 10:35:44.204360453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125574Z 2025-12-04T10:58:28.3125735Z [W1204 10:35:44.204703828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125737Z 2025-12-04T10:58:28.3125899Z [W1204 10:35:44.204767597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3125901Z 2025-12-04T10:58:28.3125953Z ('RERUN', {'yellow': True}) [0.4805s] [100%] 2025-12-04T10:58:28.3126350Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:45.692399611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3126354Z 2025-12-04T10:58:28.3126517Z [W1204 10:35:45.692757485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3126519Z 2025-12-04T10:58:28.3126687Z [W1204 10:35:45.692821964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3126689Z 2025-12-04T10:58:28.3126854Z [W1204 10:35:45.694059535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3126856Z 2025-12-04T10:58:28.3127041Z [W1204 10:35:45.694311591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3127044Z 2025-12-04T10:58:28.3127208Z [W1204 10:35:45.694370450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3127210Z 2025-12-04T10:58:28.3127371Z [W1204 10:35:45.696325550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3127373Z 2025-12-04T10:58:28.3127536Z [W1204 10:35:45.696661235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3127537Z 2025-12-04T10:58:28.3127699Z [W1204 10:35:45.696724044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3127701Z 2025-12-04T10:58:28.3127743Z FAILED [0.4643s] [100%] 2025-12-04T10:58:28.3127746Z 2025-12-04T10:58:28.3127805Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3127972Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3128035Z Traceback (most recent call last): 2025-12-04T10:58:28.3128204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3128249Z method(*args, **kwargs) 2025-12-04T10:58:28.3128415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3128459Z method(*args, **kwargs) 2025-12-04T10:58:28.3128623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3128664Z with policy(): 2025-12-04T10:58:28.3128833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3128900Z raise RuntimeError(msg) 2025-12-04T10:58:28.3129341Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3129343Z 2025-12-04T10:58:28.3129423Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3129742Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3129746Z 2025-12-04T10:58:28.3129840Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3129924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3129985Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3130181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3130260Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3130300Z graph_break [] 2025-12-04T10:58:28.3130465Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3130515Z Traceback (most recent call last): 2025-12-04T10:58:28.3130681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3130725Z method(*args, **kwargs) 2025-12-04T10:58:28.3130911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3130957Z method(*args, **kwargs) 2025-12-04T10:58:28.3131122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3131162Z with policy(): 2025-12-04T10:58:28.3131328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3131372Z raise RuntimeError(msg) 2025-12-04T10:58:28.3131818Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3131821Z 2025-12-04T10:58:28.3131901Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3132220Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3132244Z 2025-12-04T10:58:28.3132339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3132421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3132481Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3132674Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3132753Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3132793Z graph_break [] 2025-12-04T10:58:28.3132875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3132950Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3133030Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3133221Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3133296Z graph_break [] 2025-12-04T10:58:28.3133355Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3133522Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3133571Z Traceback (most recent call last): 2025-12-04T10:58:28.3133739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3133783Z method(*args, **kwargs) 2025-12-04T10:58:28.3133951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3133995Z method(*args, **kwargs) 2025-12-04T10:58:28.3134158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3134198Z with policy(): 2025-12-04T10:58:28.3134363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3134408Z raise RuntimeError(msg) 2025-12-04T10:58:28.3134850Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3134854Z 2025-12-04T10:58:28.3134963Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3135280Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3135284Z 2025-12-04T10:58:28.3135378Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3135459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3135519Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3135712Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3135791Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3135831Z graph_break [] 2025-12-04T10:58:28.3135912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3135972Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3136067Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3136259Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3136298Z graph_break [] 2025-12-04T10:58:28.3136377Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3136436Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3136514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3136702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3136758Z graph_break [] 2025-12-04T10:58:28.3137025Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-07bc3678361ab51d.xml - 2025-12-04T10:58:28.3137091Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3137796Z FAILED [0.4643s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3137798Z 2025-12-04T10:58:28.3137877Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3138196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3138199Z 2025-12-04T10:58:28.3138293Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3138361Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3138432Z ================== 1 failed, 57 deselected, 2 rerun in 3.89s =================== 2025-12-04T10:58:28.3138473Z Got exit code 1 2025-12-04T10:58:28.3138516Z Retrying single test... 2025-12-04T10:58:28.3138735Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ccbc31c1b1905b1.xml 2025-12-04T10:58:28.3138796Z ============================= test session starts ============================== 2025-12-04T10:58:28.3138940Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3138985Z cachedir: .pytest_cache 2025-12-04T10:58:28.3139158Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3139207Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3139252Z configfile: pytest.ini 2025-12-04T10:58:28.3139427Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3139507Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3139823Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3139872Z Running 1 items in this shard 2025-12-04T10:58:28.3139875Z 2025-12-04T10:58:28.3140279Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:53.249415474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3140297Z 2025-12-04T10:58:28.3140464Z [W1204 10:35:54.513095075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3140466Z 2025-12-04T10:58:28.3140634Z [W1204 10:35:54.513217573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3140636Z 2025-12-04T10:58:28.3140801Z [W1204 10:35:54.516109649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3140803Z 2025-12-04T10:58:28.3140970Z [W1204 10:35:54.516409584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3140987Z 2025-12-04T10:58:28.3141151Z [W1204 10:35:54.516470733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3141153Z 2025-12-04T10:58:28.3141315Z [W1204 10:35:54.518564221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3141317Z 2025-12-04T10:58:28.3141480Z [W1204 10:35:54.518832987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3141482Z 2025-12-04T10:58:28.3141643Z [W1204 10:35:54.518893296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3141645Z 2025-12-04T10:58:28.3141698Z ('RERUN', {'yellow': True}) [2.7791s] [100%] 2025-12-04T10:58:28.3142097Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:55.495414055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142101Z 2025-12-04T10:58:28.3142264Z [W1204 10:35:55.495780420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142266Z 2025-12-04T10:58:28.3142428Z [W1204 10:35:55.495844049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142430Z 2025-12-04T10:58:28.3142592Z [W1204 10:35:55.497110029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142594Z 2025-12-04T10:58:28.3142778Z [W1204 10:35:55.497370525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142781Z 2025-12-04T10:58:28.3142943Z [W1204 10:35:55.497431924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3142946Z 2025-12-04T10:58:28.3143110Z [W1204 10:35:55.499450053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3143112Z 2025-12-04T10:58:28.3143323Z [W1204 10:35:55.499787648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3143325Z 2025-12-04T10:58:28.3143486Z [W1204 10:35:55.499850107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3143488Z 2025-12-04T10:58:28.3143541Z ('RERUN', {'yellow': True}) [0.4827s] [100%] 2025-12-04T10:58:28.3143938Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:35:55.959614910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3143958Z 2025-12-04T10:58:28.3144122Z [W1204 10:35:55.959981625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144124Z 2025-12-04T10:58:28.3144286Z [W1204 10:35:55.960056064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144288Z 2025-12-04T10:58:28.3144450Z [W1204 10:35:55.961309744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144452Z 2025-12-04T10:58:28.3144615Z [W1204 10:35:55.961566511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144635Z 2025-12-04T10:58:28.3144797Z [W1204 10:35:55.961625870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144801Z 2025-12-04T10:58:28.3144964Z [W1204 10:35:55.963543710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3144966Z 2025-12-04T10:58:28.3145127Z [W1204 10:35:55.963876635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3145130Z 2025-12-04T10:58:28.3145293Z [W1204 10:35:55.963939234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3145295Z 2025-12-04T10:58:28.3145338Z FAILED [0.4508s] [100%] 2025-12-04T10:58:28.3145340Z 2025-12-04T10:58:28.3145397Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3145568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3145618Z Traceback (most recent call last): 2025-12-04T10:58:28.3145790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3145834Z method(*args, **kwargs) 2025-12-04T10:58:28.3146002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3146045Z method(*args, **kwargs) 2025-12-04T10:58:28.3146212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3146252Z with policy(): 2025-12-04T10:58:28.3146454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3146499Z raise RuntimeError(msg) 2025-12-04T10:58:28.3146937Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3146941Z 2025-12-04T10:58:28.3147020Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3147339Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3147342Z 2025-12-04T10:58:28.3147437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3147517Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3147581Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3147787Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3147868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3147908Z graph_break [] 2025-12-04T10:58:28.3148076Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3148126Z Traceback (most recent call last): 2025-12-04T10:58:28.3148295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3148338Z method(*args, **kwargs) 2025-12-04T10:58:28.3148504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3148561Z method(*args, **kwargs) 2025-12-04T10:58:28.3148725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3148766Z with policy(): 2025-12-04T10:58:28.3148932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3148975Z raise RuntimeError(msg) 2025-12-04T10:58:28.3149420Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3149423Z 2025-12-04T10:58:28.3149502Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3149826Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3149828Z 2025-12-04T10:58:28.3149924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3150003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3150066Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3150257Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3150337Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3150376Z graph_break [] 2025-12-04T10:58:28.3150457Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3150540Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3150619Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3150811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3150850Z graph_break [] 2025-12-04T10:58:28.3150907Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3151074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3151122Z Traceback (most recent call last): 2025-12-04T10:58:28.3151290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3151333Z method(*args, **kwargs) 2025-12-04T10:58:28.3151500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3151544Z method(*args, **kwargs) 2025-12-04T10:58:28.3151720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3151760Z with policy(): 2025-12-04T10:58:28.3151927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3151971Z raise RuntimeError(msg) 2025-12-04T10:58:28.3152421Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3152423Z 2025-12-04T10:58:28.3152503Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3152840Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3152843Z 2025-12-04T10:58:28.3152939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3153018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3153080Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3153308Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3153389Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3153428Z graph_break [] 2025-12-04T10:58:28.3153510Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3153570Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3153650Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3153840Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3153879Z graph_break [] 2025-12-04T10:58:28.3153958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3154017Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3154095Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3154285Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3154325Z graph_break [] 2025-12-04T10:58:28.3154624Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4ccbc31c1b1905b1.xml - 2025-12-04T10:58:28.3154690Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3155397Z FAILED [0.4508s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3155399Z 2025-12-04T10:58:28.3155478Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3155798Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3155816Z 2025-12-04T10:58:28.3155911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3155977Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3156050Z ================== 1 failed, 57 deselected, 2 rerun in 3.87s =================== 2025-12-04T10:58:28.3156089Z Got exit code 1 2025-12-04T10:58:28.3156355Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3156494Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3156712Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2cee415d280e1a84.xml 2025-12-04T10:58:28.3156791Z ============================= test session starts ============================== 2025-12-04T10:58:28.3156914Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3156959Z cachedir: .pytest_cache 2025-12-04T10:58:28.3157136Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3157186Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3157229Z configfile: pytest.ini 2025-12-04T10:58:28.3157406Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3157487Z collecting ... collected 58 items / 17 deselected / 41 selected 2025-12-04T10:58:28.3157546Z stepcurrent: skipping 17 already run items. 2025-12-04T10:58:28.3157596Z Running 41 items in this shard 2025-12-04T10:58:28.3157598Z 2025-12-04T10:58:28.3157873Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.3990s] [ 2%] 2025-12-04T10:58:28.3158144Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4301s] [ 2%] 2025-12-04T10:58:28.3158387Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4456s] [ 2%] 2025-12-04T10:58:28.3158390Z 2025-12-04T10:58:28.3158446Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3158611Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3158682Z Traceback (most recent call last): 2025-12-04T10:58:28.3158857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3158902Z method(*args, **kwargs) 2025-12-04T10:58:28.3159069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3159112Z method(*args, **kwargs) 2025-12-04T10:58:28.3159278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3159317Z with policy(): 2025-12-04T10:58:28.3159486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3159530Z raise RuntimeError(msg) 2025-12-04T10:58:28.3159967Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3159983Z 2025-12-04T10:58:28.3160063Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3160379Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3160382Z 2025-12-04T10:58:28.3160477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3160557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3160618Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3160812Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3160905Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3160944Z graph_break [] 2025-12-04T10:58:28.3161108Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3161157Z Traceback (most recent call last): 2025-12-04T10:58:28.3161327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3161370Z method(*args, **kwargs) 2025-12-04T10:58:28.3161541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3161583Z method(*args, **kwargs) 2025-12-04T10:58:28.3161752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3161792Z with policy(): 2025-12-04T10:58:28.3161961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3162006Z raise RuntimeError(msg) 2025-12-04T10:58:28.3162445Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3162447Z 2025-12-04T10:58:28.3162528Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3162872Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3162876Z 2025-12-04T10:58:28.3162972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3163052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3163114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3163349Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3163430Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3163469Z graph_break [] 2025-12-04T10:58:28.3163549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3163609Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3163688Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3163884Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3163941Z graph_break [] 2025-12-04T10:58:28.3163998Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3164165Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3164214Z Traceback (most recent call last): 2025-12-04T10:58:28.3164383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3164426Z method(*args, **kwargs) 2025-12-04T10:58:28.3164593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3164635Z method(*args, **kwargs) 2025-12-04T10:58:28.3164802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3164859Z with policy(): 2025-12-04T10:58:28.3165029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3165073Z raise RuntimeError(msg) 2025-12-04T10:58:28.3165512Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3165515Z 2025-12-04T10:58:28.3165595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3165912Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3165915Z 2025-12-04T10:58:28.3166012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3166092Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3166153Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3166346Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3166425Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3166464Z graph_break [] 2025-12-04T10:58:28.3166544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3166603Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3166681Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3166903Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3166944Z graph_break [] 2025-12-04T10:58:28.3167022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3167083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3167160Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3167350Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3167390Z graph_break [] 2025-12-04T10:58:28.3167655Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2cee415d280e1a84.xml - 2025-12-04T10:58:28.3167723Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3168419Z FAILED [0.4456s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3168435Z 2025-12-04T10:58:28.3168516Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3168835Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3168851Z 2025-12-04T10:58:28.3168948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3169016Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3169089Z ================== 1 failed, 17 deselected, 2 rerun in 3.44s =================== 2025-12-04T10:58:28.3169130Z Got exit code 1 2025-12-04T10:58:28.3169173Z Retrying single test... 2025-12-04T10:58:28.3169388Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f462c38f99129622.xml 2025-12-04T10:58:28.3169449Z ============================= test session starts ============================== 2025-12-04T10:58:28.3169570Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3169614Z cachedir: .pytest_cache 2025-12-04T10:58:28.3169789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3169839Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3169884Z configfile: pytest.ini 2025-12-04T10:58:28.3170059Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3170141Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3170452Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3170501Z Running 1 items in this shard 2025-12-04T10:58:28.3170503Z 2025-12-04T10:58:28.3170923Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:14.580066350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3170927Z 2025-12-04T10:58:28.3171099Z [W1204 10:36:14.850541358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171102Z 2025-12-04T10:58:28.3171270Z [W1204 10:36:14.850692325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171272Z 2025-12-04T10:58:28.3171438Z [W1204 10:36:14.854319960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171440Z 2025-12-04T10:58:28.3171604Z [W1204 10:36:14.854625885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171606Z 2025-12-04T10:58:28.3171768Z [W1204 10:36:14.854686794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171773Z 2025-12-04T10:58:28.3171936Z [W1204 10:36:14.856781292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3171949Z 2025-12-04T10:58:28.3172112Z [W1204 10:36:14.857054657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3172115Z 2025-12-04T10:58:28.3172277Z [W1204 10:36:14.857115816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3172279Z 2025-12-04T10:58:28.3172332Z ('RERUN', {'yellow': True}) [2.8414s] [100%] 2025-12-04T10:58:28.3172727Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:15.920888457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3172741Z 2025-12-04T10:58:28.3172906Z [W1204 10:36:15.921292381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3172910Z 2025-12-04T10:58:28.3173072Z [W1204 10:36:15.921359060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173075Z 2025-12-04T10:58:28.3173239Z [W1204 10:36:15.922692809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173241Z 2025-12-04T10:58:28.3173454Z [W1204 10:36:15.922949005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173456Z 2025-12-04T10:58:28.3173618Z [W1204 10:36:15.923014044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173623Z 2025-12-04T10:58:28.3173787Z [W1204 10:36:15.924966674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173790Z 2025-12-04T10:58:28.3173953Z [W1204 10:36:15.925312449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3173956Z 2025-12-04T10:58:28.3174119Z [W1204 10:36:15.925377878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3174121Z 2025-12-04T10:58:28.3174173Z ('RERUN', {'yellow': True}) [0.5775s] [100%] 2025-12-04T10:58:28.3174565Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:16.494740796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3174602Z 2025-12-04T10:58:28.3174767Z [W1204 10:36:16.495130670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3174771Z 2025-12-04T10:58:28.3174932Z [W1204 10:36:16.495198689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3174934Z 2025-12-04T10:58:28.3175096Z [W1204 10:36:16.496444330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175098Z 2025-12-04T10:58:28.3175258Z [W1204 10:36:16.496699656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175260Z 2025-12-04T10:58:28.3175423Z [W1204 10:36:16.496761465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175426Z 2025-12-04T10:58:28.3175590Z [W1204 10:36:16.498730485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175608Z 2025-12-04T10:58:28.3175772Z [W1204 10:36:16.499070390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175775Z 2025-12-04T10:58:28.3175938Z [W1204 10:36:16.499134109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3175940Z 2025-12-04T10:58:28.3175981Z FAILED [0.5756s] [100%] 2025-12-04T10:58:28.3175984Z 2025-12-04T10:58:28.3176040Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3176206Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3176256Z Traceback (most recent call last): 2025-12-04T10:58:28.3176455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3176502Z method(*args, **kwargs) 2025-12-04T10:58:28.3176668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3176712Z method(*args, **kwargs) 2025-12-04T10:58:28.3176880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3176921Z with policy(): 2025-12-04T10:58:28.3177090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3177135Z raise RuntimeError(msg) 2025-12-04T10:58:28.3177566Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3177572Z 2025-12-04T10:58:28.3177652Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3177971Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3177974Z 2025-12-04T10:58:28.3178069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3178148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3178209Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3178426Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3178508Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3178549Z graph_break [] 2025-12-04T10:58:28.3178714Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3178764Z Traceback (most recent call last): 2025-12-04T10:58:28.3178931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3178975Z method(*args, **kwargs) 2025-12-04T10:58:28.3179139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3179183Z method(*args, **kwargs) 2025-12-04T10:58:28.3179348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3179390Z with policy(): 2025-12-04T10:58:28.3179561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3179618Z raise RuntimeError(msg) 2025-12-04T10:58:28.3180055Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3180059Z 2025-12-04T10:58:28.3180138Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3180453Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3180456Z 2025-12-04T10:58:28.3180565Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3180646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3180708Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3180902Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3180981Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3181021Z graph_break [] 2025-12-04T10:58:28.3181100Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3181160Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3181238Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3181433Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3181473Z graph_break [] 2025-12-04T10:58:28.3181532Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3181697Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3181747Z Traceback (most recent call last): 2025-12-04T10:58:28.3181914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3181958Z method(*args, **kwargs) 2025-12-04T10:58:28.3182122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3182166Z method(*args, **kwargs) 2025-12-04T10:58:28.3182329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3182393Z with policy(): 2025-12-04T10:58:28.3182560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3182607Z raise RuntimeError(msg) 2025-12-04T10:58:28.3183047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3183049Z 2025-12-04T10:58:28.3183128Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3183492Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3183496Z 2025-12-04T10:58:28.3183593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3183691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3183751Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3183943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3184022Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3184062Z graph_break [] 2025-12-04T10:58:28.3184141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3184201Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3184278Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3184471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3184525Z graph_break [] 2025-12-04T10:58:28.3184605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3184663Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3184741Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3184931Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3184970Z graph_break [] 2025-12-04T10:58:28.3185236Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f462c38f99129622.xml - 2025-12-04T10:58:28.3185300Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3185989Z FAILED [0.5756s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3185993Z 2025-12-04T10:58:28.3186072Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3186386Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3186389Z 2025-12-04T10:58:28.3186483Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3186579Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3186654Z ================== 1 failed, 57 deselected, 2 rerun in 4.14s =================== 2025-12-04T10:58:28.3186694Z Got exit code 1 2025-12-04T10:58:28.3186737Z Retrying single test... 2025-12-04T10:58:28.3186953Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b112056dac8b41c.xml 2025-12-04T10:58:28.3187016Z ============================= test session starts ============================== 2025-12-04T10:58:28.3187135Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3187179Z cachedir: .pytest_cache 2025-12-04T10:58:28.3187352Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3187402Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3187449Z configfile: pytest.ini 2025-12-04T10:58:28.3187625Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3187718Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3188034Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3188083Z Running 1 items in this shard 2025-12-04T10:58:28.3188085Z 2025-12-04T10:58:28.3188480Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:25.735212190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3188497Z 2025-12-04T10:58:28.3188667Z [W1204 10:36:25.013927991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3188671Z 2025-12-04T10:58:28.3188839Z [W1204 10:36:25.014059039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3188841Z 2025-12-04T10:58:28.3189008Z [W1204 10:36:25.016960215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189010Z 2025-12-04T10:58:28.3189173Z [W1204 10:36:25.017254230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189175Z 2025-12-04T10:58:28.3189338Z [W1204 10:36:25.017317959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189340Z 2025-12-04T10:58:28.3189505Z [W1204 10:36:25.019521655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189508Z 2025-12-04T10:58:28.3189670Z [W1204 10:36:25.019791341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189672Z 2025-12-04T10:58:28.3189834Z [W1204 10:36:25.019850330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3189836Z 2025-12-04T10:58:28.3189889Z ('RERUN', {'yellow': True}) [2.9071s] [100%] 2025-12-04T10:58:28.3190281Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:26.147140105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3190283Z 2025-12-04T10:58:28.3190470Z [W1204 10:36:26.147542109 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3190473Z 2025-12-04T10:58:28.3190637Z [W1204 10:36:26.147608478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3190639Z 2025-12-04T10:58:28.3190800Z [W1204 10:36:26.148887968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3190802Z 2025-12-04T10:58:28.3190965Z [W1204 10:36:26.149155824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3190967Z 2025-12-04T10:58:28.3191130Z [W1204 10:36:26.149217923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3191132Z 2025-12-04T10:58:28.3191296Z [W1204 10:36:26.151222512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3191299Z 2025-12-04T10:58:28.3191479Z [W1204 10:36:26.151561857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3191481Z 2025-12-04T10:58:28.3191642Z [W1204 10:36:26.151624836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3191644Z 2025-12-04T10:58:28.3191697Z ('RERUN', {'yellow': True}) [0.6138s] [100%] 2025-12-04T10:58:28.3192086Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:36:27.799588477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192090Z 2025-12-04T10:58:28.3192268Z [W1204 10:36:27.799986180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192271Z 2025-12-04T10:58:28.3192435Z [W1204 10:36:27.800055439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192437Z 2025-12-04T10:58:28.3192599Z [W1204 10:36:27.801335640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192601Z 2025-12-04T10:58:28.3192763Z [W1204 10:36:27.801596986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192765Z 2025-12-04T10:58:28.3192928Z [W1204 10:36:27.801656265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3192930Z 2025-12-04T10:58:28.3193094Z [W1204 10:36:27.803676934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3193098Z 2025-12-04T10:58:28.3193335Z [W1204 10:36:27.804034298 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3193337Z 2025-12-04T10:58:28.3193500Z [W1204 10:36:27.804099197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3193502Z 2025-12-04T10:58:28.3193545Z FAILED [0.6765s] [100%] 2025-12-04T10:58:28.3193548Z 2025-12-04T10:58:28.3193604Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3193770Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3193819Z Traceback (most recent call last): 2025-12-04T10:58:28.3194022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3194067Z method(*args, **kwargs) 2025-12-04T10:58:28.3194236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3194279Z method(*args, **kwargs) 2025-12-04T10:58:28.3194444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3194483Z with policy(): 2025-12-04T10:58:28.3194650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3194693Z raise RuntimeError(msg) 2025-12-04T10:58:28.3195126Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3195130Z 2025-12-04T10:58:28.3195214Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3195550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3195552Z 2025-12-04T10:58:28.3195648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3195728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3195791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3195984Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3196081Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3196120Z graph_break [] 2025-12-04T10:58:28.3196286Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3196336Z Traceback (most recent call last): 2025-12-04T10:58:28.3196504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3196547Z method(*args, **kwargs) 2025-12-04T10:58:28.3196712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3196755Z method(*args, **kwargs) 2025-12-04T10:58:28.3196919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3196959Z with policy(): 2025-12-04T10:58:28.3197126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3197170Z raise RuntimeError(msg) 2025-12-04T10:58:28.3197609Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3197612Z 2025-12-04T10:58:28.3197691Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3198009Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3198012Z 2025-12-04T10:58:28.3198106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3198210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3198272Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3198468Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3198549Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3198588Z graph_break [] 2025-12-04T10:58:28.3198669Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3198728Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3198806Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3198997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3199041Z graph_break [] 2025-12-04T10:58:28.3199098Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3199275Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3199324Z Traceback (most recent call last): 2025-12-04T10:58:28.3199492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3199535Z method(*args, **kwargs) 2025-12-04T10:58:28.3199701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3199743Z method(*args, **kwargs) 2025-12-04T10:58:28.3199907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3199946Z with policy(): 2025-12-04T10:58:28.3200126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3200169Z raise RuntimeError(msg) 2025-12-04T10:58:28.3200614Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3200616Z 2025-12-04T10:58:28.3200697Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3201017Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3201020Z 2025-12-04T10:58:28.3201116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3201197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3201260Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3201451Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3201531Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3201570Z graph_break [] 2025-12-04T10:58:28.3201650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3201709Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3201787Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3201998Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3202041Z graph_break [] 2025-12-04T10:58:28.3202119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3202181Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3202259Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3202449Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3202489Z graph_break [] 2025-12-04T10:58:28.3202755Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b112056dac8b41c.xml - 2025-12-04T10:58:28.3202821Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3203564Z FAILED [0.6765s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3203586Z 2025-12-04T10:58:28.3203665Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3203979Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3203982Z 2025-12-04T10:58:28.3204076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3204146Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3204237Z ================== 1 failed, 57 deselected, 2 rerun in 4.34s =================== 2025-12-04T10:58:28.3204278Z Got exit code 1 2025-12-04T10:58:28.3204539Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3204678Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3204891Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c62cf1538c7aae6.xml 2025-12-04T10:58:28.3204954Z ============================= test session starts ============================== 2025-12-04T10:58:28.3205072Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3205117Z cachedir: .pytest_cache 2025-12-04T10:58:28.3205292Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3205343Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3205387Z configfile: pytest.ini 2025-12-04T10:58:28.3205562Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3205643Z collecting ... collected 58 items / 18 deselected / 40 selected 2025-12-04T10:58:28.3205701Z stepcurrent: skipping 18 already run items. 2025-12-04T10:58:28.3205749Z Running 40 items in this shard 2025-12-04T10:58:28.3205751Z 2025-12-04T10:58:28.3206026Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8465s] [ 2%] 2025-12-04T10:58:28.3206332Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4361s] [ 2%] 2025-12-04T10:58:28.3206577Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4356s] [ 2%] 2025-12-04T10:58:28.3206579Z 2025-12-04T10:58:28.3206635Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3206799Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3206849Z Traceback (most recent call last): 2025-12-04T10:58:28.3207022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3207065Z method(*args, **kwargs) 2025-12-04T10:58:28.3207232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3207276Z method(*args, **kwargs) 2025-12-04T10:58:28.3207440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3207493Z with policy(): 2025-12-04T10:58:28.3207659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3207703Z raise RuntimeError(msg) 2025-12-04T10:58:28.3208134Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3208137Z 2025-12-04T10:58:28.3208217Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3208549Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3208552Z 2025-12-04T10:58:28.3208647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3208726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3208787Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3209086Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3209166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3209206Z graph_break [] 2025-12-04T10:58:28.3209373Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3209422Z Traceback (most recent call last): 2025-12-04T10:58:28.3209589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3209633Z method(*args, **kwargs) 2025-12-04T10:58:28.3209796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3209839Z method(*args, **kwargs) 2025-12-04T10:58:28.3210002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3210042Z with policy(): 2025-12-04T10:58:28.3210206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3210251Z raise RuntimeError(msg) 2025-12-04T10:58:28.3210710Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3210713Z 2025-12-04T10:58:28.3210794Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3211106Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3211109Z 2025-12-04T10:58:28.3211203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3211282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3211346Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3211646Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3211737Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3211777Z graph_break [] 2025-12-04T10:58:28.3211856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3211916Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3211993Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3212287Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3212342Z graph_break [] 2025-12-04T10:58:28.3212399Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3212565Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3212615Z Traceback (most recent call last): 2025-12-04T10:58:28.3212783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3212826Z method(*args, **kwargs) 2025-12-04T10:58:28.3212990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3213033Z method(*args, **kwargs) 2025-12-04T10:58:28.3213196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3213236Z with policy(): 2025-12-04T10:58:28.3213450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3213496Z raise RuntimeError(msg) 2025-12-04T10:58:28.3213932Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3213935Z 2025-12-04T10:58:28.3214015Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3214329Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3214331Z 2025-12-04T10:58:28.3214452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3214534Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3214596Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3214893Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3214972Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3215012Z graph_break [] 2025-12-04T10:58:28.3215090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3215150Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3215227Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3215524Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3215580Z graph_break [] 2025-12-04T10:58:28.3215659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3215718Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3215796Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3216087Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3216127Z graph_break [] 2025-12-04T10:58:28.3216391Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0c62cf1538c7aae6.xml - 2025-12-04T10:58:28.3216472Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3217158Z FAILED [0.4356s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3217161Z 2025-12-04T10:58:28.3217239Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3217556Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3217559Z 2025-12-04T10:58:28.3217653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3217722Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3217793Z ================== 1 failed, 18 deselected, 2 rerun in 3.88s =================== 2025-12-04T10:58:28.3217835Z Got exit code 1 2025-12-04T10:58:28.3217878Z Retrying single test... 2025-12-04T10:58:28.3218092Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49a22d77cd1f993b.xml 2025-12-04T10:58:28.3218154Z ============================= test session starts ============================== 2025-12-04T10:58:28.3218273Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3218317Z cachedir: .pytest_cache 2025-12-04T10:58:28.3218514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3218565Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3218610Z configfile: pytest.ini 2025-12-04T10:58:28.3218785Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3218865Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3219176Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3219223Z Running 1 items in this shard 2025-12-04T10:58:28.3219225Z 2025-12-04T10:58:28.3219621Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:36:47.084385135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3219625Z 2025-12-04T10:58:28.3219804Z [W1204 10:36:48.348333654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3219806Z 2025-12-04T10:58:28.3219971Z [W1204 10:36:48.348490442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3219973Z 2025-12-04T10:58:28.3220136Z [W1204 10:36:48.351977598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220139Z 2025-12-04T10:58:28.3220301Z [W1204 10:36:48.352293123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220303Z 2025-12-04T10:58:28.3220466Z [W1204 10:36:48.352358152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220481Z 2025-12-04T10:58:28.3220643Z [W1204 10:36:48.354459480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220646Z 2025-12-04T10:58:28.3220808Z [W1204 10:36:48.354736196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220809Z 2025-12-04T10:58:28.3220971Z [W1204 10:36:48.354797485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3220973Z 2025-12-04T10:58:28.3221027Z ('RERUN', {'yellow': True}) [3.1439s] [100%] 2025-12-04T10:58:28.3221420Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:36:48.954326722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3221423Z 2025-12-04T10:58:28.3221588Z [W1204 10:36:48.954678127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3221590Z 2025-12-04T10:58:28.3221753Z [W1204 10:36:48.954748846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3221755Z 2025-12-04T10:58:28.3221917Z [W1204 10:36:48.956045706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3221919Z 2025-12-04T10:58:28.3222081Z [W1204 10:36:48.956305682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3222083Z 2025-12-04T10:58:28.3222272Z [W1204 10:36:48.956365331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3222277Z 2025-12-04T10:58:28.3222440Z [W1204 10:36:48.958360740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3222443Z 2025-12-04T10:58:28.3222604Z [W1204 10:36:48.958621526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3222606Z 2025-12-04T10:58:28.3222767Z [W1204 10:36:48.958680855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3222769Z 2025-12-04T10:58:28.3222821Z ('RERUN', {'yellow': True}) [0.4577s] [100%] 2025-12-04T10:58:28.3223211Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:36:49.404605826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3223215Z 2025-12-04T10:58:28.3223422Z [W1204 10:36:49.404967090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3223440Z 2025-12-04T10:58:28.3223603Z [W1204 10:36:49.405039119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3223605Z 2025-12-04T10:58:28.3223766Z [W1204 10:36:49.406291000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3223767Z 2025-12-04T10:58:28.3223931Z [W1204 10:36:49.406545576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3223933Z 2025-12-04T10:58:28.3224096Z [W1204 10:36:49.406605785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3224115Z 2025-12-04T10:58:28.3224279Z [W1204 10:36:49.408603275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3224282Z 2025-12-04T10:58:28.3224442Z [W1204 10:36:49.408869820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3224446Z 2025-12-04T10:58:28.3224606Z [W1204 10:36:49.408931249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3224609Z 2025-12-04T10:58:28.3224651Z FAILED [0.4591s] [100%] 2025-12-04T10:58:28.3224653Z 2025-12-04T10:58:28.3224708Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3224874Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3224926Z Traceback (most recent call last): 2025-12-04T10:58:28.3225098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3225144Z method(*args, **kwargs) 2025-12-04T10:58:28.3225311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3225354Z method(*args, **kwargs) 2025-12-04T10:58:28.3225519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3225559Z with policy(): 2025-12-04T10:58:28.3225724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3232544Z raise RuntimeError(msg) 2025-12-04T10:58:28.3233115Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3233124Z 2025-12-04T10:58:28.3233213Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3233609Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3233612Z 2025-12-04T10:58:28.3233712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3233801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3233866Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3234179Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3234285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3234325Z graph_break [] 2025-12-04T10:58:28.3234498Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3234550Z Traceback (most recent call last): 2025-12-04T10:58:28.3234725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3234771Z method(*args, **kwargs) 2025-12-04T10:58:28.3234937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3234982Z method(*args, **kwargs) 2025-12-04T10:58:28.3235151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3235206Z with policy(): 2025-12-04T10:58:28.3235378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3235423Z raise RuntimeError(msg) 2025-12-04T10:58:28.3235868Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3235871Z 2025-12-04T10:58:28.3235955Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3236285Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3236289Z 2025-12-04T10:58:28.3236386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3236472Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3236535Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3236838Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3236921Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3236961Z graph_break [] 2025-12-04T10:58:28.3237044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3237104Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3237214Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3237509Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3237552Z graph_break [] 2025-12-04T10:58:28.3237610Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3237778Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3237829Z Traceback (most recent call last): 2025-12-04T10:58:28.3238005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3238050Z method(*args, **kwargs) 2025-12-04T10:58:28.3238222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3238266Z method(*args, **kwargs) 2025-12-04T10:58:28.3238446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3238486Z with policy(): 2025-12-04T10:58:28.3238655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3238699Z raise RuntimeError(msg) 2025-12-04T10:58:28.3239148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3239151Z 2025-12-04T10:58:28.3239233Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3239566Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3239569Z 2025-12-04T10:58:28.3239665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3239745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3239806Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3240105Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3240185Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3240224Z graph_break [] 2025-12-04T10:58:28.3240308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3240368Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3240447Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3240747Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3240791Z graph_break [] 2025-12-04T10:58:28.3240870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3240931Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3241008Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3241330Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3241373Z graph_break [] 2025-12-04T10:58:28.3241646Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49a22d77cd1f993b.xml - 2025-12-04T10:58:28.3241713Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3242415Z FAILED [0.4591s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3242421Z 2025-12-04T10:58:28.3242503Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3242832Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3242836Z 2025-12-04T10:58:28.3242931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3243000Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3243074Z ================== 1 failed, 57 deselected, 2 rerun in 4.22s =================== 2025-12-04T10:58:28.3243115Z Got exit code 1 2025-12-04T10:58:28.3243159Z Retrying single test... 2025-12-04T10:58:28.3243418Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a114c577bc38c4fd.xml 2025-12-04T10:58:28.3243500Z ============================= test session starts ============================== 2025-12-04T10:58:28.3243627Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3243672Z cachedir: .pytest_cache 2025-12-04T10:58:28.3243850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3243901Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3243946Z configfile: pytest.ini 2025-12-04T10:58:28.3244128Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3244210Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3244526Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3244576Z Running 1 items in this shard 2025-12-04T10:58:28.3244580Z 2025-12-04T10:58:28.3244980Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:36:58.810619982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3244983Z 2025-12-04T10:58:28.3245153Z [W1204 10:36:58.086053106 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3245155Z 2025-12-04T10:58:28.3245321Z [W1204 10:36:58.086243185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3245323Z 2025-12-04T10:58:28.3245516Z [W1204 10:36:58.089957321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3245519Z 2025-12-04T10:58:28.3245683Z [W1204 10:36:58.090285010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3245687Z 2025-12-04T10:58:28.3245848Z [W1204 10:36:58.090348270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3245850Z 2025-12-04T10:58:28.3266604Z [W1204 10:36:58.092619832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3266607Z 2025-12-04T10:58:28.3266775Z [W1204 10:36:58.092895371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3266778Z 2025-12-04T10:58:28.3266942Z [W1204 10:36:58.092955400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3266947Z 2025-12-04T10:58:28.3267002Z ('RERUN', {'yellow': True}) [3.2429s] [100%] 2025-12-04T10:58:28.3267435Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:36:59.865743926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3267437Z 2025-12-04T10:58:28.3267603Z [W1204 10:36:59.866163325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3267605Z 2025-12-04T10:58:28.3267766Z [W1204 10:36:59.866236294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3267768Z 2025-12-04T10:58:28.3267931Z [W1204 10:36:59.867547960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3267946Z 2025-12-04T10:58:28.3268109Z [W1204 10:36:59.867816399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3268112Z 2025-12-04T10:58:28.3268273Z [W1204 10:36:59.867875348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3268275Z 2025-12-04T10:58:28.3268438Z [W1204 10:36:59.870046900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3268440Z 2025-12-04T10:58:28.3268601Z [W1204 10:36:59.870309269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3268603Z 2025-12-04T10:58:28.3268765Z [W1204 10:36:59.870368769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3268770Z 2025-12-04T10:58:28.3268822Z ('RERUN', {'yellow': True}) [0.6599s] [100%] 2025-12-04T10:58:28.3269218Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:37:00.537930787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3269221Z 2025-12-04T10:58:28.3269384Z [W1204 10:37:00.538366905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3269386Z 2025-12-04T10:58:28.3269548Z [W1204 10:37:00.538447615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3269549Z 2025-12-04T10:58:28.3269711Z [W1204 10:37:00.539786470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3269742Z 2025-12-04T10:58:28.3269904Z [W1204 10:37:00.540067129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3269907Z 2025-12-04T10:58:28.3270069Z [W1204 10:37:00.540131648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3270071Z 2025-12-04T10:58:28.3270232Z [W1204 10:37:00.542265330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3270234Z 2025-12-04T10:58:28.3270394Z [W1204 10:37:00.542540469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3270396Z 2025-12-04T10:58:28.3270557Z [W1204 10:37:00.542600669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3270560Z 2025-12-04T10:58:28.3270604Z FAILED [0.6362s] [100%] 2025-12-04T10:58:28.3270606Z 2025-12-04T10:58:28.3270666Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3270846Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3270897Z Traceback (most recent call last): 2025-12-04T10:58:28.3271069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3271115Z method(*args, **kwargs) 2025-12-04T10:58:28.3271281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3271325Z method(*args, **kwargs) 2025-12-04T10:58:28.3271489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3271542Z with policy(): 2025-12-04T10:58:28.3271711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3271756Z raise RuntimeError(msg) 2025-12-04T10:58:28.3272191Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3272194Z 2025-12-04T10:58:28.3272277Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3272599Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3272602Z 2025-12-04T10:58:28.3272700Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3272783Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3272847Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3273152Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3273233Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3273335Z graph_break [] 2025-12-04T10:58:28.3273501Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3273551Z Traceback (most recent call last): 2025-12-04T10:58:28.3273747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3273793Z method(*args, **kwargs) 2025-12-04T10:58:28.3273956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3274000Z method(*args, **kwargs) 2025-12-04T10:58:28.3274163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3274203Z with policy(): 2025-12-04T10:58:28.3274368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3274413Z raise RuntimeError(msg) 2025-12-04T10:58:28.3274853Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3274857Z 2025-12-04T10:58:28.3274938Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3275275Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3275278Z 2025-12-04T10:58:28.3275372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3275453Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3275514Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3275811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3275908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3275950Z graph_break [] 2025-12-04T10:58:28.3276028Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3276088Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3276165Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3276461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3276500Z graph_break [] 2025-12-04T10:58:28.3276556Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3276722Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3276774Z Traceback (most recent call last): 2025-12-04T10:58:28.3276941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3276986Z method(*args, **kwargs) 2025-12-04T10:58:28.3277149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3277191Z method(*args, **kwargs) 2025-12-04T10:58:28.3277354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3277393Z with policy(): 2025-12-04T10:58:28.3277558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3277601Z raise RuntimeError(msg) 2025-12-04T10:58:28.3278063Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3278068Z 2025-12-04T10:58:28.3278147Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3278462Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3278464Z 2025-12-04T10:58:28.3278557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3278636Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3278696Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3278996Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3279089Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3279128Z graph_break [] 2025-12-04T10:58:28.3279207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3279267Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3279345Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3279640Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3279679Z graph_break [] 2025-12-04T10:58:28.3279757Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3279830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3279907Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3280202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3280240Z graph_break [] 2025-12-04T10:58:28.3280509Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a114c577bc38c4fd.xml - 2025-12-04T10:58:28.3280573Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3281268Z FAILED [0.6362s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3281273Z 2025-12-04T10:58:28.3281352Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3281667Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3281669Z 2025-12-04T10:58:28.3281762Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3281828Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3281934Z ================== 1 failed, 57 deselected, 2 rerun in 4.70s =================== 2025-12-04T10:58:28.3281974Z Got exit code 1 2025-12-04T10:58:28.3282239Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3282378Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3282596Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ccf76fd68313e821.xml 2025-12-04T10:58:28.3282659Z ============================= test session starts ============================== 2025-12-04T10:58:28.3282782Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3282828Z cachedir: .pytest_cache 2025-12-04T10:58:28.3283002Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3283055Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3283112Z configfile: pytest.ini 2025-12-04T10:58:28.3283335Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3283416Z collecting ... collected 58 items / 19 deselected / 39 selected 2025-12-04T10:58:28.3283474Z stepcurrent: skipping 19 already run items. 2025-12-04T10:58:28.3283521Z Running 39 items in this shard 2025-12-04T10:58:28.3283524Z 2025-12-04T10:58:28.3283802Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5198s] [ 2%] 2025-12-04T10:58:28.3284076Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4520s] [ 2%] 2025-12-04T10:58:28.3284348Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4457s] [ 2%] 2025-12-04T10:58:28.3284352Z 2025-12-04T10:58:28.3284407Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3284572Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3284621Z Traceback (most recent call last): 2025-12-04T10:58:28.3284793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3284835Z method(*args, **kwargs) 2025-12-04T10:58:28.3285001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3285046Z method(*args, **kwargs) 2025-12-04T10:58:28.3285210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3285251Z with policy(): 2025-12-04T10:58:28.3285416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3285460Z raise RuntimeError(msg) 2025-12-04T10:58:28.3285898Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3285900Z 2025-12-04T10:58:28.3285980Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3286329Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3286334Z 2025-12-04T10:58:28.3286428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3286507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3286569Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3286764Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3286843Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3286881Z graph_break [] 2025-12-04T10:58:28.3287046Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3287097Z Traceback (most recent call last): 2025-12-04T10:58:28.3287266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3287324Z method(*args, **kwargs) 2025-12-04T10:58:28.3287489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3287531Z method(*args, **kwargs) 2025-12-04T10:58:28.3287694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3287733Z with policy(): 2025-12-04T10:58:28.3287898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3287941Z raise RuntimeError(msg) 2025-12-04T10:58:28.3288389Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3288404Z 2025-12-04T10:58:28.3288484Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3288804Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3288806Z 2025-12-04T10:58:28.3288900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3288979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3289039Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3289233Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3289314Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3289352Z graph_break [] 2025-12-04T10:58:28.3289431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3289490Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3289567Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3289757Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3289796Z graph_break [] 2025-12-04T10:58:28.3289852Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3290041Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3290092Z Traceback (most recent call last): 2025-12-04T10:58:28.3290261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3290306Z method(*args, **kwargs) 2025-12-04T10:58:28.3290470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3290514Z method(*args, **kwargs) 2025-12-04T10:58:28.3290676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3290717Z with policy(): 2025-12-04T10:58:28.3290880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3290925Z raise RuntimeError(msg) 2025-12-04T10:58:28.3291372Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3291389Z 2025-12-04T10:58:28.3291470Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3291787Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3291789Z 2025-12-04T10:58:28.3291883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3291962Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3292024Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3292228Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3292309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3292347Z graph_break [] 2025-12-04T10:58:28.3292426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3292484Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3292562Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3292753Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3292791Z graph_break [] 2025-12-04T10:58:28.3292869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3292927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3293008Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3293198Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3293237Z graph_break [] 2025-12-04T10:58:28.3293537Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ccf76fd68313e821.xml - 2025-12-04T10:58:28.3293602Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3294349Z FAILED [0.4457s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3294354Z 2025-12-04T10:58:28.3294434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3294754Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3294760Z 2025-12-04T10:58:28.3294852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3294919Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3294990Z ================== 1 failed, 19 deselected, 2 rerun in 3.58s =================== 2025-12-04T10:58:28.3295030Z Got exit code 1 2025-12-04T10:58:28.3295075Z Retrying single test... 2025-12-04T10:58:28.3295293Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbd2a6cb0a7d9432.xml 2025-12-04T10:58:28.3295371Z ============================= test session starts ============================== 2025-12-04T10:58:28.3295493Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3295536Z cachedir: .pytest_cache 2025-12-04T10:58:28.3295710Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3295759Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3295803Z configfile: pytest.ini 2025-12-04T10:58:28.3295981Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3296062Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3296393Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3296442Z Running 1 items in this shard 2025-12-04T10:58:28.3296445Z 2025-12-04T10:58:28.3296846Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:19.871294289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3296849Z 2025-12-04T10:58:28.3297015Z [W1204 10:37:19.137085918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297017Z 2025-12-04T10:58:28.3297185Z [W1204 10:37:19.137245187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297188Z 2025-12-04T10:58:28.3297349Z [W1204 10:37:19.140170375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297353Z 2025-12-04T10:58:28.3297516Z [W1204 10:37:19.140466524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297518Z 2025-12-04T10:58:28.3297679Z [W1204 10:37:19.140526444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297681Z 2025-12-04T10:58:28.3297842Z [W1204 10:37:19.142588916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3297844Z 2025-12-04T10:58:28.3298006Z [W1204 10:37:19.142860224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3298040Z 2025-12-04T10:58:28.3298203Z [W1204 10:37:19.142920724 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3298206Z 2025-12-04T10:58:28.3298260Z ('RERUN', {'yellow': True}) [2.7684s] [100%] 2025-12-04T10:58:28.3298660Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:20.091290345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3298662Z 2025-12-04T10:58:28.3298825Z [W1204 10:37:20.091679733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3298828Z 2025-12-04T10:58:28.3298988Z [W1204 10:37:20.091744813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3298993Z 2025-12-04T10:58:28.3299156Z [W1204 10:37:20.093066428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3299172Z 2025-12-04T10:58:28.3299335Z [W1204 10:37:20.093338996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3299337Z 2025-12-04T10:58:28.3299499Z [W1204 10:37:20.093400896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3299501Z 2025-12-04T10:58:28.3299664Z [W1204 10:37:20.095372368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3299665Z 2025-12-04T10:58:28.3299826Z [W1204 10:37:20.095711367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3299841Z 2025-12-04T10:58:28.3300006Z [W1204 10:37:20.095775726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3300009Z 2025-12-04T10:58:28.3300061Z ('RERUN', {'yellow': True}) [0.4739s] [100%] 2025-12-04T10:58:28.3300454Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:21.586327357 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3300456Z 2025-12-04T10:58:28.3300621Z [W1204 10:37:21.586696105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3300623Z 2025-12-04T10:58:28.3300784Z [W1204 10:37:21.586762965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3300788Z 2025-12-04T10:58:28.3300950Z [W1204 10:37:21.588046300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3300954Z 2025-12-04T10:58:28.3301115Z [W1204 10:37:21.588307459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3301119Z 2025-12-04T10:58:28.3301278Z [W1204 10:37:21.588367808 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3301281Z 2025-12-04T10:58:28.3301442Z [W1204 10:37:21.590359250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3301444Z 2025-12-04T10:58:28.3301604Z [W1204 10:37:21.590701509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3301607Z 2025-12-04T10:58:28.3301793Z [W1204 10:37:21.590764638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3301796Z 2025-12-04T10:58:28.3301838Z FAILED [0.4743s] [100%] 2025-12-04T10:58:28.3301840Z 2025-12-04T10:58:28.3301897Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3302063Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3302113Z Traceback (most recent call last): 2025-12-04T10:58:28.3302282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3302327Z method(*args, **kwargs) 2025-12-04T10:58:28.3302492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3302537Z method(*args, **kwargs) 2025-12-04T10:58:28.3302701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3302754Z with policy(): 2025-12-04T10:58:28.3302919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3302964Z raise RuntimeError(msg) 2025-12-04T10:58:28.3303436Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3303439Z 2025-12-04T10:58:28.3303519Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3303840Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3303861Z 2025-12-04T10:58:28.3303954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3304034Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3304095Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3304287Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3304365Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3304404Z graph_break [] 2025-12-04T10:58:28.3304567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3304616Z Traceback (most recent call last): 2025-12-04T10:58:28.3304781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3304826Z method(*args, **kwargs) 2025-12-04T10:58:28.3304988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3305031Z method(*args, **kwargs) 2025-12-04T10:58:28.3305192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3305232Z with policy(): 2025-12-04T10:58:28.3305395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3305440Z raise RuntimeError(msg) 2025-12-04T10:58:28.3305909Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3305914Z 2025-12-04T10:58:28.3305993Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3306306Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3306309Z 2025-12-04T10:58:28.3306401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3306480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3306540Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3306734Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3306812Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3306867Z graph_break [] 2025-12-04T10:58:28.3306944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3307003Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3307078Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3307267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3307304Z graph_break [] 2025-12-04T10:58:28.3307361Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3307523Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3307587Z Traceback (most recent call last): 2025-12-04T10:58:28.3307753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3307797Z method(*args, **kwargs) 2025-12-04T10:58:28.3307959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3308001Z method(*args, **kwargs) 2025-12-04T10:58:28.3308162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3308201Z with policy(): 2025-12-04T10:58:28.3308365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3308408Z raise RuntimeError(msg) 2025-12-04T10:58:28.3308847Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3308851Z 2025-12-04T10:58:28.3308929Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3309242Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3309245Z 2025-12-04T10:58:28.3309336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3309415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3309473Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3309687Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3309765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3309804Z graph_break [] 2025-12-04T10:58:28.3309883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3309942Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3310017Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3310208Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3310247Z graph_break [] 2025-12-04T10:58:28.3310324Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3310381Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3310461Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3310648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3310699Z graph_break [] 2025-12-04T10:58:28.3310963Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cbd2a6cb0a7d9432.xml - 2025-12-04T10:58:28.3311026Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3311723Z FAILED [0.4743s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3311748Z 2025-12-04T10:58:28.3311825Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3312139Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3312141Z 2025-12-04T10:58:28.3312233Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3312298Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3312368Z ================== 1 failed, 57 deselected, 2 rerun in 3.88s =================== 2025-12-04T10:58:28.3312406Z Got exit code 1 2025-12-04T10:58:28.3312452Z Retrying single test... 2025-12-04T10:58:28.3312667Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bfd295e930381e6e.xml 2025-12-04T10:58:28.3312729Z ============================= test session starts ============================== 2025-12-04T10:58:28.3312847Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3312892Z cachedir: .pytest_cache 2025-12-04T10:58:28.3313060Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3313109Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3313151Z configfile: pytest.ini 2025-12-04T10:58:28.3313375Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3313453Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3313793Z stepcurrent: skipping 19 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3313842Z Running 1 items in this shard 2025-12-04T10:58:28.3313844Z 2025-12-04T10:58:28.3314235Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:30.269948042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3314238Z 2025-12-04T10:58:28.3314402Z [W1204 10:37:30.544577867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3314405Z 2025-12-04T10:58:28.3314568Z [W1204 10:37:30.544734796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3314573Z 2025-12-04T10:58:28.3314736Z [W1204 10:37:30.547655753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3314753Z 2025-12-04T10:58:28.3314906Z [W1204 10:37:30.547954112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3314908Z 2025-12-04T10:58:28.3315059Z [W1204 10:37:30.548028682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3315061Z 2025-12-04T10:58:28.3315211Z [W1204 10:37:30.550121123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3315213Z 2025-12-04T10:58:28.3315363Z [W1204 10:37:30.550393841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3315380Z 2025-12-04T10:58:28.3315533Z [W1204 10:37:30.550453871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3315535Z 2025-12-04T10:58:28.3315584Z ('RERUN', {'yellow': True}) [2.7807s] [100%] 2025-12-04T10:58:28.3315951Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:31.506941209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3315954Z 2025-12-04T10:58:28.3316104Z [W1204 10:37:31.507319817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316105Z 2025-12-04T10:58:28.3316258Z [W1204 10:37:31.507385617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316261Z 2025-12-04T10:58:28.3316413Z [W1204 10:37:31.508630401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316415Z 2025-12-04T10:58:28.3316564Z [W1204 10:37:31.508887310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316566Z 2025-12-04T10:58:28.3316716Z [W1204 10:37:31.508947750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316718Z 2025-12-04T10:58:28.3316867Z [W1204 10:37:31.510841172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3316869Z 2025-12-04T10:58:28.3317020Z [W1204 10:37:31.511231940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3317023Z 2025-12-04T10:58:28.3317194Z [W1204 10:37:31.511298140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3317199Z 2025-12-04T10:58:28.3317247Z ('RERUN', {'yellow': True}) [0.4543s] [100%] 2025-12-04T10:58:28.3317611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:37:31.957681275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3317613Z 2025-12-04T10:58:28.3317762Z [W1204 10:37:31.958058563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3317764Z 2025-12-04T10:58:28.3317915Z [W1204 10:37:31.958135413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3317918Z 2025-12-04T10:58:28.3318069Z [W1204 10:37:31.959383767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318082Z 2025-12-04T10:58:28.3318234Z [W1204 10:37:31.959648786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318236Z 2025-12-04T10:58:28.3318386Z [W1204 10:37:31.959706976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318388Z 2025-12-04T10:58:28.3318539Z [W1204 10:37:31.961662798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318541Z 2025-12-04T10:58:28.3318691Z [W1204 10:37:31.961997056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318693Z 2025-12-04T10:58:28.3318856Z [W1204 10:37:31.962063486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3318858Z 2025-12-04T10:58:28.3318898Z FAILED [0.4482s] [100%] 2025-12-04T10:58:28.3318900Z 2025-12-04T10:58:28.3318951Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3319105Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3319150Z Traceback (most recent call last): 2025-12-04T10:58:28.3319308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3319348Z method(*args, **kwargs) 2025-12-04T10:58:28.3319504Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3319543Z method(*args, **kwargs) 2025-12-04T10:58:28.3319699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3319736Z with policy(): 2025-12-04T10:58:28.3319892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3319933Z raise RuntimeError(msg) 2025-12-04T10:58:28.3320341Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3320344Z 2025-12-04T10:58:28.3320418Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3320736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3320739Z 2025-12-04T10:58:28.3320828Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3320900Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3320958Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3321137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3321211Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3321247Z graph_break [] 2025-12-04T10:58:28.3321401Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3321446Z Traceback (most recent call last): 2025-12-04T10:58:28.3321604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3321657Z method(*args, **kwargs) 2025-12-04T10:58:28.3321809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3321849Z method(*args, **kwargs) 2025-12-04T10:58:28.3322001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3322037Z with policy(): 2025-12-04T10:58:28.3322191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3322231Z raise RuntimeError(msg) 2025-12-04T10:58:28.3322646Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3322659Z 2025-12-04T10:58:28.3322734Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3323028Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3323031Z 2025-12-04T10:58:28.3323118Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3323190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3323247Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3323468Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3323543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3323580Z graph_break [] 2025-12-04T10:58:28.3323654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3323709Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3323781Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3323957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3323994Z graph_break [] 2025-12-04T10:58:28.3324045Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3324199Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3324244Z Traceback (most recent call last): 2025-12-04T10:58:28.3324427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3324469Z method(*args, **kwargs) 2025-12-04T10:58:28.3324619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3324658Z method(*args, **kwargs) 2025-12-04T10:58:28.3324807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3324844Z with policy(): 2025-12-04T10:58:28.3324995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3325037Z raise RuntimeError(msg) 2025-12-04T10:58:28.3325443Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3325473Z 2025-12-04T10:58:28.3325547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3325839Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3325841Z 2025-12-04T10:58:28.3325927Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3325998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3326053Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3326228Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3326315Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3326351Z graph_break [] 2025-12-04T10:58:28.3326424Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3326477Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3326547Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3326722Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3326758Z graph_break [] 2025-12-04T10:58:28.3326831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3326883Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3326954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3327130Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3327167Z graph_break [] 2025-12-04T10:58:28.3327411Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bfd295e930381e6e.xml - 2025-12-04T10:58:28.3327471Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3328137Z FAILED [0.4482s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3328140Z 2025-12-04T10:58:28.3328214Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3328502Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3328505Z 2025-12-04T10:58:28.3328590Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3328651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3328716Z ================== 1 failed, 57 deselected, 2 rerun in 3.85s =================== 2025-12-04T10:58:28.3328752Z Got exit code 1 2025-12-04T10:58:28.3328995Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3329125Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3329337Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-20c2778cbf1df662.xml 2025-12-04T10:58:28.3329393Z ============================= test session starts ============================== 2025-12-04T10:58:28.3329501Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3329542Z cachedir: .pytest_cache 2025-12-04T10:58:28.3329699Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3329744Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3329783Z configfile: pytest.ini 2025-12-04T10:58:28.3329944Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3330030Z collecting ... collected 58 items / 20 deselected / 38 selected 2025-12-04T10:58:28.3330084Z stepcurrent: skipping 20 already run items. 2025-12-04T10:58:28.3330127Z Running 38 items in this shard 2025-12-04T10:58:28.3330129Z 2025-12-04T10:58:28.3330382Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4748s] [ 2%] 2025-12-04T10:58:28.3330626Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4419s] [ 2%] 2025-12-04T10:58:28.3330848Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4348s] [ 2%] 2025-12-04T10:58:28.3330853Z 2025-12-04T10:58:28.3330903Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3331053Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3331099Z Traceback (most recent call last): 2025-12-04T10:58:28.3331255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3331295Z method(*args, **kwargs) 2025-12-04T10:58:28.3331446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3331486Z method(*args, **kwargs) 2025-12-04T10:58:28.3331635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3331672Z with policy(): 2025-12-04T10:58:28.3331845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3331888Z raise RuntimeError(msg) 2025-12-04T10:58:28.3332424Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3332427Z 2025-12-04T10:58:28.3332500Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3332791Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3332794Z 2025-12-04T10:58:28.3332880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3332954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3333023Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3333200Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3333308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3333345Z graph_break [] 2025-12-04T10:58:28.3333493Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3333538Z Traceback (most recent call last): 2025-12-04T10:58:28.3333690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3333730Z method(*args, **kwargs) 2025-12-04T10:58:28.3333897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3333941Z method(*args, **kwargs) 2025-12-04T10:58:28.3334090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3334127Z with policy(): 2025-12-04T10:58:28.3334279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3334320Z raise RuntimeError(msg) 2025-12-04T10:58:28.3334726Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3334728Z 2025-12-04T10:58:28.3334805Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3335095Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3335100Z 2025-12-04T10:58:28.3335185Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3335258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3335313Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3335489Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3335560Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3335597Z graph_break [] 2025-12-04T10:58:28.3335701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3335757Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3335829Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3336004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3336038Z graph_break [] 2025-12-04T10:58:28.3336090Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3336238Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3336284Z Traceback (most recent call last): 2025-12-04T10:58:28.3336436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3336477Z method(*args, **kwargs) 2025-12-04T10:58:28.3336629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3336683Z method(*args, **kwargs) 2025-12-04T10:58:28.3336833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3336870Z with policy(): 2025-12-04T10:58:28.3337021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3337062Z raise RuntimeError(msg) 2025-12-04T10:58:28.3337463Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3337477Z 2025-12-04T10:58:28.3337553Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3337843Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3337846Z 2025-12-04T10:58:28.3337931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3338004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3338059Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3338237Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3338308Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3338345Z graph_break [] 2025-12-04T10:58:28.3338418Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3338473Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3338543Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3338717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3338752Z graph_break [] 2025-12-04T10:58:28.3338824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3338877Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3338949Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3339124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3339190Z graph_break [] 2025-12-04T10:58:28.3339433Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-20c2778cbf1df662.xml - 2025-12-04T10:58:28.3339493Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3340121Z FAILED [0.4348s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3340124Z 2025-12-04T10:58:28.3340196Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3340487Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3340502Z 2025-12-04T10:58:28.3340586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3340647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3340712Z ================== 1 failed, 20 deselected, 2 rerun in 3.52s =================== 2025-12-04T10:58:28.3340748Z Got exit code 1 2025-12-04T10:58:28.3340787Z Retrying single test... 2025-12-04T10:58:28.3340985Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-88fb7d820cb68a88.xml 2025-12-04T10:58:28.3341040Z ============================= test session starts ============================== 2025-12-04T10:58:28.3341164Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3341205Z cachedir: .pytest_cache 2025-12-04T10:58:28.3341362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3341407Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3341446Z configfile: pytest.ini 2025-12-04T10:58:28.3341606Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3341679Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3341963Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3342007Z Running 1 items in this shard 2025-12-04T10:58:28.3342010Z 2025-12-04T10:58:28.3342379Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:37:50.399005173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3342382Z 2025-12-04T10:58:28.3342535Z [W1204 10:37:50.670712840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3342537Z 2025-12-04T10:58:28.3342689Z [W1204 10:37:50.670840789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3342691Z 2025-12-04T10:58:28.3342840Z [W1204 10:37:50.673736206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3342842Z 2025-12-04T10:58:28.3343011Z [W1204 10:37:50.674033444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3343014Z 2025-12-04T10:58:28.3343164Z [W1204 10:37:50.674097544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3343165Z 2025-12-04T10:58:28.3343355Z [W1204 10:37:50.676210014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3343357Z 2025-12-04T10:58:28.3343508Z [W1204 10:37:50.676480263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3343510Z 2025-12-04T10:58:28.3343657Z [W1204 10:37:50.676541553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3343659Z 2025-12-04T10:58:28.3343708Z ('RERUN', {'yellow': True}) [2.7760s] [100%] 2025-12-04T10:58:28.3344068Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:37:51.624660442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344086Z 2025-12-04T10:58:28.3344234Z [W1204 10:37:51.625043260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344236Z 2025-12-04T10:58:28.3344385Z [W1204 10:37:51.625110010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344386Z 2025-12-04T10:58:28.3344534Z [W1204 10:37:51.626467573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344536Z 2025-12-04T10:58:28.3344686Z [W1204 10:37:51.626726942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344712Z 2025-12-04T10:58:28.3344860Z [W1204 10:37:51.626787382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3344862Z 2025-12-04T10:58:28.3345010Z [W1204 10:37:51.628701702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3345012Z 2025-12-04T10:58:28.3345159Z [W1204 10:37:51.629050201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3345161Z 2025-12-04T10:58:28.3345308Z [W1204 10:37:51.629116561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3345310Z 2025-12-04T10:58:28.3345358Z ('RERUN', {'yellow': True}) [0.4426s] [100%] 2025-12-04T10:58:28.3345720Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:37:51.071617378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3345723Z 2025-12-04T10:58:28.3345873Z [W1204 10:37:51.071977347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3345875Z 2025-12-04T10:58:28.3346022Z [W1204 10:37:51.072045096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346025Z 2025-12-04T10:58:28.3346172Z [W1204 10:37:51.073300420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346174Z 2025-12-04T10:58:28.3346349Z [W1204 10:37:51.073555469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346353Z 2025-12-04T10:58:28.3346500Z [W1204 10:37:51.073615829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346503Z 2025-12-04T10:58:28.3346652Z [W1204 10:37:51.075620010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346654Z 2025-12-04T10:58:28.3346802Z [W1204 10:37:51.075962408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346804Z 2025-12-04T10:58:28.3346953Z [W1204 10:37:51.076030008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3346954Z 2025-12-04T10:58:28.3346993Z FAILED [0.4429s] [100%] 2025-12-04T10:58:28.3346995Z 2025-12-04T10:58:28.3347049Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3347200Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3347257Z Traceback (most recent call last): 2025-12-04T10:58:28.3347414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3347453Z method(*args, **kwargs) 2025-12-04T10:58:28.3347605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3347644Z method(*args, **kwargs) 2025-12-04T10:58:28.3347794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3347830Z with policy(): 2025-12-04T10:58:28.3347984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3348036Z raise RuntimeError(msg) 2025-12-04T10:58:28.3348431Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3348434Z 2025-12-04T10:58:28.3348508Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3348799Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3348802Z 2025-12-04T10:58:28.3348887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3348963Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3349019Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3349196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3349269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3349305Z graph_break [] 2025-12-04T10:58:28.3349455Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3349499Z Traceback (most recent call last): 2025-12-04T10:58:28.3349651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3349690Z method(*args, **kwargs) 2025-12-04T10:58:28.3349863Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3349903Z method(*args, **kwargs) 2025-12-04T10:58:28.3350054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3350090Z with policy(): 2025-12-04T10:58:28.3350242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3350282Z raise RuntimeError(msg) 2025-12-04T10:58:28.3350682Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3350684Z 2025-12-04T10:58:28.3350756Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3351048Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3351062Z 2025-12-04T10:58:28.3351147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3351221Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3351276Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3351451Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3351523Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3351558Z graph_break [] 2025-12-04T10:58:28.3351630Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3351698Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3351770Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3351947Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3351983Z graph_break [] 2025-12-04T10:58:28.3352033Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3352184Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3352228Z Traceback (most recent call last): 2025-12-04T10:58:28.3352381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3352420Z method(*args, **kwargs) 2025-12-04T10:58:28.3352572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3352612Z method(*args, **kwargs) 2025-12-04T10:58:28.3352762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3352797Z with policy(): 2025-12-04T10:58:28.3352949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3352989Z raise RuntimeError(msg) 2025-12-04T10:58:28.3353422Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3353425Z 2025-12-04T10:58:28.3353497Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3353822Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3353825Z 2025-12-04T10:58:28.3353911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3353983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3354038Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3354215Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3354288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3354323Z graph_break [] 2025-12-04T10:58:28.3354397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3354451Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3354538Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3354711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3354747Z graph_break [] 2025-12-04T10:58:28.3354818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3354872Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3354942Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3355116Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3355151Z graph_break [] 2025-12-04T10:58:28.3355409Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-88fb7d820cb68a88.xml - 2025-12-04T10:58:28.3355467Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3356096Z FAILED [0.4429s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3356099Z 2025-12-04T10:58:28.3356171Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3356463Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3356467Z 2025-12-04T10:58:28.3356554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3356614Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3356680Z ================== 1 failed, 57 deselected, 2 rerun in 3.83s =================== 2025-12-04T10:58:28.3356716Z Got exit code 1 2025-12-04T10:58:28.3356756Z Retrying single test... 2025-12-04T10:58:28.3356952Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-345e4bfa87721364.xml 2025-12-04T10:58:28.3357008Z ============================= test session starts ============================== 2025-12-04T10:58:28.3357140Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3357183Z cachedir: .pytest_cache 2025-12-04T10:58:28.3357341Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3357389Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3357428Z configfile: pytest.ini 2025-12-04T10:58:28.3357587Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3357658Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3357944Z stepcurrent: skipping 20 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3357986Z Running 1 items in this shard 2025-12-04T10:58:28.3357990Z 2025-12-04T10:58:28.3358354Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:38:00.595503365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3358368Z 2025-12-04T10:58:28.3358523Z [W1204 10:38:00.860005767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3358525Z 2025-12-04T10:58:28.3358676Z [W1204 10:38:00.860160837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3358678Z 2025-12-04T10:58:28.3358828Z [W1204 10:38:00.863572130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3358830Z 2025-12-04T10:58:28.3358980Z [W1204 10:38:00.863873508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3358995Z 2025-12-04T10:58:28.3359143Z [W1204 10:38:00.863934468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3359146Z 2025-12-04T10:58:28.3359295Z [W1204 10:38:00.866258827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3359297Z 2025-12-04T10:58:28.3359444Z [W1204 10:38:00.866536225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3359446Z 2025-12-04T10:58:28.3359594Z [W1204 10:38:00.866596025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3359596Z 2025-12-04T10:58:28.3359643Z ('RERUN', {'yellow': True}) [2.8637s] [100%] 2025-12-04T10:58:28.3360007Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:38:01.029468731 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360011Z 2025-12-04T10:58:28.3360160Z [W1204 10:38:01.029863679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360162Z 2025-12-04T10:58:28.3360309Z [W1204 10:38:01.029937719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360311Z 2025-12-04T10:58:28.3360460Z [W1204 10:38:01.031229572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360462Z 2025-12-04T10:58:28.3360610Z [W1204 10:38:01.031503691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360635Z 2025-12-04T10:58:28.3360785Z [W1204 10:38:01.031568321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360788Z 2025-12-04T10:58:28.3360936Z [W1204 10:38:01.033579761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3360939Z 2025-12-04T10:58:28.3361087Z [W1204 10:38:01.033924209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3361089Z 2025-12-04T10:58:28.3361237Z [W1204 10:38:01.033987159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3361239Z 2025-12-04T10:58:28.3361286Z ('RERUN', {'yellow': True}) [0.6617s] [100%] 2025-12-04T10:58:28.3361646Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:38:02.691190850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3361660Z 2025-12-04T10:58:28.3361809Z [W1204 10:38:02.691592108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3361811Z 2025-12-04T10:58:28.3361959Z [W1204 10:38:02.691662098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3361961Z 2025-12-04T10:58:28.3362109Z [W1204 10:38:02.692902042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362111Z 2025-12-04T10:58:28.3362258Z [W1204 10:38:02.693171161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362271Z 2025-12-04T10:58:28.3362422Z [W1204 10:38:02.693235570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362425Z 2025-12-04T10:58:28.3362572Z [W1204 10:38:02.695273230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362574Z 2025-12-04T10:58:28.3362723Z [W1204 10:38:02.695615579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362725Z 2025-12-04T10:58:28.3362874Z [W1204 10:38:02.695679588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3362876Z 2025-12-04T10:58:28.3362914Z FAILED [0.6737s] [100%] 2025-12-04T10:58:28.3362916Z 2025-12-04T10:58:28.3362968Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3363120Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3363167Z Traceback (most recent call last): 2025-12-04T10:58:28.3363360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3363401Z method(*args, **kwargs) 2025-12-04T10:58:28.3363552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3363592Z method(*args, **kwargs) 2025-12-04T10:58:28.3363742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3363779Z with policy(): 2025-12-04T10:58:28.3363930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3363972Z raise RuntimeError(msg) 2025-12-04T10:58:28.3364395Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3364398Z 2025-12-04T10:58:28.3364473Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3364762Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3364764Z 2025-12-04T10:58:28.3364850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3364923Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3364982Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3365162Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3365248Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3365285Z graph_break [] 2025-12-04T10:58:28.3365434Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3365480Z Traceback (most recent call last): 2025-12-04T10:58:28.3365633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3365673Z method(*args, **kwargs) 2025-12-04T10:58:28.3365823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3365883Z method(*args, **kwargs) 2025-12-04T10:58:28.3366033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3366070Z with policy(): 2025-12-04T10:58:28.3366221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3366263Z raise RuntimeError(msg) 2025-12-04T10:58:28.3366661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3366663Z 2025-12-04T10:58:28.3366735Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3367024Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3367028Z 2025-12-04T10:58:28.3367114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3367187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3367244Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3367423Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3367495Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3367531Z graph_break [] 2025-12-04T10:58:28.3367602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3367657Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3367751Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3367926Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3367963Z graph_break [] 2025-12-04T10:58:28.3368014Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3368162Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3368207Z Traceback (most recent call last): 2025-12-04T10:58:28.3368360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3368400Z method(*args, **kwargs) 2025-12-04T10:58:28.3368550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3368593Z method(*args, **kwargs) 2025-12-04T10:58:28.3368742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3368791Z with policy(): 2025-12-04T10:58:28.3368941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3368982Z raise RuntimeError(msg) 2025-12-04T10:58:28.3369386Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3369388Z 2025-12-04T10:58:28.3369461Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3369751Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3369767Z 2025-12-04T10:58:28.3369853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3369926Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3369980Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3370156Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3370227Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3370264Z graph_break [] 2025-12-04T10:58:28.3370334Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3370391Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3370463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3370639Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3370673Z graph_break [] 2025-12-04T10:58:28.3370745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3370799Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3370870Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3371044Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3371080Z graph_break [] 2025-12-04T10:58:28.3371343Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-345e4bfa87721364.xml - 2025-12-04T10:58:28.3371403Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3372033Z FAILED [0.6737s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3372037Z 2025-12-04T10:58:28.3372108Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3372399Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3372402Z 2025-12-04T10:58:28.3372498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3372560Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3372625Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.3372662Z Got exit code 1 2025-12-04T10:58:28.3372901Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3373029Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3373226Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-46da1f91293f495a.xml 2025-12-04T10:58:28.3373336Z ============================= test session starts ============================== 2025-12-04T10:58:28.3373446Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3373488Z cachedir: .pytest_cache 2025-12-04T10:58:28.3373647Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3373693Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3373733Z configfile: pytest.ini 2025-12-04T10:58:28.3373896Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3373969Z collecting ... collected 58 items / 21 deselected / 37 selected 2025-12-04T10:58:28.3374023Z stepcurrent: skipping 21 already run items. 2025-12-04T10:58:28.3374065Z Running 37 items in this shard 2025-12-04T10:58:28.3374068Z 2025-12-04T10:58:28.3374322Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8656s] [ 2%] 2025-12-04T10:58:28.3374568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4426s] [ 2%] 2025-12-04T10:58:28.3374790Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4415s] [ 2%] 2025-12-04T10:58:28.3374792Z 2025-12-04T10:58:28.3374843Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3374991Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3375036Z Traceback (most recent call last): 2025-12-04T10:58:28.3375220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3375262Z method(*args, **kwargs) 2025-12-04T10:58:28.3375414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3375454Z method(*args, **kwargs) 2025-12-04T10:58:28.3375603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3375643Z with policy(): 2025-12-04T10:58:28.3375795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3375839Z raise RuntimeError(msg) 2025-12-04T10:58:28.3376234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3376253Z 2025-12-04T10:58:28.3376327Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3376617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3376619Z 2025-12-04T10:58:28.3376705Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3376779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3376834Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3377114Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3377202Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3377241Z graph_break [] 2025-12-04T10:58:28.3377390Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3377437Z Traceback (most recent call last): 2025-12-04T10:58:28.3377590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3377633Z method(*args, **kwargs) 2025-12-04T10:58:28.3377782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3377823Z method(*args, **kwargs) 2025-12-04T10:58:28.3377972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3378012Z with policy(): 2025-12-04T10:58:28.3378165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3378210Z raise RuntimeError(msg) 2025-12-04T10:58:28.3378615Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3378618Z 2025-12-04T10:58:28.3378691Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3378986Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3379012Z 2025-12-04T10:58:28.3379100Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3379177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3379231Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3379504Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3379577Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3379614Z graph_break [] 2025-12-04T10:58:28.3379687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3379745Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3379816Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3380092Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3380145Z graph_break [] 2025-12-04T10:58:28.3380197Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3380346Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3380393Z Traceback (most recent call last): 2025-12-04T10:58:28.3380546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3380587Z method(*args, **kwargs) 2025-12-04T10:58:28.3380737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3380791Z method(*args, **kwargs) 2025-12-04T10:58:28.3380941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3380978Z with policy(): 2025-12-04T10:58:28.3381130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3381170Z raise RuntimeError(msg) 2025-12-04T10:58:28.3381573Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3381575Z 2025-12-04T10:58:28.3381648Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3381939Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3381942Z 2025-12-04T10:58:28.3382027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3382101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3382155Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3382429Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3382501Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3382540Z graph_break [] 2025-12-04T10:58:28.3382634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3382691Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3382762Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3383034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3383070Z graph_break [] 2025-12-04T10:58:28.3383141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3383198Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3383304Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3383576Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3383614Z graph_break [] 2025-12-04T10:58:28.3383858Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-46da1f91293f495a.xml - 2025-12-04T10:58:28.3383932Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3384565Z FAILED [0.4415s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3384567Z 2025-12-04T10:58:28.3384655Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3384946Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3384950Z 2025-12-04T10:58:28.3385037Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3385098Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3385165Z ================== 1 failed, 21 deselected, 2 rerun in 3.91s =================== 2025-12-04T10:58:28.3385202Z Got exit code 1 2025-12-04T10:58:28.3385242Z Retrying single test... 2025-12-04T10:58:28.3385440Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1da269a6df106946.xml 2025-12-04T10:58:28.3385498Z ============================= test session starts ============================== 2025-12-04T10:58:28.3385608Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3385653Z cachedir: .pytest_cache 2025-12-04T10:58:28.3385810Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3385856Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3385897Z configfile: pytest.ini 2025-12-04T10:58:28.3386056Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3386129Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3386415Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3386483Z Running 1 items in this shard 2025-12-04T10:58:28.3386485Z 2025-12-04T10:58:28.3386851Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:22.242705721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3386854Z 2025-12-04T10:58:28.3387009Z [W1204 10:38:23.515235965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387011Z 2025-12-04T10:58:28.3387165Z [W1204 10:38:23.515408074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387167Z 2025-12-04T10:58:28.3387318Z [W1204 10:38:23.518936386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387321Z 2025-12-04T10:58:28.3387470Z [W1204 10:38:23.519236114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387485Z 2025-12-04T10:58:28.3387637Z [W1204 10:38:23.519298604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387639Z 2025-12-04T10:58:28.3387788Z [W1204 10:38:23.521428032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387792Z 2025-12-04T10:58:28.3387940Z [W1204 10:38:23.521699321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3387942Z 2025-12-04T10:58:28.3388090Z [W1204 10:38:23.521759361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3388104Z 2025-12-04T10:58:28.3388155Z ('RERUN', {'yellow': True}) [3.1793s] [100%] 2025-12-04T10:58:28.3388518Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:23.141014667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3388521Z 2025-12-04T10:58:28.3388670Z [W1204 10:38:23.141361326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3388672Z 2025-12-04T10:58:28.3388823Z [W1204 10:38:23.141425465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3388824Z 2025-12-04T10:58:28.3388975Z [W1204 10:38:23.142690079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3388978Z 2025-12-04T10:58:28.3389129Z [W1204 10:38:23.142948987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3389132Z 2025-12-04T10:58:28.3389284Z [W1204 10:38:23.143012427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3389286Z 2025-12-04T10:58:28.3389435Z [W1204 10:38:23.145027366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3389437Z 2025-12-04T10:58:28.3389588Z [W1204 10:38:23.145293455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3389590Z 2025-12-04T10:58:28.3389739Z [W1204 10:38:23.145354715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3389740Z 2025-12-04T10:58:28.3389810Z ('RERUN', {'yellow': True}) [0.4859s] [100%] 2025-12-04T10:58:28.3390168Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:24.606775227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390171Z 2025-12-04T10:58:28.3390319Z [W1204 10:38:24.607158485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390321Z 2025-12-04T10:58:28.3390469Z [W1204 10:38:24.607225405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390471Z 2025-12-04T10:58:28.3390619Z [W1204 10:38:24.608473798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390620Z 2025-12-04T10:58:28.3390772Z [W1204 10:38:24.608724067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390789Z 2025-12-04T10:58:28.3390940Z [W1204 10:38:24.608781937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3390942Z 2025-12-04T10:58:28.3391091Z [W1204 10:38:24.610754256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3391093Z 2025-12-04T10:58:28.3391244Z [W1204 10:38:24.611017775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3391246Z 2025-12-04T10:58:28.3391396Z [W1204 10:38:24.611078374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3391398Z 2025-12-04T10:58:28.3391437Z FAILED [0.4583s] [100%] 2025-12-04T10:58:28.3391450Z 2025-12-04T10:58:28.3391504Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3391659Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3391706Z Traceback (most recent call last): 2025-12-04T10:58:28.3391865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3391906Z method(*args, **kwargs) 2025-12-04T10:58:28.3392060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3392100Z method(*args, **kwargs) 2025-12-04T10:58:28.3392251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3392290Z with policy(): 2025-12-04T10:58:28.3392444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3392486Z raise RuntimeError(msg) 2025-12-04T10:58:28.3392883Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3392885Z 2025-12-04T10:58:28.3392961Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3393288Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3393290Z 2025-12-04T10:58:28.3393380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3393489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3393548Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3393825Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3393898Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3393933Z graph_break [] 2025-12-04T10:58:28.3394086Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3394131Z Traceback (most recent call last): 2025-12-04T10:58:28.3394285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3394330Z method(*args, **kwargs) 2025-12-04T10:58:28.3394479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3394535Z method(*args, **kwargs) 2025-12-04T10:58:28.3394686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3394724Z with policy(): 2025-12-04T10:58:28.3394876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3394918Z raise RuntimeError(msg) 2025-12-04T10:58:28.3395317Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3395334Z 2025-12-04T10:58:28.3395409Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3395697Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3395701Z 2025-12-04T10:58:28.3395787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3395859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3395915Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3396190Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3396266Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3396304Z graph_break [] 2025-12-04T10:58:28.3396377Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3396433Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3396504Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3396774Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3396811Z graph_break [] 2025-12-04T10:58:28.3396863Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3397012Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3397058Z Traceback (most recent call last): 2025-12-04T10:58:28.3397233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3397278Z method(*args, **kwargs) 2025-12-04T10:58:28.3397429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3397473Z method(*args, **kwargs) 2025-12-04T10:58:28.3397623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3397663Z with policy(): 2025-12-04T10:58:28.3397815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3397860Z raise RuntimeError(msg) 2025-12-04T10:58:28.3398267Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3398281Z 2025-12-04T10:58:28.3398357Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3398644Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3398648Z 2025-12-04T10:58:28.3398734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3398808Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3398862Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3399138Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3399224Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3399262Z graph_break [] 2025-12-04T10:58:28.3399333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3399390Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3399463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3399734Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3399771Z graph_break [] 2025-12-04T10:58:28.3399845Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3399903Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3399978Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3400252Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3400292Z graph_break [] 2025-12-04T10:58:28.3400540Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1da269a6df106946.xml - 2025-12-04T10:58:28.3400601Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3401252Z FAILED [0.4583s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3401256Z 2025-12-04T10:58:28.3401330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3401621Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3401624Z 2025-12-04T10:58:28.3401709Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3401772Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3401838Z ================== 1 failed, 57 deselected, 2 rerun in 4.29s =================== 2025-12-04T10:58:28.3401877Z Got exit code 1 2025-12-04T10:58:28.3401917Z Retrying single test... 2025-12-04T10:58:28.3402129Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ba1ec49e456e9f6.xml 2025-12-04T10:58:28.3402185Z ============================= test session starts ============================== 2025-12-04T10:58:28.3402296Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3402337Z cachedir: .pytest_cache 2025-12-04T10:58:28.3402499Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3402543Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3402584Z configfile: pytest.ini 2025-12-04T10:58:28.3402744Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3402832Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3403120Z stepcurrent: skipping 21 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3403165Z Running 1 items in this shard 2025-12-04T10:58:28.3403167Z 2025-12-04T10:58:28.3403554Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:34.299216391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3403557Z 2025-12-04T10:58:28.3403709Z [W1204 10:38:34.580937944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3403712Z 2025-12-04T10:58:28.3403867Z [W1204 10:38:34.581114773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3403869Z 2025-12-04T10:58:28.3404018Z [W1204 10:38:34.584781733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404022Z 2025-12-04T10:58:28.3404170Z [W1204 10:38:34.585094151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404172Z 2025-12-04T10:58:28.3404320Z [W1204 10:38:34.585158841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404321Z 2025-12-04T10:58:28.3404471Z [W1204 10:38:34.587296950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404472Z 2025-12-04T10:58:28.3404652Z [W1204 10:38:34.587572878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404655Z 2025-12-04T10:58:28.3404805Z [W1204 10:38:34.587632458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3404807Z 2025-12-04T10:58:28.3404856Z ('RERUN', {'yellow': True}) [3.2567s] [100%] 2025-12-04T10:58:28.3405216Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:34.197047233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405219Z 2025-12-04T10:58:28.3405367Z [W1204 10:38:34.197405171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405369Z 2025-12-04T10:58:28.3405519Z [W1204 10:38:34.197473331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405536Z 2025-12-04T10:58:28.3405684Z [W1204 10:38:34.198730814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405686Z 2025-12-04T10:58:28.3405835Z [W1204 10:38:34.198988813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405837Z 2025-12-04T10:58:28.3405986Z [W1204 10:38:34.199058822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3405990Z 2025-12-04T10:58:28.3406137Z [W1204 10:38:34.201070181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3406139Z 2025-12-04T10:58:28.3406290Z [W1204 10:38:34.201332410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3406307Z 2025-12-04T10:58:28.3406457Z [W1204 10:38:34.201393710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3406458Z 2025-12-04T10:58:28.3406507Z ('RERUN', {'yellow': True}) [0.4839s] [100%] 2025-12-04T10:58:28.3406867Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:38:35.675498416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3406870Z 2025-12-04T10:58:28.3407019Z [W1204 10:38:35.675872014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407021Z 2025-12-04T10:58:28.3407171Z [W1204 10:38:35.675944854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407174Z 2025-12-04T10:58:28.3407323Z [W1204 10:38:35.677203257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407324Z 2025-12-04T10:58:28.3407472Z [W1204 10:38:35.677457526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407474Z 2025-12-04T10:58:28.3407621Z [W1204 10:38:35.677517055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407623Z 2025-12-04T10:58:28.3407771Z [W1204 10:38:35.679520814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407773Z 2025-12-04T10:58:28.3407948Z [W1204 10:38:35.679784253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3407953Z 2025-12-04T10:58:28.3408100Z [W1204 10:38:35.679845343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3408103Z 2025-12-04T10:58:28.3408143Z FAILED [0.4704s] [100%] 2025-12-04T10:58:28.3408145Z 2025-12-04T10:58:28.3408196Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3408347Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3408393Z Traceback (most recent call last): 2025-12-04T10:58:28.3408551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3408591Z method(*args, **kwargs) 2025-12-04T10:58:28.3408746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3408787Z method(*args, **kwargs) 2025-12-04T10:58:28.3408952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3408988Z with policy(): 2025-12-04T10:58:28.3409143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3409183Z raise RuntimeError(msg) 2025-12-04T10:58:28.3409580Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3409582Z 2025-12-04T10:58:28.3409656Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3409959Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3409962Z 2025-12-04T10:58:28.3410050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3410123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3410181Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3410453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3410527Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3410563Z graph_break [] 2025-12-04T10:58:28.3410719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3410765Z Traceback (most recent call last): 2025-12-04T10:58:28.3410920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3410959Z method(*args, **kwargs) 2025-12-04T10:58:28.3411111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3411151Z method(*args, **kwargs) 2025-12-04T10:58:28.3411302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3411337Z with policy(): 2025-12-04T10:58:28.3411489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3411530Z raise RuntimeError(msg) 2025-12-04T10:58:28.3411955Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3411958Z 2025-12-04T10:58:28.3412032Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3412323Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3412325Z 2025-12-04T10:58:28.3412411Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3412483Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3412542Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3412813Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3412901Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3412937Z graph_break [] 2025-12-04T10:58:28.3413010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3413064Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3413136Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3413461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3413515Z graph_break [] 2025-12-04T10:58:28.3413566Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3413720Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3413764Z Traceback (most recent call last): 2025-12-04T10:58:28.3413919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3413958Z method(*args, **kwargs) 2025-12-04T10:58:28.3414111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3414150Z method(*args, **kwargs) 2025-12-04T10:58:28.3414301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3414337Z with policy(): 2025-12-04T10:58:28.3414492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3414533Z raise RuntimeError(msg) 2025-12-04T10:58:28.3414938Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3414940Z 2025-12-04T10:58:28.3415014Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3415303Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3415305Z 2025-12-04T10:58:28.3415416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3415490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3415549Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3415823Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3415897Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3415933Z graph_break [] 2025-12-04T10:58:28.3416005Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3416060Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3416132Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3416403Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3416455Z graph_break [] 2025-12-04T10:58:28.3416527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3416582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3416653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3416923Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3416960Z graph_break [] 2025-12-04T10:58:28.3417205Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ba1ec49e456e9f6.xml - 2025-12-04T10:58:28.3417278Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3417910Z FAILED [0.4704s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3417912Z 2025-12-04T10:58:28.3417986Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3418275Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3418279Z 2025-12-04T10:58:28.3418363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3418427Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3418492Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.3418529Z Got exit code 1 2025-12-04T10:58:28.3418769Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3418898Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3419094Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b30b00220dcf498.xml 2025-12-04T10:58:28.3419174Z ============================= test session starts ============================== 2025-12-04T10:58:28.3419284Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3419328Z cachedir: .pytest_cache 2025-12-04T10:58:28.3419489Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3419536Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3419576Z configfile: pytest.ini 2025-12-04T10:58:28.3419739Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3419813Z collecting ... collected 58 items / 22 deselected / 36 selected 2025-12-04T10:58:28.3419867Z stepcurrent: skipping 22 already run items. 2025-12-04T10:58:28.3419910Z Running 36 items in this shard 2025-12-04T10:58:28.3419912Z 2025-12-04T10:58:28.3420174Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5624s] [ 2%] 2025-12-04T10:58:28.3420434Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5422s] [ 2%] 2025-12-04T10:58:28.3420658Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.5279s] [ 2%] 2025-12-04T10:58:28.3420661Z 2025-12-04T10:58:28.3420712Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3420862Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3420909Z Traceback (most recent call last): 2025-12-04T10:58:28.3421067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3421127Z method(*args, **kwargs) 2025-12-04T10:58:28.3421281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3421321Z method(*args, **kwargs) 2025-12-04T10:58:28.3421470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3421511Z with policy(): 2025-12-04T10:58:28.3421662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3421704Z raise RuntimeError(msg) 2025-12-04T10:58:28.3422107Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3422110Z 2025-12-04T10:58:28.3422185Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3422482Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3422486Z 2025-12-04T10:58:28.3422572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3422646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3422702Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3422880Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3422976Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3423014Z graph_break [] 2025-12-04T10:58:28.3423166Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3423212Z Traceback (most recent call last): 2025-12-04T10:58:28.3423396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3423438Z method(*args, **kwargs) 2025-12-04T10:58:28.3423589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3423630Z method(*args, **kwargs) 2025-12-04T10:58:28.3423779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3423817Z with policy(): 2025-12-04T10:58:28.3423971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3424012Z raise RuntimeError(msg) 2025-12-04T10:58:28.3424444Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3424446Z 2025-12-04T10:58:28.3424521Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3424813Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3424817Z 2025-12-04T10:58:28.3424904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3424993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3425049Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3425226Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3425298Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3425337Z graph_break [] 2025-12-04T10:58:28.3425408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3425463Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3425535Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3425714Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3425751Z graph_break [] 2025-12-04T10:58:28.3425804Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3425957Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3426003Z Traceback (most recent call last): 2025-12-04T10:58:28.3426156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3426197Z method(*args, **kwargs) 2025-12-04T10:58:28.3426349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3426389Z method(*args, **kwargs) 2025-12-04T10:58:28.3426538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3426576Z with policy(): 2025-12-04T10:58:28.3426754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3426797Z raise RuntimeError(msg) 2025-12-04T10:58:28.3427208Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3427211Z 2025-12-04T10:58:28.3427286Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3427580Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3427583Z 2025-12-04T10:58:28.3427671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3427744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3427811Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3427987Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3428059Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3428096Z graph_break [] 2025-12-04T10:58:28.3428168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3428223Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3430691Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3430885Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3430945Z graph_break [] 2025-12-04T10:58:28.3431022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3431077Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3431151Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3431329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3431365Z graph_break [] 2025-12-04T10:58:28.3431610Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-5b30b00220dcf498.xml - 2025-12-04T10:58:28.3431671Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3432324Z FAILED [0.5279s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3432329Z 2025-12-04T10:58:28.3432402Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3432693Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3432696Z 2025-12-04T10:58:28.3432782Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3432871Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3432937Z ================== 1 failed, 22 deselected, 2 rerun in 3.80s =================== 2025-12-04T10:58:28.3432976Z Got exit code 1 2025-12-04T10:58:28.3433017Z Retrying single test... 2025-12-04T10:58:28.3433218Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b40a96ef07b17a88.xml 2025-12-04T10:58:28.3433340Z ============================= test session starts ============================== 2025-12-04T10:58:28.3433455Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3433495Z cachedir: .pytest_cache 2025-12-04T10:58:28.3433659Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3433705Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3433748Z configfile: pytest.ini 2025-12-04T10:58:28.3433911Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3434007Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3434294Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3434339Z Running 1 items in this shard 2025-12-04T10:58:28.3434341Z 2025-12-04T10:58:28.3434707Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:38:54.992849570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3434727Z 2025-12-04T10:58:28.3434884Z [W1204 10:38:54.264090141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3434887Z 2025-12-04T10:58:28.3435040Z [W1204 10:38:54.264246310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435042Z 2025-12-04T10:58:28.3435191Z [W1204 10:38:54.267553641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435193Z 2025-12-04T10:58:28.3435341Z [W1204 10:38:54.267866839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435343Z 2025-12-04T10:58:28.3435490Z [W1204 10:38:54.267927839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435493Z 2025-12-04T10:58:28.3435643Z [W1204 10:38:55.270091026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435645Z 2025-12-04T10:58:28.3435795Z [W1204 10:38:55.270365595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435797Z 2025-12-04T10:58:28.3435945Z [W1204 10:38:55.270425854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3435947Z 2025-12-04T10:58:28.3435997Z ('RERUN', {'yellow': True}) [2.9737s] [100%] 2025-12-04T10:58:28.3436362Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:38:56.434774640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3436364Z 2025-12-04T10:58:28.3436543Z [W1204 10:38:56.435145327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3436546Z 2025-12-04T10:58:28.3436695Z [W1204 10:38:56.435211277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3436697Z 2025-12-04T10:58:28.3436845Z [W1204 10:38:56.436481250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3436847Z 2025-12-04T10:58:28.3436994Z [W1204 10:38:56.436743198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3436996Z 2025-12-04T10:58:28.3437143Z [W1204 10:38:56.436804768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3437145Z 2025-12-04T10:58:28.3437295Z [W1204 10:38:56.438778046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3437298Z 2025-12-04T10:58:28.3437468Z [W1204 10:38:56.439124564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3437471Z 2025-12-04T10:58:28.3437619Z [W1204 10:38:56.439190884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3437620Z 2025-12-04T10:58:28.3437669Z ('RERUN', {'yellow': True}) [0.6719s] [100%] 2025-12-04T10:58:28.3438034Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:38:56.110923188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438036Z 2025-12-04T10:58:28.3438187Z [W1204 10:38:56.111303656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438201Z 2025-12-04T10:58:28.3438350Z [W1204 10:38:56.111373685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438351Z 2025-12-04T10:58:28.3438500Z [W1204 10:38:56.112655528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438502Z 2025-12-04T10:58:28.3438650Z [W1204 10:38:56.112915266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438652Z 2025-12-04T10:58:28.3438798Z [W1204 10:38:56.112978746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438800Z 2025-12-04T10:58:28.3438950Z [W1204 10:38:56.114891885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3438953Z 2025-12-04T10:58:28.3439100Z [W1204 10:38:56.115248153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3439103Z 2025-12-04T10:58:28.3439252Z [W1204 10:38:56.115313252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3439254Z 2025-12-04T10:58:28.3439292Z FAILED [0.6612s] [100%] 2025-12-04T10:58:28.3439294Z 2025-12-04T10:58:28.3439345Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3439498Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3439545Z Traceback (most recent call last): 2025-12-04T10:58:28.3439724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3439766Z method(*args, **kwargs) 2025-12-04T10:58:28.3439919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3439959Z method(*args, **kwargs) 2025-12-04T10:58:28.3440111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3440147Z with policy(): 2025-12-04T10:58:28.3440299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3440339Z raise RuntimeError(msg) 2025-12-04T10:58:28.3440746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3440749Z 2025-12-04T10:58:28.3440823Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3441131Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3441133Z 2025-12-04T10:58:28.3441219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3441293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3441349Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3441526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3441613Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3441650Z graph_break [] 2025-12-04T10:58:28.3441801Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3441847Z Traceback (most recent call last): 2025-12-04T10:58:28.3442001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3442040Z method(*args, **kwargs) 2025-12-04T10:58:28.3442192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3442232Z method(*args, **kwargs) 2025-12-04T10:58:28.3442382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3442418Z with policy(): 2025-12-04T10:58:28.3442571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3442613Z raise RuntimeError(msg) 2025-12-04T10:58:28.3443026Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3443028Z 2025-12-04T10:58:28.3443100Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3443437Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3443439Z 2025-12-04T10:58:28.3443524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3443631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3443688Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3443866Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3443938Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3443974Z graph_break [] 2025-12-04T10:58:28.3444046Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3444100Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3444171Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3444345Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3444384Z graph_break [] 2025-12-04T10:58:28.3444436Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3444603Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3444647Z Traceback (most recent call last): 2025-12-04T10:58:28.3444801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3444840Z method(*args, **kwargs) 2025-12-04T10:58:28.3444991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3445030Z method(*args, **kwargs) 2025-12-04T10:58:28.3445179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3445215Z with policy(): 2025-12-04T10:58:28.3445383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3445425Z raise RuntimeError(msg) 2025-12-04T10:58:28.3445838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3445840Z 2025-12-04T10:58:28.3445912Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3446202Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3446204Z 2025-12-04T10:58:28.3446290Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3446365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3446421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3446597Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3446671Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3446707Z graph_break [] 2025-12-04T10:58:28.3446780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3446834Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3446906Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3447100Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3447139Z graph_break [] 2025-12-04T10:58:28.3447210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3447266Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3447337Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3447510Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3447546Z graph_break [] 2025-12-04T10:58:28.3447790Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b40a96ef07b17a88.xml - 2025-12-04T10:58:28.3447848Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3448494Z FAILED [0.6612s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3448512Z 2025-12-04T10:58:28.3448586Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3448877Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3448879Z 2025-12-04T10:58:28.3448965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3449038Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3449105Z ================== 1 failed, 57 deselected, 2 rerun in 4.45s =================== 2025-12-04T10:58:28.3449142Z Got exit code 1 2025-12-04T10:58:28.3449182Z Retrying single test... 2025-12-04T10:58:28.3449381Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6a069bd46d72bb1.xml 2025-12-04T10:58:28.3449437Z ============================= test session starts ============================== 2025-12-04T10:58:28.3449548Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3449588Z cachedir: .pytest_cache 2025-12-04T10:58:28.3449746Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3449792Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3449832Z configfile: pytest.ini 2025-12-04T10:58:28.3449995Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3450068Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3450355Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3450398Z Running 1 items in this shard 2025-12-04T10:58:28.3450401Z 2025-12-04T10:58:28.3450765Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:39:05.191555138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3450767Z 2025-12-04T10:58:28.3450947Z [W1204 10:39:06.458095097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3450950Z 2025-12-04T10:58:28.3451103Z [W1204 10:39:06.458253226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451105Z 2025-12-04T10:58:28.3451255Z [W1204 10:39:06.461818155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451257Z 2025-12-04T10:58:28.3451405Z [W1204 10:39:06.462125063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451407Z 2025-12-04T10:58:28.3451555Z [W1204 10:39:06.462188873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451557Z 2025-12-04T10:58:28.3451707Z [W1204 10:39:06.464315970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451710Z 2025-12-04T10:58:28.3451857Z [W1204 10:39:06.464588008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3451875Z 2025-12-04T10:58:28.3452024Z [W1204 10:39:06.464647798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3452026Z 2025-12-04T10:58:28.3452074Z ('RERUN', {'yellow': True}) [2.9638s] [100%] 2025-12-04T10:58:28.3452440Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:39:07.723453831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3452442Z 2025-12-04T10:58:28.3452592Z [W1204 10:39:07.723828029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3452605Z 2025-12-04T10:58:28.3452755Z [W1204 10:39:07.723892669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3452757Z 2025-12-04T10:58:28.3452905Z [W1204 10:39:07.725154811 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3452907Z 2025-12-04T10:58:28.3453053Z [W1204 10:39:07.725415270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3453055Z 2025-12-04T10:58:28.3453205Z [W1204 10:39:07.725480709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3453207Z 2025-12-04T10:58:28.3453391Z [W1204 10:39:07.727507147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3453393Z 2025-12-04T10:58:28.3453541Z [W1204 10:39:07.727849415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3453544Z 2025-12-04T10:58:28.3453692Z [W1204 10:39:07.727912164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3453694Z 2025-12-04T10:58:28.3453742Z ('RERUN', {'yellow': True}) [0.7783s] [100%] 2025-12-04T10:58:28.3454103Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:39:08.478875123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454105Z 2025-12-04T10:58:28.3454281Z [W1204 10:39:08.479278241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454284Z 2025-12-04T10:58:28.3454433Z [W1204 10:39:08.479350361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454436Z 2025-12-04T10:58:28.3454583Z [W1204 10:39:08.480615953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454587Z 2025-12-04T10:58:28.3454735Z [W1204 10:39:08.480880621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454737Z 2025-12-04T10:58:28.3454885Z [W1204 10:39:08.480942991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3454887Z 2025-12-04T10:58:28.3455035Z [W1204 10:39:08.483014389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3455038Z 2025-12-04T10:58:28.3455186Z [W1204 10:39:08.483364027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3455203Z 2025-12-04T10:58:28.3455352Z [W1204 10:39:08.483426436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3455354Z 2025-12-04T10:58:28.3455392Z FAILED [0.7238s] [100%] 2025-12-04T10:58:28.3455394Z 2025-12-04T10:58:28.3455445Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3455598Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3455642Z Traceback (most recent call last): 2025-12-04T10:58:28.3455799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3455856Z method(*args, **kwargs) 2025-12-04T10:58:28.3456008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3456050Z method(*args, **kwargs) 2025-12-04T10:58:28.3456201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3456238Z with policy(): 2025-12-04T10:58:28.3456389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3456430Z raise RuntimeError(msg) 2025-12-04T10:58:28.3456834Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3456838Z 2025-12-04T10:58:28.3456912Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3457204Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3457206Z 2025-12-04T10:58:28.3457292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3457365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3457421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3457599Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3457672Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3457731Z graph_break [] 2025-12-04T10:58:28.3457883Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3457929Z Traceback (most recent call last): 2025-12-04T10:58:28.3458083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3458123Z method(*args, **kwargs) 2025-12-04T10:58:28.3458272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3458312Z method(*args, **kwargs) 2025-12-04T10:58:28.3458461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3458498Z with policy(): 2025-12-04T10:58:28.3458650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3458693Z raise RuntimeError(msg) 2025-12-04T10:58:28.3459101Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3459116Z 2025-12-04T10:58:28.3459190Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3459480Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3459482Z 2025-12-04T10:58:28.3459568Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3459656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3459712Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3459890Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3459964Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3459999Z graph_break [] 2025-12-04T10:58:28.3460073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3460127Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3460199Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3460373Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3460409Z graph_break [] 2025-12-04T10:58:28.3460462Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3460612Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3460659Z Traceback (most recent call last): 2025-12-04T10:58:28.3460811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3460851Z method(*args, **kwargs) 2025-12-04T10:58:28.3461001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3461041Z method(*args, **kwargs) 2025-12-04T10:58:28.3461189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3461226Z with policy(): 2025-12-04T10:58:28.3461400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3461442Z raise RuntimeError(msg) 2025-12-04T10:58:28.3461857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3461859Z 2025-12-04T10:58:28.3461933Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3462223Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3462226Z 2025-12-04T10:58:28.3462312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3462386Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3462454Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3462631Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3462702Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3462739Z graph_break [] 2025-12-04T10:58:28.3462811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3462866Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3462936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3463112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3463165Z graph_break [] 2025-12-04T10:58:28.3463238Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3463326Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3463401Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3463575Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3463611Z graph_break [] 2025-12-04T10:58:28.3463856Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f6a069bd46d72bb1.xml - 2025-12-04T10:58:28.3463916Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3464565Z FAILED [0.7238s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3464571Z 2025-12-04T10:58:28.3464644Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3464934Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3464936Z 2025-12-04T10:58:28.3465022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3465120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3465189Z ================== 1 failed, 57 deselected, 2 rerun in 4.63s =================== 2025-12-04T10:58:28.3465228Z Got exit code 1 2025-12-04T10:58:28.3465471Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3465601Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3465798Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cb00f055a151edf0.xml 2025-12-04T10:58:28.3465857Z ============================= test session starts ============================== 2025-12-04T10:58:28.3465967Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3466011Z cachedir: .pytest_cache 2025-12-04T10:58:28.3466173Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3466235Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3466275Z configfile: pytest.ini 2025-12-04T10:58:28.3466437Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3466512Z collecting ... collected 58 items / 23 deselected / 35 selected 2025-12-04T10:58:28.3466566Z stepcurrent: skipping 23 already run items. 2025-12-04T10:58:28.3466609Z Running 35 items in this shard 2025-12-04T10:58:28.3466611Z 2025-12-04T10:58:28.3466863Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5204s] [ 2%] 2025-12-04T10:58:28.3467114Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4543s] [ 2%] 2025-12-04T10:58:28.3467353Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4464s] [ 2%] 2025-12-04T10:58:28.3467355Z 2025-12-04T10:58:28.3467407Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3467557Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3467604Z Traceback (most recent call last): 2025-12-04T10:58:28.3467761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3467802Z method(*args, **kwargs) 2025-12-04T10:58:28.3467955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3467998Z method(*args, **kwargs) 2025-12-04T10:58:28.3468148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3468187Z with policy(): 2025-12-04T10:58:28.3468339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3468382Z raise RuntimeError(msg) 2025-12-04T10:58:28.3468781Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3468784Z 2025-12-04T10:58:28.3468858Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3469173Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3469176Z 2025-12-04T10:58:28.3469262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3469337Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3469393Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3469574Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3469646Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3469684Z graph_break [] 2025-12-04T10:58:28.3469834Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3469884Z Traceback (most recent call last): 2025-12-04T10:58:28.3470037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3470090Z method(*args, **kwargs) 2025-12-04T10:58:28.3470240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3470280Z method(*args, **kwargs) 2025-12-04T10:58:28.3470429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3470467Z with policy(): 2025-12-04T10:58:28.3470620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3470662Z raise RuntimeError(msg) 2025-12-04T10:58:28.3471067Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3471084Z 2025-12-04T10:58:28.3471158Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3471448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3471450Z 2025-12-04T10:58:28.3471536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3471610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3471666Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3471845Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3471919Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3471955Z graph_break [] 2025-12-04T10:58:28.3472027Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3472083Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3472153Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3472329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3472365Z graph_break [] 2025-12-04T10:58:28.3472416Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3472587Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3472635Z Traceback (most recent call last): 2025-12-04T10:58:28.3472789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3472829Z method(*args, **kwargs) 2025-12-04T10:58:28.3472980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3473020Z method(*args, **kwargs) 2025-12-04T10:58:28.3473168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3473206Z with policy(): 2025-12-04T10:58:28.3473400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3473442Z raise RuntimeError(msg) 2025-12-04T10:58:28.3473849Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3473868Z 2025-12-04T10:58:28.3473941Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3474230Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3474232Z 2025-12-04T10:58:28.3474318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3474391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3474445Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3474637Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3474709Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3474746Z graph_break [] 2025-12-04T10:58:28.3474816Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3474871Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3474941Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3475117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3475152Z graph_break [] 2025-12-04T10:58:28.3475224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3475280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3475353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3475527Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3475564Z graph_break [] 2025-12-04T10:58:28.3475807Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cb00f055a151edf0.xml - 2025-12-04T10:58:28.3475866Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3476527Z FAILED [0.4464s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3476531Z 2025-12-04T10:58:28.3476603Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3476891Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3476893Z 2025-12-04T10:58:28.3476977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3477039Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3477105Z ================== 1 failed, 23 deselected, 2 rerun in 3.59s =================== 2025-12-04T10:58:28.3477143Z Got exit code 1 2025-12-04T10:58:28.3477183Z Retrying single test... 2025-12-04T10:58:28.3477384Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-449c76403cc13d01.xml 2025-12-04T10:58:28.3477452Z ============================= test session starts ============================== 2025-12-04T10:58:28.3477563Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3477603Z cachedir: .pytest_cache 2025-12-04T10:58:28.3477760Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3477805Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3477846Z configfile: pytest.ini 2025-12-04T10:58:28.3478005Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3478079Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3478382Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3478426Z Running 1 items in this shard 2025-12-04T10:58:28.3478429Z 2025-12-04T10:58:28.3478793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:27.814676142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3478795Z 2025-12-04T10:58:28.3478947Z [W1204 10:39:27.082037278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3478949Z 2025-12-04T10:58:28.3479102Z [W1204 10:39:27.082174058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479105Z 2025-12-04T10:58:28.3479255Z [W1204 10:39:27.085948424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479259Z 2025-12-04T10:58:28.3479408Z [W1204 10:39:27.086269342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479410Z 2025-12-04T10:58:28.3479560Z [W1204 10:39:27.086332231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479562Z 2025-12-04T10:58:28.3479710Z [W1204 10:39:27.088637767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479711Z 2025-12-04T10:58:28.3479860Z [W1204 10:39:27.088937725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3479883Z 2025-12-04T10:58:28.3480031Z [W1204 10:39:27.088998825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3480034Z 2025-12-04T10:58:28.3480083Z ('RERUN', {'yellow': True}) [2.8571s] [100%] 2025-12-04T10:58:28.3480444Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:28.237613788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3480446Z 2025-12-04T10:58:28.3480594Z [W1204 10:39:28.238037795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3480596Z 2025-12-04T10:58:28.3480746Z [W1204 10:39:28.238119704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3480749Z 2025-12-04T10:58:28.3480896Z [W1204 10:39:28.239442686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3480909Z 2025-12-04T10:58:28.3481059Z [W1204 10:39:28.239722894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3481061Z 2025-12-04T10:58:28.3481208Z [W1204 10:39:28.239786314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3481211Z 2025-12-04T10:58:28.3481358Z [W1204 10:39:28.241903971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3481360Z 2025-12-04T10:58:28.3481509Z [W1204 10:39:28.242270168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3481525Z 2025-12-04T10:58:28.3481676Z [W1204 10:39:28.242337268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3481679Z 2025-12-04T10:58:28.3481727Z ('RERUN', {'yellow': True}) [0.6468s] [100%] 2025-12-04T10:58:28.3482081Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:29.858323241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482084Z 2025-12-04T10:58:28.3482231Z [W1204 10:39:29.858774718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482233Z 2025-12-04T10:58:28.3482381Z [W1204 10:39:29.858838208 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482385Z 2025-12-04T10:58:28.3482533Z [W1204 10:39:29.860106870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482536Z 2025-12-04T10:58:28.3482685Z [W1204 10:39:29.860364068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482687Z 2025-12-04T10:58:28.3482833Z [W1204 10:39:29.860424488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482835Z 2025-12-04T10:58:28.3482982Z [W1204 10:39:29.862509905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3482984Z 2025-12-04T10:58:28.3483132Z [W1204 10:39:29.862849932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3483136Z 2025-12-04T10:58:28.3483342Z [W1204 10:39:29.862912772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3483345Z 2025-12-04T10:58:28.3483385Z FAILED [0.6157s] [100%] 2025-12-04T10:58:28.3483387Z 2025-12-04T10:58:28.3483437Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3483588Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3483634Z Traceback (most recent call last): 2025-12-04T10:58:28.3483793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3483833Z method(*args, **kwargs) 2025-12-04T10:58:28.3483986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3484026Z method(*args, **kwargs) 2025-12-04T10:58:28.3484178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3484230Z with policy(): 2025-12-04T10:58:28.3484382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3484422Z raise RuntimeError(msg) 2025-12-04T10:58:28.3484818Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3484820Z 2025-12-04T10:58:28.3484893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3485185Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3485201Z 2025-12-04T10:58:28.3485288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3485360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3485417Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3485593Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3485667Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3485703Z graph_break [] 2025-12-04T10:58:28.3485853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3485898Z Traceback (most recent call last): 2025-12-04T10:58:28.3486054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3486095Z method(*args, **kwargs) 2025-12-04T10:58:28.3486245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3486284Z method(*args, **kwargs) 2025-12-04T10:58:28.3486434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3486470Z with policy(): 2025-12-04T10:58:28.3486621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3486662Z raise RuntimeError(msg) 2025-12-04T10:58:28.3487085Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3487089Z 2025-12-04T10:58:28.3487162Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3487451Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3487453Z 2025-12-04T10:58:28.3487539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3487611Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3487666Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3487843Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3487916Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3487964Z graph_break [] 2025-12-04T10:58:28.3488039Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3488093Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3488165Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3488340Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3488376Z graph_break [] 2025-12-04T10:58:28.3488426Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3488576Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3488634Z Traceback (most recent call last): 2025-12-04T10:58:28.3488787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3488828Z method(*args, **kwargs) 2025-12-04T10:58:28.3488977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3489016Z method(*args, **kwargs) 2025-12-04T10:58:28.3489165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3489201Z with policy(): 2025-12-04T10:58:28.3489353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3489393Z raise RuntimeError(msg) 2025-12-04T10:58:28.3489798Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3489802Z 2025-12-04T10:58:28.3489876Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3490164Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3490166Z 2025-12-04T10:58:28.3490252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3490325Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3490380Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3490576Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3490651Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3490688Z graph_break [] 2025-12-04T10:58:28.3490760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3490813Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3490886Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3491060Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3491096Z graph_break [] 2025-12-04T10:58:28.3491167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3491222Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3491294Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3491468Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3491523Z graph_break [] 2025-12-04T10:58:28.3491766Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-449c76403cc13d01.xml - 2025-12-04T10:58:28.3491824Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3492468Z FAILED [0.6157s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3492482Z 2025-12-04T10:58:28.3492555Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3492841Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3492843Z 2025-12-04T10:58:28.3492927Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3492988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3493053Z ================== 1 failed, 57 deselected, 2 rerun in 4.28s =================== 2025-12-04T10:58:28.3493089Z Got exit code 1 2025-12-04T10:58:28.3493129Z Retrying single test... 2025-12-04T10:58:28.3493359Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb8cc7447648d933.xml 2025-12-04T10:58:28.3493417Z ============================= test session starts ============================== 2025-12-04T10:58:28.3493526Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3493567Z cachedir: .pytest_cache 2025-12-04T10:58:28.3493723Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3493769Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3493810Z configfile: pytest.ini 2025-12-04T10:58:28.3493973Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3494045Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3494359Z stepcurrent: skipping 23 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3494408Z Running 1 items in this shard 2025-12-04T10:58:28.3494410Z 2025-12-04T10:58:28.3494773Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:38.231795872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3494775Z 2025-12-04T10:58:28.3494927Z [W1204 10:39:39.496595228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3494929Z 2025-12-04T10:58:28.3495079Z [W1204 10:39:39.496773727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495082Z 2025-12-04T10:58:28.3495233Z [W1204 10:39:39.500546132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495250Z 2025-12-04T10:58:28.3495400Z [W1204 10:39:39.500851330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495403Z 2025-12-04T10:58:28.3495550Z [W1204 10:39:39.500914000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495552Z 2025-12-04T10:58:28.3495699Z [W1204 10:39:39.503035216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495701Z 2025-12-04T10:58:28.3495849Z [W1204 10:39:39.503307864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3495851Z 2025-12-04T10:58:28.3496017Z [W1204 10:39:39.503368394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3496019Z 2025-12-04T10:58:28.3496069Z ('RERUN', {'yellow': True}) [2.8039s] [100%] 2025-12-04T10:58:28.3496426Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:40.492188558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3496429Z 2025-12-04T10:58:28.3496576Z [W1204 10:39:40.492571755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3496578Z 2025-12-04T10:58:28.3496725Z [W1204 10:39:40.492644925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3496728Z 2025-12-04T10:58:28.3496878Z [W1204 10:39:40.493909667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3496881Z 2025-12-04T10:58:28.3497029Z [W1204 10:39:40.494176295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3497031Z 2025-12-04T10:58:28.3497179Z [W1204 10:39:40.494240425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3497181Z 2025-12-04T10:58:28.3497330Z [W1204 10:39:40.496184502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3497332Z 2025-12-04T10:58:28.3497479Z [W1204 10:39:40.496523410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3497481Z 2025-12-04T10:58:28.3497654Z [W1204 10:39:40.496586599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3497657Z 2025-12-04T10:58:28.3497707Z ('RERUN', {'yellow': True}) [0.4886s] [100%] 2025-12-04T10:58:28.3498063Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:39:40.972015764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498065Z 2025-12-04T10:58:28.3498212Z [W1204 10:39:40.972380251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498215Z 2025-12-04T10:58:28.3498361Z [W1204 10:39:40.972445071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498363Z 2025-12-04T10:58:28.3498513Z [W1204 10:39:40.973694403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498526Z 2025-12-04T10:58:28.3498674Z [W1204 10:39:40.973949741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498676Z 2025-12-04T10:58:28.3498826Z [W1204 10:39:40.974014641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498828Z 2025-12-04T10:58:28.3498976Z [W1204 10:39:40.975977078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3498978Z 2025-12-04T10:58:28.3499128Z [W1204 10:39:40.976322086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3499130Z 2025-12-04T10:58:28.3499280Z [W1204 10:39:40.976386985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3499294Z 2025-12-04T10:58:28.3499333Z FAILED [0.4702s] [100%] 2025-12-04T10:58:28.3499335Z 2025-12-04T10:58:28.3499386Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3499536Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3499582Z Traceback (most recent call last): 2025-12-04T10:58:28.3499738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3499778Z method(*args, **kwargs) 2025-12-04T10:58:28.3499929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3499969Z method(*args, **kwargs) 2025-12-04T10:58:28.3500120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3500157Z with policy(): 2025-12-04T10:58:28.3500312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3500353Z raise RuntimeError(msg) 2025-12-04T10:58:28.3500746Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3500748Z 2025-12-04T10:58:28.3500821Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3501133Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3501137Z 2025-12-04T10:58:28.3501223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3501297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3501352Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3501531Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3501602Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3501640Z graph_break [] 2025-12-04T10:58:28.3501789Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3501835Z Traceback (most recent call last): 2025-12-04T10:58:28.3501989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3502031Z method(*args, **kwargs) 2025-12-04T10:58:28.3502193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3502233Z method(*args, **kwargs) 2025-12-04T10:58:28.3502383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3502420Z with policy(): 2025-12-04T10:58:28.3502572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3502613Z raise RuntimeError(msg) 2025-12-04T10:58:28.3503019Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3503032Z 2025-12-04T10:58:28.3503107Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3503433Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3503436Z 2025-12-04T10:58:28.3503522Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3503597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3503651Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3503827Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3503901Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3503938Z graph_break [] 2025-12-04T10:58:28.3504010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3504065Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3504135Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3504311Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3504346Z graph_break [] 2025-12-04T10:58:28.3504398Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3504548Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3504593Z Traceback (most recent call last): 2025-12-04T10:58:28.3504785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3504827Z method(*args, **kwargs) 2025-12-04T10:58:28.3504976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3505015Z method(*args, **kwargs) 2025-12-04T10:58:28.3505163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3505201Z with policy(): 2025-12-04T10:58:28.3505352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3505393Z raise RuntimeError(msg) 2025-12-04T10:58:28.3505799Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3505817Z 2025-12-04T10:58:28.3505890Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3506179Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3506181Z 2025-12-04T10:58:28.3506266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3506338Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3506394Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3506570Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3506656Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3506694Z graph_break [] 2025-12-04T10:58:28.3506765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3506822Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3506892Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3507067Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3507103Z graph_break [] 2025-12-04T10:58:28.3507175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3507228Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3507300Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3507475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3507512Z graph_break [] 2025-12-04T10:58:28.3507754Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb8cc7447648d933.xml - 2025-12-04T10:58:28.3507813Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3508471Z FAILED [0.4702s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3508475Z 2025-12-04T10:58:28.3508547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3508838Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3508840Z 2025-12-04T10:58:28.3508925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3508989Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3509053Z ================== 1 failed, 57 deselected, 2 rerun in 3.93s =================== 2025-12-04T10:58:28.3509090Z Got exit code 1 2025-12-04T10:58:28.3509330Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3509461Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3509673Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-29c4ead6da966b43.xml 2025-12-04T10:58:28.3509731Z ============================= test session starts ============================== 2025-12-04T10:58:28.3509841Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3509883Z cachedir: .pytest_cache 2025-12-04T10:58:28.3510039Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3510086Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3510126Z configfile: pytest.ini 2025-12-04T10:58:28.3510287Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3510373Z collecting ... collected 58 items / 24 deselected / 34 selected 2025-12-04T10:58:28.3510428Z stepcurrent: skipping 24 already run items. 2025-12-04T10:58:28.3510472Z Running 34 items in this shard 2025-12-04T10:58:28.3510474Z 2025-12-04T10:58:28.3510724Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.9002s] [ 2%] 2025-12-04T10:58:28.3510968Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5311s] [ 2%] 2025-12-04T10:58:28.3511189Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4742s] [ 2%] 2025-12-04T10:58:28.3511193Z 2025-12-04T10:58:28.3511245Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3511393Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3511440Z Traceback (most recent call last): 2025-12-04T10:58:28.3511597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3511638Z method(*args, **kwargs) 2025-12-04T10:58:28.3511788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3511831Z method(*args, **kwargs) 2025-12-04T10:58:28.3511981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3512017Z with policy(): 2025-12-04T10:58:28.3512191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3512235Z raise RuntimeError(msg) 2025-12-04T10:58:28.3512632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3512634Z 2025-12-04T10:58:28.3512706Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3512996Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3512998Z 2025-12-04T10:58:28.3513083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3513158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3513213Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3513537Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3513609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3513646Z graph_break [] 2025-12-04T10:58:28.3513795Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3513840Z Traceback (most recent call last): 2025-12-04T10:58:28.3513992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3514032Z method(*args, **kwargs) 2025-12-04T10:58:28.3514202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3514243Z method(*args, **kwargs) 2025-12-04T10:58:28.3514392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3514430Z with policy(): 2025-12-04T10:58:28.3514579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3514620Z raise RuntimeError(msg) 2025-12-04T10:58:28.3515021Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3515024Z 2025-12-04T10:58:28.3515098Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3515389Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3515392Z 2025-12-04T10:58:28.3515477Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3515551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3515606Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3515882Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3515954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3516020Z graph_break [] 2025-12-04T10:58:28.3516093Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3516150Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3516220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3516497Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3516534Z graph_break [] 2025-12-04T10:58:28.3516584Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3516735Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3516780Z Traceback (most recent call last): 2025-12-04T10:58:28.3516936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3516990Z method(*args, **kwargs) 2025-12-04T10:58:28.3517140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3517180Z method(*args, **kwargs) 2025-12-04T10:58:28.3517330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3517366Z with policy(): 2025-12-04T10:58:28.3517518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3517559Z raise RuntimeError(msg) 2025-12-04T10:58:28.3517966Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3517981Z 2025-12-04T10:58:28.3518054Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3518342Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3518344Z 2025-12-04T10:58:28.3518428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3518501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3518556Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3518831Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3518906Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3518941Z graph_break [] 2025-12-04T10:58:28.3519012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3519066Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3519139Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3519409Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3519445Z graph_break [] 2025-12-04T10:58:28.3519516Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3519599Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3519670Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3519939Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3519975Z graph_break [] 2025-12-04T10:58:28.3520219Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-29c4ead6da966b43.xml - 2025-12-04T10:58:28.3520277Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3520915Z FAILED [0.4742s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3520931Z 2025-12-04T10:58:28.3521003Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3521291Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3521293Z 2025-12-04T10:58:28.3521379Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3521439Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3521505Z ================== 1 failed, 24 deselected, 2 rerun in 4.07s =================== 2025-12-04T10:58:28.3521557Z Got exit code 1 2025-12-04T10:58:28.3521597Z Retrying single test... 2025-12-04T10:58:28.3521792Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-725e06f85442251d.xml 2025-12-04T10:58:28.3521849Z ============================= test session starts ============================== 2025-12-04T10:58:28.3521959Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3522001Z cachedir: .pytest_cache 2025-12-04T10:58:28.3522157Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3522204Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3522243Z configfile: pytest.ini 2025-12-04T10:58:28.3522404Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3522479Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3522767Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3522811Z Running 1 items in this shard 2025-12-04T10:58:28.3522814Z 2025-12-04T10:58:28.3523177Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:00.122405397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3523180Z 2025-12-04T10:58:28.3523370Z [W1204 10:40:01.398367274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3523372Z 2025-12-04T10:58:28.3523551Z [W1204 10:40:01.398521013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3523554Z 2025-12-04T10:58:28.3523704Z [W1204 10:40:01.402422747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3523706Z 2025-12-04T10:58:28.3523855Z [W1204 10:40:01.402732965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3523858Z 2025-12-04T10:58:28.3524005Z [W1204 10:40:01.402793154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3524007Z 2025-12-04T10:58:28.3524155Z [W1204 10:40:01.404944369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3524157Z 2025-12-04T10:58:28.3524306Z [W1204 10:40:01.405225838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3524309Z 2025-12-04T10:58:28.3524472Z [W1204 10:40:01.405286437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3524473Z 2025-12-04T10:58:28.3524521Z ('RERUN', {'yellow': True}) [3.3050s] [100%] 2025-12-04T10:58:28.3524881Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:01.169846404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3524883Z 2025-12-04T10:58:28.3525032Z [W1204 10:40:01.170220222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525034Z 2025-12-04T10:58:28.3525184Z [W1204 10:40:01.170286901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525200Z 2025-12-04T10:58:28.3525350Z [W1204 10:40:01.171548603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525352Z 2025-12-04T10:58:28.3525498Z [W1204 10:40:01.171800621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525500Z 2025-12-04T10:58:28.3525647Z [W1204 10:40:01.171858891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525649Z 2025-12-04T10:58:28.3525795Z [W1204 10:40:01.173861957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525798Z 2025-12-04T10:58:28.3525946Z [W1204 10:40:01.174129015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3525949Z 2025-12-04T10:58:28.3526097Z [W1204 10:40:01.174190305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3526100Z 2025-12-04T10:58:28.3526147Z ('RERUN', {'yellow': True}) [0.6235s] [100%] 2025-12-04T10:58:28.3526503Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:02.791160367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3526506Z 2025-12-04T10:58:28.3526653Z [W1204 10:40:02.791525054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3526655Z 2025-12-04T10:58:28.3526827Z [W1204 10:40:02.791588854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3526830Z 2025-12-04T10:58:28.3526980Z [W1204 10:40:02.792837915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3526982Z 2025-12-04T10:58:28.3527130Z [W1204 10:40:02.793102514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3527131Z 2025-12-04T10:58:28.3527281Z [W1204 10:40:02.793165063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3527283Z 2025-12-04T10:58:28.3527429Z [W1204 10:40:02.795171359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3527432Z 2025-12-04T10:58:28.3527581Z [W1204 10:40:02.795430518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3527583Z 2025-12-04T10:58:28.3527731Z [W1204 10:40:02.795489707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3527747Z 2025-12-04T10:58:28.3527787Z FAILED [0.6253s] [100%] 2025-12-04T10:58:28.3527790Z 2025-12-04T10:58:28.3527841Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3527992Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3528038Z Traceback (most recent call last): 2025-12-04T10:58:28.3528196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3528237Z method(*args, **kwargs) 2025-12-04T10:58:28.3528393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3528446Z method(*args, **kwargs) 2025-12-04T10:58:28.3528597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3528635Z with policy(): 2025-12-04T10:58:28.3528787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3528828Z raise RuntimeError(msg) 2025-12-04T10:58:28.3529223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3529225Z 2025-12-04T10:58:28.3529299Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3529592Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3529595Z 2025-12-04T10:58:28.3529682Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3529754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3529810Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3530083Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3530156Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3530193Z graph_break [] 2025-12-04T10:58:28.3530372Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3530419Z Traceback (most recent call last): 2025-12-04T10:58:28.3530571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3530612Z method(*args, **kwargs) 2025-12-04T10:58:28.3530760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3530801Z method(*args, **kwargs) 2025-12-04T10:58:28.3530949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3530986Z with policy(): 2025-12-04T10:58:28.3531137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3531179Z raise RuntimeError(msg) 2025-12-04T10:58:28.3531588Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3531602Z 2025-12-04T10:58:28.3531675Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3531963Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3531967Z 2025-12-04T10:58:28.3532052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3532125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3532182Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3532471Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3532544Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3532582Z graph_break [] 2025-12-04T10:58:28.3532654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3532708Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3532778Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3533050Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3533088Z graph_break [] 2025-12-04T10:58:28.3533139Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3533322Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3533367Z Traceback (most recent call last): 2025-12-04T10:58:28.3533519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3533560Z method(*args, **kwargs) 2025-12-04T10:58:28.3533709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3533751Z method(*args, **kwargs) 2025-12-04T10:58:28.3533899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3533936Z with policy(): 2025-12-04T10:58:28.3534117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3534159Z raise RuntimeError(msg) 2025-12-04T10:58:28.3534562Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3534565Z 2025-12-04T10:58:28.3534638Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3534926Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3534928Z 2025-12-04T10:58:28.3535015Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3535089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3535159Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3535430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3535502Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3535538Z graph_break [] 2025-12-04T10:58:28.3535608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3535663Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3535733Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3536006Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3536055Z graph_break [] 2025-12-04T10:58:28.3536128Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3536181Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3536253Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3536521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3536559Z graph_break [] 2025-12-04T10:58:28.3536801Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-725e06f85442251d.xml - 2025-12-04T10:58:28.3536862Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3537499Z FAILED [0.6253s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3537502Z 2025-12-04T10:58:28.3537574Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3537884Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3537888Z 2025-12-04T10:58:28.3537973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3538036Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3538101Z ================== 1 failed, 57 deselected, 2 rerun in 4.70s =================== 2025-12-04T10:58:28.3538139Z Got exit code 1 2025-12-04T10:58:28.3538178Z Retrying single test... 2025-12-04T10:58:28.3538378Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-060cb42cc5b5f825.xml 2025-12-04T10:58:28.3538434Z ============================= test session starts ============================== 2025-12-04T10:58:28.3538546Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3538586Z cachedir: .pytest_cache 2025-12-04T10:58:28.3538745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3538792Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3538843Z configfile: pytest.ini 2025-12-04T10:58:28.3539004Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3539076Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3539363Z stepcurrent: skipping 24 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3539406Z Running 1 items in this shard 2025-12-04T10:58:28.3539408Z 2025-12-04T10:58:28.3539771Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:12.763506334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3539784Z 2025-12-04T10:58:28.3539938Z [W1204 10:40:12.025174267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3539940Z 2025-12-04T10:58:28.3540093Z [W1204 10:40:12.025304576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540095Z 2025-12-04T10:58:28.3540245Z [W1204 10:40:12.028252835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540246Z 2025-12-04T10:58:28.3540394Z [W1204 10:40:12.028548323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540396Z 2025-12-04T10:58:28.3540546Z [W1204 10:40:12.028611073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540548Z 2025-12-04T10:58:28.3540696Z [W1204 10:40:12.030747968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540698Z 2025-12-04T10:58:28.3540847Z [W1204 10:40:12.031032286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540848Z 2025-12-04T10:58:28.3540996Z [W1204 10:40:12.031096835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3540999Z 2025-12-04T10:58:28.3541047Z ('RERUN', {'yellow': True}) [3.1429s] [100%] 2025-12-04T10:58:28.3541426Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:13.628993498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3541430Z 2025-12-04T10:58:28.3541578Z [W1204 10:40:13.629354375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3541581Z 2025-12-04T10:58:28.3541729Z [W1204 10:40:13.629417665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3541731Z 2025-12-04T10:58:28.3541879Z [W1204 10:40:13.630715436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3541880Z 2025-12-04T10:58:28.3542028Z [W1204 10:40:13.630974894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3542030Z 2025-12-04T10:58:28.3542180Z [W1204 10:40:13.631044233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3542183Z 2025-12-04T10:58:28.3542330Z [W1204 10:40:13.633060359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3542344Z 2025-12-04T10:58:28.3542493Z [W1204 10:40:13.633325068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3542495Z 2025-12-04T10:58:28.3542643Z [W1204 10:40:13.633385637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3542645Z 2025-12-04T10:58:28.3542692Z ('RERUN', {'yellow': True}) [0.4695s] [100%] 2025-12-04T10:58:28.3543049Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:40:13.071255670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543064Z 2025-12-04T10:58:28.3543212Z [W1204 10:40:13.071609737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543215Z 2025-12-04T10:58:28.3543394Z [W1204 10:40:13.071678827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543396Z 2025-12-04T10:58:28.3543543Z [W1204 10:40:13.072968288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543545Z 2025-12-04T10:58:28.3543693Z [W1204 10:40:13.073237886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543695Z 2025-12-04T10:58:28.3543843Z [W1204 10:40:13.073301146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543846Z 2025-12-04T10:58:28.3543994Z [W1204 10:40:13.075326671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3543997Z 2025-12-04T10:58:28.3544147Z [W1204 10:40:13.075593330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3544148Z 2025-12-04T10:58:28.3544296Z [W1204 10:40:13.075653299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3544298Z 2025-12-04T10:58:28.3544337Z FAILED [0.4318s] [100%] 2025-12-04T10:58:28.3544339Z 2025-12-04T10:58:28.3544389Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3544540Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3544613Z Traceback (most recent call last): 2025-12-04T10:58:28.3544771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3544812Z method(*args, **kwargs) 2025-12-04T10:58:28.3544965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3545004Z method(*args, **kwargs) 2025-12-04T10:58:28.3545156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3545192Z with policy(): 2025-12-04T10:58:28.3545345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3545385Z raise RuntimeError(msg) 2025-12-04T10:58:28.3545781Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3545803Z 2025-12-04T10:58:28.3545877Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3546166Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3546168Z 2025-12-04T10:58:28.3546254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3546326Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3546382Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3546657Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3546744Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3546780Z graph_break [] 2025-12-04T10:58:28.3546930Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3546974Z Traceback (most recent call last): 2025-12-04T10:58:28.3547127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3547166Z method(*args, **kwargs) 2025-12-04T10:58:28.3547316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3547355Z method(*args, **kwargs) 2025-12-04T10:58:28.3547506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3547543Z with policy(): 2025-12-04T10:58:28.3547696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3547737Z raise RuntimeError(msg) 2025-12-04T10:58:28.3548139Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3548142Z 2025-12-04T10:58:28.3548216Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3548527Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3548530Z 2025-12-04T10:58:28.3548617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3548691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3548747Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3549017Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3549089Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3549125Z graph_break [] 2025-12-04T10:58:28.3549197Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3549251Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3549325Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3549594Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3549643Z graph_break [] 2025-12-04T10:58:28.3549694Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3549845Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3549890Z Traceback (most recent call last): 2025-12-04T10:58:28.3550043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3550082Z method(*args, **kwargs) 2025-12-04T10:58:28.3550233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3550284Z method(*args, **kwargs) 2025-12-04T10:58:28.3550435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3550472Z with policy(): 2025-12-04T10:58:28.3550623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3550664Z raise RuntimeError(msg) 2025-12-04T10:58:28.3551069Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3551071Z 2025-12-04T10:58:28.3551144Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3551435Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3551438Z 2025-12-04T10:58:28.3551524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3551596Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3551652Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3551923Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3551996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3552032Z graph_break [] 2025-12-04T10:58:28.3552130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3552184Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3552256Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3552527Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3552563Z graph_break [] 2025-12-04T10:58:28.3552636Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3552689Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3552761Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3553031Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3553081Z graph_break [] 2025-12-04T10:58:28.3553359Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-060cb42cc5b5f825.xml - 2025-12-04T10:58:28.3553419Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3554048Z FAILED [0.4318s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3554069Z 2025-12-04T10:58:28.3554141Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3554430Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3554432Z 2025-12-04T10:58:28.3554518Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3554580Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3554644Z ================== 1 failed, 57 deselected, 2 rerun in 4.21s =================== 2025-12-04T10:58:28.3554681Z Got exit code 1 2025-12-04T10:58:28.3554919Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3555049Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3555249Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-da0ebb988fc302fc.xml 2025-12-04T10:58:28.3555306Z ============================= test session starts ============================== 2025-12-04T10:58:28.3555414Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3555456Z cachedir: .pytest_cache 2025-12-04T10:58:28.3555613Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3555659Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3555698Z configfile: pytest.ini 2025-12-04T10:58:28.3555857Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3555958Z collecting ... collected 58 items / 25 deselected / 33 selected 2025-12-04T10:58:28.3556013Z stepcurrent: skipping 25 already run items. 2025-12-04T10:58:28.3556057Z Running 33 items in this shard 2025-12-04T10:58:28.3556059Z 2025-12-04T10:58:28.3556309Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6083s] [ 3%] 2025-12-04T10:58:28.3556558Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6584s] [ 3%] 2025-12-04T10:58:28.3556780Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.6463s] [ 3%] 2025-12-04T10:58:28.3556783Z 2025-12-04T10:58:28.3556837Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3556986Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3557051Z Traceback (most recent call last): 2025-12-04T10:58:28.3557207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3557247Z method(*args, **kwargs) 2025-12-04T10:58:28.3557398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3557438Z method(*args, **kwargs) 2025-12-04T10:58:28.3557587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3557623Z with policy(): 2025-12-04T10:58:28.3557776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3557830Z raise RuntimeError(msg) 2025-12-04T10:58:28.3558225Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3558229Z 2025-12-04T10:58:28.3558303Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3558594Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3558596Z 2025-12-04T10:58:28.3558685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3558761Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3558816Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3558994Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3559065Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3559101Z graph_break [] 2025-12-04T10:58:28.3559250Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3559295Z Traceback (most recent call last): 2025-12-04T10:58:28.3559448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3559488Z method(*args, **kwargs) 2025-12-04T10:58:28.3559669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3559710Z method(*args, **kwargs) 2025-12-04T10:58:28.3559859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3559897Z with policy(): 2025-12-04T10:58:28.3560048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3560090Z raise RuntimeError(msg) 2025-12-04T10:58:28.3560493Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3560495Z 2025-12-04T10:58:28.3560568Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3560860Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3560874Z 2025-12-04T10:58:28.3560960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3561033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3561088Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3561264Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3561335Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3561372Z graph_break [] 2025-12-04T10:58:28.3561443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3561513Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3561583Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3561762Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3561797Z graph_break [] 2025-12-04T10:58:28.3561848Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3561996Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3562041Z Traceback (most recent call last): 2025-12-04T10:58:28.3562193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3562235Z method(*args, **kwargs) 2025-12-04T10:58:28.3562386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3562426Z method(*args, **kwargs) 2025-12-04T10:58:28.3562576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3562613Z with policy(): 2025-12-04T10:58:28.3562764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3562805Z raise RuntimeError(msg) 2025-12-04T10:58:28.3563209Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3563212Z 2025-12-04T10:58:28.3563347Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3563640Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3563644Z 2025-12-04T10:58:28.3563729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3563803Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3563857Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3564034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3564105Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3564142Z graph_break [] 2025-12-04T10:58:28.3564214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3564268Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3564357Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3564531Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3564566Z graph_break [] 2025-12-04T10:58:28.3564639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3564692Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3564763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3564936Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3564972Z graph_break [] 2025-12-04T10:58:28.3565232Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-da0ebb988fc302fc.xml - 2025-12-04T10:58:28.3565291Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3565930Z FAILED [0.6463s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3565932Z 2025-12-04T10:58:28.3566005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3566298Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3566302Z 2025-12-04T10:58:28.3566387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3566448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3566513Z ================== 1 failed, 25 deselected, 2 rerun in 4.07s =================== 2025-12-04T10:58:28.3566550Z Got exit code 1 2025-12-04T10:58:28.3566590Z Retrying single test... 2025-12-04T10:58:28.3566788Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7e745ec90cc6ed6a.xml 2025-12-04T10:58:28.3566844Z ============================= test session starts ============================== 2025-12-04T10:58:28.3566977Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3567019Z cachedir: .pytest_cache 2025-12-04T10:58:28.3567176Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3567221Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3567261Z configfile: pytest.ini 2025-12-04T10:58:28.3567419Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3567493Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3567778Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3567821Z Running 1 items in this shard 2025-12-04T10:58:28.3567823Z 2025-12-04T10:58:28.3568190Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:33.968058510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568204Z 2025-12-04T10:58:28.3568359Z [W1204 10:40:33.233405421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568361Z 2025-12-04T10:58:28.3568514Z [W1204 10:40:33.233546360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568516Z 2025-12-04T10:58:28.3568664Z [W1204 10:40:33.237124144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568667Z 2025-12-04T10:58:28.3568817Z [W1204 10:40:33.237416892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568829Z 2025-12-04T10:58:28.3568977Z [W1204 10:40:33.237477732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3568980Z 2025-12-04T10:58:28.3569127Z [W1204 10:40:33.239675666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3569129Z 2025-12-04T10:58:28.3569277Z [W1204 10:40:33.239956044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3569279Z 2025-12-04T10:58:28.3569427Z [W1204 10:40:33.240022243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3569429Z 2025-12-04T10:58:28.3569479Z ('RERUN', {'yellow': True}) [2.9468s] [100%] 2025-12-04T10:58:28.3569841Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:35.411454597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3569845Z 2025-12-04T10:58:28.3569993Z [W1204 10:40:35.411841625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3569995Z 2025-12-04T10:58:28.3570144Z [W1204 10:40:35.411906334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570146Z 2025-12-04T10:58:28.3570294Z [W1204 10:40:35.413178345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570296Z 2025-12-04T10:58:28.3570468Z [W1204 10:40:35.413442963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570471Z 2025-12-04T10:58:28.3570620Z [W1204 10:40:35.413504302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570624Z 2025-12-04T10:58:28.3570772Z [W1204 10:40:35.415480868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570774Z 2025-12-04T10:58:28.3570923Z [W1204 10:40:35.415830406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3570924Z 2025-12-04T10:58:28.3571072Z [W1204 10:40:35.415893225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3571074Z 2025-12-04T10:58:28.3571124Z ('RERUN', {'yellow': True}) [0.6634s] [100%] 2025-12-04T10:58:28.3571483Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:35.058689831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3571497Z 2025-12-04T10:58:28.3571647Z [W1204 10:40:35.059093778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3571649Z 2025-12-04T10:58:28.3571798Z [W1204 10:40:35.059178138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3571800Z 2025-12-04T10:58:28.3571947Z [W1204 10:40:35.060459928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3571949Z 2025-12-04T10:58:28.3572098Z [W1204 10:40:35.060736366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3572113Z 2025-12-04T10:58:28.3572262Z [W1204 10:40:35.060797786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3572265Z 2025-12-04T10:58:28.3572413Z [W1204 10:40:35.062764532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3572415Z 2025-12-04T10:58:28.3572564Z [W1204 10:40:35.063159049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3572566Z 2025-12-04T10:58:28.3572714Z [W1204 10:40:35.063223948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3572716Z 2025-12-04T10:58:28.3572755Z FAILED [0.6435s] [100%] 2025-12-04T10:58:28.3572757Z 2025-12-04T10:58:28.3572807Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3572962Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3573007Z Traceback (most recent call last): 2025-12-04T10:58:28.3573163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3573203Z method(*args, **kwargs) 2025-12-04T10:58:28.3573388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3573427Z method(*args, **kwargs) 2025-12-04T10:58:28.3573579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3573615Z with policy(): 2025-12-04T10:58:28.3573768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3573844Z raise RuntimeError(msg) 2025-12-04T10:58:28.3574246Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3574249Z 2025-12-04T10:58:28.3574321Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3574611Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3574613Z 2025-12-04T10:58:28.3574699Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3574772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3574831Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3575007Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3575097Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3575133Z graph_break [] 2025-12-04T10:58:28.3575283Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3575327Z Traceback (most recent call last): 2025-12-04T10:58:28.3575480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3575519Z method(*args, **kwargs) 2025-12-04T10:58:28.3575669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3575724Z method(*args, **kwargs) 2025-12-04T10:58:28.3575874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3575911Z with policy(): 2025-12-04T10:58:28.3576062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3576102Z raise RuntimeError(msg) 2025-12-04T10:58:28.3576507Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3576509Z 2025-12-04T10:58:28.3576581Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3576874Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3576877Z 2025-12-04T10:58:28.3576963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3577035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3577092Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3577268Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3577340Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3577375Z graph_break [] 2025-12-04T10:58:28.3577448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3577502Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3577597Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3577772Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3577808Z graph_break [] 2025-12-04T10:58:28.3577859Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3578008Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3578052Z Traceback (most recent call last): 2025-12-04T10:58:28.3578206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3578244Z method(*args, **kwargs) 2025-12-04T10:58:28.3578395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3578436Z method(*args, **kwargs) 2025-12-04T10:58:28.3578584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3578633Z with policy(): 2025-12-04T10:58:28.3578784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3578824Z raise RuntimeError(msg) 2025-12-04T10:58:28.3579232Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3579234Z 2025-12-04T10:58:28.3579309Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3579617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3579620Z 2025-12-04T10:58:28.3579708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3579780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3579837Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3580011Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3580082Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3580117Z graph_break [] 2025-12-04T10:58:28.3580189Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3580246Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3580317Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3580492Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3580528Z graph_break [] 2025-12-04T10:58:28.3580599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3580654Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3580724Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3580897Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3580932Z graph_break [] 2025-12-04T10:58:28.3581200Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7e745ec90cc6ed6a.xml - 2025-12-04T10:58:28.3581260Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3581895Z FAILED [0.6435s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3581897Z 2025-12-04T10:58:28.3581971Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3582261Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3582275Z 2025-12-04T10:58:28.3582362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3582422Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3582489Z ================== 1 failed, 57 deselected, 2 rerun in 4.40s =================== 2025-12-04T10:58:28.3582525Z Got exit code 1 2025-12-04T10:58:28.3582565Z Retrying single test... 2025-12-04T10:58:28.3582760Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-22470a54131d25a8.xml 2025-12-04T10:58:28.3582817Z ============================= test session starts ============================== 2025-12-04T10:58:28.3582925Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3582979Z cachedir: .pytest_cache 2025-12-04T10:58:28.3583137Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3583184Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3583223Z configfile: pytest.ini 2025-12-04T10:58:28.3583413Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3583485Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3583773Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3583817Z Running 1 items in this shard 2025-12-04T10:58:28.3583819Z 2025-12-04T10:58:28.3584183Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:45.286103335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584187Z 2025-12-04T10:58:28.3584340Z [W1204 10:40:45.553668209 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584342Z 2025-12-04T10:58:28.3584492Z [W1204 10:40:45.553827258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584494Z 2025-12-04T10:58:28.3584644Z [W1204 10:40:45.556919815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584646Z 2025-12-04T10:58:28.3584795Z [W1204 10:40:45.557228473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584798Z 2025-12-04T10:58:28.3584975Z [W1204 10:40:45.557291432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3584978Z 2025-12-04T10:58:28.3585127Z [W1204 10:40:45.559438286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3585129Z 2025-12-04T10:58:28.3585276Z [W1204 10:40:45.559710324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3585278Z 2025-12-04T10:58:28.3585427Z [W1204 10:40:45.559769784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3585429Z 2025-12-04T10:58:28.3585476Z ('RERUN', {'yellow': True}) [2.7752s] [100%] 2025-12-04T10:58:28.3585843Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:46.494176559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3585860Z 2025-12-04T10:58:28.3586010Z [W1204 10:40:46.494535356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586012Z 2025-12-04T10:58:28.3586159Z [W1204 10:40:46.494599506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586161Z 2025-12-04T10:58:28.3586310Z [W1204 10:40:46.495856747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586311Z 2025-12-04T10:58:28.3586459Z [W1204 10:40:46.496116425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586480Z 2025-12-04T10:58:28.3586630Z [W1204 10:40:46.496179314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586633Z 2025-12-04T10:58:28.3586781Z [W1204 10:40:46.498140370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586783Z 2025-12-04T10:58:28.3586930Z [W1204 10:40:46.498477337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3586932Z 2025-12-04T10:58:28.3587080Z [W1204 10:40:46.498541577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3587082Z 2025-12-04T10:58:28.3587128Z ('RERUN', {'yellow': True}) [0.4471s] [100%] 2025-12-04T10:58:28.3587490Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:40:46.955807053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3587493Z 2025-12-04T10:58:28.3587642Z [W1204 10:40:46.956182060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3587645Z 2025-12-04T10:58:28.3587794Z [W1204 10:40:46.956253980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3587796Z 2025-12-04T10:58:28.3587945Z [W1204 10:40:46.957606080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3587947Z 2025-12-04T10:58:28.3588095Z [W1204 10:40:46.957870728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3588098Z 2025-12-04T10:58:28.3588270Z [W1204 10:40:46.957937777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3588273Z 2025-12-04T10:58:28.3588421Z [W1204 10:40:46.959929913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3588423Z 2025-12-04T10:58:28.3588570Z [W1204 10:40:46.960284500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3588572Z 2025-12-04T10:58:28.3588721Z [W1204 10:40:46.960350230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3588723Z 2025-12-04T10:58:28.3588761Z FAILED [0.4580s] [100%] 2025-12-04T10:58:28.3588763Z 2025-12-04T10:58:28.3588814Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3588970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3589015Z Traceback (most recent call last): 2025-12-04T10:58:28.3589182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3589223Z method(*args, **kwargs) 2025-12-04T10:58:28.3589374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3589414Z method(*args, **kwargs) 2025-12-04T10:58:28.3589563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3589601Z with policy(): 2025-12-04T10:58:28.3589752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3589793Z raise RuntimeError(msg) 2025-12-04T10:58:28.3590207Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3590210Z 2025-12-04T10:58:28.3590284Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3590574Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3590577Z 2025-12-04T10:58:28.3590662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3590736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3590793Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3590972Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3591047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3591083Z graph_break [] 2025-12-04T10:58:28.3591233Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3591279Z Traceback (most recent call last): 2025-12-04T10:58:28.3591429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3591469Z method(*args, **kwargs) 2025-12-04T10:58:28.3591619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3591659Z method(*args, **kwargs) 2025-12-04T10:58:28.3591830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3591869Z with policy(): 2025-12-04T10:58:28.3592020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3592062Z raise RuntimeError(msg) 2025-12-04T10:58:28.3592467Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3592469Z 2025-12-04T10:58:28.3592543Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3592833Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3592848Z 2025-12-04T10:58:28.3592934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3593006Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3593061Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3593238Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3593344Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3593380Z graph_break [] 2025-12-04T10:58:28.3593451Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3593506Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3593594Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3593768Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3593804Z graph_break [] 2025-12-04T10:58:28.3593855Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3594005Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3594051Z Traceback (most recent call last): 2025-12-04T10:58:28.3594204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3594245Z method(*args, **kwargs) 2025-12-04T10:58:28.3594394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3594437Z method(*args, **kwargs) 2025-12-04T10:58:28.3594585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3594623Z with policy(): 2025-12-04T10:58:28.3594773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3594815Z raise RuntimeError(msg) 2025-12-04T10:58:28.3595218Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3595222Z 2025-12-04T10:58:28.3595294Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3595611Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3595615Z 2025-12-04T10:58:28.3595700Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3595773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3595828Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3596004Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3596074Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3596111Z graph_break [] 2025-12-04T10:58:28.3596182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3596237Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3596310Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3596486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3596536Z graph_break [] 2025-12-04T10:58:28.3596608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3596662Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3596733Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3596906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3596942Z graph_break [] 2025-12-04T10:58:28.3597186Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-22470a54131d25a8.xml - 2025-12-04T10:58:28.3597257Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3597898Z FAILED [0.4580s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3597900Z 2025-12-04T10:58:28.3597973Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3598265Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3598268Z 2025-12-04T10:58:28.3598353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3598416Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3598481Z ================== 1 failed, 57 deselected, 2 rerun in 3.85s =================== 2025-12-04T10:58:28.3598519Z Got exit code 1 2025-12-04T10:58:28.3598760Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3598887Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3599083Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd51af51355f849c.xml 2025-12-04T10:58:28.3599162Z ============================= test session starts ============================== 2025-12-04T10:58:28.3599271Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3599314Z cachedir: .pytest_cache 2025-12-04T10:58:28.3599472Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3599517Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3599557Z configfile: pytest.ini 2025-12-04T10:58:28.3599716Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3599789Z collecting ... collected 58 items / 26 deselected / 32 selected 2025-12-04T10:58:28.3599842Z stepcurrent: skipping 26 already run items. 2025-12-04T10:58:28.3599885Z Running 32 items in this shard 2025-12-04T10:58:28.3599887Z 2025-12-04T10:58:28.3601601Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4095s] [ 3%] 2025-12-04T10:58:28.3601877Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4563s] [ 3%] 2025-12-04T10:58:28.3602099Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4317s] [ 3%] 2025-12-04T10:58:28.3602102Z 2025-12-04T10:58:28.3602154Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3602303Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3602349Z Traceback (most recent call last): 2025-12-04T10:58:28.3602523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3602564Z method(*args, **kwargs) 2025-12-04T10:58:28.3602718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3602757Z method(*args, **kwargs) 2025-12-04T10:58:28.3602907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3602944Z with policy(): 2025-12-04T10:58:28.3603094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3603136Z raise RuntimeError(msg) 2025-12-04T10:58:28.3603571Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3603577Z 2025-12-04T10:58:28.3603651Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3603942Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3603945Z 2025-12-04T10:58:28.3604029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3604102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3604158Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3604335Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3604439Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3604476Z graph_break [] 2025-12-04T10:58:28.3604627Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3604673Z Traceback (most recent call last): 2025-12-04T10:58:28.3604826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3604866Z method(*args, **kwargs) 2025-12-04T10:58:28.3605016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3605056Z method(*args, **kwargs) 2025-12-04T10:58:28.3605204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3605241Z with policy(): 2025-12-04T10:58:28.3605396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3605456Z raise RuntimeError(msg) 2025-12-04T10:58:28.3605856Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3605858Z 2025-12-04T10:58:28.3605931Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3606221Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3606223Z 2025-12-04T10:58:28.3606310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3606399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3606455Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3606632Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3606703Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3606740Z graph_break [] 2025-12-04T10:58:28.3606811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3606866Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3606936Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3607112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3607148Z graph_break [] 2025-12-04T10:58:28.3607201Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3607351Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3607396Z Traceback (most recent call last): 2025-12-04T10:58:28.3607548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3607588Z method(*args, **kwargs) 2025-12-04T10:58:28.3607737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3607777Z method(*args, **kwargs) 2025-12-04T10:58:28.3607925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3607963Z with policy(): 2025-12-04T10:58:28.3608135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3608176Z raise RuntimeError(msg) 2025-12-04T10:58:28.3608576Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3608579Z 2025-12-04T10:58:28.3608651Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3608942Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3608944Z 2025-12-04T10:58:28.3609031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3609105Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3609173Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3609350Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3609421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3609457Z graph_break [] 2025-12-04T10:58:28.3609528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3609583Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3609653Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3609830Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3609877Z graph_break [] 2025-12-04T10:58:28.3609951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3610004Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3610075Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3610248Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3610283Z graph_break [] 2025-12-04T10:58:28.3610527Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cd51af51355f849c.xml - 2025-12-04T10:58:28.3610585Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3611221Z FAILED [0.4317s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3611225Z 2025-12-04T10:58:28.3611297Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3611584Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3611586Z 2025-12-04T10:58:28.3611671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3611755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3611821Z ================== 1 failed, 26 deselected, 2 rerun in 3.46s =================== 2025-12-04T10:58:28.3611859Z Got exit code 1 2025-12-04T10:58:28.3611899Z Retrying single test... 2025-12-04T10:58:28.3612094Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-01e89f6914a0e7af.xml 2025-12-04T10:58:28.3612151Z ============================= test session starts ============================== 2025-12-04T10:58:28.3612260Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3612302Z cachedir: .pytest_cache 2025-12-04T10:58:28.3612459Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3612505Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3612545Z configfile: pytest.ini 2025-12-04T10:58:28.3612707Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3612794Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3613079Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3613122Z Running 1 items in this shard 2025-12-04T10:58:28.3613125Z 2025-12-04T10:58:28.3613513Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:05.351400864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3613515Z 2025-12-04T10:58:28.3613686Z [W1204 10:41:05.618413004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3613690Z 2025-12-04T10:58:28.3613840Z [W1204 10:41:05.618555653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3613842Z 2025-12-04T10:58:28.3613990Z [W1204 10:41:05.622375234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3613992Z 2025-12-04T10:58:28.3614138Z [W1204 10:41:05.622700411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3614140Z 2025-12-04T10:58:28.3614287Z [W1204 10:41:05.622763011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3614289Z 2025-12-04T10:58:28.3614437Z [W1204 10:41:05.625120423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3614440Z 2025-12-04T10:58:28.3614589Z [W1204 10:41:05.625402071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3614591Z 2025-12-04T10:58:28.3614740Z [W1204 10:41:05.625462570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3614742Z 2025-12-04T10:58:28.3614791Z ('RERUN', {'yellow': True}) [2.8110s] [100%] 2025-12-04T10:58:28.3615149Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:06.794753530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615151Z 2025-12-04T10:58:28.3615331Z [W1204 10:41:06.795149127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615334Z 2025-12-04T10:58:28.3615485Z [W1204 10:41:06.795216626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615487Z 2025-12-04T10:58:28.3615634Z [W1204 10:41:06.796480706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615637Z 2025-12-04T10:58:28.3615784Z [W1204 10:41:06.796739854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615786Z 2025-12-04T10:58:28.3615934Z [W1204 10:41:06.796799694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3615936Z 2025-12-04T10:58:28.3616084Z [W1204 10:41:06.798796829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3616087Z 2025-12-04T10:58:28.3616235Z [W1204 10:41:06.799191326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3616256Z 2025-12-04T10:58:28.3616403Z [W1204 10:41:06.799256355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3616405Z 2025-12-04T10:58:28.3616454Z ('RERUN', {'yellow': True}) [0.6667s] [100%] 2025-12-04T10:58:28.3616809Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:07.434697830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3616812Z 2025-12-04T10:58:28.3616961Z [W1204 10:41:07.435101147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3616974Z 2025-12-04T10:58:28.3617123Z [W1204 10:41:07.435169187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617125Z 2025-12-04T10:58:28.3617273Z [W1204 10:41:07.436436867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617275Z 2025-12-04T10:58:28.3617424Z [W1204 10:41:07.436696675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617426Z 2025-12-04T10:58:28.3617574Z [W1204 10:41:07.436758385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617577Z 2025-12-04T10:58:28.3617726Z [W1204 10:41:07.438754969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617728Z 2025-12-04T10:58:28.3617876Z [W1204 10:41:07.439100806 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3617879Z 2025-12-04T10:58:28.3618026Z [W1204 10:41:07.439165986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3618028Z 2025-12-04T10:58:28.3618067Z FAILED [0.6332s] [100%] 2025-12-04T10:58:28.3618069Z 2025-12-04T10:58:28.3618119Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3618269Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3618314Z Traceback (most recent call last): 2025-12-04T10:58:28.3618472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3618535Z method(*args, **kwargs) 2025-12-04T10:58:28.3618688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3618729Z method(*args, **kwargs) 2025-12-04T10:58:28.3618879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3618915Z with policy(): 2025-12-04T10:58:28.3619067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3619108Z raise RuntimeError(msg) 2025-12-04T10:58:28.3619500Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3619505Z 2025-12-04T10:58:28.3619578Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3619881Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3619883Z 2025-12-04T10:58:28.3619970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3620042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3620097Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3620273Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3620346Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3620394Z graph_break [] 2025-12-04T10:58:28.3620542Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3620588Z Traceback (most recent call last): 2025-12-04T10:58:28.3620740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3620779Z method(*args, **kwargs) 2025-12-04T10:58:28.3620929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3620968Z method(*args, **kwargs) 2025-12-04T10:58:28.3621117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3621153Z with policy(): 2025-12-04T10:58:28.3621305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3621348Z raise RuntimeError(msg) 2025-12-04T10:58:28.3621749Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3621752Z 2025-12-04T10:58:28.3621825Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3622118Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3622119Z 2025-12-04T10:58:28.3622205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3622299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3622357Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3622532Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3622605Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3622640Z graph_break [] 2025-12-04T10:58:28.3622713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3622766Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3622837Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3623011Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3623048Z graph_break [] 2025-12-04T10:58:28.3623101Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3623287Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3623348Z Traceback (most recent call last): 2025-12-04T10:58:28.3623501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3623540Z method(*args, **kwargs) 2025-12-04T10:58:28.3623691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3623730Z method(*args, **kwargs) 2025-12-04T10:58:28.3623879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3623915Z with policy(): 2025-12-04T10:58:28.3624069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3624125Z raise RuntimeError(msg) 2025-12-04T10:58:28.3624525Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3624528Z 2025-12-04T10:58:28.3624601Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3624888Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3624891Z 2025-12-04T10:58:28.3624976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3625050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3625106Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3625281Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3625353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3625389Z graph_break [] 2025-12-04T10:58:28.3625461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3625514Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3625586Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3625760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3625797Z graph_break [] 2025-12-04T10:58:28.3625894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3625950Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3626020Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3626196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3626231Z graph_break [] 2025-12-04T10:58:28.3626474Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-01e89f6914a0e7af.xml - 2025-12-04T10:58:28.3626533Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3627166Z FAILED [0.6332s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3627184Z 2025-12-04T10:58:28.3627256Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3627543Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3627545Z 2025-12-04T10:58:28.3627630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3627690Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3627768Z ================== 1 failed, 57 deselected, 2 rerun in 4.27s =================== 2025-12-04T10:58:28.3627804Z Got exit code 1 2025-12-04T10:58:28.3627845Z Retrying single test... 2025-12-04T10:58:28.3628041Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63ee314691ebd3c6.xml 2025-12-04T10:58:28.3628097Z ============================= test session starts ============================== 2025-12-04T10:58:28.3628208Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3628248Z cachedir: .pytest_cache 2025-12-04T10:58:28.3628408Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3628452Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3628492Z configfile: pytest.ini 2025-12-04T10:58:28.3628653Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3628727Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3629012Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3629057Z Running 1 items in this shard 2025-12-04T10:58:28.3629059Z 2025-12-04T10:58:28.3629416Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:16.335811261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3629418Z 2025-12-04T10:58:28.3629570Z [W1204 10:41:16.606692535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3629600Z 2025-12-04T10:58:28.3629751Z [W1204 10:41:16.606830054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3629755Z 2025-12-04T10:58:28.3629904Z [W1204 10:41:16.610392386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3629906Z 2025-12-04T10:58:28.3630055Z [W1204 10:41:16.610712974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3630057Z 2025-12-04T10:58:28.3630203Z [W1204 10:41:16.610774733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3630205Z 2025-12-04T10:58:28.3630352Z [W1204 10:41:16.613038926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3630355Z 2025-12-04T10:58:28.3630505Z [W1204 10:41:16.613317524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3630517Z 2025-12-04T10:58:28.3630665Z [W1204 10:41:16.613379493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3630667Z 2025-12-04T10:58:28.3630716Z ('RERUN', {'yellow': True}) [2.8236s] [100%] 2025-12-04T10:58:28.3631073Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:17.742221730 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631075Z 2025-12-04T10:58:28.3631223Z [W1204 10:41:17.742636887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631237Z 2025-12-04T10:58:28.3631386Z [W1204 10:41:17.742716776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631389Z 2025-12-04T10:58:28.3631537Z [W1204 10:41:17.744016256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631539Z 2025-12-04T10:58:28.3631686Z [W1204 10:41:17.744291164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631688Z 2025-12-04T10:58:28.3631834Z [W1204 10:41:17.744354773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631836Z 2025-12-04T10:58:28.3631984Z [W1204 10:41:17.746397887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3631986Z 2025-12-04T10:58:28.3632134Z [W1204 10:41:17.746747225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3632137Z 2025-12-04T10:58:28.3632285Z [W1204 10:41:17.746811204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3632286Z 2025-12-04T10:58:28.3632334Z ('RERUN', {'yellow': True}) [0.6424s] [100%] 2025-12-04T10:58:28.3632692Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:41:18.375330148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3632694Z 2025-12-04T10:58:28.3632844Z [W1204 10:41:18.375743195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3632847Z 2025-12-04T10:58:28.3633017Z [W1204 10:41:18.375821834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633021Z 2025-12-04T10:58:28.3633169Z [W1204 10:41:18.377121734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633171Z 2025-12-04T10:58:28.3633354Z [W1204 10:41:18.377405192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633356Z 2025-12-04T10:58:28.3633504Z [W1204 10:41:18.377469771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633505Z 2025-12-04T10:58:28.3633652Z [W1204 10:41:18.379547885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633655Z 2025-12-04T10:58:28.3633803Z [W1204 10:41:18.379906772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633822Z 2025-12-04T10:58:28.3633970Z [W1204 10:41:18.379971852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3633972Z 2025-12-04T10:58:28.3634010Z FAILED [0.6246s] [100%] 2025-12-04T10:58:28.3634012Z 2025-12-04T10:58:28.3634063Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3634212Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3634258Z Traceback (most recent call last): 2025-12-04T10:58:28.3634414Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3634455Z method(*args, **kwargs) 2025-12-04T10:58:28.3634621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3634663Z method(*args, **kwargs) 2025-12-04T10:58:28.3634813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3634850Z with policy(): 2025-12-04T10:58:28.3635000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3635041Z raise RuntimeError(msg) 2025-12-04T10:58:28.3635434Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3635437Z 2025-12-04T10:58:28.3635512Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3635802Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3635805Z 2025-12-04T10:58:28.3635891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3635965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3636020Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3636198Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3636270Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3636306Z graph_break [] 2025-12-04T10:58:28.3636484Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3636530Z Traceback (most recent call last): 2025-12-04T10:58:28.3636683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3636723Z method(*args, **kwargs) 2025-12-04T10:58:28.3636872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3636912Z method(*args, **kwargs) 2025-12-04T10:58:28.3637062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3637098Z with policy(): 2025-12-04T10:58:28.3637250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3637292Z raise RuntimeError(msg) 2025-12-04T10:58:28.3637692Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3637708Z 2025-12-04T10:58:28.3637781Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3638068Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3638070Z 2025-12-04T10:58:28.3638155Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3638227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3638282Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3638472Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3638545Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3638581Z graph_break [] 2025-12-04T10:58:28.3638652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3638707Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3638777Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3638952Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3638987Z graph_break [] 2025-12-04T10:58:28.3639039Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3639190Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3639236Z Traceback (most recent call last): 2025-12-04T10:58:28.3639387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3639427Z method(*args, **kwargs) 2025-12-04T10:58:28.3639576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3639616Z method(*args, **kwargs) 2025-12-04T10:58:28.3639764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3639801Z with policy(): 2025-12-04T10:58:28.3639951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3639993Z raise RuntimeError(msg) 2025-12-04T10:58:28.3640415Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3640418Z 2025-12-04T10:58:28.3640490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3640777Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3640779Z 2025-12-04T10:58:28.3640864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3640936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3640994Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3641169Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3641253Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3641289Z graph_break [] 2025-12-04T10:58:28.3641360Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3641415Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3641485Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3641659Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3641694Z graph_break [] 2025-12-04T10:58:28.3641767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3641843Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3641914Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3642088Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3642125Z graph_break [] 2025-12-04T10:58:28.3642366Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63ee314691ebd3c6.xml - 2025-12-04T10:58:28.3642425Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3643056Z FAILED [0.6246s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3643060Z 2025-12-04T10:58:28.3643131Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3643452Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3643455Z 2025-12-04T10:58:28.3643540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3643603Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3643667Z ================== 1 failed, 57 deselected, 2 rerun in 4.26s =================== 2025-12-04T10:58:28.3643734Z Got exit code 1 2025-12-04T10:58:28.3643971Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3644101Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3644298Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9310199d145a79ec.xml 2025-12-04T10:58:28.3644354Z ============================= test session starts ============================== 2025-12-04T10:58:28.3644464Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3644504Z cachedir: .pytest_cache 2025-12-04T10:58:28.3644662Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3644709Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3644749Z configfile: pytest.ini 2025-12-04T10:58:28.3644907Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3644995Z collecting ... collected 58 items / 27 deselected / 31 selected 2025-12-04T10:58:28.3645048Z stepcurrent: skipping 27 already run items. 2025-12-04T10:58:28.3645092Z Running 31 items in this shard 2025-12-04T10:58:28.3645094Z 2025-12-04T10:58:28.3645341Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8582s] [ 3%] 2025-12-04T10:58:28.3645585Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4555s] [ 3%] 2025-12-04T10:58:28.3645825Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4523s] [ 3%] 2025-12-04T10:58:28.3645828Z 2025-12-04T10:58:28.3645879Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3646026Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3646072Z Traceback (most recent call last): 2025-12-04T10:58:28.3646227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3646267Z method(*args, **kwargs) 2025-12-04T10:58:28.3646419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3646458Z method(*args, **kwargs) 2025-12-04T10:58:28.3646610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3646647Z with policy(): 2025-12-04T10:58:28.3646799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3646839Z raise RuntimeError(msg) 2025-12-04T10:58:28.3647230Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3647232Z 2025-12-04T10:58:28.3647305Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3647616Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3647620Z 2025-12-04T10:58:28.3647705Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3647779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3647834Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3648113Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3648185Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3648220Z graph_break [] 2025-12-04T10:58:28.3648370Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3648414Z Traceback (most recent call last): 2025-12-04T10:58:28.3648569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3648620Z method(*args, **kwargs) 2025-12-04T10:58:28.3648770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3648808Z method(*args, **kwargs) 2025-12-04T10:58:28.3648958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3648994Z with policy(): 2025-12-04T10:58:28.3649146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3649186Z raise RuntimeError(msg) 2025-12-04T10:58:28.3649586Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3649600Z 2025-12-04T10:58:28.3649673Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3649961Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3649963Z 2025-12-04T10:58:28.3650048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3650123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3650178Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3650454Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3650528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3650563Z graph_break [] 2025-12-04T10:58:28.3650636Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3650689Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3650761Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3651029Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3651065Z graph_break [] 2025-12-04T10:58:28.3651116Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3651286Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3651331Z Traceback (most recent call last): 2025-12-04T10:58:28.3651485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3651525Z method(*args, **kwargs) 2025-12-04T10:58:28.3651681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3651722Z method(*args, **kwargs) 2025-12-04T10:58:28.3651874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3651911Z with policy(): 2025-12-04T10:58:28.3652063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3652104Z raise RuntimeError(msg) 2025-12-04T10:58:28.3652510Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3652526Z 2025-12-04T10:58:28.3652601Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3652890Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3652892Z 2025-12-04T10:58:28.3652980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3653053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3653125Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3653426Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3653502Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3653538Z graph_break [] 2025-12-04T10:58:28.3653612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3653667Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3653739Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3654006Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3654045Z graph_break [] 2025-12-04T10:58:28.3654117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3654174Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3654244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3654515Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3654551Z graph_break [] 2025-12-04T10:58:28.3654795Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9310199d145a79ec.xml - 2025-12-04T10:58:28.3654854Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3655511Z FAILED [0.4523s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3655515Z 2025-12-04T10:58:28.3655589Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3655875Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3655877Z 2025-12-04T10:58:28.3655963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3656027Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3656095Z ================== 1 failed, 27 deselected, 2 rerun in 3.93s =================== 2025-12-04T10:58:28.3656152Z Got exit code 1 2025-12-04T10:58:28.3656194Z Retrying single test... 2025-12-04T10:58:28.3656389Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-772da696c4a7ce03.xml 2025-12-04T10:58:28.3656446Z ============================= test session starts ============================== 2025-12-04T10:58:28.3656556Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3656597Z cachedir: .pytest_cache 2025-12-04T10:58:28.3656757Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3656803Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3656859Z configfile: pytest.ini 2025-12-04T10:58:28.3657023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3657095Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3657381Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3657425Z Running 1 items in this shard 2025-12-04T10:58:28.3657427Z 2025-12-04T10:58:28.3657787Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:38.803584956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3657789Z 2025-12-04T10:58:28.3657945Z [W1204 10:41:38.073116040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3657948Z 2025-12-04T10:58:28.3658099Z [W1204 10:41:38.073269629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658101Z 2025-12-04T10:58:28.3658251Z [W1204 10:41:38.076736881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658253Z 2025-12-04T10:58:28.3658401Z [W1204 10:41:38.077036688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658403Z 2025-12-04T10:58:28.3658550Z [W1204 10:41:38.077100668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658552Z 2025-12-04T10:58:28.3658722Z [W1204 10:41:38.079201051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658725Z 2025-12-04T10:58:28.3658875Z [W1204 10:41:38.079468799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3658877Z 2025-12-04T10:58:28.3659026Z [W1204 10:41:38.079529398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3659028Z 2025-12-04T10:58:28.3659076Z ('RERUN', {'yellow': True}) [3.1591s] [100%] 2025-12-04T10:58:28.3659432Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:39.672987833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3659434Z 2025-12-04T10:58:28.3659584Z [W1204 10:41:39.673367740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3659587Z 2025-12-04T10:58:28.3659745Z [W1204 10:41:39.673436970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3659747Z 2025-12-04T10:58:28.3659895Z [W1204 10:41:39.674723199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3659898Z 2025-12-04T10:58:28.3660045Z [W1204 10:41:39.674978497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3660047Z 2025-12-04T10:58:28.3660195Z [W1204 10:41:39.675042487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3660197Z 2025-12-04T10:58:28.3660346Z [W1204 10:41:39.677056111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3660359Z 2025-12-04T10:58:28.3660507Z [W1204 10:41:39.677323038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3660510Z 2025-12-04T10:58:28.3660658Z [W1204 10:41:39.677384408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3660660Z 2025-12-04T10:58:28.3660707Z ('RERUN', {'yellow': True}) [0.4599s] [100%] 2025-12-04T10:58:28.3661064Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:39.137878971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661066Z 2025-12-04T10:58:28.3661215Z [W1204 10:41:39.138281898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661219Z 2025-12-04T10:58:28.3661367Z [W1204 10:41:39.138361517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661370Z 2025-12-04T10:58:28.3661518Z [W1204 10:41:39.139636337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661520Z 2025-12-04T10:58:28.3661666Z [W1204 10:41:39.139904415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661668Z 2025-12-04T10:58:28.3661815Z [W1204 10:41:39.139964814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661817Z 2025-12-04T10:58:28.3661987Z [W1204 10:41:39.141986038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3661990Z 2025-12-04T10:58:28.3662138Z [W1204 10:41:39.142254286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3662141Z 2025-12-04T10:58:28.3662288Z [W1204 10:41:39.142318816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3662290Z 2025-12-04T10:58:28.3662328Z FAILED [0.4581s] [100%] 2025-12-04T10:58:28.3662330Z 2025-12-04T10:58:28.3662381Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3662531Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3662577Z Traceback (most recent call last): 2025-12-04T10:58:28.3662733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3662777Z method(*args, **kwargs) 2025-12-04T10:58:28.3662929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3662982Z method(*args, **kwargs) 2025-12-04T10:58:28.3663134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3663172Z with policy(): 2025-12-04T10:58:28.3663356Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3663398Z raise RuntimeError(msg) 2025-12-04T10:58:28.3663792Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3663809Z 2025-12-04T10:58:28.3663884Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3664175Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3664179Z 2025-12-04T10:58:28.3664265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3664339Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3664395Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3664670Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3664746Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3664782Z graph_break [] 2025-12-04T10:58:28.3664931Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3664977Z Traceback (most recent call last): 2025-12-04T10:58:28.3665131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3665171Z method(*args, **kwargs) 2025-12-04T10:58:28.3665323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3665363Z method(*args, **kwargs) 2025-12-04T10:58:28.3665514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3665550Z with policy(): 2025-12-04T10:58:28.3665727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3665769Z raise RuntimeError(msg) 2025-12-04T10:58:28.3666166Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3666169Z 2025-12-04T10:58:28.3666242Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3666531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3666533Z 2025-12-04T10:58:28.3666620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3666696Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3666768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3667039Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3667112Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3667148Z graph_break [] 2025-12-04T10:58:28.3667220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3667275Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3667346Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3667619Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3667669Z graph_break [] 2025-12-04T10:58:28.3667722Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3667871Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3667917Z Traceback (most recent call last): 2025-12-04T10:58:28.3668071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3668112Z method(*args, **kwargs) 2025-12-04T10:58:28.3668262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3668301Z method(*args, **kwargs) 2025-12-04T10:58:28.3668453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3668491Z with policy(): 2025-12-04T10:58:28.3668644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3668685Z raise RuntimeError(msg) 2025-12-04T10:58:28.3669086Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3669089Z 2025-12-04T10:58:28.3669161Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3669481Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3669484Z 2025-12-04T10:58:28.3669571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3669645Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3669700Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3669974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3670047Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3670083Z graph_break [] 2025-12-04T10:58:28.3670154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3670209Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3670283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3670551Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3670601Z graph_break [] 2025-12-04T10:58:28.3670673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3670727Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3670799Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3671067Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3671103Z graph_break [] 2025-12-04T10:58:28.3671359Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-772da696c4a7ce03.xml - 2025-12-04T10:58:28.3671419Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3672053Z FAILED [0.4581s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3672056Z 2025-12-04T10:58:28.3672127Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3672419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3672422Z 2025-12-04T10:58:28.3672509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3672569Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3672635Z ================== 1 failed, 57 deselected, 2 rerun in 4.24s =================== 2025-12-04T10:58:28.3672671Z Got exit code 1 2025-12-04T10:58:28.3672711Z Retrying single test... 2025-12-04T10:58:28.3672907Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9882c7b3f3162be3.xml 2025-12-04T10:58:28.3672964Z ============================= test session starts ============================== 2025-12-04T10:58:28.3673100Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3673142Z cachedir: .pytest_cache 2025-12-04T10:58:28.3673336Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3673383Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3673423Z configfile: pytest.ini 2025-12-04T10:58:28.3673584Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3673657Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3673942Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3673985Z Running 1 items in this shard 2025-12-04T10:58:28.3673987Z 2025-12-04T10:58:28.3674349Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:49.683160449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3674369Z 2025-12-04T10:58:28.3674521Z [W1204 10:41:49.966848842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3674524Z 2025-12-04T10:58:28.3674674Z [W1204 10:41:49.967023351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3674676Z 2025-12-04T10:58:28.3674825Z [W1204 10:41:49.970744241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3674827Z 2025-12-04T10:58:28.3674976Z [W1204 10:41:49.971057358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3674992Z 2025-12-04T10:58:28.3675141Z [W1204 10:41:49.971122517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3675144Z 2025-12-04T10:58:28.3675290Z [W1204 10:41:49.973339879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3675292Z 2025-12-04T10:58:28.3675440Z [W1204 10:41:49.973612017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3675442Z 2025-12-04T10:58:28.3675590Z [W1204 10:41:49.973673987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3675592Z 2025-12-04T10:58:28.3675639Z ('RERUN', {'yellow': True}) [3.2060s] [100%] 2025-12-04T10:58:28.3676002Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:50.580474909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676006Z 2025-12-04T10:58:28.3676155Z [W1204 10:41:50.580843946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676157Z 2025-12-04T10:58:28.3676306Z [W1204 10:41:50.580910085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676308Z 2025-12-04T10:58:28.3676456Z [W1204 10:41:50.582176205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676458Z 2025-12-04T10:58:28.3676634Z [W1204 10:41:50.582432113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676637Z 2025-12-04T10:58:28.3676786Z [W1204 10:41:50.582496122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676789Z 2025-12-04T10:58:28.3676936Z [W1204 10:41:50.584576935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3676938Z 2025-12-04T10:58:28.3677086Z [W1204 10:41:50.584838683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3677087Z 2025-12-04T10:58:28.3677234Z [W1204 10:41:50.584899153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3677236Z 2025-12-04T10:58:28.3677284Z ('RERUN', {'yellow': True}) [0.4883s] [100%] 2025-12-04T10:58:28.3677645Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:41:50.080882397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3677660Z 2025-12-04T10:58:28.3677808Z [W1204 10:41:50.081265134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3677810Z 2025-12-04T10:58:28.3677958Z [W1204 10:41:50.081346743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3677959Z 2025-12-04T10:58:28.3678107Z [W1204 10:41:50.082624403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678109Z 2025-12-04T10:58:28.3678257Z [W1204 10:41:50.082889241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678272Z 2025-12-04T10:58:28.3678421Z [W1204 10:41:50.082950580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678425Z 2025-12-04T10:58:28.3678572Z [W1204 10:41:50.085034523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678574Z 2025-12-04T10:58:28.3678723Z [W1204 10:41:50.085298871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678725Z 2025-12-04T10:58:28.3678872Z [W1204 10:41:50.085358741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3678873Z 2025-12-04T10:58:28.3678912Z FAILED [0.4748s] [100%] 2025-12-04T10:58:28.3678914Z 2025-12-04T10:58:28.3678966Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3679118Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3679164Z Traceback (most recent call last): 2025-12-04T10:58:28.3679321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3679361Z method(*args, **kwargs) 2025-12-04T10:58:28.3679513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3679552Z method(*args, **kwargs) 2025-12-04T10:58:28.3679704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3679740Z with policy(): 2025-12-04T10:58:28.3679893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3679957Z raise RuntimeError(msg) 2025-12-04T10:58:28.3680355Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3680358Z 2025-12-04T10:58:28.3680433Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3680721Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3680723Z 2025-12-04T10:58:28.3680810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3680882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3680941Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3681212Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3681299Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3681335Z graph_break [] 2025-12-04T10:58:28.3681485Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3681530Z Traceback (most recent call last): 2025-12-04T10:58:28.3681683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3681722Z method(*args, **kwargs) 2025-12-04T10:58:28.3681873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3681923Z method(*args, **kwargs) 2025-12-04T10:58:28.3682074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3682111Z with policy(): 2025-12-04T10:58:28.3682264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3682305Z raise RuntimeError(msg) 2025-12-04T10:58:28.3682710Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3682712Z 2025-12-04T10:58:28.3682786Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3683076Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3683079Z 2025-12-04T10:58:28.3683166Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3683238Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3683327Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3683600Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3683674Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3683710Z graph_break [] 2025-12-04T10:58:28.3683820Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3683875Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3683948Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3684217Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3684254Z graph_break [] 2025-12-04T10:58:28.3684306Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3684453Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3684499Z Traceback (most recent call last): 2025-12-04T10:58:28.3684654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3684697Z method(*args, **kwargs) 2025-12-04T10:58:28.3684847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3684905Z method(*args, **kwargs) 2025-12-04T10:58:28.3685054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3685091Z with policy(): 2025-12-04T10:58:28.3685242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3685284Z raise RuntimeError(msg) 2025-12-04T10:58:28.3685684Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3685700Z 2025-12-04T10:58:28.3685774Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3686189Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3686192Z 2025-12-04T10:58:28.3686279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3686351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3686407Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3686681Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3686756Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3686793Z graph_break [] 2025-12-04T10:58:28.3686872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3686926Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3686998Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3687267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3687303Z graph_break [] 2025-12-04T10:58:28.3687376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3687430Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3687502Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3687794Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3687832Z graph_break [] 2025-12-04T10:58:28.3688076Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9882c7b3f3162be3.xml - 2025-12-04T10:58:28.3688136Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3688762Z FAILED [0.4748s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3688780Z 2025-12-04T10:58:28.3688853Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3689141Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3689143Z 2025-12-04T10:58:28.3689229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3689291Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3689357Z ================== 1 failed, 57 deselected, 2 rerun in 4.31s =================== 2025-12-04T10:58:28.3689395Z Got exit code 1 2025-12-04T10:58:28.3689632Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3689777Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3689974Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6609d8fcca16e2ff.xml 2025-12-04T10:58:28.3690032Z ============================= test session starts ============================== 2025-12-04T10:58:28.3690141Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3690182Z cachedir: .pytest_cache 2025-12-04T10:58:28.3690339Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3690385Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3690425Z configfile: pytest.ini 2025-12-04T10:58:28.3690587Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3690662Z collecting ... collected 58 items / 28 deselected / 30 selected 2025-12-04T10:58:28.3690715Z stepcurrent: skipping 28 already run items. 2025-12-04T10:58:28.3690759Z Running 30 items in this shard 2025-12-04T10:58:28.3690761Z 2025-12-04T10:58:28.3691012Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5146s] [ 3%] 2025-12-04T10:58:28.3691259Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4666s] [ 3%] 2025-12-04T10:58:28.3691506Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4541s] [ 3%] 2025-12-04T10:58:28.3691509Z 2025-12-04T10:58:28.3691562Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3691711Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3691757Z Traceback (most recent call last): 2025-12-04T10:58:28.3691912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3691953Z method(*args, **kwargs) 2025-12-04T10:58:28.3692103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3692143Z method(*args, **kwargs) 2025-12-04T10:58:28.3692293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3692332Z with policy(): 2025-12-04T10:58:28.3692484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3692539Z raise RuntimeError(msg) 2025-12-04T10:58:28.3692937Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3692939Z 2025-12-04T10:58:28.3693011Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3693334Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3693353Z 2025-12-04T10:58:28.3693440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3693515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3693571Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3693748Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3693820Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3693857Z graph_break [] 2025-12-04T10:58:28.3694008Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3694054Z Traceback (most recent call last): 2025-12-04T10:58:28.3694207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3694250Z method(*args, **kwargs) 2025-12-04T10:58:28.3694400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3694440Z method(*args, **kwargs) 2025-12-04T10:58:28.3694589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3694626Z with policy(): 2025-12-04T10:58:28.3694777Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3694818Z raise RuntimeError(msg) 2025-12-04T10:58:28.3695228Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3695261Z 2025-12-04T10:58:28.3695334Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3695626Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3695629Z 2025-12-04T10:58:28.3695713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3695788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3695842Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3696021Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3696093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3696132Z graph_break [] 2025-12-04T10:58:28.3696204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3696275Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3696346Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3696521Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3696557Z graph_break [] 2025-12-04T10:58:28.3696609Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3696758Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3696803Z Traceback (most recent call last): 2025-12-04T10:58:28.3696957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3697017Z method(*args, **kwargs) 2025-12-04T10:58:28.3697167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3697208Z method(*args, **kwargs) 2025-12-04T10:58:28.3697357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3697394Z with policy(): 2025-12-04T10:58:28.3697546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3697586Z raise RuntimeError(msg) 2025-12-04T10:58:28.3697995Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3697998Z 2025-12-04T10:58:28.3698070Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3698361Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3698363Z 2025-12-04T10:58:28.3698449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3698522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3698577Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3698752Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3698848Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3698886Z graph_break [] 2025-12-04T10:58:28.3698958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3699013Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3699083Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3699258Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3699293Z graph_break [] 2025-12-04T10:58:28.3699366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3699419Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3699490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3699667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3699703Z graph_break [] 2025-12-04T10:58:28.3699963Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6609d8fcca16e2ff.xml - 2025-12-04T10:58:28.3700021Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3700664Z FAILED [0.4541s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3700677Z 2025-12-04T10:58:28.3700751Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3701040Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3701043Z 2025-12-04T10:58:28.3701129Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3701189Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3701256Z ================== 1 failed, 28 deselected, 2 rerun in 3.60s =================== 2025-12-04T10:58:28.3701293Z Got exit code 1 2025-12-04T10:58:28.3701333Z Retrying single test... 2025-12-04T10:58:28.3701531Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e8cd154a026f81ba.xml 2025-12-04T10:58:28.3701591Z ============================= test session starts ============================== 2025-12-04T10:58:28.3701700Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3701742Z cachedir: .pytest_cache 2025-12-04T10:58:28.3701898Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3701944Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3701984Z configfile: pytest.ini 2025-12-04T10:58:28.3702146Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3702218Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3702505Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3702569Z Running 1 items in this shard 2025-12-04T10:58:28.3702572Z 2025-12-04T10:58:28.3702938Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:09.871688138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3702941Z 2025-12-04T10:58:28.3703094Z [W1204 10:42:09.135211528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703097Z 2025-12-04T10:58:28.3703269Z [W1204 10:42:09.135388326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703271Z 2025-12-04T10:58:28.3703423Z [W1204 10:42:09.138954226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703427Z 2025-12-04T10:58:28.3703576Z [W1204 10:42:09.139344593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703596Z 2025-12-04T10:58:28.3703746Z [W1204 10:42:09.139410163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703747Z 2025-12-04T10:58:28.3703895Z [W1204 10:42:09.141809152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3703897Z 2025-12-04T10:58:28.3704045Z [W1204 10:42:09.142111400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3704048Z 2025-12-04T10:58:28.3704195Z [W1204 10:42:09.142174809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3704212Z 2025-12-04T10:58:28.3704262Z ('RERUN', {'yellow': True}) [2.8705s] [100%] 2025-12-04T10:58:28.3704627Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:11.314907410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3704630Z 2025-12-04T10:58:28.3704779Z [W1204 10:42:11.315277177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3704781Z 2025-12-04T10:58:28.3704929Z [W1204 10:42:11.315342246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3704931Z 2025-12-04T10:58:28.3705077Z [W1204 10:42:11.316605336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705081Z 2025-12-04T10:58:28.3705229Z [W1204 10:42:11.316860573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705231Z 2025-12-04T10:58:28.3705379Z [W1204 10:42:11.316920403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705381Z 2025-12-04T10:58:28.3705529Z [W1204 10:42:11.319056395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705531Z 2025-12-04T10:58:28.3705680Z [W1204 10:42:11.319395532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705682Z 2025-12-04T10:58:28.3705829Z [W1204 10:42:11.319463022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3705832Z 2025-12-04T10:58:28.3705910Z ('RERUN', {'yellow': True}) [0.7071s] [100%] 2025-12-04T10:58:28.3706267Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:11.021301939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3706272Z 2025-12-04T10:58:28.3706422Z [W1204 10:42:11.021722835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3706424Z 2025-12-04T10:58:28.3706572Z [W1204 10:42:11.021796145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3706574Z 2025-12-04T10:58:28.3706722Z [W1204 10:42:11.023131923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3706724Z 2025-12-04T10:58:28.3706874Z [W1204 10:42:11.023397431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3706891Z 2025-12-04T10:58:28.3707038Z [W1204 10:42:11.023457001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3707041Z 2025-12-04T10:58:28.3707189Z [W1204 10:42:11.025496443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3707190Z 2025-12-04T10:58:28.3707338Z [W1204 10:42:11.025835820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3707340Z 2025-12-04T10:58:28.3707487Z [W1204 10:42:11.025898080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3707488Z 2025-12-04T10:58:28.3707540Z FAILED [0.6609s] [100%] 2025-12-04T10:58:28.3707543Z 2025-12-04T10:58:28.3707594Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3707747Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3707793Z Traceback (most recent call last): 2025-12-04T10:58:28.3707949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3707991Z method(*args, **kwargs) 2025-12-04T10:58:28.3708145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3708184Z method(*args, **kwargs) 2025-12-04T10:58:28.3708334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3708370Z with policy(): 2025-12-04T10:58:28.3708527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3708568Z raise RuntimeError(msg) 2025-12-04T10:58:28.3708966Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3708968Z 2025-12-04T10:58:28.3709043Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3709332Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3709334Z 2025-12-04T10:58:28.3709445Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3709519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3709578Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3709754Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3709827Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3709864Z graph_break [] 2025-12-04T10:58:28.3710014Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3710059Z Traceback (most recent call last): 2025-12-04T10:58:28.3710211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3710250Z method(*args, **kwargs) 2025-12-04T10:58:28.3710405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3710463Z method(*args, **kwargs) 2025-12-04T10:58:28.3710613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3710649Z with policy(): 2025-12-04T10:58:28.3710804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3710844Z raise RuntimeError(msg) 2025-12-04T10:58:28.3711250Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3711264Z 2025-12-04T10:58:28.3711339Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3711629Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3711632Z 2025-12-04T10:58:28.3711719Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3711791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3711848Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3712023Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3712095Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3712131Z graph_break [] 2025-12-04T10:58:28.3712206Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3712260Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3712333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3712507Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3712543Z graph_break [] 2025-12-04T10:58:28.3712595Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3712745Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3712790Z Traceback (most recent call last): 2025-12-04T10:58:28.3712945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3713007Z method(*args, **kwargs) 2025-12-04T10:58:28.3713159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3713200Z method(*args, **kwargs) 2025-12-04T10:58:28.3713382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3713419Z with policy(): 2025-12-04T10:58:28.3713571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3713611Z raise RuntimeError(msg) 2025-12-04T10:58:28.3714020Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3714024Z 2025-12-04T10:58:28.3714098Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3714406Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3714408Z 2025-12-04T10:58:28.3714496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3714568Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3714625Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3714799Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3714871Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3714923Z graph_break [] 2025-12-04T10:58:28.3714997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3715054Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3715127Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3715302Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3715339Z graph_break [] 2025-12-04T10:58:28.3715411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3715465Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3715536Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3715711Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3715752Z graph_break [] 2025-12-04T10:58:28.3715993Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e8cd154a026f81ba.xml - 2025-12-04T10:58:28.3716054Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3716686Z FAILED [0.6609s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3716688Z 2025-12-04T10:58:28.3716788Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3717079Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3717082Z 2025-12-04T10:58:28.3717168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3717229Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3717297Z ================== 1 failed, 57 deselected, 2 rerun in 4.40s =================== 2025-12-04T10:58:28.3717334Z Got exit code 1 2025-12-04T10:58:28.3717374Z Retrying single test... 2025-12-04T10:58:28.3717571Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2fc909bd641be371.xml 2025-12-04T10:58:28.3717627Z ============================= test session starts ============================== 2025-12-04T10:58:28.3717740Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3717792Z cachedir: .pytest_cache 2025-12-04T10:58:28.3717950Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3717995Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3718035Z configfile: pytest.ini 2025-12-04T10:58:28.3718193Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3718266Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3718555Z stepcurrent: skipping 28 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3718612Z Running 1 items in this shard 2025-12-04T10:58:28.3718616Z 2025-12-04T10:58:28.3718978Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:21.472884875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3718981Z 2025-12-04T10:58:28.3719138Z [W1204 10:42:21.743978235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719140Z 2025-12-04T10:58:28.3719292Z [W1204 10:42:21.744122934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719294Z 2025-12-04T10:58:28.3719444Z [W1204 10:42:21.747518605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719447Z 2025-12-04T10:58:28.3719599Z [W1204 10:42:21.747833543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719602Z 2025-12-04T10:58:28.3719751Z [W1204 10:42:21.747896882 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719753Z 2025-12-04T10:58:28.3719902Z [W1204 10:42:21.750229202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3719904Z 2025-12-04T10:58:28.3720052Z [W1204 10:42:21.750534020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3720055Z 2025-12-04T10:58:28.3720202Z [W1204 10:42:21.750597019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3720204Z 2025-12-04T10:58:28.3720254Z ('RERUN', {'yellow': True}) [2.9279s] [100%] 2025-12-04T10:58:28.3720635Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:22.962115381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3720638Z 2025-12-04T10:58:28.3720789Z [W1204 10:42:22.962528867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3720790Z 2025-12-04T10:58:28.3720939Z [W1204 10:42:22.962606157 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3720940Z 2025-12-04T10:58:28.3721089Z [W1204 10:42:22.963907626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721091Z 2025-12-04T10:58:28.3721241Z [W1204 10:42:22.964186363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721255Z 2025-12-04T10:58:28.3721403Z [W1204 10:42:22.964253333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721405Z 2025-12-04T10:58:28.3721554Z [W1204 10:42:22.966296165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721556Z 2025-12-04T10:58:28.3721704Z [W1204 10:42:22.966640842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721706Z 2025-12-04T10:58:28.3721854Z [W1204 10:42:22.966704732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3721856Z 2025-12-04T10:58:28.3721904Z ('RERUN', {'yellow': True}) [0.6792s] [100%] 2025-12-04T10:58:28.3722276Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:42:23.638201740 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3722279Z 2025-12-04T10:58:28.3722428Z [W1204 10:42:23.638598667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3722430Z 2025-12-04T10:58:28.3722577Z [W1204 10:42:23.638668426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3722579Z 2025-12-04T10:58:28.3722729Z [W1204 10:42:23.639932126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3722730Z 2025-12-04T10:58:28.3722878Z [W1204 10:42:23.640195483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3722881Z 2025-12-04T10:58:28.3723030Z [W1204 10:42:23.640257503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3723032Z 2025-12-04T10:58:28.3723181Z [W1204 10:42:23.642344745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3723182Z 2025-12-04T10:58:28.3723365Z [W1204 10:42:23.642686692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3723367Z 2025-12-04T10:58:28.3723516Z [W1204 10:42:23.642749251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3723517Z 2025-12-04T10:58:28.3723555Z FAILED [0.6747s] [100%] 2025-12-04T10:58:28.3723558Z 2025-12-04T10:58:28.3723646Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3723798Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3723844Z Traceback (most recent call last): 2025-12-04T10:58:28.3724001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3724042Z method(*args, **kwargs) 2025-12-04T10:58:28.3724193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3724234Z method(*args, **kwargs) 2025-12-04T10:58:28.3724384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3724421Z with policy(): 2025-12-04T10:58:28.3724575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3724617Z raise RuntimeError(msg) 2025-12-04T10:58:28.3725034Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3725037Z 2025-12-04T10:58:28.3725110Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3725402Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3725404Z 2025-12-04T10:58:28.3725490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3725580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3725636Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3725814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3725887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3725924Z graph_break [] 2025-12-04T10:58:28.3726077Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3726122Z Traceback (most recent call last): 2025-12-04T10:58:28.3726274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3726315Z method(*args, **kwargs) 2025-12-04T10:58:28.3726467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3726508Z method(*args, **kwargs) 2025-12-04T10:58:28.3726658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3726695Z with policy(): 2025-12-04T10:58:28.3726846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3726888Z raise RuntimeError(msg) 2025-12-04T10:58:28.3727292Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3727295Z 2025-12-04T10:58:28.3727367Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3727679Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3727682Z 2025-12-04T10:58:28.3727768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3727843Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3727899Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3728077Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3728149Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3728185Z graph_break [] 2025-12-04T10:58:28.3728259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3728315Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3728386Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3728572Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3728608Z graph_break [] 2025-12-04T10:58:28.3728661Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3728810Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3728855Z Traceback (most recent call last): 2025-12-04T10:58:28.3729007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3729049Z method(*args, **kwargs) 2025-12-04T10:58:28.3729213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3729256Z method(*args, **kwargs) 2025-12-04T10:58:28.3729404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3729441Z with policy(): 2025-12-04T10:58:28.3729592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3729633Z raise RuntimeError(msg) 2025-12-04T10:58:28.3730039Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3730042Z 2025-12-04T10:58:28.3730118Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3730409Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3730412Z 2025-12-04T10:58:28.3730498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3730572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3730628Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3730805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3730880Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3730916Z graph_break [] 2025-12-04T10:58:28.3731013Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3731069Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3731142Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3731317Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3731353Z graph_break [] 2025-12-04T10:58:28.3731429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3731483Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3731556Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3731730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3731768Z graph_break [] 2025-12-04T10:58:28.3732016Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2fc909bd641be371.xml - 2025-12-04T10:58:28.3732088Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3732728Z FAILED [0.6747s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3732731Z 2025-12-04T10:58:28.3732802Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3733106Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3733109Z 2025-12-04T10:58:28.3733195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3733282Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3733348Z ================== 1 failed, 57 deselected, 2 rerun in 4.44s =================== 2025-12-04T10:58:28.3733386Z Got exit code 1 2025-12-04T10:58:28.3733627Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3733754Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3733954Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77716e089feec6ff.xml 2025-12-04T10:58:28.3734011Z ============================= test session starts ============================== 2025-12-04T10:58:28.3734121Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3734162Z cachedir: .pytest_cache 2025-12-04T10:58:28.3734321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3734366Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3734407Z configfile: pytest.ini 2025-12-04T10:58:28.3734566Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3734640Z collecting ... collected 58 items / 29 deselected / 29 selected 2025-12-04T10:58:28.3734693Z stepcurrent: skipping 29 already run items. 2025-12-04T10:58:28.3734767Z Running 29 items in this shard 2025-12-04T10:58:28.3734769Z 2025-12-04T10:58:28.3735018Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6068s] [ 3%] 2025-12-04T10:58:28.3735265Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5974s] [ 3%] 2025-12-04T10:58:28.3735487Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.5676s] [ 3%] 2025-12-04T10:58:28.3735490Z 2025-12-04T10:58:28.3735541Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3735692Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3735738Z Traceback (most recent call last): 2025-12-04T10:58:28.3735895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3735954Z method(*args, **kwargs) 2025-12-04T10:58:28.3736106Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3736145Z method(*args, **kwargs) 2025-12-04T10:58:28.3736296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3736333Z with policy(): 2025-12-04T10:58:28.3736485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3736525Z raise RuntimeError(msg) 2025-12-04T10:58:28.3736925Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3736942Z 2025-12-04T10:58:28.3737015Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3737305Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3737307Z 2025-12-04T10:58:28.3737392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3737465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3737521Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3737700Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3737774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3737809Z graph_break [] 2025-12-04T10:58:28.3737958Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3738003Z Traceback (most recent call last): 2025-12-04T10:58:28.3738156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3738195Z method(*args, **kwargs) 2025-12-04T10:58:28.3738346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3738385Z method(*args, **kwargs) 2025-12-04T10:58:28.3738563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3738601Z with policy(): 2025-12-04T10:58:28.3738754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3738794Z raise RuntimeError(msg) 2025-12-04T10:58:28.3739195Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3739197Z 2025-12-04T10:58:28.3739269Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3739558Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3739561Z 2025-12-04T10:58:28.3739647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3739736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3739791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3739967Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3740040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3740075Z graph_break [] 2025-12-04T10:58:28.3740149Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3740203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3740275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3740460Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3740498Z graph_break [] 2025-12-04T10:58:28.3740549Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3740698Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3740742Z Traceback (most recent call last): 2025-12-04T10:58:28.3740896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3740936Z method(*args, **kwargs) 2025-12-04T10:58:28.3741088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3741128Z method(*args, **kwargs) 2025-12-04T10:58:28.3741280Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3741316Z with policy(): 2025-12-04T10:58:28.3741469Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3741509Z raise RuntimeError(msg) 2025-12-04T10:58:28.3741910Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3741912Z 2025-12-04T10:58:28.3741985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3742297Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3742300Z 2025-12-04T10:58:28.3742389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3742462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3742518Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3742692Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3742765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3742800Z graph_break [] 2025-12-04T10:58:28.3742873Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3742927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3743000Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3743176Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3743226Z graph_break [] 2025-12-04T10:58:28.3743322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3743377Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3743448Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3743623Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3743659Z graph_break [] 2025-12-04T10:58:28.3743902Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77716e089feec6ff.xml - 2025-12-04T10:58:28.3743980Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3744617Z FAILED [0.5676s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3744620Z 2025-12-04T10:58:28.3744693Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3744981Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3744984Z 2025-12-04T10:58:28.3745072Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3745134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3745200Z ================== 1 failed, 29 deselected, 2 rerun in 3.92s =================== 2025-12-04T10:58:28.3745237Z Got exit code 1 2025-12-04T10:58:28.3745278Z Retrying single test... 2025-12-04T10:58:28.3745476Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6a5d2830f31d9465.xml 2025-12-04T10:58:28.3745533Z ============================= test session starts ============================== 2025-12-04T10:58:28.3745641Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3745682Z cachedir: .pytest_cache 2025-12-04T10:58:28.3745867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3745915Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3745955Z configfile: pytest.ini 2025-12-04T10:58:28.3746116Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3746189Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3746475Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3746518Z Running 1 items in this shard 2025-12-04T10:58:28.3746520Z 2025-12-04T10:58:28.3746883Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:43.799679123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3746886Z 2025-12-04T10:58:28.3747041Z [W1204 10:42:43.068264643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747061Z 2025-12-04T10:58:28.3747211Z [W1204 10:42:43.068399491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747213Z 2025-12-04T10:58:28.3747362Z [W1204 10:42:43.071747022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747364Z 2025-12-04T10:58:28.3747514Z [W1204 10:42:43.072053719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747516Z 2025-12-04T10:58:28.3747666Z [W1204 10:42:43.072117389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747680Z 2025-12-04T10:58:28.3747829Z [W1204 10:42:43.074362809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747832Z 2025-12-04T10:58:28.3747979Z [W1204 10:42:43.074631877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3747981Z 2025-12-04T10:58:28.3748129Z [W1204 10:42:43.074693336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3748131Z 2025-12-04T10:58:28.3748180Z ('RERUN', {'yellow': True}) [2.8693s] [100%] 2025-12-04T10:58:28.3748540Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:44.233809155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3748543Z 2025-12-04T10:58:28.3748691Z [W1204 10:42:44.234223861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3748696Z 2025-12-04T10:58:28.3748843Z [W1204 10:42:44.234291890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3748845Z 2025-12-04T10:58:28.3748993Z [W1204 10:42:44.235563339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3748995Z 2025-12-04T10:58:28.3749141Z [W1204 10:42:44.235818637 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3749143Z 2025-12-04T10:58:28.3749291Z [W1204 10:42:44.235878266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3749316Z 2025-12-04T10:58:28.3749465Z [W1204 10:42:44.237901859 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3749468Z 2025-12-04T10:58:28.3749616Z [W1204 10:42:44.238252996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3749618Z 2025-12-04T10:58:28.3749767Z [W1204 10:42:44.238317715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3749768Z 2025-12-04T10:58:28.3749816Z ('RERUN', {'yellow': True}) [0.6649s] [100%] 2025-12-04T10:58:28.3750173Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:45.902349412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750177Z 2025-12-04T10:58:28.3750325Z [W1204 10:42:45.902747428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750339Z 2025-12-04T10:58:28.3750488Z [W1204 10:42:45.902817048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750490Z 2025-12-04T10:58:28.3750638Z [W1204 10:42:45.904139436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750640Z 2025-12-04T10:58:28.3750788Z [W1204 10:42:45.904399644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750790Z 2025-12-04T10:58:28.3750938Z [W1204 10:42:45.904460763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3750959Z 2025-12-04T10:58:28.3751107Z [W1204 10:42:45.906491705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3751110Z 2025-12-04T10:58:28.3751258Z [W1204 10:42:45.906833082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3751260Z 2025-12-04T10:58:28.3751407Z [W1204 10:42:45.906895272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3751409Z 2025-12-04T10:58:28.3751448Z FAILED [0.6657s] [100%] 2025-12-04T10:58:28.3751450Z 2025-12-04T10:58:28.3751501Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3751651Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3751697Z Traceback (most recent call last): 2025-12-04T10:58:28.3751856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3751898Z method(*args, **kwargs) 2025-12-04T10:58:28.3752051Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3752092Z method(*args, **kwargs) 2025-12-04T10:58:28.3752242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3752279Z with policy(): 2025-12-04T10:58:28.3752429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3752470Z raise RuntimeError(msg) 2025-12-04T10:58:28.3752886Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3752890Z 2025-12-04T10:58:28.3752965Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3753290Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3753292Z 2025-12-04T10:58:28.3753381Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3753454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3753511Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3753688Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3753763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3753820Z graph_break [] 2025-12-04T10:58:28.3753970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3754015Z Traceback (most recent call last): 2025-12-04T10:58:28.3754168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3754208Z method(*args, **kwargs) 2025-12-04T10:58:28.3754359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3754399Z method(*args, **kwargs) 2025-12-04T10:58:28.3754547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3754600Z with policy(): 2025-12-04T10:58:28.3754753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3754797Z raise RuntimeError(msg) 2025-12-04T10:58:28.3755193Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3755196Z 2025-12-04T10:58:28.3755269Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3755557Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3755560Z 2025-12-04T10:58:28.3755651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3755723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3755781Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3755956Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3756029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3756065Z graph_break [] 2025-12-04T10:58:28.3756139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3756193Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3756266Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3756469Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3756506Z graph_break [] 2025-12-04T10:58:28.3756559Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3756706Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3756751Z Traceback (most recent call last): 2025-12-04T10:58:28.3756904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3756945Z method(*args, **kwargs) 2025-12-04T10:58:28.3757096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3757136Z method(*args, **kwargs) 2025-12-04T10:58:28.3757285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3757325Z with policy(): 2025-12-04T10:58:28.3757476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3757533Z raise RuntimeError(msg) 2025-12-04T10:58:28.3757930Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3757932Z 2025-12-04T10:58:28.3758006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3758295Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3758308Z 2025-12-04T10:58:28.3758397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3758471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3758527Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3758702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3758775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3758811Z graph_break [] 2025-12-04T10:58:28.3758883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3758938Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3759008Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3759185Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3759222Z graph_break [] 2025-12-04T10:58:28.3759295Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3759348Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3759421Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3759595Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3759633Z graph_break [] 2025-12-04T10:58:28.3759877Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6a5d2830f31d9465.xml - 2025-12-04T10:58:28.3759937Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3760597Z FAILED [0.6657s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3760602Z 2025-12-04T10:58:28.3760675Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3760964Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3760966Z 2025-12-04T10:58:28.3761051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3761117Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3761196Z ================== 1 failed, 57 deselected, 2 rerun in 4.37s =================== 2025-12-04T10:58:28.3761234Z Got exit code 1 2025-12-04T10:58:28.3761273Z Retrying single test... 2025-12-04T10:58:28.3761470Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-203dcb1c11c21baa.xml 2025-12-04T10:58:28.3761526Z ============================= test session starts ============================== 2025-12-04T10:58:28.3761636Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3761677Z cachedir: .pytest_cache 2025-12-04T10:58:28.3761835Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3761892Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3761936Z configfile: pytest.ini 2025-12-04T10:58:28.3762097Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3762171Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3762455Z stepcurrent: skipping 29 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3762501Z Running 1 items in this shard 2025-12-04T10:58:28.3762503Z 2025-12-04T10:58:28.3762865Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:54.263169598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3762870Z 2025-12-04T10:58:28.3763026Z [W1204 10:42:55.521588116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763029Z 2025-12-04T10:58:28.3763182Z [W1204 10:42:55.521735025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763184Z 2025-12-04T10:58:28.3763358Z [W1204 10:42:55.525734439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763360Z 2025-12-04T10:58:28.3763509Z [W1204 10:42:55.526139285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763511Z 2025-12-04T10:58:28.3763659Z [W1204 10:42:55.526206895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763661Z 2025-12-04T10:58:28.3763850Z [W1204 10:42:55.528661353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3763853Z 2025-12-04T10:58:28.3764002Z [W1204 10:42:55.528968780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3764004Z 2025-12-04T10:58:28.3764150Z [W1204 10:42:55.529035060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3764152Z 2025-12-04T10:58:28.3764201Z ('RERUN', {'yellow': True}) [2.8403s] [100%] 2025-12-04T10:58:28.3764556Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:56.669111482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3764558Z 2025-12-04T10:58:28.3764710Z [W1204 10:42:56.669610047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3764736Z 2025-12-04T10:58:28.3764885Z [W1204 10:42:56.669680587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3764888Z 2025-12-04T10:58:28.3765036Z [W1204 10:42:56.671062944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765038Z 2025-12-04T10:58:28.3765186Z [W1204 10:42:56.671336962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765188Z 2025-12-04T10:58:28.3765335Z [W1204 10:42:56.671397771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765337Z 2025-12-04T10:58:28.3765486Z [W1204 10:42:56.673475373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765503Z 2025-12-04T10:58:28.3765652Z [W1204 10:42:56.673838580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765654Z 2025-12-04T10:58:28.3765806Z [W1204 10:42:56.673901639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3765807Z 2025-12-04T10:58:28.3765856Z ('RERUN', {'yellow': True}) [0.6682s] [100%] 2025-12-04T10:58:28.3766210Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:42:57.362410463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766212Z 2025-12-04T10:58:28.3766364Z [W1204 10:42:57.362802460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766366Z 2025-12-04T10:58:28.3766514Z [W1204 10:42:57.362877829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766516Z 2025-12-04T10:58:28.3766665Z [W1204 10:42:57.364148268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766667Z 2025-12-04T10:58:28.3766815Z [W1204 10:42:57.364413855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766817Z 2025-12-04T10:58:28.3766966Z [W1204 10:42:57.364474515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3766968Z 2025-12-04T10:58:28.3767137Z [W1204 10:42:57.366489627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3767140Z 2025-12-04T10:58:28.3767290Z [W1204 10:42:57.366829004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3767292Z 2025-12-04T10:58:28.3767441Z [W1204 10:42:57.366891733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3767443Z 2025-12-04T10:58:28.3767481Z FAILED [0.6719s] [100%] 2025-12-04T10:58:28.3767483Z 2025-12-04T10:58:28.3767536Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3767685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3767732Z Traceback (most recent call last): 2025-12-04T10:58:28.3767891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3767933Z method(*args, **kwargs) 2025-12-04T10:58:28.3768102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3768142Z method(*args, **kwargs) 2025-12-04T10:58:28.3768296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3768332Z with policy(): 2025-12-04T10:58:28.3768487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3768527Z raise RuntimeError(msg) 2025-12-04T10:58:28.3768922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3768936Z 2025-12-04T10:58:28.3769011Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3769304Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3769306Z 2025-12-04T10:58:28.3769393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3769466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3769522Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3769699Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3769774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3769811Z graph_break [] 2025-12-04T10:58:28.3769960Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3770008Z Traceback (most recent call last): 2025-12-04T10:58:28.3770160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3770201Z method(*args, **kwargs) 2025-12-04T10:58:28.3770353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3770392Z method(*args, **kwargs) 2025-12-04T10:58:28.3770541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3771939Z with policy(): 2025-12-04T10:58:28.3772129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3772172Z raise RuntimeError(msg) 2025-12-04T10:58:28.3772575Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3772578Z 2025-12-04T10:58:28.3772650Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3772944Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3772946Z 2025-12-04T10:58:28.3773032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3773112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3773168Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3773387Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3773460Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3773497Z graph_break [] 2025-12-04T10:58:28.3773570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3773625Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3773696Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3773872Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3773929Z graph_break [] 2025-12-04T10:58:28.3773982Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3774134Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3774178Z Traceback (most recent call last): 2025-12-04T10:58:28.3774333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3774373Z method(*args, **kwargs) 2025-12-04T10:58:28.3774523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3774562Z method(*args, **kwargs) 2025-12-04T10:58:28.3774712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3774749Z with policy(): 2025-12-04T10:58:28.3774907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3774947Z raise RuntimeError(msg) 2025-12-04T10:58:28.3775352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3775354Z 2025-12-04T10:58:28.3775426Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3775715Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3775717Z 2025-12-04T10:58:28.3775803Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3775905Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3775961Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3776137Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3776209Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3776245Z graph_break [] 2025-12-04T10:58:28.3776318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3776373Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3776443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3776618Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3776659Z graph_break [] 2025-12-04T10:58:28.3776730Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3776802Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3776872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3777047Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3777083Z graph_break [] 2025-12-04T10:58:28.3777327Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-203dcb1c11c21baa.xml - 2025-12-04T10:58:28.3777386Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3778015Z FAILED [0.6719s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3778033Z 2025-12-04T10:58:28.3778105Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3778392Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3778394Z 2025-12-04T10:58:28.3778480Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3778545Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3778613Z ================== 1 failed, 57 deselected, 2 rerun in 4.34s =================== 2025-12-04T10:58:28.3778650Z Got exit code 1 2025-12-04T10:58:28.3778889Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3779016Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3779214Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f3c6f9decd24c7bb.xml 2025-12-04T10:58:28.3779271Z ============================= test session starts ============================== 2025-12-04T10:58:28.3779382Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3779423Z cachedir: .pytest_cache 2025-12-04T10:58:28.3779613Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3779660Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3779702Z configfile: pytest.ini 2025-12-04T10:58:28.3779861Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3779936Z collecting ... collected 58 items / 30 deselected / 28 selected 2025-12-04T10:58:28.3779988Z stepcurrent: skipping 30 already run items. 2025-12-04T10:58:28.3780033Z Running 28 items in this shard 2025-12-04T10:58:28.3780035Z 2025-12-04T10:58:28.3780283Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.9869s] [ 3%] 2025-12-04T10:58:28.3780530Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6477s] [ 3%] 2025-12-04T10:58:28.3780766Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.6091s] [ 3%] 2025-12-04T10:58:28.3780768Z 2025-12-04T10:58:28.3780818Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3780968Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3781013Z Traceback (most recent call last): 2025-12-04T10:58:28.3781170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3781209Z method(*args, **kwargs) 2025-12-04T10:58:28.3781363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3781412Z method(*args, **kwargs) 2025-12-04T10:58:28.3781564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3781603Z with policy(): 2025-12-04T10:58:28.3781754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3781794Z raise RuntimeError(msg) 2025-12-04T10:58:28.3782185Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3782187Z 2025-12-04T10:58:28.3782260Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3782551Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3782556Z 2025-12-04T10:58:28.3782643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3782716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3782773Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3783050Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3783124Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3783159Z graph_break [] 2025-12-04T10:58:28.3783379Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3783425Z Traceback (most recent call last): 2025-12-04T10:58:28.3783579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3783618Z method(*args, **kwargs) 2025-12-04T10:58:28.3783770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3783808Z method(*args, **kwargs) 2025-12-04T10:58:28.3783960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3783996Z with policy(): 2025-12-04T10:58:28.3784148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3784188Z raise RuntimeError(msg) 2025-12-04T10:58:28.3784588Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3784609Z 2025-12-04T10:58:28.3784681Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3784970Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3784972Z 2025-12-04T10:58:28.3785058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3785130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3785186Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3785475Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3785550Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3785586Z graph_break [] 2025-12-04T10:58:28.3785658Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3785712Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3785783Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3786055Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3786093Z graph_break [] 2025-12-04T10:58:28.3786146Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3786296Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3786340Z Traceback (most recent call last): 2025-12-04T10:58:28.3786493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3786533Z method(*args, **kwargs) 2025-12-04T10:58:28.3786683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3786722Z method(*args, **kwargs) 2025-12-04T10:58:28.3786873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3786910Z with policy(): 2025-12-04T10:58:28.3787084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3787126Z raise RuntimeError(msg) 2025-12-04T10:58:28.3787528Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3787530Z 2025-12-04T10:58:28.3787603Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3787889Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3787891Z 2025-12-04T10:58:28.3787977Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3788052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3788124Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3788399Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3788472Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3788508Z graph_break [] 2025-12-04T10:58:28.3788580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3788634Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3788706Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3788977Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3789026Z graph_break [] 2025-12-04T10:58:28.3789098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3789152Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3789223Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3789491Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3789527Z graph_break [] 2025-12-04T10:58:28.3789772Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f3c6f9decd24c7bb.xml - 2025-12-04T10:58:28.3789833Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3790465Z FAILED [0.6091s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3790468Z 2025-12-04T10:58:28.3790541Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3790827Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3790854Z 2025-12-04T10:58:28.3790940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3791003Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3791068Z ================== 1 failed, 30 deselected, 2 rerun in 4.39s =================== 2025-12-04T10:58:28.3791106Z Got exit code 1 2025-12-04T10:58:28.3791145Z Retrying single test... 2025-12-04T10:58:28.3791344Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7feb3a661ebb3a9f.xml 2025-12-04T10:58:28.3791400Z ============================= test session starts ============================== 2025-12-04T10:58:28.3791511Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3791551Z cachedir: .pytest_cache 2025-12-04T10:58:28.3791711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3791757Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3791813Z configfile: pytest.ini 2025-12-04T10:58:28.3791973Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3792046Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3792332Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3792376Z Running 1 items in this shard 2025-12-04T10:58:28.3792379Z 2025-12-04T10:58:28.3792743Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:18.071899977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3792757Z 2025-12-04T10:58:28.3792911Z [W1204 10:43:19.346353799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3792914Z 2025-12-04T10:58:28.3793066Z [W1204 10:43:19.346505238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793068Z 2025-12-04T10:58:28.3793216Z [W1204 10:43:19.350035476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793218Z 2025-12-04T10:58:28.3793411Z [W1204 10:43:19.350349293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793413Z 2025-12-04T10:58:28.3793563Z [W1204 10:43:19.350410032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793565Z 2025-12-04T10:58:28.3793715Z [W1204 10:43:19.352546013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793718Z 2025-12-04T10:58:28.3793865Z [W1204 10:43:19.352818820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3793868Z 2025-12-04T10:58:28.3794016Z [W1204 10:43:19.352878830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3794018Z 2025-12-04T10:58:28.3794067Z ('RERUN', {'yellow': True}) [3.1978s] [100%] 2025-12-04T10:58:28.3794461Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:19.964099714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3794464Z 2025-12-04T10:58:28.3794615Z [W1204 10:43:19.964459100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3794618Z 2025-12-04T10:58:28.3794766Z [W1204 10:43:19.964523690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3794768Z 2025-12-04T10:58:28.3794917Z [W1204 10:43:19.965785658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3794919Z 2025-12-04T10:58:28.3795068Z [W1204 10:43:19.966049016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3795070Z 2025-12-04T10:58:28.3795219Z [W1204 10:43:19.966113265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3795222Z 2025-12-04T10:58:28.3795371Z [W1204 10:43:19.968121207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3795387Z 2025-12-04T10:58:28.3795535Z [W1204 10:43:19.968383365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3795537Z 2025-12-04T10:58:28.3795686Z [W1204 10:43:19.968444314 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3795688Z 2025-12-04T10:58:28.3795735Z ('RERUN', {'yellow': True}) [0.4759s] [100%] 2025-12-04T10:58:28.3796095Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:20.443289618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796112Z 2025-12-04T10:58:28.3796261Z [W1204 10:43:20.443646845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796264Z 2025-12-04T10:58:28.3796412Z [W1204 10:43:20.443711924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796414Z 2025-12-04T10:58:28.3796562Z [W1204 10:43:20.445030592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796565Z 2025-12-04T10:58:28.3796712Z [W1204 10:43:20.445291130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796714Z 2025-12-04T10:58:28.3796863Z [W1204 10:43:20.445351569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3796868Z 2025-12-04T10:58:28.3797019Z [W1204 10:43:20.447376091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3797022Z 2025-12-04T10:58:28.3797171Z [W1204 10:43:20.447639888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3797172Z 2025-12-04T10:58:28.3797320Z [W1204 10:43:20.447700108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3797322Z 2025-12-04T10:58:28.3797360Z FAILED [0.4729s] [100%] 2025-12-04T10:58:28.3797362Z 2025-12-04T10:58:28.3797414Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3797563Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3797610Z Traceback (most recent call last): 2025-12-04T10:58:28.3797788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3797830Z method(*args, **kwargs) 2025-12-04T10:58:28.3797982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3798022Z method(*args, **kwargs) 2025-12-04T10:58:28.3798172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3798210Z with policy(): 2025-12-04T10:58:28.3798362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3798403Z raise RuntimeError(msg) 2025-12-04T10:58:28.3798797Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3798819Z 2025-12-04T10:58:28.3798892Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3799184Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3799186Z 2025-12-04T10:58:28.3799272Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3799346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3799402Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3799680Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3799765Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3799802Z graph_break [] 2025-12-04T10:58:28.3799952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3799998Z Traceback (most recent call last): 2025-12-04T10:58:28.3800153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3800193Z method(*args, **kwargs) 2025-12-04T10:58:28.3800343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3800383Z method(*args, **kwargs) 2025-12-04T10:58:28.3800534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3800571Z with policy(): 2025-12-04T10:58:28.3800723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3800766Z raise RuntimeError(msg) 2025-12-04T10:58:28.3801165Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3801167Z 2025-12-04T10:58:28.3801239Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3801556Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3801560Z 2025-12-04T10:58:28.3801646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3801722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3801777Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3802050Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3802123Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3802159Z graph_break [] 2025-12-04T10:58:28.3802232Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3802287Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3802360Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3802631Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3802681Z graph_break [] 2025-12-04T10:58:28.3802733Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3802882Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3802927Z Traceback (most recent call last): 2025-12-04T10:58:28.3803080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3803119Z method(*args, **kwargs) 2025-12-04T10:58:28.3803304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3803358Z method(*args, **kwargs) 2025-12-04T10:58:28.3803508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3803547Z with policy(): 2025-12-04T10:58:28.3803699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3803739Z raise RuntimeError(msg) 2025-12-04T10:58:28.3804138Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3804140Z 2025-12-04T10:58:28.3804213Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3804506Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3804509Z 2025-12-04T10:58:28.3804595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3804668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3804723Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3804994Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3805068Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3805104Z graph_break [] 2025-12-04T10:58:28.3805207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3805261Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3805334Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3805606Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3805642Z graph_break [] 2025-12-04T10:58:28.3805714Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3805768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3805840Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3806112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3806148Z graph_break [] 2025-12-04T10:58:28.3806407Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7feb3a661ebb3a9f.xml - 2025-12-04T10:58:28.3806465Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3807095Z FAILED [0.4729s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3807115Z 2025-12-04T10:58:28.3807189Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3807475Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3807478Z 2025-12-04T10:58:28.3807563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3807624Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3807691Z ================== 1 failed, 57 deselected, 2 rerun in 4.31s =================== 2025-12-04T10:58:28.3807727Z Got exit code 1 2025-12-04T10:58:28.3807768Z Retrying single test... 2025-12-04T10:58:28.3807965Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9c3667b4d3b908ee.xml 2025-12-04T10:58:28.3808026Z ============================= test session starts ============================== 2025-12-04T10:58:28.3808135Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3808177Z cachedir: .pytest_cache 2025-12-04T10:58:28.3808335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3808380Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3808420Z configfile: pytest.ini 2025-12-04T10:58:28.3808580Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3808653Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3808938Z stepcurrent: skipping 30 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3809004Z Running 1 items in this shard 2025-12-04T10:58:28.3809006Z 2025-12-04T10:58:28.3809365Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:29.901182692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3809368Z 2025-12-04T10:58:28.3809521Z [W1204 10:43:29.161613215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3809523Z 2025-12-04T10:58:28.3809673Z [W1204 10:43:29.161772604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3809675Z 2025-12-04T10:58:28.3809825Z [W1204 10:43:29.165739997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3809829Z 2025-12-04T10:58:28.3809976Z [W1204 10:43:29.166061024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3809991Z 2025-12-04T10:58:28.3810141Z [W1204 10:43:29.166122924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3810142Z 2025-12-04T10:58:28.3810291Z [W1204 10:43:29.168416953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3810293Z 2025-12-04T10:58:28.3810440Z [W1204 10:43:29.168695190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3810442Z 2025-12-04T10:58:28.3810591Z [W1204 10:43:29.168755490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3810604Z 2025-12-04T10:58:28.3810654Z ('RERUN', {'yellow': True}) [3.2765s] [100%] 2025-12-04T10:58:28.3811009Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:30.002094865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811013Z 2025-12-04T10:58:28.3811160Z [W1204 10:43:30.002568110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811163Z 2025-12-04T10:58:28.3811310Z [W1204 10:43:30.002647000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811312Z 2025-12-04T10:58:28.3811460Z [W1204 10:43:30.004164216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811463Z 2025-12-04T10:58:28.3811612Z [W1204 10:43:30.004442113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811614Z 2025-12-04T10:58:28.3811763Z [W1204 10:43:30.004502762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811765Z 2025-12-04T10:58:28.3811913Z [W1204 10:43:30.006761932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3811915Z 2025-12-04T10:58:28.3812064Z [W1204 10:43:30.007035039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3812066Z 2025-12-04T10:58:28.3812215Z [W1204 10:43:30.007098789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3812218Z 2025-12-04T10:58:28.3812286Z ('RERUN', {'yellow': True}) [0.6822s] [100%] 2025-12-04T10:58:28.3812642Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:43:31.732893169 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3812645Z 2025-12-04T10:58:28.3812793Z [W1204 10:43:31.733292815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3812795Z 2025-12-04T10:58:28.3812942Z [W1204 10:43:31.733370034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3812944Z 2025-12-04T10:58:28.3813090Z [W1204 10:43:31.734677162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813094Z 2025-12-04T10:58:28.3813243Z [W1204 10:43:31.734950960 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813289Z 2025-12-04T10:58:28.3813437Z [W1204 10:43:31.735024229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813439Z 2025-12-04T10:58:28.3813586Z [W1204 10:43:31.737181919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813588Z 2025-12-04T10:58:28.3813736Z [W1204 10:43:31.737453217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813737Z 2025-12-04T10:58:28.3813884Z [W1204 10:43:31.737515956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3813886Z 2025-12-04T10:58:28.3813947Z FAILED [0.7473s] [100%] 2025-12-04T10:58:28.3813951Z 2025-12-04T10:58:28.3814002Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3814151Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3814197Z Traceback (most recent call last): 2025-12-04T10:58:28.3814353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3814394Z method(*args, **kwargs) 2025-12-04T10:58:28.3814546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3814586Z method(*args, **kwargs) 2025-12-04T10:58:28.3814736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3814773Z with policy(): 2025-12-04T10:58:28.3814927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3814972Z raise RuntimeError(msg) 2025-12-04T10:58:28.3815369Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3815372Z 2025-12-04T10:58:28.3815446Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3815736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3815738Z 2025-12-04T10:58:28.3815853Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3815928Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3815986Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3816259Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3816333Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3816370Z graph_break [] 2025-12-04T10:58:28.3816518Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3816564Z Traceback (most recent call last): 2025-12-04T10:58:28.3816717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3816760Z method(*args, **kwargs) 2025-12-04T10:58:28.3816910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3816964Z method(*args, **kwargs) 2025-12-04T10:58:28.3817113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3817150Z with policy(): 2025-12-04T10:58:28.3817301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3817342Z raise RuntimeError(msg) 2025-12-04T10:58:28.3817739Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3817755Z 2025-12-04T10:58:28.3817828Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3818118Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3818120Z 2025-12-04T10:58:28.3818207Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3818280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3818336Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3818607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3818683Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3818720Z graph_break [] 2025-12-04T10:58:28.3818792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3818848Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3818918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3819187Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3819223Z graph_break [] 2025-12-04T10:58:28.3819274Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3819422Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3819469Z Traceback (most recent call last): 2025-12-04T10:58:28.3819642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3819683Z method(*args, **kwargs) 2025-12-04T10:58:28.3819833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3819873Z method(*args, **kwargs) 2025-12-04T10:58:28.3820023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3820059Z with policy(): 2025-12-04T10:58:28.3820210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3820251Z raise RuntimeError(msg) 2025-12-04T10:58:28.3820652Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3820673Z 2025-12-04T10:58:28.3820747Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3821036Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3821039Z 2025-12-04T10:58:28.3821124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3821198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3821252Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3821526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3821610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3821647Z graph_break [] 2025-12-04T10:58:28.3821719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3821773Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3821844Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3822115Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3822151Z graph_break [] 2025-12-04T10:58:28.3822223Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3822279Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3822350Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3822619Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3822655Z graph_break [] 2025-12-04T10:58:28.3822897Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9c3667b4d3b908ee.xml - 2025-12-04T10:58:28.3822956Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3823647Z FAILED [0.7473s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3823652Z 2025-12-04T10:58:28.3823724Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3824014Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3824016Z 2025-12-04T10:58:28.3824101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3824163Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3824230Z ================== 1 failed, 57 deselected, 2 rerun in 4.87s =================== 2025-12-04T10:58:28.3824268Z Got exit code 1 2025-12-04T10:58:28.3824507Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3824652Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3824851Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34c4fd167c043707.xml 2025-12-04T10:58:28.3824907Z ============================= test session starts ============================== 2025-12-04T10:58:28.3825016Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3825057Z cachedir: .pytest_cache 2025-12-04T10:58:28.3825216Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3825276Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3825316Z configfile: pytest.ini 2025-12-04T10:58:28.3825477Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3825552Z collecting ... collected 58 items / 31 deselected / 27 selected 2025-12-04T10:58:28.3825605Z stepcurrent: skipping 31 already run items. 2025-12-04T10:58:28.3825649Z Running 27 items in this shard 2025-12-04T10:58:28.3825651Z 2025-12-04T10:58:28.3825904Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5457s] [ 3%] 2025-12-04T10:58:28.3826153Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5289s] [ 3%] 2025-12-04T10:58:28.3826378Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.5036s] [ 3%] 2025-12-04T10:58:28.3826381Z 2025-12-04T10:58:28.3826432Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3826583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3826629Z Traceback (most recent call last): 2025-12-04T10:58:28.3826783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3826823Z method(*args, **kwargs) 2025-12-04T10:58:28.3826975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3827015Z method(*args, **kwargs) 2025-12-04T10:58:28.3827190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3827227Z with policy(): 2025-12-04T10:58:28.3827379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3827419Z raise RuntimeError(msg) 2025-12-04T10:58:28.3827824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3827826Z 2025-12-04T10:58:28.3827898Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3828191Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3828206Z 2025-12-04T10:58:28.3828292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3828366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3828421Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3828599Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3828671Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3828707Z graph_break [] 2025-12-04T10:58:28.3828858Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3828903Z Traceback (most recent call last): 2025-12-04T10:58:28.3829069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3829110Z method(*args, **kwargs) 2025-12-04T10:58:28.3829261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3829299Z method(*args, **kwargs) 2025-12-04T10:58:28.3829448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3829484Z with policy(): 2025-12-04T10:58:28.3829636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3829676Z raise RuntimeError(msg) 2025-12-04T10:58:28.3830095Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3830099Z 2025-12-04T10:58:28.3830171Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3830462Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3830465Z 2025-12-04T10:58:28.3830550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3830624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3830678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3830876Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3830950Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3830988Z graph_break [] 2025-12-04T10:58:28.3831059Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3831114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3831185Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3831361Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3831398Z graph_break [] 2025-12-04T10:58:28.3831449Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3831601Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3831647Z Traceback (most recent call last): 2025-12-04T10:58:28.3831800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3831854Z method(*args, **kwargs) 2025-12-04T10:58:28.3832007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3832046Z method(*args, **kwargs) 2025-12-04T10:58:28.3832197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3832233Z with policy(): 2025-12-04T10:58:28.3832384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3832424Z raise RuntimeError(msg) 2025-12-04T10:58:28.3832835Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3832850Z 2025-12-04T10:58:28.3832923Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3833213Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3833215Z 2025-12-04T10:58:28.3833332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3833406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3833460Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3833637Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3833710Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3833746Z graph_break [] 2025-12-04T10:58:28.3833818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3833872Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3833944Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3834119Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3834155Z graph_break [] 2025-12-04T10:58:28.3834226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3834280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3834387Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3834562Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3834599Z graph_break [] 2025-12-04T10:58:28.3834841Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34c4fd167c043707.xml - 2025-12-04T10:58:28.3834900Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3835544Z FAILED [0.5036s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3835567Z 2025-12-04T10:58:28.3835640Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3835931Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3835933Z 2025-12-04T10:58:28.3836019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3836080Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3836146Z ================== 1 failed, 31 deselected, 2 rerun in 3.74s =================== 2025-12-04T10:58:28.3836182Z Got exit code 1 2025-12-04T10:58:28.3836223Z Retrying single test... 2025-12-04T10:58:28.3836436Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-248680a860d6c70b.xml 2025-12-04T10:58:28.3836494Z ============================= test session starts ============================== 2025-12-04T10:58:28.3836602Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3836643Z cachedir: .pytest_cache 2025-12-04T10:58:28.3836800Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3836846Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3836886Z configfile: pytest.ini 2025-12-04T10:58:28.3837046Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3837118Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3837405Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3837449Z Running 1 items in this shard 2025-12-04T10:58:28.3837451Z 2025-12-04T10:58:28.3837819Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:43:51.295863855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3837822Z 2025-12-04T10:58:28.3837975Z [W1204 10:43:51.575126868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3837977Z 2025-12-04T10:58:28.3838128Z [W1204 10:43:51.575284147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838131Z 2025-12-04T10:58:28.3838303Z [W1204 10:43:51.579034122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838306Z 2025-12-04T10:58:28.3838455Z [W1204 10:43:51.579445458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838457Z 2025-12-04T10:58:28.3838607Z [W1204 10:43:51.579509847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838609Z 2025-12-04T10:58:28.3838756Z [W1204 10:43:51.582055283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838759Z 2025-12-04T10:58:28.3838906Z [W1204 10:43:51.582358360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3838909Z 2025-12-04T10:58:28.3839058Z [W1204 10:43:51.582419240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3839071Z 2025-12-04T10:58:28.3839119Z ('RERUN', {'yellow': True}) [2.9345s] [100%] 2025-12-04T10:58:28.3839482Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:43:52.788817641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3839484Z 2025-12-04T10:58:28.3839632Z [W1204 10:43:52.789204917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3839634Z 2025-12-04T10:58:28.3839782Z [W1204 10:43:52.789270886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3839794Z 2025-12-04T10:58:28.3839944Z [W1204 10:43:52.790547134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3839947Z 2025-12-04T10:58:28.3840094Z [W1204 10:43:52.790810772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3840096Z 2025-12-04T10:58:28.3840244Z [W1204 10:43:52.790871361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3840246Z 2025-12-04T10:58:28.3840394Z [W1204 10:43:52.792866982 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3840396Z 2025-12-04T10:58:28.3840544Z [W1204 10:43:52.793211589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3840546Z 2025-12-04T10:58:28.3840697Z [W1204 10:43:52.793277969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3840701Z 2025-12-04T10:58:28.3840749Z ('RERUN', {'yellow': True}) [0.7083s] [100%] 2025-12-04T10:58:28.3841109Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:43:53.464144102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841112Z 2025-12-04T10:58:28.3841260Z [W1204 10:43:53.464539279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841263Z 2025-12-04T10:58:28.3841411Z [W1204 10:43:53.464607748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841414Z 2025-12-04T10:58:28.3841584Z [W1204 10:43:53.465890406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841587Z 2025-12-04T10:58:28.3841736Z [W1204 10:43:53.466151503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841738Z 2025-12-04T10:58:28.3841887Z [W1204 10:43:53.466213413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3841889Z 2025-12-04T10:58:28.3842036Z [W1204 10:43:53.468287073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3842038Z 2025-12-04T10:58:28.3842186Z [W1204 10:43:53.468628260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3842188Z 2025-12-04T10:58:28.3842338Z [W1204 10:43:53.468690699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3842351Z 2025-12-04T10:58:28.3842390Z FAILED [0.6765s] [100%] 2025-12-04T10:58:28.3842392Z 2025-12-04T10:58:28.3842443Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3842596Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3842641Z Traceback (most recent call last): 2025-12-04T10:58:28.3842798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3842838Z method(*args, **kwargs) 2025-12-04T10:58:28.3842990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3843029Z method(*args, **kwargs) 2025-12-04T10:58:28.3843192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3843230Z with policy(): 2025-12-04T10:58:28.3843417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3843459Z raise RuntimeError(msg) 2025-12-04T10:58:28.3843859Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3843862Z 2025-12-04T10:58:28.3843936Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3844228Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3844231Z 2025-12-04T10:58:28.3844320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3844392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3844448Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3844625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3844697Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3844733Z graph_break [] 2025-12-04T10:58:28.3844887Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3844931Z Traceback (most recent call last): 2025-12-04T10:58:28.3845113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3845154Z method(*args, **kwargs) 2025-12-04T10:58:28.3845305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3845345Z method(*args, **kwargs) 2025-12-04T10:58:28.3845493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3845530Z with policy(): 2025-12-04T10:58:28.3845680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3845721Z raise RuntimeError(msg) 2025-12-04T10:58:28.3846131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3846150Z 2025-12-04T10:58:28.3846224Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3846513Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3846515Z 2025-12-04T10:58:28.3846602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3846674Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3846730Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3846906Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3846993Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3847030Z graph_break [] 2025-12-04T10:58:28.3847103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3847157Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3847229Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3847403Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3847440Z graph_break [] 2025-12-04T10:58:28.3847492Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3847644Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3847690Z Traceback (most recent call last): 2025-12-04T10:58:28.3847844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3847885Z method(*args, **kwargs) 2025-12-04T10:58:28.3848035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3848075Z method(*args, **kwargs) 2025-12-04T10:58:28.3848223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3848260Z with policy(): 2025-12-04T10:58:28.3848410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3848451Z raise RuntimeError(msg) 2025-12-04T10:58:28.3848894Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3848899Z 2025-12-04T10:58:28.3848973Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3849264Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3849266Z 2025-12-04T10:58:28.3849352Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3849425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3849479Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3849655Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3849729Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3849778Z graph_break [] 2025-12-04T10:58:28.3849850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3849905Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3849975Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3850150Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3850186Z graph_break [] 2025-12-04T10:58:28.3850258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3850312Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3850384Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3850570Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3850607Z graph_break [] 2025-12-04T10:58:28.3850847Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-248680a860d6c70b.xml - 2025-12-04T10:58:28.3850906Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3851552Z FAILED [0.6765s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3851555Z 2025-12-04T10:58:28.3851629Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3851918Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3851920Z 2025-12-04T10:58:28.3852005Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3852067Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3852132Z ================== 1 failed, 57 deselected, 2 rerun in 4.48s =================== 2025-12-04T10:58:28.3852170Z Got exit code 1 2025-12-04T10:58:28.3852209Z Retrying single test... 2025-12-04T10:58:28.3852427Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ae35ad306f00234.xml 2025-12-04T10:58:28.3852485Z ============================= test session starts ============================== 2025-12-04T10:58:28.3852596Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3852636Z cachedir: .pytest_cache 2025-12-04T10:58:28.3852794Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3852838Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3852880Z configfile: pytest.ini 2025-12-04T10:58:28.3853039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3853112Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3853439Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3853503Z Running 1 items in this shard 2025-12-04T10:58:28.3853506Z 2025-12-04T10:58:28.3853871Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:44:02.750400132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3853874Z 2025-12-04T10:58:28.3854025Z [W1204 10:44:02.021226349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854027Z 2025-12-04T10:58:28.3854178Z [W1204 10:44:02.021376407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854193Z 2025-12-04T10:58:28.3854344Z [W1204 10:44:02.025167151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854347Z 2025-12-04T10:58:28.3854495Z [W1204 10:44:02.025568807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854497Z 2025-12-04T10:58:28.3854645Z [W1204 10:44:02.025636877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854647Z 2025-12-04T10:58:28.3854795Z [W1204 10:44:02.027932255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854797Z 2025-12-04T10:58:28.3854945Z [W1204 10:44:02.028246772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3854947Z 2025-12-04T10:58:28.3855096Z [W1204 10:44:02.028312491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3855099Z 2025-12-04T10:58:28.3855147Z ('RERUN', {'yellow': True}) [2.9547s] [100%] 2025-12-04T10:58:28.3855509Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:44:03.266335719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3855511Z 2025-12-04T10:58:28.3855660Z [W1204 10:44:03.266728886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3855662Z 2025-12-04T10:58:28.3855811Z [W1204 10:44:03.266793565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3855814Z 2025-12-04T10:58:28.3855990Z [W1204 10:44:04.268064723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3855993Z 2025-12-04T10:58:28.3856142Z [W1204 10:44:04.268327460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3856144Z 2025-12-04T10:58:28.3856291Z [W1204 10:44:04.268388880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3856293Z 2025-12-04T10:58:28.3856442Z [W1204 10:44:04.270402061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3856443Z 2025-12-04T10:58:28.3856591Z [W1204 10:44:04.270737337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3856593Z 2025-12-04T10:58:28.3856743Z [W1204 10:44:04.270799927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3856745Z 2025-12-04T10:58:28.3856805Z ('RERUN', {'yellow': True}) [0.7456s] [100%] 2025-12-04T10:58:28.3857162Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:44:04.998159293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857164Z 2025-12-04T10:58:28.3857312Z [W1204 10:44:04.998552299 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857314Z 2025-12-04T10:58:28.3857460Z [W1204 10:44:04.998616839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857462Z 2025-12-04T10:58:28.3857612Z [W1204 10:44:04.999874077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857625Z 2025-12-04T10:58:28.3857774Z [W1204 10:44:04.000135414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857776Z 2025-12-04T10:58:28.3857925Z [W1204 10:44:04.000197984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3857927Z 2025-12-04T10:58:28.3858077Z [W1204 10:44:04.002224304 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3858079Z 2025-12-04T10:58:28.3858226Z [W1204 10:44:04.002559911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3858228Z 2025-12-04T10:58:28.3858378Z [W1204 10:44:04.002621781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3858380Z 2025-12-04T10:58:28.3858418Z FAILED [0.7287s] [100%] 2025-12-04T10:58:28.3858421Z 2025-12-04T10:58:28.3858473Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3858625Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3858672Z Traceback (most recent call last): 2025-12-04T10:58:28.3858826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3858867Z method(*args, **kwargs) 2025-12-04T10:58:28.3859019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3859059Z method(*args, **kwargs) 2025-12-04T10:58:28.3859234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3859271Z with policy(): 2025-12-04T10:58:28.3859425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3859466Z raise RuntimeError(msg) 2025-12-04T10:58:28.3859868Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3859871Z 2025-12-04T10:58:28.3859944Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3860240Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3860244Z 2025-12-04T10:58:28.3860330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3860415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3860470Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3860647Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3860720Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3860757Z graph_break [] 2025-12-04T10:58:28.3860908Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3860954Z Traceback (most recent call last): 2025-12-04T10:58:28.3861107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3861162Z method(*args, **kwargs) 2025-12-04T10:58:28.3861313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3861353Z method(*args, **kwargs) 2025-12-04T10:58:28.3861503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3861539Z with policy(): 2025-12-04T10:58:28.3861690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3861730Z raise RuntimeError(msg) 2025-12-04T10:58:28.3862141Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3862144Z 2025-12-04T10:58:28.3862217Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3862511Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3862513Z 2025-12-04T10:58:28.3862599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3862672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3862726Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3862902Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3863001Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3863039Z graph_break [] 2025-12-04T10:58:28.3863112Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3863167Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3863237Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3863437Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3863473Z graph_break [] 2025-12-04T10:58:28.3863526Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3863678Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3863723Z Traceback (most recent call last): 2025-12-04T10:58:28.3863879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3863936Z method(*args, **kwargs) 2025-12-04T10:58:28.3864087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3864126Z method(*args, **kwargs) 2025-12-04T10:58:28.3864276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3864312Z with policy(): 2025-12-04T10:58:28.3864465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3864505Z raise RuntimeError(msg) 2025-12-04T10:58:28.3864916Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3864932Z 2025-12-04T10:58:28.3865006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3865297Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3865299Z 2025-12-04T10:58:28.3865386Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3865459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3865512Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3865690Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3865763Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3865800Z graph_break [] 2025-12-04T10:58:28.3865872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3865925Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3865997Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3866169Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3866205Z graph_break [] 2025-12-04T10:58:28.3866277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3866331Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3866402Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3866607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3866644Z graph_break [] 2025-12-04T10:58:28.3866889Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ae35ad306f00234.xml - 2025-12-04T10:58:28.3866948Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3867586Z FAILED [0.7287s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3867589Z 2025-12-04T10:58:28.3867661Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3867966Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3867968Z 2025-12-04T10:58:28.3868055Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3868116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3868182Z ================== 1 failed, 57 deselected, 2 rerun in 4.59s =================== 2025-12-04T10:58:28.3868218Z Got exit code 1 2025-12-04T10:58:28.3868462Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3868600Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3868799Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77f8c2e4c9f07d22.xml 2025-12-04T10:58:28.3868856Z ============================= test session starts ============================== 2025-12-04T10:58:28.3868966Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3869006Z cachedir: .pytest_cache 2025-12-04T10:58:28.3869164Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3869210Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3869250Z configfile: pytest.ini 2025-12-04T10:58:28.3869411Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3869486Z collecting ... collected 58 items / 32 deselected / 26 selected 2025-12-04T10:58:28.3869539Z stepcurrent: skipping 32 already run items. 2025-12-04T10:58:28.3869584Z Running 26 items in this shard 2025-12-04T10:58:28.3869585Z 2025-12-04T10:58:28.3869835Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5961s] [ 3%] 2025-12-04T10:58:28.3870081Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6180s] [ 3%] 2025-12-04T10:58:28.3870305Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.6539s] [ 3%] 2025-12-04T10:58:28.3870308Z 2025-12-04T10:58:28.3870381Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3870533Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3870577Z Traceback (most recent call last): 2025-12-04T10:58:28.3870734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3870773Z method(*args, **kwargs) 2025-12-04T10:58:28.3870924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3870963Z method(*args, **kwargs) 2025-12-04T10:58:28.3871114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3871150Z with policy(): 2025-12-04T10:58:28.3871305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3871346Z raise RuntimeError(msg) 2025-12-04T10:58:28.3871758Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3871760Z 2025-12-04T10:58:28.3871832Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3872122Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3872125Z 2025-12-04T10:58:28.3872211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3872297Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3872354Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3872530Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3872602Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3872638Z graph_break [] 2025-12-04T10:58:28.3872789Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3872833Z Traceback (most recent call last): 2025-12-04T10:58:28.3872987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3873026Z method(*args, **kwargs) 2025-12-04T10:58:28.3873178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3873217Z method(*args, **kwargs) 2025-12-04T10:58:28.3873398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3873434Z with policy(): 2025-12-04T10:58:28.3873586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3873626Z raise RuntimeError(msg) 2025-12-04T10:58:28.3874028Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3874031Z 2025-12-04T10:58:28.3874129Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3874422Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3874426Z 2025-12-04T10:58:28.3874512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3874584Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3874639Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3874814Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3874887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3874922Z graph_break [] 2025-12-04T10:58:28.3874997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3875052Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3875146Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3875320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3875357Z graph_break [] 2025-12-04T10:58:28.3875408Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3875559Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3875604Z Traceback (most recent call last): 2025-12-04T10:58:28.3875757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3875796Z method(*args, **kwargs) 2025-12-04T10:58:28.3875963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3876003Z method(*args, **kwargs) 2025-12-04T10:58:28.3876153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3876189Z with policy(): 2025-12-04T10:58:28.3876340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3876380Z raise RuntimeError(msg) 2025-12-04T10:58:28.3876783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3876786Z 2025-12-04T10:58:28.3876861Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3877150Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3877154Z 2025-12-04T10:58:28.3877240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3877312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3877369Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3877542Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3877614Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3877649Z graph_break [] 2025-12-04T10:58:28.3877745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3877799Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3877872Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3878045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3878081Z graph_break [] 2025-12-04T10:58:28.3878153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3878207Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3878278Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3878453Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3878490Z graph_break [] 2025-12-04T10:58:28.3878735Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-77f8c2e4c9f07d22.xml - 2025-12-04T10:58:28.3878805Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3879439Z FAILED [0.6539s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3879442Z 2025-12-04T10:58:28.3879515Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3879818Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3879821Z 2025-12-04T10:58:28.3879908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3879968Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3880034Z ================== 1 failed, 32 deselected, 2 rerun in 4.03s =================== 2025-12-04T10:58:28.3880071Z Got exit code 1 2025-12-04T10:58:28.3880111Z Retrying single test... 2025-12-04T10:58:28.3880308Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49e09e572afc16fd.xml 2025-12-04T10:58:28.3880365Z ============================= test session starts ============================== 2025-12-04T10:58:28.3880477Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3880518Z cachedir: .pytest_cache 2025-12-04T10:58:28.3880678Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3880724Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3880764Z configfile: pytest.ini 2025-12-04T10:58:28.3880924Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3880996Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3881281Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3881324Z Running 1 items in this shard 2025-12-04T10:58:28.3881327Z 2025-12-04T10:58:28.3881710Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:25.415151659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3881713Z 2025-12-04T10:58:28.3881867Z [W1204 10:44:25.695526710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3881869Z 2025-12-04T10:58:28.3882019Z [W1204 10:44:25.695677718 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882021Z 2025-12-04T10:58:28.3882171Z [W1204 10:44:25.699185414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882172Z 2025-12-04T10:58:28.3882323Z [W1204 10:44:25.699484901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882326Z 2025-12-04T10:58:28.3882486Z [W1204 10:44:25.699544751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882488Z 2025-12-04T10:58:28.3882636Z [W1204 10:44:25.701748749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882638Z 2025-12-04T10:58:28.3882785Z [W1204 10:44:25.702029437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882787Z 2025-12-04T10:58:28.3882935Z [W1204 10:44:25.702091486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3882937Z 2025-12-04T10:58:28.3882985Z ('RERUN', {'yellow': True}) [2.7773s] [100%] 2025-12-04T10:58:28.3883385Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:26.647563511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3883389Z 2025-12-04T10:58:28.3883539Z [W1204 10:44:26.647941927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3883541Z 2025-12-04T10:58:28.3883689Z [W1204 10:44:26.648015166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3883691Z 2025-12-04T10:58:28.3883838Z [W1204 10:44:26.649283844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3883840Z 2025-12-04T10:58:28.3883989Z [W1204 10:44:26.649549102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3883992Z 2025-12-04T10:58:28.3884141Z [W1204 10:44:26.649610371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3884144Z 2025-12-04T10:58:28.3884293Z [W1204 10:44:26.651618591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3884295Z 2025-12-04T10:58:28.3884442Z [W1204 10:44:26.651960018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3884444Z 2025-12-04T10:58:28.3884592Z [W1204 10:44:26.652028067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3884594Z 2025-12-04T10:58:28.3884641Z ('RERUN', {'yellow': True}) [0.4426s] [100%] 2025-12-04T10:58:28.3885029Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:26.088774968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885033Z 2025-12-04T10:58:28.3885181Z [W1204 10:44:26.089149144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885184Z 2025-12-04T10:58:28.3885332Z [W1204 10:44:26.089214293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885333Z 2025-12-04T10:58:28.3885481Z [W1204 10:44:26.090463201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885483Z 2025-12-04T10:58:28.3885632Z [W1204 10:44:26.090717459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885635Z 2025-12-04T10:58:28.3885783Z [W1204 10:44:26.090776348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885799Z 2025-12-04T10:58:28.3885947Z [W1204 10:44:26.092703909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3885949Z 2025-12-04T10:58:28.3886096Z [W1204 10:44:26.093044476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3886098Z 2025-12-04T10:58:28.3886247Z [W1204 10:44:26.093108735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3886248Z 2025-12-04T10:58:28.3886287Z FAILED [0.4287s] [100%] 2025-12-04T10:58:28.3886289Z 2025-12-04T10:58:28.3886342Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3886507Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3886554Z Traceback (most recent call last): 2025-12-04T10:58:28.3886711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3886751Z method(*args, **kwargs) 2025-12-04T10:58:28.3886903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3886943Z method(*args, **kwargs) 2025-12-04T10:58:28.3887093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3887130Z with policy(): 2025-12-04T10:58:28.3887282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3887325Z raise RuntimeError(msg) 2025-12-04T10:58:28.3887720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3887723Z 2025-12-04T10:58:28.3887797Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3888084Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3888088Z 2025-12-04T10:58:28.3888175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3888278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3888335Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3888515Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3888587Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3888623Z graph_break [] 2025-12-04T10:58:28.3888773Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3888818Z Traceback (most recent call last): 2025-12-04T10:58:28.3888970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3889010Z method(*args, **kwargs) 2025-12-04T10:58:28.3889160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3889202Z method(*args, **kwargs) 2025-12-04T10:58:28.3889351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3889399Z with policy(): 2025-12-04T10:58:28.3889550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3889590Z raise RuntimeError(msg) 2025-12-04T10:58:28.3889991Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3889994Z 2025-12-04T10:58:28.3890067Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3890369Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3890373Z 2025-12-04T10:58:28.3890460Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3890533Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3890588Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3890765Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3890837Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3890873Z graph_break [] 2025-12-04T10:58:28.3890945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3891002Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3891073Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3891249Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3891284Z graph_break [] 2025-12-04T10:58:28.3891336Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3891485Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3891530Z Traceback (most recent call last): 2025-12-04T10:58:28.3891684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3891724Z method(*args, **kwargs) 2025-12-04T10:58:28.3891897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3891939Z method(*args, **kwargs) 2025-12-04T10:58:28.3892087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3892125Z with policy(): 2025-12-04T10:58:28.3892276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3892316Z raise RuntimeError(msg) 2025-12-04T10:58:28.3892720Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3892723Z 2025-12-04T10:58:28.3892795Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3893089Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3893104Z 2025-12-04T10:58:28.3893190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3893287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3893343Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3893518Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3893589Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3893626Z graph_break [] 2025-12-04T10:58:28.3893697Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3893769Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3893840Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3894015Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3894051Z graph_break [] 2025-12-04T10:58:28.3894123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3894177Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3894248Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3894421Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3894457Z graph_break [] 2025-12-04T10:58:28.3894699Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49e09e572afc16fd.xml - 2025-12-04T10:58:28.3894761Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3895397Z FAILED [0.4287s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3895399Z 2025-12-04T10:58:28.3895472Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3895789Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3895793Z 2025-12-04T10:58:28.3895878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3895941Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3896006Z ================== 1 failed, 57 deselected, 2 rerun in 3.81s =================== 2025-12-04T10:58:28.3896043Z Got exit code 1 2025-12-04T10:58:28.3896083Z Retrying single test... 2025-12-04T10:58:28.3896279Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2355542669fbdb3.xml 2025-12-04T10:58:28.3896335Z ============================= test session starts ============================== 2025-12-04T10:58:28.3896445Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3896488Z cachedir: .pytest_cache 2025-12-04T10:58:28.3896647Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3896707Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3896747Z configfile: pytest.ini 2025-12-04T10:58:28.3896906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3896980Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3897268Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3897311Z Running 1 items in this shard 2025-12-04T10:58:28.3897314Z 2025-12-04T10:58:28.3897680Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:35.823930016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3897694Z 2025-12-04T10:58:28.3897847Z [W1204 10:44:35.089117472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3897849Z 2025-12-04T10:58:28.3898001Z [W1204 10:44:35.089293660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898003Z 2025-12-04T10:58:28.3898152Z [W1204 10:44:35.092791976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898154Z 2025-12-04T10:58:28.3898303Z [W1204 10:44:35.093107233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898308Z 2025-12-04T10:58:28.3898456Z [W1204 10:44:35.093171092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898459Z 2025-12-04T10:58:28.3898607Z [W1204 10:44:35.095305491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898609Z 2025-12-04T10:58:28.3898758Z [W1204 10:44:35.095577869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898760Z 2025-12-04T10:58:28.3898908Z [W1204 10:44:35.095638548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3898910Z 2025-12-04T10:58:28.3898959Z ('RERUN', {'yellow': True}) [2.7815s] [100%] 2025-12-04T10:58:28.3899345Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:36.038203441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3899350Z 2025-12-04T10:58:28.3899499Z [W1204 10:44:36.038587827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3899500Z 2025-12-04T10:58:28.3899649Z [W1204 10:44:36.038652996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3899650Z 2025-12-04T10:58:28.3899798Z [W1204 10:44:36.039903834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3899800Z 2025-12-04T10:58:28.3899947Z [W1204 10:44:36.040172231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3899950Z 2025-12-04T10:58:28.3900098Z [W1204 10:44:36.040235801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3900111Z 2025-12-04T10:58:28.3900259Z [W1204 10:44:36.042235051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3900261Z 2025-12-04T10:58:28.3900408Z [W1204 10:44:36.042582038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3900410Z 2025-12-04T10:58:28.3900556Z [W1204 10:44:36.042644517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3900558Z 2025-12-04T10:58:28.3900605Z ('RERUN', {'yellow': True}) [0.4374s] [100%] 2025-12-04T10:58:28.3900963Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:44:37.473261529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3900977Z 2025-12-04T10:58:28.3901127Z [W1204 10:44:37.473778084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901129Z 2025-12-04T10:58:28.3901277Z [W1204 10:44:37.473841844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901280Z 2025-12-04T10:58:28.3901429Z [W1204 10:44:37.475107151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901430Z 2025-12-04T10:58:28.3901580Z [W1204 10:44:37.475368059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901583Z 2025-12-04T10:58:28.3901732Z [W1204 10:44:37.475427678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901735Z 2025-12-04T10:58:28.3901883Z [W1204 10:44:37.477381009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3901885Z 2025-12-04T10:58:28.3902032Z [W1204 10:44:37.477719656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3902034Z 2025-12-04T10:58:28.3902183Z [W1204 10:44:37.477781055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3902185Z 2025-12-04T10:58:28.3902224Z FAILED [0.4302s] [100%] 2025-12-04T10:58:28.3902226Z 2025-12-04T10:58:28.3902277Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3902466Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3902511Z Traceback (most recent call last): 2025-12-04T10:58:28.3902669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3902709Z method(*args, **kwargs) 2025-12-04T10:58:28.3902861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3902900Z method(*args, **kwargs) 2025-12-04T10:58:28.3903049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3903086Z with policy(): 2025-12-04T10:58:28.3903238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3903381Z raise RuntimeError(msg) 2025-12-04T10:58:28.3903785Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3903806Z 2025-12-04T10:58:28.3903879Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3904169Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3904171Z 2025-12-04T10:58:28.3904257Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3904330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3904403Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3904578Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3904652Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3904688Z graph_break [] 2025-12-04T10:58:28.3904839Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3904884Z Traceback (most recent call last): 2025-12-04T10:58:28.3905037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3905076Z method(*args, **kwargs) 2025-12-04T10:58:28.3905227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3905266Z method(*args, **kwargs) 2025-12-04T10:58:28.3905417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3905454Z with policy(): 2025-12-04T10:58:28.3905606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3905646Z raise RuntimeError(msg) 2025-12-04T10:58:28.3906050Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3906052Z 2025-12-04T10:58:28.3906124Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3906441Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3906445Z 2025-12-04T10:58:28.3906532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3906605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3906661Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3906835Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3906908Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3906944Z graph_break [] 2025-12-04T10:58:28.3907017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3907070Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3907145Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3907318Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3907367Z graph_break [] 2025-12-04T10:58:28.3907418Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3907568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.3907612Z Traceback (most recent call last): 2025-12-04T10:58:28.3907765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3907805Z method(*args, **kwargs) 2025-12-04T10:58:28.3907957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3908007Z method(*args, **kwargs) 2025-12-04T10:58:28.3908157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3908194Z with policy(): 2025-12-04T10:58:28.3908346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3908386Z raise RuntimeError(msg) 2025-12-04T10:58:28.3908789Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3908791Z 2025-12-04T10:58:28.3908863Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3909153Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3909156Z 2025-12-04T10:58:28.3909243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3909315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3909371Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3909545Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3909617Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3909652Z graph_break [] 2025-12-04T10:58:28.3909725Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3909778Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3909874Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3910049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3910087Z graph_break [] 2025-12-04T10:58:28.3910159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3910212Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3910283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3910457Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3910492Z graph_break [] 2025-12-04T10:58:28.3910738Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d2355542669fbdb3.xml - 2025-12-04T10:58:28.3910798Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3911445Z FAILED [0.4302s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3911447Z 2025-12-04T10:58:28.3911520Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3911811Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3911824Z 2025-12-04T10:58:28.3911911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3911973Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3912039Z ================== 1 failed, 57 deselected, 2 rerun in 3.82s =================== 2025-12-04T10:58:28.3912075Z Got exit code 1 2025-12-04T10:58:28.3912315Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.3912441Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3912639Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-13e0eb09aac22077.xml 2025-12-04T10:58:28.3912698Z ============================= test session starts ============================== 2025-12-04T10:58:28.3912808Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3912850Z cachedir: .pytest_cache 2025-12-04T10:58:28.3913009Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3913054Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3913095Z configfile: pytest.ini 2025-12-04T10:58:28.3913279Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3913354Z collecting ... collected 58 items / 33 deselected / 25 selected 2025-12-04T10:58:28.3913406Z stepcurrent: skipping 33 already run items. 2025-12-04T10:58:28.3913449Z Running 25 items in this shard 2025-12-04T10:58:28.3913451Z 2025-12-04T10:58:28.3913732Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8942s] [ 4%] 2025-12-04T10:58:28.3913978Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4643s] [ 4%] 2025-12-04T10:58:28.3914200Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4607s] [ 4%] 2025-12-04T10:58:28.3914202Z 2025-12-04T10:58:28.3914253Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3914403Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3914447Z Traceback (most recent call last): 2025-12-04T10:58:28.3914608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3914665Z method(*args, **kwargs) 2025-12-04T10:58:28.3914817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3914856Z method(*args, **kwargs) 2025-12-04T10:58:28.3915007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3915042Z with policy(): 2025-12-04T10:58:28.3915195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3915235Z raise RuntimeError(msg) 2025-12-04T10:58:28.3915631Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3915653Z 2025-12-04T10:58:28.3915727Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3916015Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3916017Z 2025-12-04T10:58:28.3916103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3916175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3916230Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3916505Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3916579Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3916615Z graph_break [] 2025-12-04T10:58:28.3916768Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3916812Z Traceback (most recent call last): 2025-12-04T10:58:28.3916966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3917005Z method(*args, **kwargs) 2025-12-04T10:58:28.3917155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3917194Z method(*args, **kwargs) 2025-12-04T10:58:28.3917345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3917404Z with policy(): 2025-12-04T10:58:28.3917558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3917598Z raise RuntimeError(msg) 2025-12-04T10:58:28.3918001Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3918003Z 2025-12-04T10:58:28.3918076Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3918367Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3918370Z 2025-12-04T10:58:28.3918458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3918545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3918600Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3918873Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3918947Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3918982Z graph_break [] 2025-12-04T10:58:28.3919055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3919108Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3919179Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3919461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3919499Z graph_break [] 2025-12-04T10:58:28.3919550Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3919701Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3919746Z Traceback (most recent call last): 2025-12-04T10:58:28.3919898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3919938Z method(*args, **kwargs) 2025-12-04T10:58:28.3920087Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3920129Z method(*args, **kwargs) 2025-12-04T10:58:28.3920278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3920316Z with policy(): 2025-12-04T10:58:28.3920466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3920507Z raise RuntimeError(msg) 2025-12-04T10:58:28.3920909Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3920911Z 2025-12-04T10:58:28.3920985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3921299Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3921303Z 2025-12-04T10:58:28.3921390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3921462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3921517Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3921788Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3921861Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3921897Z graph_break [] 2025-12-04T10:58:28.3921969Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3922026Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3922097Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3922378Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3922414Z graph_break [] 2025-12-04T10:58:28.3922486Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3922539Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3922610Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3922879Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3922928Z graph_break [] 2025-12-04T10:58:28.3923172Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-13e0eb09aac22077.xml - 2025-12-04T10:58:28.3923232Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3923903Z FAILED [0.4607s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3923906Z 2025-12-04T10:58:28.3923980Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3924268Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3924271Z 2025-12-04T10:58:28.3924356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3924418Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3924483Z ================== 1 failed, 33 deselected, 2 rerun in 3.98s =================== 2025-12-04T10:58:28.3924520Z Got exit code 1 2025-12-04T10:58:28.3924559Z Retrying single test... 2025-12-04T10:58:28.3924758Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0676ebac742d50ba.xml 2025-12-04T10:58:28.3924839Z ============================= test session starts ============================== 2025-12-04T10:58:28.3924950Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3924991Z cachedir: .pytest_cache 2025-12-04T10:58:28.3925149Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3925194Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3925234Z configfile: pytest.ini 2025-12-04T10:58:28.3925393Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3925466Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3925750Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3925797Z Running 1 items in this shard 2025-12-04T10:58:28.3925799Z 2025-12-04T10:58:28.3926164Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:44:57.321846788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926184Z 2025-12-04T10:58:28.3926337Z [W1204 10:44:57.582299094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926339Z 2025-12-04T10:58:28.3926489Z [W1204 10:44:57.582459102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926491Z 2025-12-04T10:58:28.3926638Z [W1204 10:44:57.586320464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926654Z 2025-12-04T10:58:28.3926805Z [W1204 10:44:57.586634041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926808Z 2025-12-04T10:58:28.3926955Z [W1204 10:44:57.586694260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3926958Z 2025-12-04T10:58:28.3927105Z [W1204 10:44:57.588914668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3927107Z 2025-12-04T10:58:28.3927254Z [W1204 10:44:57.589213195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3927256Z 2025-12-04T10:58:28.3927402Z [W1204 10:44:57.589277044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3927404Z 2025-12-04T10:58:28.3927455Z ('RERUN', {'yellow': True}) [3.2849s] [100%] 2025-12-04T10:58:28.3927814Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:44:58.420322565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3927817Z 2025-12-04T10:58:28.3927966Z [W1204 10:44:58.420711922 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3927968Z 2025-12-04T10:58:28.3928116Z [W1204 10:44:58.420789311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928118Z 2025-12-04T10:58:28.3928265Z [W1204 10:44:58.422067988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928268Z 2025-12-04T10:58:28.3928437Z [W1204 10:44:58.422340135 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928440Z 2025-12-04T10:58:28.3928588Z [W1204 10:44:58.422399855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928590Z 2025-12-04T10:58:28.3928738Z [W1204 10:44:58.424494854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928739Z 2025-12-04T10:58:28.3928885Z [W1204 10:44:58.424758891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3928888Z 2025-12-04T10:58:28.3929034Z [W1204 10:44:58.424818271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3929036Z 2025-12-04T10:58:28.3929085Z ('RERUN', {'yellow': True}) [0.6795s] [100%] 2025-12-04T10:58:28.3929442Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:44:58.112851242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3929461Z 2025-12-04T10:58:28.3929610Z [W1204 10:44:58.113253218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3929612Z 2025-12-04T10:58:28.3929759Z [W1204 10:44:58.113339407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3929761Z 2025-12-04T10:58:28.3929909Z [W1204 10:44:58.114639834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3929911Z 2025-12-04T10:58:28.3930071Z [W1204 10:44:58.114908072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3930074Z 2025-12-04T10:58:28.3930221Z [W1204 10:44:58.114969231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3930223Z 2025-12-04T10:58:28.3930373Z [W1204 10:44:58.117074930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3930374Z 2025-12-04T10:58:28.3930522Z [W1204 10:44:58.117350087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3930523Z 2025-12-04T10:58:28.3930671Z [W1204 10:44:58.117412077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3930673Z 2025-12-04T10:58:28.3930711Z FAILED [0.6761s] [100%] 2025-12-04T10:58:28.3930715Z 2025-12-04T10:58:28.3930767Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3930920Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3930965Z Traceback (most recent call last): 2025-12-04T10:58:28.3931122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3931162Z method(*args, **kwargs) 2025-12-04T10:58:28.3931315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3931354Z method(*args, **kwargs) 2025-12-04T10:58:28.3931505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3931541Z with policy(): 2025-12-04T10:58:28.3931713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3931755Z raise RuntimeError(msg) 2025-12-04T10:58:28.3932152Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3932155Z 2025-12-04T10:58:28.3932228Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3932519Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3932521Z 2025-12-04T10:58:28.3932608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3932684Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3932739Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3933025Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3933099Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3933135Z graph_break [] 2025-12-04T10:58:28.3933329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3933374Z Traceback (most recent call last): 2025-12-04T10:58:28.3933527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3933581Z method(*args, **kwargs) 2025-12-04T10:58:28.3933733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3933773Z method(*args, **kwargs) 2025-12-04T10:58:28.3933924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3933960Z with policy(): 2025-12-04T10:58:28.3934111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3934151Z raise RuntimeError(msg) 2025-12-04T10:58:28.3934554Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3934557Z 2025-12-04T10:58:28.3934631Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3934923Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3934927Z 2025-12-04T10:58:28.3935013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3935087Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3935142Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3935414Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3935515Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3935552Z graph_break [] 2025-12-04T10:58:28.3935625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3935680Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3935751Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3936022Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3936058Z graph_break [] 2025-12-04T10:58:28.3936110Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3936261Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3936305Z Traceback (most recent call last): 2025-12-04T10:58:28.3936460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3936515Z method(*args, **kwargs) 2025-12-04T10:58:28.3936665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3936704Z method(*args, **kwargs) 2025-12-04T10:58:28.3936854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3936890Z with policy(): 2025-12-04T10:58:28.3937045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3937085Z raise RuntimeError(msg) 2025-12-04T10:58:28.3937493Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3937508Z 2025-12-04T10:58:28.3937580Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3937869Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3937871Z 2025-12-04T10:58:28.3937958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3938030Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3938085Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3938358Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3938432Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3938467Z graph_break [] 2025-12-04T10:58:28.3938539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3938593Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3938665Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3938935Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3938971Z graph_break [] 2025-12-04T10:58:28.3939043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3939122Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3939193Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3939465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3939500Z graph_break [] 2025-12-04T10:58:28.3939745Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0676ebac742d50ba.xml - 2025-12-04T10:58:28.3939804Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3940440Z FAILED [0.6761s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3940455Z 2025-12-04T10:58:28.3940528Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3940814Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3940816Z 2025-12-04T10:58:28.3940902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3940963Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3941029Z ================== 1 failed, 57 deselected, 2 rerun in 4.80s =================== 2025-12-04T10:58:28.3941079Z Got exit code 1 2025-12-04T10:58:28.3941120Z Retrying single test... 2025-12-04T10:58:28.3942626Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-09bb8902ce1fc14f.xml 2025-12-04T10:58:28.3942686Z ============================= test session starts ============================== 2025-12-04T10:58:28.3942797Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3942838Z cachedir: .pytest_cache 2025-12-04T10:58:28.3942997Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3943043Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3943083Z configfile: pytest.ini 2025-12-04T10:58:28.3943245Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3943360Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3943644Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3943691Z Running 1 items in this shard 2025-12-04T10:58:28.3943693Z 2025-12-04T10:58:28.3944054Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:45:08.189075044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944056Z 2025-12-04T10:58:28.3944209Z [W1204 10:45:09.470260528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944212Z 2025-12-04T10:58:28.3944407Z [W1204 10:45:09.470429466 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944410Z 2025-12-04T10:58:28.3944560Z [W1204 10:45:09.474646754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944562Z 2025-12-04T10:58:28.3944711Z [W1204 10:45:09.474969950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944713Z 2025-12-04T10:58:28.3944861Z [W1204 10:45:09.475039880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3944863Z 2025-12-04T10:58:28.3945011Z [W1204 10:45:09.477275527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3945013Z 2025-12-04T10:58:28.3945164Z [W1204 10:45:09.477556364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3945183Z 2025-12-04T10:58:28.3945331Z [W1204 10:45:09.477617834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3945333Z 2025-12-04T10:58:28.3945382Z ('RERUN', {'yellow': True}) [3.2594s] [100%] 2025-12-04T10:58:28.3945742Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:45:10.273692920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3945744Z 2025-12-04T10:58:28.3945894Z [W1204 10:45:10.274105146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3945895Z 2025-12-04T10:58:28.3946058Z [W1204 10:45:10.274176485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946061Z 2025-12-04T10:58:28.3946209Z [W1204 10:45:10.275458542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946211Z 2025-12-04T10:58:28.3946359Z [W1204 10:45:10.275716569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946361Z 2025-12-04T10:58:28.3946509Z [W1204 10:45:10.275775839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946510Z 2025-12-04T10:58:28.3946658Z [W1204 10:45:10.277797699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946660Z 2025-12-04T10:58:28.3946809Z [W1204 10:45:10.278066966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946812Z 2025-12-04T10:58:28.3946961Z [W1204 10:45:10.278129205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3946963Z 2025-12-04T10:58:28.3947011Z ('RERUN', {'yellow': True}) [0.6540s] [100%] 2025-12-04T10:58:28.3947367Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:45:10.912297594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3947370Z 2025-12-04T10:58:28.3947518Z [W1204 10:45:10.912686670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3947521Z 2025-12-04T10:58:28.3947691Z [W1204 10:45:10.912761630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3947694Z 2025-12-04T10:58:28.3947844Z [W1204 10:45:10.914066436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3947846Z 2025-12-04T10:58:28.3947993Z [W1204 10:45:10.914331434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3947995Z 2025-12-04T10:58:28.3948143Z [W1204 10:45:10.914391733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3948145Z 2025-12-04T10:58:28.3948291Z [W1204 10:45:10.916427483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3948293Z 2025-12-04T10:58:28.3948443Z [W1204 10:45:10.916696180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3948446Z 2025-12-04T10:58:28.3948594Z [W1204 10:45:10.916756389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3948608Z 2025-12-04T10:58:28.3948646Z FAILED [0.6416s] [100%] 2025-12-04T10:58:28.3948649Z 2025-12-04T10:58:28.3948701Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3948852Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3948898Z Traceback (most recent call last): 2025-12-04T10:58:28.3949055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3949096Z method(*args, **kwargs) 2025-12-04T10:58:28.3949249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3949302Z method(*args, **kwargs) 2025-12-04T10:58:28.3949451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3949489Z with policy(): 2025-12-04T10:58:28.3949640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3949682Z raise RuntimeError(msg) 2025-12-04T10:58:28.3950080Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.3950082Z 2025-12-04T10:58:28.3950156Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3950451Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3950454Z 2025-12-04T10:58:28.3950541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3950615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3950671Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3950947Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3951019Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3951056Z graph_break [] 2025-12-04T10:58:28.3951230Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3951277Z Traceback (most recent call last): 2025-12-04T10:58:28.3951431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3951471Z method(*args, **kwargs) 2025-12-04T10:58:28.3951620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3951659Z method(*args, **kwargs) 2025-12-04T10:58:28.3951807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3951844Z with policy(): 2025-12-04T10:58:28.3951994Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3952036Z raise RuntimeError(msg) 2025-12-04T10:58:28.3952443Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.3952457Z 2025-12-04T10:58:28.3952531Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3952823Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3952825Z 2025-12-04T10:58:28.3952911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3952985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3953053Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3953356Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3953430Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3953467Z graph_break [] 2025-12-04T10:58:28.3953539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3953594Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3953665Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3953936Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3953975Z graph_break [] 2025-12-04T10:58:28.3954027Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3954179Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.3954224Z Traceback (most recent call last): 2025-12-04T10:58:28.3954377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3954417Z method(*args, **kwargs) 2025-12-04T10:58:28.3954566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3954606Z method(*args, **kwargs) 2025-12-04T10:58:28.3954754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3954791Z with policy(): 2025-12-04T10:58:28.3954971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3955014Z raise RuntimeError(msg) 2025-12-04T10:58:28.3955418Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3955421Z 2025-12-04T10:58:28.3955493Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3955787Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3955789Z 2025-12-04T10:58:28.3955876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3955950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3956018Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3956290Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3956362Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3956398Z graph_break [] 2025-12-04T10:58:28.3956470Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3956524Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3956595Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3956864Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3956916Z graph_break [] 2025-12-04T10:58:28.3956988Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3957043Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3957113Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3957381Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3957416Z graph_break [] 2025-12-04T10:58:28.3957660Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-09bb8902ce1fc14f.xml - 2025-12-04T10:58:28.3957721Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3958360Z FAILED [0.6416s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.3958363Z 2025-12-04T10:58:28.3958434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3958751Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3958754Z 2025-12-04T10:58:28.3958841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3958903Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3958970Z ================== 1 failed, 57 deselected, 2 rerun in 4.72s =================== 2025-12-04T10:58:28.3959006Z Got exit code 1 2025-12-04T10:58:28.3959244Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.3959371Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.3959570Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab061dad8ee042c9.xml 2025-12-04T10:58:28.3959630Z ============================= test session starts ============================== 2025-12-04T10:58:28.3959741Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3959794Z cachedir: .pytest_cache 2025-12-04T10:58:28.3959952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3959998Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3960038Z configfile: pytest.ini 2025-12-04T10:58:28.3960199Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3960273Z collecting ... collected 58 items / 34 deselected / 24 selected 2025-12-04T10:58:28.3960325Z stepcurrent: skipping 34 already run items. 2025-12-04T10:58:28.3960369Z Running 24 items in this shard 2025-12-04T10:58:28.3960372Z 2025-12-04T10:58:28.3960624Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4778s] [ 4%] 2025-12-04T10:58:28.3960883Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4866s] [ 4%] 2025-12-04T10:58:28.3961106Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4808s] [ 4%] 2025-12-04T10:58:28.3961108Z 2025-12-04T10:58:28.3961159Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3961310Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3961355Z Traceback (most recent call last): 2025-12-04T10:58:28.3961513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3961553Z method(*args, **kwargs) 2025-12-04T10:58:28.3961706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3961744Z method(*args, **kwargs) 2025-12-04T10:58:28.3961894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3961930Z with policy(): 2025-12-04T10:58:28.3962082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3962122Z raise RuntimeError(msg) 2025-12-04T10:58:28.3962545Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3962548Z 2025-12-04T10:58:28.3962622Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3962913Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3962915Z 2025-12-04T10:58:28.3963001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3963073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3963129Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3963355Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3963431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3963466Z graph_break [] 2025-12-04T10:58:28.3963630Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3963674Z Traceback (most recent call last): 2025-12-04T10:58:28.3963826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3963866Z method(*args, **kwargs) 2025-12-04T10:58:28.3964015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3964054Z method(*args, **kwargs) 2025-12-04T10:58:28.3964203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3964239Z with policy(): 2025-12-04T10:58:28.3964408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3964447Z raise RuntimeError(msg) 2025-12-04T10:58:28.3964861Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3964864Z 2025-12-04T10:58:28.3964935Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3965226Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3965228Z 2025-12-04T10:58:28.3965316Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3965390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3965446Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3965621Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3965693Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3965728Z graph_break [] 2025-12-04T10:58:28.3965801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3965854Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3965925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3966101Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3966166Z graph_break [] 2025-12-04T10:58:28.3966218Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3966368Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3966412Z Traceback (most recent call last): 2025-12-04T10:58:28.3966566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3966605Z method(*args, **kwargs) 2025-12-04T10:58:28.3966757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3966796Z method(*args, **kwargs) 2025-12-04T10:58:28.3966946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3966982Z with policy(): 2025-12-04T10:58:28.3967135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3967188Z raise RuntimeError(msg) 2025-12-04T10:58:28.3967593Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3967595Z 2025-12-04T10:58:28.3967667Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3967959Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3967961Z 2025-12-04T10:58:28.3968061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3968134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3968191Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3968365Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3968436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3968472Z graph_break [] 2025-12-04T10:58:28.3968544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3968597Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3968669Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3968846Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3968883Z graph_break [] 2025-12-04T10:58:28.3968955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3969010Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3969081Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3969255Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3969291Z graph_break [] 2025-12-04T10:58:28.3969536Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ab061dad8ee042c9.xml - 2025-12-04T10:58:28.3969594Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3970256Z FAILED [0.4808s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3970260Z 2025-12-04T10:58:28.3970334Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3970622Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3970625Z 2025-12-04T10:58:28.3970711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3970774Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3970840Z ================== 1 failed, 34 deselected, 2 rerun in 3.61s =================== 2025-12-04T10:58:28.3970889Z Got exit code 1 2025-12-04T10:58:28.3970929Z Retrying single test... 2025-12-04T10:58:28.3971127Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-61a43a1e59f2b921.xml 2025-12-04T10:58:28.3971185Z ============================= test session starts ============================== 2025-12-04T10:58:28.3971293Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3971335Z cachedir: .pytest_cache 2025-12-04T10:58:28.3971491Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3971538Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3971594Z configfile: pytest.ini 2025-12-04T10:58:28.3971756Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3971829Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3972115Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3972159Z Running 1 items in this shard 2025-12-04T10:58:28.3972161Z 2025-12-04T10:58:28.3972524Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:29.254417513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3972526Z 2025-12-04T10:58:28.3972682Z [W1204 10:45:30.525917230 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3972684Z 2025-12-04T10:58:28.3972836Z [W1204 10:45:30.526056569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3972838Z 2025-12-04T10:58:28.3972988Z [W1204 10:45:30.529822510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3972990Z 2025-12-04T10:58:28.3973139Z [W1204 10:45:30.530128557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3973142Z 2025-12-04T10:58:28.3973328Z [W1204 10:45:30.530192016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3973330Z 2025-12-04T10:58:28.3973508Z [W1204 10:45:30.532332544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3973511Z 2025-12-04T10:58:28.3973660Z [W1204 10:45:30.532605212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3973661Z 2025-12-04T10:58:28.3973809Z [W1204 10:45:30.532665411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3973811Z 2025-12-04T10:58:28.3973859Z ('RERUN', {'yellow': True}) [2.7945s] [100%] 2025-12-04T10:58:28.3974217Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:31.488828014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974220Z 2025-12-04T10:58:28.3974370Z [W1204 10:45:31.489210270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974373Z 2025-12-04T10:58:28.3974535Z [W1204 10:45:31.489286549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974537Z 2025-12-04T10:58:28.3974685Z [W1204 10:45:31.490535437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974687Z 2025-12-04T10:58:28.3974834Z [W1204 10:45:31.490796204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974836Z 2025-12-04T10:58:28.3974984Z [W1204 10:45:31.490856673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3974986Z 2025-12-04T10:58:28.3975135Z [W1204 10:45:31.492761894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3975151Z 2025-12-04T10:58:28.3975299Z [W1204 10:45:31.493154580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3975302Z 2025-12-04T10:58:28.3975451Z [W1204 10:45:31.493220569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3975453Z 2025-12-04T10:58:28.3975500Z ('RERUN', {'yellow': True}) [0.4532s] [100%] 2025-12-04T10:58:28.3975858Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:31.936040764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3975860Z 2025-12-04T10:58:28.3976010Z [W1204 10:45:31.936420410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976012Z 2025-12-04T10:58:28.3976161Z [W1204 10:45:31.936489519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976163Z 2025-12-04T10:58:28.3976311Z [W1204 10:45:31.937737726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976313Z 2025-12-04T10:58:28.3976461Z [W1204 10:45:31.937994393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976463Z 2025-12-04T10:58:28.3976611Z [W1204 10:45:31.938059443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976613Z 2025-12-04T10:58:28.3976783Z [W1204 10:45:31.939953503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976786Z 2025-12-04T10:58:28.3976935Z [W1204 10:45:31.940293890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3976937Z 2025-12-04T10:58:28.3977084Z [W1204 10:45:31.940358179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3977088Z 2025-12-04T10:58:28.3977126Z FAILED [0.4432s] [100%] 2025-12-04T10:58:28.3977127Z 2025-12-04T10:58:28.3977180Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3977330Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3977376Z Traceback (most recent call last): 2025-12-04T10:58:28.3977532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3977575Z method(*args, **kwargs) 2025-12-04T10:58:28.3977727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3977779Z method(*args, **kwargs) 2025-12-04T10:58:28.3977929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3977967Z with policy(): 2025-12-04T10:58:28.3978118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3978159Z raise RuntimeError(msg) 2025-12-04T10:58:28.3978558Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3978572Z 2025-12-04T10:58:28.3978646Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3978937Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3978940Z 2025-12-04T10:58:28.3979025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3979099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3979154Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3979330Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3979403Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3979441Z graph_break [] 2025-12-04T10:58:28.3979592Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3979638Z Traceback (most recent call last): 2025-12-04T10:58:28.3979791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3979831Z method(*args, **kwargs) 2025-12-04T10:58:28.3979980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3980020Z method(*args, **kwargs) 2025-12-04T10:58:28.3980167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3980204Z with policy(): 2025-12-04T10:58:28.3980378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3980421Z raise RuntimeError(msg) 2025-12-04T10:58:28.3980826Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3980829Z 2025-12-04T10:58:28.3980902Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3981190Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3981193Z 2025-12-04T10:58:28.3981279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3981356Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3981411Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3981598Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3981671Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3981708Z graph_break [] 2025-12-04T10:58:28.3981780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3981835Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3981906Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3982081Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3982129Z graph_break [] 2025-12-04T10:58:28.3982183Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3982335Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3982380Z Traceback (most recent call last): 2025-12-04T10:58:28.3982533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3982573Z method(*args, **kwargs) 2025-12-04T10:58:28.3982723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3982763Z method(*args, **kwargs) 2025-12-04T10:58:28.3982912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3982949Z with policy(): 2025-12-04T10:58:28.3983101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3983142Z raise RuntimeError(msg) 2025-12-04T10:58:28.3983567Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3983569Z 2025-12-04T10:58:28.3983643Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3983934Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3983936Z 2025-12-04T10:58:28.3984022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3984126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3984183Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3984357Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3984429Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3984465Z graph_break [] 2025-12-04T10:58:28.3984537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3984591Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3984661Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3984837Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3984874Z graph_break [] 2025-12-04T10:58:28.3984947Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3985021Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3985093Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3985267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3985302Z graph_break [] 2025-12-04T10:58:28.3985545Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-61a43a1e59f2b921.xml - 2025-12-04T10:58:28.3985604Z =========================== short test summary info ============================ 2025-12-04T10:58:28.3986245Z FAILED [0.4432s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3986263Z 2025-12-04T10:58:28.3986335Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3986625Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3986627Z 2025-12-04T10:58:28.3986712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3986776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.3986841Z ================== 1 failed, 57 deselected, 2 rerun in 3.86s =================== 2025-12-04T10:58:28.3986879Z Got exit code 1 2025-12-04T10:58:28.3986919Z Retrying single test... 2025-12-04T10:58:28.3987117Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2019c2a0fa926bab.xml 2025-12-04T10:58:28.3987173Z ============================= test session starts ============================== 2025-12-04T10:58:28.3987283Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.3987323Z cachedir: .pytest_cache 2025-12-04T10:58:28.3987481Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.3987526Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.3987567Z configfile: pytest.ini 2025-12-04T10:58:28.3987749Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.3987824Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.3988112Z stepcurrent: skipping 34 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3988156Z Running 1 items in this shard 2025-12-04T10:58:28.3988158Z 2025-12-04T10:58:28.3988523Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:40.567599341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3988525Z 2025-12-04T10:58:28.3988679Z [W1204 10:45:40.838912670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3988682Z 2025-12-04T10:58:28.3988834Z [W1204 10:45:40.839056249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3988847Z 2025-12-04T10:58:28.3988996Z [W1204 10:45:40.842419474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3988998Z 2025-12-04T10:58:28.3989147Z [W1204 10:45:40.842727661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3989148Z 2025-12-04T10:58:28.3989297Z [W1204 10:45:40.842788160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3989299Z 2025-12-04T10:58:28.3989447Z [W1204 10:45:40.845222785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3989461Z 2025-12-04T10:58:28.3989610Z [W1204 10:45:40.845503042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3989613Z 2025-12-04T10:58:28.3989760Z [W1204 10:45:40.845563512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3989762Z 2025-12-04T10:58:28.3989810Z ('RERUN', {'yellow': True}) [2.9050s] [100%] 2025-12-04T10:58:28.3990168Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:41.006196118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990171Z 2025-12-04T10:58:28.3990320Z [W1204 10:45:41.006580574 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990322Z 2025-12-04T10:58:28.3990473Z [W1204 10:45:41.006645073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990476Z 2025-12-04T10:58:28.3990625Z [W1204 10:45:41.007924060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990626Z 2025-12-04T10:58:28.3990775Z [W1204 10:45:41.008196267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990777Z 2025-12-04T10:58:28.3990924Z [W1204 10:45:41.008260186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3990926Z 2025-12-04T10:58:28.3991074Z [W1204 10:45:41.010306395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3991100Z 2025-12-04T10:58:28.3991250Z [W1204 10:45:41.010648012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3991253Z 2025-12-04T10:58:28.3991401Z [W1204 10:45:41.010711161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3991403Z 2025-12-04T10:58:28.3991451Z ('RERUN', {'yellow': True}) [0.6562s] [100%] 2025-12-04T10:58:28.3991807Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:45:42.666293720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3991809Z 2025-12-04T10:58:28.3991958Z [W1204 10:45:42.666690766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3991962Z 2025-12-04T10:58:28.3992110Z [W1204 10:45:42.666762855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992124Z 2025-12-04T10:58:28.3992272Z [W1204 10:45:42.668054552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992274Z 2025-12-04T10:58:28.3992423Z [W1204 10:45:42.668317199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992425Z 2025-12-04T10:58:28.3992573Z [W1204 10:45:42.668376449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992575Z 2025-12-04T10:58:28.3992725Z [W1204 10:45:42.670451327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992738Z 2025-12-04T10:58:28.3992888Z [W1204 10:45:42.670790894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3992891Z 2025-12-04T10:58:28.3993039Z [W1204 10:45:42.670852733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.3993041Z 2025-12-04T10:58:28.3993079Z FAILED [0.6561s] [100%] 2025-12-04T10:58:28.3993081Z 2025-12-04T10:58:28.3993132Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.3993315Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3993360Z Traceback (most recent call last): 2025-12-04T10:58:28.3993517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3993557Z method(*args, **kwargs) 2025-12-04T10:58:28.3993712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3993751Z method(*args, **kwargs) 2025-12-04T10:58:28.3993902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3993938Z with policy(): 2025-12-04T10:58:28.3994089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3994129Z raise RuntimeError(msg) 2025-12-04T10:58:28.3994527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.3994530Z 2025-12-04T10:58:28.3994631Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3994925Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3994929Z 2025-12-04T10:58:28.3995016Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3995088Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3995144Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3995320Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3995394Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3995430Z graph_break [] 2025-12-04T10:58:28.3995582Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3995643Z Traceback (most recent call last): 2025-12-04T10:58:28.3995794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3995833Z method(*args, **kwargs) 2025-12-04T10:58:28.3995983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3996021Z method(*args, **kwargs) 2025-12-04T10:58:28.3996170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3996206Z with policy(): 2025-12-04T10:58:28.3996357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3996414Z raise RuntimeError(msg) 2025-12-04T10:58:28.3996818Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.3996821Z 2025-12-04T10:58:28.3996893Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3997185Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3997187Z 2025-12-04T10:58:28.3997274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.3997347Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3997405Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3997580Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3997654Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3997689Z graph_break [] 2025-12-04T10:58:28.3997763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.3997816Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.3997887Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.3998061Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.3998097Z graph_break [] 2025-12-04T10:58:28.3998187Z =================================== FAILURES =================================== 2025-12-04T10:58:28.3998338Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.3998383Z Traceback (most recent call last): 2025-12-04T10:58:28.3998537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3998575Z method(*args, **kwargs) 2025-12-04T10:58:28.3998725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.3998764Z method(*args, **kwargs) 2025-12-04T10:58:28.3998913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.3998949Z with policy(): 2025-12-04T10:58:28.3999103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.3999144Z raise RuntimeError(msg) 2025-12-04T10:58:28.3999552Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.3999567Z 2025-12-04T10:58:28.3999640Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.3999929Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.3999931Z 2025-12-04T10:58:28.4000018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4000103Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4000158Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4000333Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4000405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4000441Z graph_break [] 2025-12-04T10:58:28.4000513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4000567Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4000638Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4000811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4000847Z graph_break [] 2025-12-04T10:58:28.4000922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4000977Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4001049Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4001224Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4001260Z graph_break [] 2025-12-04T10:58:28.4001503Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2019c2a0fa926bab.xml - 2025-12-04T10:58:28.4001562Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4002224Z FAILED [0.6561s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4002228Z 2025-12-04T10:58:28.4002301Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4002589Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4002591Z 2025-12-04T10:58:28.4002677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4002738Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4002806Z ================== 1 failed, 57 deselected, 2 rerun in 4.38s =================== 2025-12-04T10:58:28.4002842Z Got exit code 1 2025-12-04T10:58:28.4003095Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4003222Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4003458Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6d7e650e3d56ff05.xml 2025-12-04T10:58:28.4003514Z ============================= test session starts ============================== 2025-12-04T10:58:28.4003625Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4003666Z cachedir: .pytest_cache 2025-12-04T10:58:28.4003826Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4003886Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4003928Z configfile: pytest.ini 2025-12-04T10:58:28.4004087Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4004161Z collecting ... collected 58 items / 35 deselected / 23 selected 2025-12-04T10:58:28.4004213Z stepcurrent: skipping 35 already run items. 2025-12-04T10:58:28.4004256Z Running 23 items in this shard 2025-12-04T10:58:28.4004259Z 2025-12-04T10:58:28.4004508Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4369s] [ 4%] 2025-12-04T10:58:28.4004755Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4277s] [ 4%] 2025-12-04T10:58:28.4004978Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4312s] [ 4%] 2025-12-04T10:58:28.4004982Z 2025-12-04T10:58:28.4005032Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4005181Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4005226Z Traceback (most recent call last): 2025-12-04T10:58:28.4005382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4005421Z method(*args, **kwargs) 2025-12-04T10:58:28.4005573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4005613Z method(*args, **kwargs) 2025-12-04T10:58:28.4005793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4005831Z with policy(): 2025-12-04T10:58:28.4005984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4006024Z raise RuntimeError(msg) 2025-12-04T10:58:28.4006415Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4006417Z 2025-12-04T10:58:28.4006490Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4006780Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4006797Z 2025-12-04T10:58:28.4006884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4006956Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4007012Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4007186Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4007258Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4007294Z graph_break [] 2025-12-04T10:58:28.4007443Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4007500Z Traceback (most recent call last): 2025-12-04T10:58:28.4007654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4007694Z method(*args, **kwargs) 2025-12-04T10:58:28.4007844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4007883Z method(*args, **kwargs) 2025-12-04T10:58:28.4008035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4008071Z with policy(): 2025-12-04T10:58:28.4008225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4008264Z raise RuntimeError(msg) 2025-12-04T10:58:28.4008663Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4008667Z 2025-12-04T10:58:28.4008739Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4009027Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4009029Z 2025-12-04T10:58:28.4009114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4009186Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4009242Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4009439Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4009513Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4009550Z graph_break [] 2025-12-04T10:58:28.4009623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4011142Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4011217Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4011392Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4011429Z graph_break [] 2025-12-04T10:58:28.4011480Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4011632Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4011680Z Traceback (most recent call last): 2025-12-04T10:58:28.4011835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4011891Z method(*args, **kwargs) 2025-12-04T10:58:28.4012060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4012100Z method(*args, **kwargs) 2025-12-04T10:58:28.4012250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4012286Z with policy(): 2025-12-04T10:58:28.4012438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4012478Z raise RuntimeError(msg) 2025-12-04T10:58:28.4012882Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4012909Z 2025-12-04T10:58:28.4012983Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4013314Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4013316Z 2025-12-04T10:58:28.4013404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4013476Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4013532Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4013708Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4013782Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4013819Z graph_break [] 2025-12-04T10:58:28.4013892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4013947Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4014019Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4014192Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4014228Z graph_break [] 2025-12-04T10:58:28.4014300Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4014354Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4014445Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4014619Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4014656Z graph_break [] 2025-12-04T10:58:28.4014946Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6d7e650e3d56ff05.xml - 2025-12-04T10:58:28.4015006Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4015642Z FAILED [0.4312s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4015663Z 2025-12-04T10:58:28.4015736Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4016024Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4016026Z 2025-12-04T10:58:28.4016113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4016174Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4016240Z ================== 1 failed, 35 deselected, 2 rerun in 3.46s =================== 2025-12-04T10:58:28.4016277Z Got exit code 1 2025-12-04T10:58:28.4016317Z Retrying single test... 2025-12-04T10:58:28.4016518Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f9a1c47033f4fd8c.xml 2025-12-04T10:58:28.4016590Z ============================= test session starts ============================== 2025-12-04T10:58:28.4016700Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4016741Z cachedir: .pytest_cache 2025-12-04T10:58:28.4016900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4016946Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4016986Z configfile: pytest.ini 2025-12-04T10:58:28.4017146Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4017218Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4017504Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4017548Z Running 1 items in this shard 2025-12-04T10:58:28.4017552Z 2025-12-04T10:58:28.4017916Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:01.751095641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4017918Z 2025-12-04T10:58:28.4018073Z [W1204 10:46:01.020090014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018075Z 2025-12-04T10:58:28.4018226Z [W1204 10:46:01.020256832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018229Z 2025-12-04T10:58:28.4018392Z [W1204 10:46:01.023831125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018395Z 2025-12-04T10:58:28.4018544Z [W1204 10:46:01.024144742 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018547Z 2025-12-04T10:58:28.4018718Z [W1204 10:46:01.024209061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018720Z 2025-12-04T10:58:28.4018870Z [W1204 10:46:01.026282270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4018872Z 2025-12-04T10:58:28.4019019Z [W1204 10:46:01.026549377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4019021Z 2025-12-04T10:58:28.4019172Z [W1204 10:46:01.026610246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4019185Z 2025-12-04T10:58:28.4019233Z ('RERUN', {'yellow': True}) [2.7015s] [100%] 2025-12-04T10:58:28.4019594Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:02.963884280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4019596Z 2025-12-04T10:58:28.4019746Z [W1204 10:46:02.964265156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4019748Z 2025-12-04T10:58:28.4019895Z [W1204 10:46:02.964333045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4019897Z 2025-12-04T10:58:28.4020058Z [W1204 10:46:02.965582592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020061Z 2025-12-04T10:58:28.4020209Z [W1204 10:46:02.965845110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020210Z 2025-12-04T10:58:28.4020361Z [W1204 10:46:02.965906099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020363Z 2025-12-04T10:58:28.4020511Z [W1204 10:46:02.967880648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020514Z 2025-12-04T10:58:28.4020661Z [W1204 10:46:02.968222785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020663Z 2025-12-04T10:58:28.4020813Z [W1204 10:46:02.968287084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4020815Z 2025-12-04T10:58:28.4020863Z ('RERUN', {'yellow': True}) [0.4379s] [100%] 2025-12-04T10:58:28.4021219Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:03.397972386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021221Z 2025-12-04T10:58:28.4021368Z [W1204 10:46:03.398352112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021370Z 2025-12-04T10:58:28.4021518Z [W1204 10:46:03.398418511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021519Z 2025-12-04T10:58:28.4021680Z [W1204 10:46:03.399758597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021682Z 2025-12-04T10:58:28.4021832Z [W1204 10:46:03.400023165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021834Z 2025-12-04T10:58:28.4021995Z [W1204 10:46:03.400084674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4021997Z 2025-12-04T10:58:28.4022145Z [W1204 10:46:03.401994444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4022147Z 2025-12-04T10:58:28.4022295Z [W1204 10:46:03.402334590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4022297Z 2025-12-04T10:58:28.4022445Z [W1204 10:46:03.402398660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4022449Z 2025-12-04T10:58:28.4022487Z FAILED [0.4281s] [100%] 2025-12-04T10:58:28.4022501Z 2025-12-04T10:58:28.4022554Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4022704Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4022750Z Traceback (most recent call last): 2025-12-04T10:58:28.4022906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4022947Z method(*args, **kwargs) 2025-12-04T10:58:28.4023098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4023139Z method(*args, **kwargs) 2025-12-04T10:58:28.4023327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4023381Z with policy(): 2025-12-04T10:58:28.4023533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4023574Z raise RuntimeError(msg) 2025-12-04T10:58:28.4023968Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4023970Z 2025-12-04T10:58:28.4024044Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4024334Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4024337Z 2025-12-04T10:58:28.4024425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4024500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4024556Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4024736Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4024809Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4024846Z graph_break [] 2025-12-04T10:58:28.4024994Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4025040Z Traceback (most recent call last): 2025-12-04T10:58:28.4025206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4025248Z method(*args, **kwargs) 2025-12-04T10:58:28.4025400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4025440Z method(*args, **kwargs) 2025-12-04T10:58:28.4025604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4025641Z with policy(): 2025-12-04T10:58:28.4025792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4025833Z raise RuntimeError(msg) 2025-12-04T10:58:28.4026231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4026234Z 2025-12-04T10:58:28.4026307Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4026617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4026619Z 2025-12-04T10:58:28.4026706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4026779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4026834Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4027012Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4027098Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4027134Z graph_break [] 2025-12-04T10:58:28.4027206Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4027262Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4027332Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4027508Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4027544Z graph_break [] 2025-12-04T10:58:28.4027596Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4027742Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4027788Z Traceback (most recent call last): 2025-12-04T10:58:28.4027941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4027983Z method(*args, **kwargs) 2025-12-04T10:58:28.4028135Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4028175Z method(*args, **kwargs) 2025-12-04T10:58:28.4028326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4028363Z with policy(): 2025-12-04T10:58:28.4028514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4028555Z raise RuntimeError(msg) 2025-12-04T10:58:28.4028965Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4028969Z 2025-12-04T10:58:28.4029043Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4029346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4029349Z 2025-12-04T10:58:28.4029436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4029509Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4029563Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4029739Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4029813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4029849Z graph_break [] 2025-12-04T10:58:28.4029933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4029988Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4030060Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4030234Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4030270Z graph_break [] 2025-12-04T10:58:28.4030343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4030395Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4030467Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4030640Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4030689Z graph_break [] 2025-12-04T10:58:28.4030932Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f9a1c47033f4fd8c.xml - 2025-12-04T10:58:28.4030993Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4031625Z FAILED [0.4281s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4031630Z 2025-12-04T10:58:28.4031702Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4031990Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4031993Z 2025-12-04T10:58:28.4032078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4032140Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4032204Z ================== 1 failed, 57 deselected, 2 rerun in 3.73s =================== 2025-12-04T10:58:28.4032241Z Got exit code 1 2025-12-04T10:58:28.4032281Z Retrying single test... 2025-12-04T10:58:28.4032479Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1ff87baee62f2e19.xml 2025-12-04T10:58:28.4032548Z ============================= test session starts ============================== 2025-12-04T10:58:28.4032659Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4032699Z cachedir: .pytest_cache 2025-12-04T10:58:28.4032869Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4032915Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4032955Z configfile: pytest.ini 2025-12-04T10:58:28.4033114Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4033187Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4033507Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4033552Z Running 1 items in this shard 2025-12-04T10:58:28.4033575Z 2025-12-04T10:58:28.4033937Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:11.943183780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4033939Z 2025-12-04T10:58:28.4034091Z [W1204 10:46:11.222959571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034093Z 2025-12-04T10:58:28.4034244Z [W1204 10:46:11.223095739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034246Z 2025-12-04T10:58:28.4034395Z [W1204 10:46:11.226536703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034412Z 2025-12-04T10:58:28.4034561Z [W1204 10:46:11.226854240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034564Z 2025-12-04T10:58:28.4034713Z [W1204 10:46:11.226916039 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034715Z 2025-12-04T10:58:28.4034863Z [W1204 10:46:11.229148225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4034865Z 2025-12-04T10:58:28.4035013Z [W1204 10:46:11.229422513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4035015Z 2025-12-04T10:58:28.4035161Z [W1204 10:46:11.229483112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4035164Z 2025-12-04T10:58:28.4035212Z ('RERUN', {'yellow': True}) [2.8345s] [100%] 2025-12-04T10:58:28.4035570Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:13.402067955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4035573Z 2025-12-04T10:58:28.4035723Z [W1204 10:46:13.402466681 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4035725Z 2025-12-04T10:58:28.4035873Z [W1204 10:46:13.402541330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4035875Z 2025-12-04T10:58:28.4036035Z [W1204 10:46:13.403813997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036038Z 2025-12-04T10:58:28.4036187Z [W1204 10:46:13.404090264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036190Z 2025-12-04T10:58:28.4036354Z [W1204 10:46:13.404156493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036356Z 2025-12-04T10:58:28.4036504Z [W1204 10:46:13.406153642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036506Z 2025-12-04T10:58:28.4036655Z [W1204 10:46:13.406494639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036656Z 2025-12-04T10:58:28.4036804Z [W1204 10:46:13.406557798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4036808Z 2025-12-04T10:58:28.4036856Z ('RERUN', {'yellow': True}) [0.6765s] [100%] 2025-12-04T10:58:28.4037211Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:46:13.094511292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037226Z 2025-12-04T10:58:28.4037375Z [W1204 10:46:13.094899348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037377Z 2025-12-04T10:58:28.4037525Z [W1204 10:46:13.094966947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037527Z 2025-12-04T10:58:28.4037676Z [W1204 10:46:13.096216084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037691Z 2025-12-04T10:58:28.4037841Z [W1204 10:46:13.096474641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037844Z 2025-12-04T10:58:28.4037991Z [W1204 10:46:13.096534661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4037993Z 2025-12-04T10:58:28.4038141Z [W1204 10:46:13.098521250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4038143Z 2025-12-04T10:58:28.4038289Z [W1204 10:46:13.098853876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4038291Z 2025-12-04T10:58:28.4038438Z [W1204 10:46:13.098921125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4038441Z 2025-12-04T10:58:28.4038480Z FAILED [0.6889s] [100%] 2025-12-04T10:58:28.4038483Z 2025-12-04T10:58:28.4038534Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4038685Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4038731Z Traceback (most recent call last): 2025-12-04T10:58:28.4038889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4038929Z method(*args, **kwargs) 2025-12-04T10:58:28.4039082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4039121Z method(*args, **kwargs) 2025-12-04T10:58:28.4039272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4039309Z with policy(): 2025-12-04T10:58:28.4039473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4039515Z raise RuntimeError(msg) 2025-12-04T10:58:28.4039930Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4039932Z 2025-12-04T10:58:28.4040005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4040295Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4040298Z 2025-12-04T10:58:28.4040385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4040458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4040526Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4040704Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4040778Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4040814Z graph_break [] 2025-12-04T10:58:28.4040963Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4041008Z Traceback (most recent call last): 2025-12-04T10:58:28.4041161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4041212Z method(*args, **kwargs) 2025-12-04T10:58:28.4041363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4041402Z method(*args, **kwargs) 2025-12-04T10:58:28.4041552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4041589Z with policy(): 2025-12-04T10:58:28.4041741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4041781Z raise RuntimeError(msg) 2025-12-04T10:58:28.4042182Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4042185Z 2025-12-04T10:58:28.4042259Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4042550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4042553Z 2025-12-04T10:58:28.4042640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4042713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4042768Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4042944Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4043017Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4043054Z graph_break [] 2025-12-04T10:58:28.4043139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4043194Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4043297Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4043488Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4043525Z graph_break [] 2025-12-04T10:58:28.4043576Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4043725Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4043769Z Traceback (most recent call last): 2025-12-04T10:58:28.4043922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4043964Z method(*args, **kwargs) 2025-12-04T10:58:28.4044115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4044168Z method(*args, **kwargs) 2025-12-04T10:58:28.4044323Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4044359Z with policy(): 2025-12-04T10:58:28.4044512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4044552Z raise RuntimeError(msg) 2025-12-04T10:58:28.4044949Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4044966Z 2025-12-04T10:58:28.4045039Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4045328Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4045331Z 2025-12-04T10:58:28.4045418Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4045491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4045546Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4045719Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4045792Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4045829Z graph_break [] 2025-12-04T10:58:28.4045901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4045956Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4046027Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4046201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4046238Z graph_break [] 2025-12-04T10:58:28.4046311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4046365Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4046436Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4046625Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4046662Z graph_break [] 2025-12-04T10:58:28.4046907Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1ff87baee62f2e19.xml - 2025-12-04T10:58:28.4046967Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4047610Z FAILED [0.6889s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4047613Z 2025-12-04T10:58:28.4047687Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4047974Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4047988Z 2025-12-04T10:58:28.4048075Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4048137Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4048202Z ================== 1 failed, 57 deselected, 2 rerun in 4.37s =================== 2025-12-04T10:58:28.4048239Z Got exit code 1 2025-12-04T10:58:28.4048475Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4048602Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4048817Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98ee4568f4e3cfd2.xml 2025-12-04T10:58:28.4048874Z ============================= test session starts ============================== 2025-12-04T10:58:28.4048986Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4049026Z cachedir: .pytest_cache 2025-12-04T10:58:28.4049184Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4049231Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4049271Z configfile: pytest.ini 2025-12-04T10:58:28.4049429Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4049503Z collecting ... collected 58 items / 36 deselected / 22 selected 2025-12-04T10:58:28.4049557Z stepcurrent: skipping 36 already run items. 2025-12-04T10:58:28.4049601Z Running 22 items in this shard 2025-12-04T10:58:28.4049604Z 2025-12-04T10:58:28.4049853Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8911s] [ 4%] 2025-12-04T10:58:28.4050096Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4596s] [ 4%] 2025-12-04T10:58:28.4050319Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4527s] [ 4%] 2025-12-04T10:58:28.4050322Z 2025-12-04T10:58:28.4050372Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4050532Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4050579Z Traceback (most recent call last): 2025-12-04T10:58:28.4050736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4050776Z method(*args, **kwargs) 2025-12-04T10:58:28.4050941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4050981Z method(*args, **kwargs) 2025-12-04T10:58:28.4051132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4051168Z with policy(): 2025-12-04T10:58:28.4051321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4051361Z raise RuntimeError(msg) 2025-12-04T10:58:28.4051753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4051767Z 2025-12-04T10:58:28.4051842Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4052130Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4052132Z 2025-12-04T10:58:28.4052218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4052290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4052359Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4052636Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4052711Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4052747Z graph_break [] 2025-12-04T10:58:28.4052897Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4052941Z Traceback (most recent call last): 2025-12-04T10:58:28.4053096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4053135Z method(*args, **kwargs) 2025-12-04T10:58:28.4053317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4053358Z method(*args, **kwargs) 2025-12-04T10:58:28.4053507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4053544Z with policy(): 2025-12-04T10:58:28.4053697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4053737Z raise RuntimeError(msg) 2025-12-04T10:58:28.4054135Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4054138Z 2025-12-04T10:58:28.4054211Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4054520Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4054524Z 2025-12-04T10:58:28.4054611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4054699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4054755Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4055028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4055101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4055137Z graph_break [] 2025-12-04T10:58:28.4055212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4055267Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4055339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4055624Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4055661Z graph_break [] 2025-12-04T10:58:28.4055712Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4055861Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4055905Z Traceback (most recent call last): 2025-12-04T10:58:28.4056057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4056111Z method(*args, **kwargs) 2025-12-04T10:58:28.4056262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4056302Z method(*args, **kwargs) 2025-12-04T10:58:28.4056453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4056490Z with policy(): 2025-12-04T10:58:28.4056640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4056681Z raise RuntimeError(msg) 2025-12-04T10:58:28.4057079Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4057083Z 2025-12-04T10:58:28.4057156Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4057446Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4057449Z 2025-12-04T10:58:28.4057536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4057609Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4057664Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4057935Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4058021Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4058057Z graph_break [] 2025-12-04T10:58:28.4058130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4058184Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4058256Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4058540Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4058576Z graph_break [] 2025-12-04T10:58:28.4058649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4058703Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4058776Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4059045Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4059094Z graph_break [] 2025-12-04T10:58:28.4059338Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-98ee4568f4e3cfd2.xml - 2025-12-04T10:58:28.4059398Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4060033Z FAILED [0.4527s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4060048Z 2025-12-04T10:58:28.4060122Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4060409Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4060411Z 2025-12-04T10:58:28.4060497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4060558Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4060623Z ================== 1 failed, 36 deselected, 2 rerun in 3.97s =================== 2025-12-04T10:58:28.4060660Z Got exit code 1 2025-12-04T10:58:28.4060699Z Retrying single test... 2025-12-04T10:58:28.4060898Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3dd27572e7e2af8a.xml 2025-12-04T10:58:28.4060955Z ============================= test session starts ============================== 2025-12-04T10:58:28.4061064Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4061105Z cachedir: .pytest_cache 2025-12-04T10:58:28.4061263Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4061307Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4061348Z configfile: pytest.ini 2025-12-04T10:58:28.4061506Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4061579Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4061875Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4061921Z Running 1 items in this shard 2025-12-04T10:58:28.4061923Z 2025-12-04T10:58:28.4062294Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:34.693363789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4062298Z 2025-12-04T10:58:28.4062451Z [W1204 10:46:34.972347216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4062453Z 2025-12-04T10:58:28.4062604Z [W1204 10:46:34.972504414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4062607Z 2025-12-04T10:58:28.4062756Z [W1204 10:46:34.975698020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4062770Z 2025-12-04T10:58:28.4062919Z [W1204 10:46:34.975997757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4062921Z 2025-12-04T10:58:28.4063070Z [W1204 10:46:34.976063886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4063072Z 2025-12-04T10:58:28.4063221Z [W1204 10:46:34.978324322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4063222Z 2025-12-04T10:58:28.4063399Z [W1204 10:46:34.978597779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4063401Z 2025-12-04T10:58:28.4063566Z [W1204 10:46:34.978656348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4063569Z 2025-12-04T10:58:28.4063618Z ('RERUN', {'yellow': True}) [3.2838s] [100%] 2025-12-04T10:58:28.4063976Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:35.801338476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4063978Z 2025-12-04T10:58:28.4064129Z [W1204 10:46:35.801724072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064131Z 2025-12-04T10:58:28.4064278Z [W1204 10:46:35.801793651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064281Z 2025-12-04T10:58:28.4064431Z [W1204 10:46:35.803082478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064434Z 2025-12-04T10:58:28.4064583Z [W1204 10:46:35.803344805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064585Z 2025-12-04T10:58:28.4064733Z [W1204 10:46:35.803405264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064734Z 2025-12-04T10:58:28.4064883Z [W1204 10:46:35.805511342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4064885Z 2025-12-04T10:58:28.4065032Z [W1204 10:46:35.805779009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4065033Z 2025-12-04T10:58:28.4065198Z [W1204 10:46:35.805838188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4065201Z 2025-12-04T10:58:28.4065250Z ('RERUN', {'yellow': True}) [0.6983s] [100%] 2025-12-04T10:58:28.4065620Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:36.519421786 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4065622Z 2025-12-04T10:58:28.4065772Z [W1204 10:46:36.519795372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4065774Z 2025-12-04T10:58:28.4065921Z [W1204 10:46:36.519861271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4065923Z 2025-12-04T10:58:28.4066072Z [W1204 10:46:36.521141228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066088Z 2025-12-04T10:58:28.4066236Z [W1204 10:46:36.521403755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066239Z 2025-12-04T10:58:28.4066387Z [W1204 10:46:36.521465804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066389Z 2025-12-04T10:58:28.4066537Z [W1204 10:46:36.523593012 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066539Z 2025-12-04T10:58:28.4066685Z [W1204 10:46:36.523861619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066687Z 2025-12-04T10:58:28.4066835Z [W1204 10:46:36.523925138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4066854Z 2025-12-04T10:58:28.4066893Z FAILED [0.7033s] [100%] 2025-12-04T10:58:28.4066895Z 2025-12-04T10:58:28.4066946Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4067095Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4067142Z Traceback (most recent call last): 2025-12-04T10:58:28.4067297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4067338Z method(*args, **kwargs) 2025-12-04T10:58:28.4067490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4067530Z method(*args, **kwargs) 2025-12-04T10:58:28.4067680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4067718Z with policy(): 2025-12-04T10:58:28.4067871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4067911Z raise RuntimeError(msg) 2025-12-04T10:58:28.4068315Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4068318Z 2025-12-04T10:58:28.4068391Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4068693Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4068696Z 2025-12-04T10:58:28.4068782Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4068857Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4068913Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4069202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4069275Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4069312Z graph_break [] 2025-12-04T10:58:28.4069460Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4069506Z Traceback (most recent call last): 2025-12-04T10:58:28.4069662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4069713Z method(*args, **kwargs) 2025-12-04T10:58:28.4069864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4069904Z method(*args, **kwargs) 2025-12-04T10:58:28.4070054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4070090Z with policy(): 2025-12-04T10:58:28.4070242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4070282Z raise RuntimeError(msg) 2025-12-04T10:58:28.4070685Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4070698Z 2025-12-04T10:58:28.4070771Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4071058Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4071060Z 2025-12-04T10:58:28.4071147Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4071222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4071277Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4071551Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4071626Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4071771Z graph_break [] 2025-12-04T10:58:28.4071846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4071901Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4071974Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4072243Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4072280Z graph_break [] 2025-12-04T10:58:28.4072331Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4072496Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4072543Z Traceback (most recent call last): 2025-12-04T10:58:28.4072696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4072736Z method(*args, **kwargs) 2025-12-04T10:58:28.4072907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4072947Z method(*args, **kwargs) 2025-12-04T10:58:28.4073098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4073134Z with policy(): 2025-12-04T10:58:28.4073317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4073358Z raise RuntimeError(msg) 2025-12-04T10:58:28.4073762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4073781Z 2025-12-04T10:58:28.4073854Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4074143Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4074145Z 2025-12-04T10:58:28.4074230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4074302Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4074376Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4074649Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4074725Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4074761Z graph_break [] 2025-12-04T10:58:28.4074834Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4074888Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4074960Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4075230Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4075269Z graph_break [] 2025-12-04T10:58:28.4075340Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4075395Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4075466Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4075734Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4075770Z graph_break [] 2025-12-04T10:58:28.4076014Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3dd27572e7e2af8a.xml - 2025-12-04T10:58:28.4076073Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4076730Z FAILED [0.7033s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4076735Z 2025-12-04T10:58:28.4076808Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4077094Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4077096Z 2025-12-04T10:58:28.4077182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4077246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4077313Z ================== 1 failed, 57 deselected, 2 rerun in 4.85s =================== 2025-12-04T10:58:28.4077363Z Got exit code 1 2025-12-04T10:58:28.4077403Z Retrying single test... 2025-12-04T10:58:28.4077601Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cea7e54616b277ce.xml 2025-12-04T10:58:28.4077659Z ============================= test session starts ============================== 2025-12-04T10:58:28.4077768Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4077809Z cachedir: .pytest_cache 2025-12-04T10:58:28.4077966Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4078012Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4078064Z configfile: pytest.ini 2025-12-04T10:58:28.4078225Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4078299Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4078585Z stepcurrent: skipping 36 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4078628Z Running 1 items in this shard 2025-12-04T10:58:28.4078630Z 2025-12-04T10:58:28.4078989Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:46.711329860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4078991Z 2025-12-04T10:58:28.4079146Z [W1204 10:46:46.984165801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079148Z 2025-12-04T10:58:28.4079298Z [W1204 10:46:46.984306319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079300Z 2025-12-04T10:58:28.4079452Z [W1204 10:46:46.988242317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079454Z 2025-12-04T10:58:28.4079606Z [W1204 10:46:46.988558534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079608Z 2025-12-04T10:58:28.4079755Z [W1204 10:46:46.988639383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079757Z 2025-12-04T10:58:28.4079917Z [W1204 10:46:46.990876649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4079920Z 2025-12-04T10:58:28.4080068Z [W1204 10:46:46.991213105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4080070Z 2025-12-04T10:58:28.4080230Z [W1204 10:46:46.991278565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4080232Z 2025-12-04T10:58:28.4080280Z ('RERUN', {'yellow': True}) [3.2771s] [100%] 2025-12-04T10:58:28.4080639Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:47.828594531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4080641Z 2025-12-04T10:58:28.4080791Z [W1204 10:46:47.828985707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4080794Z 2025-12-04T10:58:28.4080959Z [W1204 10:46:47.829068726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4080961Z 2025-12-04T10:58:28.4081109Z [W1204 10:46:47.830357712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081111Z 2025-12-04T10:58:28.4081258Z [W1204 10:46:47.830625379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081260Z 2025-12-04T10:58:28.4081408Z [W1204 10:46:47.830686459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081409Z 2025-12-04T10:58:28.4081559Z [W1204 10:46:47.832818556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081572Z 2025-12-04T10:58:28.4081720Z [W1204 10:46:47.833092563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081723Z 2025-12-04T10:58:28.4081873Z [W1204 10:46:47.833155682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4081874Z 2025-12-04T10:58:28.4081922Z ('RERUN', {'yellow': True}) [0.6987s] [100%] 2025-12-04T10:58:28.4082278Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:46:48.511505263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4082280Z 2025-12-04T10:58:28.4082428Z [W1204 10:46:48.511880549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4082431Z 2025-12-04T10:58:28.4082579Z [W1204 10:46:48.511951649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4082581Z 2025-12-04T10:58:28.4082730Z [W1204 10:46:48.513270124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4082732Z 2025-12-04T10:58:28.4082879Z [W1204 10:46:48.513528192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4082880Z 2025-12-04T10:58:28.4083029Z [W1204 10:46:48.513589211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4083031Z 2025-12-04T10:58:28.4083188Z [W1204 10:46:48.515695368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4083191Z 2025-12-04T10:58:28.4083373Z [W1204 10:46:48.515962465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4083376Z 2025-12-04T10:58:28.4083539Z [W1204 10:46:48.516025995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4083541Z 2025-12-04T10:58:28.4083579Z FAILED [0.6705s] [100%] 2025-12-04T10:58:28.4083581Z 2025-12-04T10:58:28.4083632Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4083783Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4083829Z Traceback (most recent call last): 2025-12-04T10:58:28.4083985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4084028Z method(*args, **kwargs) 2025-12-04T10:58:28.4084181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4084236Z method(*args, **kwargs) 2025-12-04T10:58:28.4084387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4084424Z with policy(): 2025-12-04T10:58:28.4084576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4084618Z raise RuntimeError(msg) 2025-12-04T10:58:28.4085011Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4085027Z 2025-12-04T10:58:28.4085102Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4085394Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4085398Z 2025-12-04T10:58:28.4085484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4085557Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4085613Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4085888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4085964Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4086001Z graph_break [] 2025-12-04T10:58:28.4086151Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4086196Z Traceback (most recent call last): 2025-12-04T10:58:28.4086350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4086390Z method(*args, **kwargs) 2025-12-04T10:58:28.4086540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4086579Z method(*args, **kwargs) 2025-12-04T10:58:28.4086729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4086765Z with policy(): 2025-12-04T10:58:28.4086932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4086974Z raise RuntimeError(msg) 2025-12-04T10:58:28.4087384Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4087387Z 2025-12-04T10:58:28.4087460Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4087747Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4087750Z 2025-12-04T10:58:28.4087835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4087910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4087978Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4088252Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4088325Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4088361Z graph_break [] 2025-12-04T10:58:28.4088432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4088487Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4088558Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4088830Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4088876Z graph_break [] 2025-12-04T10:58:28.4088928Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4089079Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4089125Z Traceback (most recent call last): 2025-12-04T10:58:28.4089277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4089317Z method(*args, **kwargs) 2025-12-04T10:58:28.4089467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4089506Z method(*args, **kwargs) 2025-12-04T10:58:28.4089656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4089694Z with policy(): 2025-12-04T10:58:28.4089847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4089888Z raise RuntimeError(msg) 2025-12-04T10:58:28.4090292Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4090295Z 2025-12-04T10:58:28.4090368Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4090670Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4090673Z 2025-12-04T10:58:28.4090759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4090833Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4090887Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4091171Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4091244Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4091281Z graph_break [] 2025-12-04T10:58:28.4091353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4091408Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4091481Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4091751Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4091798Z graph_break [] 2025-12-04T10:58:28.4091872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4091926Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4091999Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4092268Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4092303Z graph_break [] 2025-12-04T10:58:28.4092561Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cea7e54616b277ce.xml - 2025-12-04T10:58:28.4092621Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4093291Z FAILED [0.6705s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4093294Z 2025-12-04T10:58:28.4093366Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4093654Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4093658Z 2025-12-04T10:58:28.4093744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4093806Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4093873Z ================== 1 failed, 57 deselected, 2 rerun in 4.81s =================== 2025-12-04T10:58:28.4093910Z Got exit code 1 2025-12-04T10:58:28.4094146Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4094273Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4094485Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c317dbf4de91dc94.xml 2025-12-04T10:58:28.4094543Z ============================= test session starts ============================== 2025-12-04T10:58:28.4094655Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4094695Z cachedir: .pytest_cache 2025-12-04T10:58:28.4094876Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4094921Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4094962Z configfile: pytest.ini 2025-12-04T10:58:28.4095120Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4095194Z collecting ... collected 58 items / 37 deselected / 21 selected 2025-12-04T10:58:28.4095246Z stepcurrent: skipping 37 already run items. 2025-12-04T10:58:28.4095292Z Running 21 items in this shard 2025-12-04T10:58:28.4095294Z 2025-12-04T10:58:28.4095545Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5280s] [ 4%] 2025-12-04T10:58:28.4095809Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4643s] [ 4%] 2025-12-04T10:58:28.4096032Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4682s] [ 4%] 2025-12-04T10:58:28.4096035Z 2025-12-04T10:58:28.4096085Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4096237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4096297Z Traceback (most recent call last): 2025-12-04T10:58:28.4096453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4096494Z method(*args, **kwargs) 2025-12-04T10:58:28.4096647Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4096685Z method(*args, **kwargs) 2025-12-04T10:58:28.4096837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4096873Z with policy(): 2025-12-04T10:58:28.4097026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4097066Z raise RuntimeError(msg) 2025-12-04T10:58:28.4097465Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4097469Z 2025-12-04T10:58:28.4097542Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4097834Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4097836Z 2025-12-04T10:58:28.4097922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4097996Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4098051Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4098240Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4098315Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4098351Z graph_break [] 2025-12-04T10:58:28.4098512Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4098557Z Traceback (most recent call last): 2025-12-04T10:58:28.4098709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4098748Z method(*args, **kwargs) 2025-12-04T10:58:28.4098898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4098936Z method(*args, **kwargs) 2025-12-04T10:58:28.4099088Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4099125Z with policy(): 2025-12-04T10:58:28.4099278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4099330Z raise RuntimeError(msg) 2025-12-04T10:58:28.4099736Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4099738Z 2025-12-04T10:58:28.4099811Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4100105Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4100119Z 2025-12-04T10:58:28.4100205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4100280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4100335Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4100511Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4100584Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4100620Z graph_break [] 2025-12-04T10:58:28.4100693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4100747Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4100819Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4100995Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4101032Z graph_break [] 2025-12-04T10:58:28.4101084Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4101235Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4101279Z Traceback (most recent call last): 2025-12-04T10:58:28.4101433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4101472Z method(*args, **kwargs) 2025-12-04T10:58:28.4101622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4101661Z method(*args, **kwargs) 2025-12-04T10:58:28.4101824Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4101860Z with policy(): 2025-12-04T10:58:28.4102013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4102053Z raise RuntimeError(msg) 2025-12-04T10:58:28.4102469Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4102471Z 2025-12-04T10:58:28.4102544Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4102835Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4102838Z 2025-12-04T10:58:28.4102937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4103009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4103065Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4103240Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4103353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4103388Z graph_break [] 2025-12-04T10:58:28.4103463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4103517Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4103588Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4103779Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4103816Z graph_break [] 2025-12-04T10:58:28.4103888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4103943Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4104013Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4104186Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4104222Z graph_break [] 2025-12-04T10:58:28.4104466Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c317dbf4de91dc94.xml - 2025-12-04T10:58:28.4104526Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4105167Z FAILED [0.4682s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4105170Z 2025-12-04T10:58:28.4105243Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4105533Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4105551Z 2025-12-04T10:58:28.4105638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4105700Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4105767Z ================== 1 failed, 37 deselected, 2 rerun in 3.62s =================== 2025-12-04T10:58:28.4105803Z Got exit code 1 2025-12-04T10:58:28.4105858Z Retrying single test... 2025-12-04T10:58:28.4106055Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7723bdf109fa4817.xml 2025-12-04T10:58:28.4106113Z ============================= test session starts ============================== 2025-12-04T10:58:28.4106222Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4106264Z cachedir: .pytest_cache 2025-12-04T10:58:28.4106422Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4106470Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4106524Z configfile: pytest.ini 2025-12-04T10:58:28.4106685Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4106758Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4107044Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4107087Z Running 1 items in this shard 2025-12-04T10:58:28.4107089Z 2025-12-04T10:58:28.4107454Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:07.103860140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4107469Z 2025-12-04T10:58:28.4107622Z [W1204 10:47:08.374408059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4107625Z 2025-12-04T10:58:28.4107777Z [W1204 10:47:08.374541038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4107779Z 2025-12-04T10:58:28.4107930Z [W1204 10:47:08.378152628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4107932Z 2025-12-04T10:58:28.4108081Z [W1204 10:47:08.378465635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4108083Z 2025-12-04T10:58:28.4108232Z [W1204 10:47:08.378526834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4108235Z 2025-12-04T10:58:28.4108383Z [W1204 10:47:08.380863849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4108386Z 2025-12-04T10:58:28.4108535Z [W1204 10:47:08.381143556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4108537Z 2025-12-04T10:58:28.4108686Z [W1204 10:47:08.381205845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4108687Z 2025-12-04T10:58:28.4108736Z ('RERUN', {'yellow': True}) [2.8742s] [100%] 2025-12-04T10:58:28.4109113Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:09.505058013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109117Z 2025-12-04T10:58:28.4109266Z [W1204 10:47:09.505454569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109269Z 2025-12-04T10:58:28.4109429Z [W1204 10:47:09.505525428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109431Z 2025-12-04T10:58:28.4109580Z [W1204 10:47:09.506804844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109582Z 2025-12-04T10:58:28.4109729Z [W1204 10:47:09.507076041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109731Z 2025-12-04T10:58:28.4109881Z [W1204 10:47:09.507139900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4109884Z 2025-12-04T10:58:28.4110032Z [W1204 10:47:09.509182458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4110049Z 2025-12-04T10:58:28.4110199Z [W1204 10:47:09.509524195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4110200Z 2025-12-04T10:58:28.4110349Z [W1204 10:47:09.509587024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4110351Z 2025-12-04T10:58:28.4110398Z ('RERUN', {'yellow': True}) [0.6280s] [100%] 2025-12-04T10:58:28.4110757Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:09.132284809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4110772Z 2025-12-04T10:58:28.4110921Z [W1204 10:47:09.132673685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4110924Z 2025-12-04T10:58:28.4111074Z [W1204 10:47:09.132742704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111075Z 2025-12-04T10:58:28.4111224Z [W1204 10:47:09.134038220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111226Z 2025-12-04T10:58:28.4111373Z [W1204 10:47:09.134301027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111375Z 2025-12-04T10:58:28.4111524Z [W1204 10:47:09.134362206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111527Z 2025-12-04T10:58:28.4111676Z [W1204 10:47:09.136387694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111678Z 2025-12-04T10:58:28.4111828Z [W1204 10:47:09.136725081 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111830Z 2025-12-04T10:58:28.4111978Z [W1204 10:47:09.136785680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4111981Z 2025-12-04T10:58:28.4112019Z FAILED [0.6333s] [100%] 2025-12-04T10:58:28.4112021Z 2025-12-04T10:58:28.4112073Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4112225Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4112272Z Traceback (most recent call last): 2025-12-04T10:58:28.4112438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4112481Z method(*args, **kwargs) 2025-12-04T10:58:28.4112633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4112683Z method(*args, **kwargs) 2025-12-04T10:58:28.4112834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4112871Z with policy(): 2025-12-04T10:58:28.4113023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4113064Z raise RuntimeError(msg) 2025-12-04T10:58:28.4113498Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4113517Z 2025-12-04T10:58:28.4113591Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4115473Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4115477Z 2025-12-04T10:58:28.4115569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4115644Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4115701Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4115883Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4115983Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4116022Z graph_break [] 2025-12-04T10:58:28.4116175Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4116223Z Traceback (most recent call last): 2025-12-04T10:58:28.4116378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4116419Z method(*args, **kwargs) 2025-12-04T10:58:28.4116570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4116609Z method(*args, **kwargs) 2025-12-04T10:58:28.4116758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4116798Z with policy(): 2025-12-04T10:58:28.4116949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4116992Z raise RuntimeError(msg) 2025-12-04T10:58:28.4117400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4117402Z 2025-12-04T10:58:28.4117477Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4117767Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4117770Z 2025-12-04T10:58:28.4117876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4117952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4118009Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4118202Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4118277Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4118314Z graph_break [] 2025-12-04T10:58:28.4118387Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4118442Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4118514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4118690Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4118741Z graph_break [] 2025-12-04T10:58:28.4118793Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4118945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4118991Z Traceback (most recent call last): 2025-12-04T10:58:28.4119143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4119184Z method(*args, **kwargs) 2025-12-04T10:58:28.4119334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4119374Z method(*args, **kwargs) 2025-12-04T10:58:28.4119524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4119574Z with policy(): 2025-12-04T10:58:28.4119725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4119768Z raise RuntimeError(msg) 2025-12-04T10:58:28.4120173Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4120175Z 2025-12-04T10:58:28.4120249Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4120541Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4120545Z 2025-12-04T10:58:28.4120632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4120706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4120761Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4120939Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4121012Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4121048Z graph_break [] 2025-12-04T10:58:28.4121121Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4121175Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4121247Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4121434Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4121472Z graph_break [] 2025-12-04T10:58:28.4121545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4121599Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4121683Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4121857Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4121894Z graph_break [] 2025-12-04T10:58:28.4122137Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7723bdf109fa4817.xml - 2025-12-04T10:58:28.4122197Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4122845Z FAILED [0.6333s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4122861Z 2025-12-04T10:58:28.4122933Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4123222Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4123224Z 2025-12-04T10:58:28.4123360Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4123423Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4123491Z ================== 1 failed, 57 deselected, 2 rerun in 4.30s =================== 2025-12-04T10:58:28.4123529Z Got exit code 1 2025-12-04T10:58:28.4123568Z Retrying single test... 2025-12-04T10:58:28.4123766Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-53cc422704518616.xml 2025-12-04T10:58:28.4123823Z ============================= test session starts ============================== 2025-12-04T10:58:28.4123936Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4123976Z cachedir: .pytest_cache 2025-12-04T10:58:28.4124138Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4124186Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4124227Z configfile: pytest.ini 2025-12-04T10:58:28.4124389Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4124464Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4124755Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4124800Z Running 1 items in this shard 2025-12-04T10:58:28.4124802Z 2025-12-04T10:58:28.4125170Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:19.351253599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125190Z 2025-12-04T10:58:28.4125344Z [W1204 10:47:19.633960376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125347Z 2025-12-04T10:58:28.4125512Z [W1204 10:47:19.634096325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125515Z 2025-12-04T10:58:28.4125664Z [W1204 10:47:19.637557147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125666Z 2025-12-04T10:58:28.4125816Z [W1204 10:47:19.637870103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125818Z 2025-12-04T10:58:28.4125966Z [W1204 10:47:19.637930213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4125970Z 2025-12-04T10:58:28.4126119Z [W1204 10:47:19.640183118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4126135Z 2025-12-04T10:58:28.4126285Z [W1204 10:47:19.640454985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4126287Z 2025-12-04T10:58:28.4126436Z [W1204 10:47:19.640515794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4126437Z 2025-12-04T10:58:28.4126486Z ('RERUN', {'yellow': True}) [2.8925s] [100%] 2025-12-04T10:58:28.4126850Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:20.800351952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4126867Z 2025-12-04T10:58:28.4127017Z [W1204 10:47:20.800737518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127020Z 2025-12-04T10:58:28.4127170Z [W1204 10:47:20.800801947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127172Z 2025-12-04T10:58:28.4127321Z [W1204 10:47:20.802132473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127323Z 2025-12-04T10:58:28.4127471Z [W1204 10:47:20.802425509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127473Z 2025-12-04T10:58:28.4127620Z [W1204 10:47:20.802487929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127623Z 2025-12-04T10:58:28.4127772Z [W1204 10:47:20.804528146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127775Z 2025-12-04T10:58:28.4127923Z [W1204 10:47:20.804865553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4127926Z 2025-12-04T10:58:28.4128074Z [W1204 10:47:20.804928792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4128076Z 2025-12-04T10:58:28.4128125Z ('RERUN', {'yellow': True}) [0.6767s] [100%] 2025-12-04T10:58:28.4128485Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:47:21.459186439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4128488Z 2025-12-04T10:58:28.4128650Z [W1204 10:47:21.459557395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4128653Z 2025-12-04T10:58:28.4128802Z [W1204 10:47:21.459631544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4128803Z 2025-12-04T10:58:28.4128966Z [W1204 10:47:21.460886870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4128968Z 2025-12-04T10:58:28.4129118Z [W1204 10:47:21.461159617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4129120Z 2025-12-04T10:58:28.4129268Z [W1204 10:47:21.461223017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4129269Z 2025-12-04T10:58:28.4129421Z [W1204 10:47:21.463249865 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4129433Z 2025-12-04T10:58:28.4129582Z [W1204 10:47:21.463585711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4129584Z 2025-12-04T10:58:28.4129733Z [W1204 10:47:21.463646470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4129735Z 2025-12-04T10:58:28.4129773Z FAILED [0.6511s] [100%] 2025-12-04T10:58:28.4129775Z 2025-12-04T10:58:28.4129827Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4129980Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4130026Z Traceback (most recent call last): 2025-12-04T10:58:28.4130198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4130239Z method(*args, **kwargs) 2025-12-04T10:58:28.4130392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4130432Z method(*args, **kwargs) 2025-12-04T10:58:28.4130585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4130621Z with policy(): 2025-12-04T10:58:28.4130774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4130814Z raise RuntimeError(msg) 2025-12-04T10:58:28.4131218Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4131222Z 2025-12-04T10:58:28.4131296Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4131588Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4131590Z 2025-12-04T10:58:28.4131676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4131750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4131806Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4131983Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4132071Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4132109Z graph_break [] 2025-12-04T10:58:28.4132261Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4132306Z Traceback (most recent call last): 2025-12-04T10:58:28.4132474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4132514Z method(*args, **kwargs) 2025-12-04T10:58:28.4132665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4132704Z method(*args, **kwargs) 2025-12-04T10:58:28.4132854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4132890Z with policy(): 2025-12-04T10:58:28.4133044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4133097Z raise RuntimeError(msg) 2025-12-04T10:58:28.4133540Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4133543Z 2025-12-04T10:58:28.4133615Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4133904Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4133906Z 2025-12-04T10:58:28.4134015Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4134089Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4134146Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4134322Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4134396Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4134432Z graph_break [] 2025-12-04T10:58:28.4134504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4134559Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4134630Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4134806Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4134843Z graph_break [] 2025-12-04T10:58:28.4134895Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4135047Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4135094Z Traceback (most recent call last): 2025-12-04T10:58:28.4135249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4135289Z method(*args, **kwargs) 2025-12-04T10:58:28.4135442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4135481Z method(*args, **kwargs) 2025-12-04T10:58:28.4135632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4135668Z with policy(): 2025-12-04T10:58:28.4135835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4135877Z raise RuntimeError(msg) 2025-12-04T10:58:28.4136297Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4136299Z 2025-12-04T10:58:28.4136373Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4136662Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4136666Z 2025-12-04T10:58:28.4136754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4136826Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4136907Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4137082Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4137155Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4137191Z graph_break [] 2025-12-04T10:58:28.4137263Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4137317Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4137389Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4137565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4137613Z graph_break [] 2025-12-04T10:58:28.4137686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4137740Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4137813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4137987Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4138023Z graph_break [] 2025-12-04T10:58:28.4138265Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-53cc422704518616.xml - 2025-12-04T10:58:28.4138324Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4138968Z FAILED [0.6511s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4138971Z 2025-12-04T10:58:28.4139044Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4139331Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4139334Z 2025-12-04T10:58:28.4139420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4139494Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4139563Z ================== 1 failed, 57 deselected, 2 rerun in 4.38s =================== 2025-12-04T10:58:28.4139600Z Got exit code 1 2025-12-04T10:58:28.4139854Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4139983Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4140181Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f57a5e2679e5efa3.xml 2025-12-04T10:58:28.4140238Z ============================= test session starts ============================== 2025-12-04T10:58:28.4140349Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4140391Z cachedir: .pytest_cache 2025-12-04T10:58:28.4140552Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4140610Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4140651Z configfile: pytest.ini 2025-12-04T10:58:28.4140811Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4140886Z collecting ... collected 58 items / 38 deselected / 20 selected 2025-12-04T10:58:28.4140938Z stepcurrent: skipping 38 already run items. 2025-12-04T10:58:28.4140983Z Running 20 items in this shard 2025-12-04T10:58:28.4140985Z 2025-12-04T10:58:28.4141233Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4944s] [ 5%] 2025-12-04T10:58:28.4141489Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4605s] [ 5%] 2025-12-04T10:58:28.4141714Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4579s] [ 5%] 2025-12-04T10:58:28.4141716Z 2025-12-04T10:58:28.4141767Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4141917Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4141962Z Traceback (most recent call last): 2025-12-04T10:58:28.4142120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4142159Z method(*args, **kwargs) 2025-12-04T10:58:28.4142314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4142353Z method(*args, **kwargs) 2025-12-04T10:58:28.4142505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4142541Z with policy(): 2025-12-04T10:58:28.4142696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4142736Z raise RuntimeError(msg) 2025-12-04T10:58:28.4143133Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4143135Z 2025-12-04T10:58:28.4143219Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4143527Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4143530Z 2025-12-04T10:58:28.4143636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4143710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4143767Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4143943Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4144016Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4144051Z graph_break [] 2025-12-04T10:58:28.4144203Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4144249Z Traceback (most recent call last): 2025-12-04T10:58:28.4144418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4144458Z method(*args, **kwargs) 2025-12-04T10:58:28.4144611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4144649Z method(*args, **kwargs) 2025-12-04T10:58:28.4144801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4144837Z with policy(): 2025-12-04T10:58:28.4144989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4145029Z raise RuntimeError(msg) 2025-12-04T10:58:28.4145445Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4145448Z 2025-12-04T10:58:28.4145523Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4145814Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4145816Z 2025-12-04T10:58:28.4145903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4145976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4146033Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4146209Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4146283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4146319Z graph_break [] 2025-12-04T10:58:28.4146393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4146447Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4146519Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4146694Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4146730Z graph_break [] 2025-12-04T10:58:28.4146781Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4146945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4146991Z Traceback (most recent call last): 2025-12-04T10:58:28.4147145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4147196Z method(*args, **kwargs) 2025-12-04T10:58:28.4147348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4147387Z method(*args, **kwargs) 2025-12-04T10:58:28.4147538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4147574Z with policy(): 2025-12-04T10:58:28.4147728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4147768Z raise RuntimeError(msg) 2025-12-04T10:58:28.4148170Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4148184Z 2025-12-04T10:58:28.4148258Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4148550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4148552Z 2025-12-04T10:58:28.4148640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4148712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4148781Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4148955Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4149029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4149064Z graph_break [] 2025-12-04T10:58:28.4149138Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4149192Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4149264Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4149436Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4149473Z graph_break [] 2025-12-04T10:58:28.4149545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4149601Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4149671Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4149846Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4149883Z graph_break [] 2025-12-04T10:58:28.4150127Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f57a5e2679e5efa3.xml - 2025-12-04T10:58:28.4150186Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4150836Z FAILED [0.4579s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4150841Z 2025-12-04T10:58:28.4150914Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4151217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4151220Z 2025-12-04T10:58:28.4151306Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4151367Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4151434Z ================== 1 failed, 38 deselected, 2 rerun in 3.58s =================== 2025-12-04T10:58:28.4151472Z Got exit code 1 2025-12-04T10:58:28.4151513Z Retrying single test... 2025-12-04T10:58:28.4151711Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f73ae57a3e6da2f5.xml 2025-12-04T10:58:28.4151782Z ============================= test session starts ============================== 2025-12-04T10:58:28.4151892Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4151934Z cachedir: .pytest_cache 2025-12-04T10:58:28.4152092Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4152138Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4152179Z configfile: pytest.ini 2025-12-04T10:58:28.4152338Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4152426Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4152711Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4152757Z Running 1 items in this shard 2025-12-04T10:58:28.4152760Z 2025-12-04T10:58:28.4153125Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:40.823251916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153127Z 2025-12-04T10:58:28.4153315Z [W1204 10:47:40.093814343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153317Z 2025-12-04T10:58:28.4153468Z [W1204 10:47:40.093947732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153471Z 2025-12-04T10:58:28.4153622Z [W1204 10:47:40.097611051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153624Z 2025-12-04T10:58:28.4153774Z [W1204 10:47:40.097927218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153776Z 2025-12-04T10:58:28.4153925Z [W1204 10:47:40.097988057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4153927Z 2025-12-04T10:58:28.4154075Z [W1204 10:47:40.100351061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4154077Z 2025-12-04T10:58:28.4154239Z [W1204 10:47:40.100630348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4154242Z 2025-12-04T10:58:28.4154391Z [W1204 10:47:40.100691087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4154394Z 2025-12-04T10:58:28.4154442Z ('RERUN', {'yellow': True}) [2.8783s] [100%] 2025-12-04T10:58:28.4154814Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:41.259033356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4154816Z 2025-12-04T10:58:28.4154967Z [W1204 10:47:41.259419352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4154969Z 2025-12-04T10:58:28.4155119Z [W1204 10:47:41.259493361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155122Z 2025-12-04T10:58:28.4155286Z [W1204 10:47:41.260741067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155288Z 2025-12-04T10:58:28.4155437Z [W1204 10:47:41.261020824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155439Z 2025-12-04T10:58:28.4155588Z [W1204 10:47:41.261083623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155590Z 2025-12-04T10:58:28.4155737Z [W1204 10:47:41.263099281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155739Z 2025-12-04T10:58:28.4155888Z [W1204 10:47:41.263433527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4155903Z 2025-12-04T10:58:28.4156052Z [W1204 10:47:41.263495756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4156056Z 2025-12-04T10:58:28.4156102Z ('RERUN', {'yellow': True}) [0.6694s] [100%] 2025-12-04T10:58:28.4156459Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:42.895957572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4156461Z 2025-12-04T10:58:28.4156610Z [W1204 10:47:42.896340728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4156613Z 2025-12-04T10:58:28.4156761Z [W1204 10:47:42.896410757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4156764Z 2025-12-04T10:58:28.4156912Z [W1204 10:47:42.897671373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4156915Z 2025-12-04T10:58:28.4157064Z [W1204 10:47:42.897927040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4157066Z 2025-12-04T10:58:28.4157215Z [W1204 10:47:42.897986779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4157217Z 2025-12-04T10:58:28.4157366Z [W1204 10:47:42.900060216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4157368Z 2025-12-04T10:58:28.4157527Z [W1204 10:47:42.900402823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4157531Z 2025-12-04T10:58:28.4157680Z [W1204 10:47:42.900464722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4157683Z 2025-12-04T10:58:28.4157721Z FAILED [0.6169s] [100%] 2025-12-04T10:58:28.4157722Z 2025-12-04T10:58:28.4157787Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4157936Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4157981Z Traceback (most recent call last): 2025-12-04T10:58:28.4158138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4158179Z method(*args, **kwargs) 2025-12-04T10:58:28.4158331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4158373Z method(*args, **kwargs) 2025-12-04T10:58:28.4158525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4158574Z with policy(): 2025-12-04T10:58:28.4158726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4158768Z raise RuntimeError(msg) 2025-12-04T10:58:28.4159161Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4159163Z 2025-12-04T10:58:28.4159237Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4159530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4159545Z 2025-12-04T10:58:28.4159632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4159706Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4159763Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4159941Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4160013Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4160049Z graph_break [] 2025-12-04T10:58:28.4160198Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4160246Z Traceback (most recent call last): 2025-12-04T10:58:28.4160401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4160442Z method(*args, **kwargs) 2025-12-04T10:58:28.4160594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4160634Z method(*args, **kwargs) 2025-12-04T10:58:28.4160783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4160820Z with policy(): 2025-12-04T10:58:28.4160972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4161013Z raise RuntimeError(msg) 2025-12-04T10:58:28.4161423Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4161427Z 2025-12-04T10:58:28.4161501Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4161804Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4161806Z 2025-12-04T10:58:28.4161893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4161966Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4162022Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4162199Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4162283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4162319Z graph_break [] 2025-12-04T10:58:28.4162391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4162446Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4162517Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4162693Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4162728Z graph_break [] 2025-12-04T10:58:28.4162780Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4162928Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4162984Z Traceback (most recent call last): 2025-12-04T10:58:28.4163136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4163178Z method(*args, **kwargs) 2025-12-04T10:58:28.4163362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4163402Z method(*args, **kwargs) 2025-12-04T10:58:28.4163553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4163589Z with policy(): 2025-12-04T10:58:28.4163742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4163783Z raise RuntimeError(msg) 2025-12-04T10:58:28.4164184Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4164189Z 2025-12-04T10:58:28.4164262Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4164550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4164552Z 2025-12-04T10:58:28.4164638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4164711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4164767Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4164966Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4165040Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4165076Z graph_break [] 2025-12-04T10:58:28.4165161Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4165217Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4165288Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4165463Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4165498Z graph_break [] 2025-12-04T10:58:28.4165571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4165624Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4165698Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4165872Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4165923Z graph_break [] 2025-12-04T10:58:28.4166171Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f73ae57a3e6da2f5.xml - 2025-12-04T10:58:28.4166231Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4166870Z FAILED [0.6169s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4166889Z 2025-12-04T10:58:28.4166961Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4167251Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4167253Z 2025-12-04T10:58:28.4167339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4167401Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4167467Z ================== 1 failed, 57 deselected, 2 rerun in 4.33s =================== 2025-12-04T10:58:28.4167505Z Got exit code 1 2025-12-04T10:58:28.4167544Z Retrying single test... 2025-12-04T10:58:28.4167745Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-324e1e2cf28a5a86.xml 2025-12-04T10:58:28.4167802Z ============================= test session starts ============================== 2025-12-04T10:58:28.4167913Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4167954Z cachedir: .pytest_cache 2025-12-04T10:58:28.4168114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4168160Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4168200Z configfile: pytest.ini 2025-12-04T10:58:28.4168360Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4168434Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4168732Z stepcurrent: skipping 38 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4168777Z Running 1 items in this shard 2025-12-04T10:58:28.4168779Z 2025-12-04T10:58:28.4169151Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:51.221915542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169153Z 2025-12-04T10:58:28.4169306Z [W1204 10:47:52.479609474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169308Z 2025-12-04T10:58:28.4169460Z [W1204 10:47:52.479739953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169464Z 2025-12-04T10:58:28.4169613Z [W1204 10:47:52.483179194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169628Z 2025-12-04T10:58:28.4169777Z [W1204 10:47:52.483482021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169780Z 2025-12-04T10:58:28.4169928Z [W1204 10:47:52.483543560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4169930Z 2025-12-04T10:58:28.4170078Z [W1204 10:47:52.485655267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4170080Z 2025-12-04T10:58:28.4170230Z [W1204 10:47:52.485923254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4170243Z 2025-12-04T10:58:28.4170394Z [W1204 10:47:52.485984183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4170397Z 2025-12-04T10:58:28.4170446Z ('RERUN', {'yellow': True}) [2.8469s] [100%] 2025-12-04T10:58:28.4170804Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:53.662603893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4170808Z 2025-12-04T10:58:28.4170957Z [W1204 10:47:53.663100338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4170958Z 2025-12-04T10:58:28.4171106Z [W1204 10:47:53.663190527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171109Z 2025-12-04T10:58:28.4171258Z [W1204 10:47:53.664482562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171260Z 2025-12-04T10:58:28.4171409Z [W1204 10:47:53.664761909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171411Z 2025-12-04T10:58:28.4171559Z [W1204 10:47:53.664824628 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171561Z 2025-12-04T10:58:28.4171708Z [W1204 10:47:53.666840166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171710Z 2025-12-04T10:58:28.4171858Z [W1204 10:47:53.667186032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4171861Z 2025-12-04T10:58:28.4172020Z [W1204 10:47:53.667250651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4172022Z 2025-12-04T10:58:28.4172071Z ('RERUN', {'yellow': True}) [0.6727s] [100%] 2025-12-04T10:58:28.4172440Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:47:54.308327671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4172442Z 2025-12-04T10:58:28.4172592Z [W1204 10:47:54.308726687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4172594Z 2025-12-04T10:58:28.4172743Z [W1204 10:47:54.308803136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4172746Z 2025-12-04T10:58:28.4172895Z [W1204 10:47:54.310091022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4172908Z 2025-12-04T10:58:28.4173057Z [W1204 10:47:54.310357329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4173059Z 2025-12-04T10:58:28.4173208Z [W1204 10:47:54.310417318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4173210Z 2025-12-04T10:58:28.4173419Z [W1204 10:47:54.312446185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4173421Z 2025-12-04T10:58:28.4173568Z [W1204 10:47:54.312784452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4173571Z 2025-12-04T10:58:28.4173737Z [W1204 10:47:54.312845991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4173739Z 2025-12-04T10:58:28.4173778Z FAILED [0.6328s] [100%] 2025-12-04T10:58:28.4173780Z 2025-12-04T10:58:28.4173831Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4173982Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4174027Z Traceback (most recent call last): 2025-12-04T10:58:28.4174185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4174225Z method(*args, **kwargs) 2025-12-04T10:58:28.4174378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4174417Z method(*args, **kwargs) 2025-12-04T10:58:28.4174572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4174610Z with policy(): 2025-12-04T10:58:28.4174763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4174804Z raise RuntimeError(msg) 2025-12-04T10:58:28.4175201Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4175204Z 2025-12-04T10:58:28.4175276Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4175579Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4175583Z 2025-12-04T10:58:28.4175672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4175744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4175815Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4175990Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4176064Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4176100Z graph_break [] 2025-12-04T10:58:28.4176247Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4176292Z Traceback (most recent call last): 2025-12-04T10:58:28.4176447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4176486Z method(*args, **kwargs) 2025-12-04T10:58:28.4176653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4176692Z method(*args, **kwargs) 2025-12-04T10:58:28.4176843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4176880Z with policy(): 2025-12-04T10:58:28.4177031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4177072Z raise RuntimeError(msg) 2025-12-04T10:58:28.4177472Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4177494Z 2025-12-04T10:58:28.4177567Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4177855Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4177857Z 2025-12-04T10:58:28.4177944Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4178016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4178073Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4178247Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4178322Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4178359Z graph_break [] 2025-12-04T10:58:28.4178432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4178486Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4178559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4178733Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4178770Z graph_break [] 2025-12-04T10:58:28.4178821Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4178970Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4179015Z Traceback (most recent call last): 2025-12-04T10:58:28.4179183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4179224Z method(*args, **kwargs) 2025-12-04T10:58:28.4179375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4179426Z method(*args, **kwargs) 2025-12-04T10:58:28.4179578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4179614Z with policy(): 2025-12-04T10:58:28.4179767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4179807Z raise RuntimeError(msg) 2025-12-04T10:58:28.4180212Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4180228Z 2025-12-04T10:58:28.4180301Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4180592Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4180594Z 2025-12-04T10:58:28.4180681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4180753Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4180808Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4180983Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4181068Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4181105Z graph_break [] 2025-12-04T10:58:28.4181178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4181231Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4181304Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4181478Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4181513Z graph_break [] 2025-12-04T10:58:28.4181585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4181641Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4181712Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4181888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4181925Z graph_break [] 2025-12-04T10:58:28.4182171Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-324e1e2cf28a5a86.xml - 2025-12-04T10:58:28.4182230Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4182874Z FAILED [0.6328s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4182878Z 2025-12-04T10:58:28.4182952Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4183287Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4183290Z 2025-12-04T10:58:28.4183376Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4183438Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4183505Z ================== 1 failed, 57 deselected, 2 rerun in 4.31s =================== 2025-12-04T10:58:28.4183541Z Got exit code 1 2025-12-04T10:58:28.4183781Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4183914Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4184131Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6d7e6b487f8c231e.xml 2025-12-04T10:58:28.4184189Z ============================= test session starts ============================== 2025-12-04T10:58:28.4184300Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4184341Z cachedir: .pytest_cache 2025-12-04T10:58:28.4184499Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4184545Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4184586Z configfile: pytest.ini 2025-12-04T10:58:28.4184748Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4184838Z collecting ... collected 58 items / 39 deselected / 19 selected 2025-12-04T10:58:28.4184892Z stepcurrent: skipping 39 already run items. 2025-12-04T10:58:28.4184935Z Running 19 items in this shard 2025-12-04T10:58:28.4184937Z 2025-12-04T10:58:28.4185191Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8767s] [ 5%] 2025-12-04T10:58:28.4185438Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4582s] [ 5%] 2025-12-04T10:58:28.4185661Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4643s] [ 5%] 2025-12-04T10:58:28.4185664Z 2025-12-04T10:58:28.4185715Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4185866Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4185911Z Traceback (most recent call last): 2025-12-04T10:58:28.4186070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4186110Z method(*args, **kwargs) 2025-12-04T10:58:28.4186265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4186304Z method(*args, **kwargs) 2025-12-04T10:58:28.4186456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4186492Z with policy(): 2025-12-04T10:58:28.4186664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4186705Z raise RuntimeError(msg) 2025-12-04T10:58:28.4187117Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4187119Z 2025-12-04T10:58:28.4187194Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4187483Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4187485Z 2025-12-04T10:58:28.4187571Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4187646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4187702Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4187992Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4188066Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4188101Z graph_break [] 2025-12-04T10:58:28.4188252Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4188297Z Traceback (most recent call last): 2025-12-04T10:58:28.4188452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4188503Z method(*args, **kwargs) 2025-12-04T10:58:28.4188664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4188704Z method(*args, **kwargs) 2025-12-04T10:58:28.4188854Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4188891Z with policy(): 2025-12-04T10:58:28.4189047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4189087Z raise RuntimeError(msg) 2025-12-04T10:58:28.4189489Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4189493Z 2025-12-04T10:58:28.4189567Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4189860Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4189863Z 2025-12-04T10:58:28.4189951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4190023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4190080Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4190355Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4190443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4190480Z graph_break [] 2025-12-04T10:58:28.4190554Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4190609Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4190686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4190968Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4191005Z graph_break [] 2025-12-04T10:58:28.4191058Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4191206Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4191252Z Traceback (most recent call last): 2025-12-04T10:58:28.4191407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4191465Z method(*args, **kwargs) 2025-12-04T10:58:28.4191615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4191655Z method(*args, **kwargs) 2025-12-04T10:58:28.4191805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4191842Z with policy(): 2025-12-04T10:58:28.4191993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4192034Z raise RuntimeError(msg) 2025-12-04T10:58:28.4192434Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4192447Z 2025-12-04T10:58:28.4192521Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4192809Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4192811Z 2025-12-04T10:58:28.4192898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4192971Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4193027Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4193336Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4193410Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4193447Z graph_break [] 2025-12-04T10:58:28.4193519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4193574Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4193647Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4193918Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4193954Z graph_break [] 2025-12-04T10:58:28.4194026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4194096Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4194171Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4194454Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4194491Z graph_break [] 2025-12-04T10:58:28.4194734Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-6d7e6b487f8c231e.xml - 2025-12-04T10:58:28.4194794Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4195433Z FAILED [0.4643s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4195449Z 2025-12-04T10:58:28.4195522Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4195810Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4195812Z 2025-12-04T10:58:28.4195897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4195960Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4196025Z ================== 1 failed, 39 deselected, 2 rerun in 3.96s =================== 2025-12-04T10:58:28.4196076Z Got exit code 1 2025-12-04T10:58:28.4196116Z Retrying single test... 2025-12-04T10:58:28.4196316Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50f799d1a623082c.xml 2025-12-04T10:58:28.4196373Z ============================= test session starts ============================== 2025-12-04T10:58:28.4196485Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4196525Z cachedir: .pytest_cache 2025-12-04T10:58:28.4196685Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4196730Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4196771Z configfile: pytest.ini 2025-12-04T10:58:28.4196931Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4197006Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4197290Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4197335Z Running 1 items in this shard 2025-12-04T10:58:28.4197337Z 2025-12-04T10:58:28.4197701Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:14.818102122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4197704Z 2025-12-04T10:58:28.4197858Z [W1204 10:48:14.096944474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4197860Z 2025-12-04T10:58:28.4198025Z [W1204 10:48:14.097108722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198028Z 2025-12-04T10:58:28.4198177Z [W1204 10:48:14.100976579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198179Z 2025-12-04T10:58:28.4198338Z [W1204 10:48:14.101291725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198340Z 2025-12-04T10:58:28.4198491Z [W1204 10:48:14.101355395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198493Z 2025-12-04T10:58:28.4198640Z [W1204 10:48:14.103518700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198642Z 2025-12-04T10:58:28.4198791Z [W1204 10:48:14.103788707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198794Z 2025-12-04T10:58:28.4198954Z [W1204 10:48:14.103848977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4198955Z 2025-12-04T10:58:28.4199005Z ('RERUN', {'yellow': True}) [3.2596s] [100%] 2025-12-04T10:58:28.4199363Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:15.871969940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4199367Z 2025-12-04T10:58:28.4199516Z [W1204 10:48:15.872390435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4199518Z 2025-12-04T10:58:28.4199667Z [W1204 10:48:15.872463104 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4199681Z 2025-12-04T10:58:28.4199834Z [W1204 10:48:15.873719420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4199836Z 2025-12-04T10:58:28.4199986Z [W1204 10:48:15.873974087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4199988Z 2025-12-04T10:58:28.4200135Z [W1204 10:48:15.874041726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4200137Z 2025-12-04T10:58:28.4200285Z [W1204 10:48:15.876070894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4200287Z 2025-12-04T10:58:28.4200436Z [W1204 10:48:15.876331271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4200439Z 2025-12-04T10:58:28.4200586Z [W1204 10:48:15.876390830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4200589Z 2025-12-04T10:58:28.4200636Z ('RERUN', {'yellow': True}) [0.6315s] [100%] 2025-12-04T10:58:28.4200992Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:16.507882799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4200994Z 2025-12-04T10:58:28.4201143Z [W1204 10:48:16.508276165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201145Z 2025-12-04T10:58:28.4201305Z [W1204 10:48:16.508354084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201309Z 2025-12-04T10:58:28.4201457Z [W1204 10:48:16.509636140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201460Z 2025-12-04T10:58:28.4201624Z [W1204 10:48:16.509895417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201626Z 2025-12-04T10:58:28.4201775Z [W1204 10:48:16.509955326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201777Z 2025-12-04T10:58:28.4201927Z [W1204 10:48:16.511983303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4201929Z 2025-12-04T10:58:28.4202077Z [W1204 10:48:16.512251030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4202080Z 2025-12-04T10:58:28.4202233Z [W1204 10:48:16.512313070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4202246Z 2025-12-04T10:58:28.4202284Z FAILED [0.6274s] [100%] 2025-12-04T10:58:28.4202287Z 2025-12-04T10:58:28.4202338Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4202488Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4202534Z Traceback (most recent call last): 2025-12-04T10:58:28.4202691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4202731Z method(*args, **kwargs) 2025-12-04T10:58:28.4202886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4202936Z method(*args, **kwargs) 2025-12-04T10:58:28.4203086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4203123Z with policy(): 2025-12-04T10:58:28.4203309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4203350Z raise RuntimeError(msg) 2025-12-04T10:58:28.4203745Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4203747Z 2025-12-04T10:58:28.4203820Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4204114Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4204117Z 2025-12-04T10:58:28.4204203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4204277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4204332Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4204606Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4204680Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4204715Z graph_break [] 2025-12-04T10:58:28.4204882Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4204927Z Traceback (most recent call last): 2025-12-04T10:58:28.4205082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4205121Z method(*args, **kwargs) 2025-12-04T10:58:28.4205291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4205331Z method(*args, **kwargs) 2025-12-04T10:58:28.4205484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4205520Z with policy(): 2025-12-04T10:58:28.4205673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4205713Z raise RuntimeError(msg) 2025-12-04T10:58:28.4206113Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4206128Z 2025-12-04T10:58:28.4206203Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4206493Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4206495Z 2025-12-04T10:58:28.4206581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4206654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4206711Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4206996Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4207073Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4207109Z graph_break [] 2025-12-04T10:58:28.4207183Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4207237Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4207309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4207578Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4207615Z graph_break [] 2025-12-04T10:58:28.4207667Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4207818Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4207862Z Traceback (most recent call last): 2025-12-04T10:58:28.4208017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4208057Z method(*args, **kwargs) 2025-12-04T10:58:28.4208208Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4208247Z method(*args, **kwargs) 2025-12-04T10:58:28.4208398Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4208434Z with policy(): 2025-12-04T10:58:28.4208598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4208639Z raise RuntimeError(msg) 2025-12-04T10:58:28.4209053Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4209056Z 2025-12-04T10:58:28.4209129Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4209418Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4209420Z 2025-12-04T10:58:28.4209506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4209584Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4209652Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4209925Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4209999Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4210034Z graph_break [] 2025-12-04T10:58:28.4210107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4210161Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4210233Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4210502Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4210551Z graph_break [] 2025-12-04T10:58:28.4210624Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4210678Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4210749Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4211018Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4211054Z graph_break [] 2025-12-04T10:58:28.4211299Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-50f799d1a623082c.xml - 2025-12-04T10:58:28.4211361Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4211994Z FAILED [0.6274s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4211997Z 2025-12-04T10:58:28.4212070Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4212356Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4212372Z 2025-12-04T10:58:28.4212459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4212521Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4212590Z ================== 1 failed, 57 deselected, 2 rerun in 4.68s =================== 2025-12-04T10:58:28.4212626Z Got exit code 1 2025-12-04T10:58:28.4212678Z Retrying single test... 2025-12-04T10:58:28.4212877Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f9718ae754c2176.xml 2025-12-04T10:58:28.4212933Z ============================= test session starts ============================== 2025-12-04T10:58:28.4213044Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4213084Z cachedir: .pytest_cache 2025-12-04T10:58:28.4213243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4213322Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4213378Z configfile: pytest.ini 2025-12-04T10:58:28.4213542Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4213617Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4213900Z stepcurrent: skipping 39 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4213944Z Running 1 items in this shard 2025-12-04T10:58:28.4213945Z 2025-12-04T10:58:28.4214308Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:26.615877401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4214324Z 2025-12-04T10:58:28.4214479Z [W1204 10:48:26.876129686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4214482Z 2025-12-04T10:58:28.4214633Z [W1204 10:48:26.876293844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4214636Z 2025-12-04T10:58:28.4214785Z [W1204 10:48:26.879605887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4214787Z 2025-12-04T10:58:28.4214936Z [W1204 10:48:26.879937723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4214938Z 2025-12-04T10:58:28.4215087Z [W1204 10:48:26.880005042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4215089Z 2025-12-04T10:58:28.4215238Z [W1204 10:48:26.882317716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4215241Z 2025-12-04T10:58:28.4215389Z [W1204 10:48:26.882588853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4215391Z 2025-12-04T10:58:28.4215539Z [W1204 10:48:26.882649272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4215541Z 2025-12-04T10:58:28.4215590Z ('RERUN', {'yellow': True}) [3.2535s] [100%] 2025-12-04T10:58:28.4215962Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:27.698309280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4215965Z 2025-12-04T10:58:28.4216116Z [W1204 10:48:27.698699435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216119Z 2025-12-04T10:58:28.4216280Z [W1204 10:48:27.698772585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216282Z 2025-12-04T10:58:28.4216433Z [W1204 10:48:27.700061390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216435Z 2025-12-04T10:58:28.4216583Z [W1204 10:48:27.700322607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216586Z 2025-12-04T10:58:28.4216734Z [W1204 10:48:27.700382836 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216737Z 2025-12-04T10:58:28.4216886Z [W1204 10:48:27.702471643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4216900Z 2025-12-04T10:58:28.4217049Z [W1204 10:48:27.702735470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4217051Z 2025-12-04T10:58:28.4217199Z [W1204 10:48:27.702796569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4217201Z 2025-12-04T10:58:28.4217248Z ('RERUN', {'yellow': True}) [0.7049s] [100%] 2025-12-04T10:58:28.4217606Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:48:28.391546741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4217619Z 2025-12-04T10:58:28.4217769Z [W1204 10:48:28.391916536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4217772Z 2025-12-04T10:58:28.4217920Z [W1204 10:48:28.391988296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4217922Z 2025-12-04T10:58:28.4218071Z [W1204 10:48:28.393293991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218073Z 2025-12-04T10:58:28.4218221Z [W1204 10:48:28.393549328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218222Z 2025-12-04T10:58:28.4218371Z [W1204 10:48:28.393608217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218375Z 2025-12-04T10:58:28.4218524Z [W1204 10:48:28.395691364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218528Z 2025-12-04T10:58:28.4218677Z [W1204 10:48:28.395950341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218679Z 2025-12-04T10:58:28.4218828Z [W1204 10:48:28.396012850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4218830Z 2025-12-04T10:58:28.4218868Z FAILED [0.6695s] [100%] 2025-12-04T10:58:28.4218870Z 2025-12-04T10:58:28.4218922Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4219072Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4219119Z Traceback (most recent call last): 2025-12-04T10:58:28.4219294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4219337Z method(*args, **kwargs) 2025-12-04T10:58:28.4219489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4219542Z method(*args, **kwargs) 2025-12-04T10:58:28.4219694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4219731Z with policy(): 2025-12-04T10:58:28.4219883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4219925Z raise RuntimeError(msg) 2025-12-04T10:58:28.4220322Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4220338Z 2025-12-04T10:58:28.4220411Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4220704Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4220706Z 2025-12-04T10:58:28.4220791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4220865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4220920Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4221196Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4221283Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4221320Z graph_break [] 2025-12-04T10:58:28.4221470Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4221516Z Traceback (most recent call last): 2025-12-04T10:58:28.4221669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4221710Z method(*args, **kwargs) 2025-12-04T10:58:28.4221861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4221901Z method(*args, **kwargs) 2025-12-04T10:58:28.4222052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4222089Z with policy(): 2025-12-04T10:58:28.4222242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4222284Z raise RuntimeError(msg) 2025-12-04T10:58:28.4222683Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4222687Z 2025-12-04T10:58:28.4222759Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4223059Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4223062Z 2025-12-04T10:58:28.4223149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4223224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4223304Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4223595Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4223669Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4223705Z graph_break [] 2025-12-04T10:58:28.4223777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4223831Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4223904Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4224176Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4224227Z graph_break [] 2025-12-04T10:58:28.4224280Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4224430Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4224475Z Traceback (most recent call last): 2025-12-04T10:58:28.4224628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4224668Z method(*args, **kwargs) 2025-12-04T10:58:28.4224820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4224874Z method(*args, **kwargs) 2025-12-04T10:58:28.4225024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4225061Z with policy(): 2025-12-04T10:58:28.4225214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4225255Z raise RuntimeError(msg) 2025-12-04T10:58:28.4225655Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4225657Z 2025-12-04T10:58:28.4225729Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4226021Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4226024Z 2025-12-04T10:58:28.4226110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4226184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4226239Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4226513Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4226585Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4226621Z graph_break [] 2025-12-04T10:58:28.4226710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4226766Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4226838Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4227120Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4227156Z graph_break [] 2025-12-04T10:58:28.4227229Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4227284Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4227354Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4227623Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4227660Z graph_break [] 2025-12-04T10:58:28.4227915Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1f9718ae754c2176.xml - 2025-12-04T10:58:28.4227975Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4228602Z FAILED [0.6695s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4228616Z 2025-12-04T10:58:28.4228689Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4228981Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4228984Z 2025-12-04T10:58:28.4229071Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4229132Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4229199Z ================== 1 failed, 57 deselected, 2 rerun in 4.79s =================== 2025-12-04T10:58:28.4229236Z Got exit code 1 2025-12-04T10:58:28.4229474Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4229603Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4229801Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-78d9343e5e8a8868.xml 2025-12-04T10:58:28.4229858Z ============================= test session starts ============================== 2025-12-04T10:58:28.4229969Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4230009Z cachedir: .pytest_cache 2025-12-04T10:58:28.4230167Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4230212Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4230252Z configfile: pytest.ini 2025-12-04T10:58:28.4230410Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4230496Z collecting ... collected 58 items / 40 deselected / 18 selected 2025-12-04T10:58:28.4230549Z stepcurrent: skipping 40 already run items. 2025-12-04T10:58:28.4230594Z Running 18 items in this shard 2025-12-04T10:58:28.4230596Z 2025-12-04T10:58:28.4230862Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.7044s] [ 5%] 2025-12-04T10:58:28.4231110Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.7241s] [ 5%] 2025-12-04T10:58:28.4231337Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.7511s] [ 5%] 2025-12-04T10:58:28.4231339Z 2025-12-04T10:58:28.4231391Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4231544Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4231602Z Traceback (most recent call last): 2025-12-04T10:58:28.4231759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4231799Z method(*args, **kwargs) 2025-12-04T10:58:28.4231952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4231991Z method(*args, **kwargs) 2025-12-04T10:58:28.4232142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4232178Z with policy(): 2025-12-04T10:58:28.4232331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4232390Z raise RuntimeError(msg) 2025-12-04T10:58:28.4232793Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4232796Z 2025-12-04T10:58:28.4232869Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4233161Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4233164Z 2025-12-04T10:58:28.4233277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4233351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4233409Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4233586Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4233662Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4233698Z graph_break [] 2025-12-04T10:58:28.4233854Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4233899Z Traceback (most recent call last): 2025-12-04T10:58:28.4234052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4234090Z method(*args, **kwargs) 2025-12-04T10:58:28.4234241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4234304Z method(*args, **kwargs) 2025-12-04T10:58:28.4234455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4234493Z with policy(): 2025-12-04T10:58:28.4234660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4234700Z raise RuntimeError(msg) 2025-12-04T10:58:28.4235113Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4235115Z 2025-12-04T10:58:28.4235187Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4235485Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4235500Z 2025-12-04T10:58:28.4235587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4235661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4235717Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4235893Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4235966Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4236001Z graph_break [] 2025-12-04T10:58:28.4236073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4236144Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4236216Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4236392Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4236428Z graph_break [] 2025-12-04T10:58:28.4236480Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4236633Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4236677Z Traceback (most recent call last): 2025-12-04T10:58:28.4236830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4236869Z method(*args, **kwargs) 2025-12-04T10:58:28.4237022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4237061Z method(*args, **kwargs) 2025-12-04T10:58:28.4237213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4237248Z with policy(): 2025-12-04T10:58:28.4237402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4237442Z raise RuntimeError(msg) 2025-12-04T10:58:28.4237852Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4237855Z 2025-12-04T10:58:28.4237928Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4238232Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4238235Z 2025-12-04T10:58:28.4238334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4238407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4238462Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4238637Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4238710Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4238746Z graph_break [] 2025-12-04T10:58:28.4238820Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4238875Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4238946Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4239132Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4239170Z graph_break [] 2025-12-04T10:58:28.4239241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4239296Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4239367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4239543Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4239578Z graph_break [] 2025-12-04T10:58:28.4239832Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-78d9343e5e8a8868.xml - 2025-12-04T10:58:28.4239893Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4240540Z FAILED [0.7511s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4240543Z 2025-12-04T10:58:28.4240615Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4240908Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4240912Z 2025-12-04T10:58:28.4240998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4241061Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4241127Z ================== 1 failed, 40 deselected, 2 rerun in 4.34s =================== 2025-12-04T10:58:28.4241164Z Got exit code 1 2025-12-04T10:58:28.4241204Z Retrying single test... 2025-12-04T10:58:28.4241401Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0646415a875cfaf3.xml 2025-12-04T10:58:28.4241458Z ============================= test session starts ============================== 2025-12-04T10:58:28.4241579Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4241623Z cachedir: .pytest_cache 2025-12-04T10:58:28.4241781Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4241829Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4241868Z configfile: pytest.ini 2025-12-04T10:58:28.4242040Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4242114Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4242404Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4242448Z Running 1 items in this shard 2025-12-04T10:58:28.4242450Z 2025-12-04T10:58:28.4242817Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:48:48.126443539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4242832Z 2025-12-04T10:58:28.4242988Z [W1204 10:48:49.390140034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4242990Z 2025-12-04T10:58:28.4243141Z [W1204 10:48:49.390264303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243143Z 2025-12-04T10:58:28.4243321Z [W1204 10:48:49.393556235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243323Z 2025-12-04T10:58:28.4243476Z [W1204 10:48:49.393863211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243493Z 2025-12-04T10:58:28.4243642Z [W1204 10:48:49.393923831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243644Z 2025-12-04T10:58:28.4243794Z [W1204 10:48:49.396178695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243796Z 2025-12-04T10:58:28.4243944Z [W1204 10:48:49.396457952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4243945Z 2025-12-04T10:58:28.4244094Z [W1204 10:48:49.396517381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4244096Z 2025-12-04T10:58:28.4244144Z ('RERUN', {'yellow': True}) [2.9525s] [100%] 2025-12-04T10:58:28.4244508Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:48:50.612976470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4244511Z 2025-12-04T10:58:28.4244662Z [W1204 10:48:50.613354245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4244664Z 2025-12-04T10:58:28.4244812Z [W1204 10:48:50.613420045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4244814Z 2025-12-04T10:58:28.4244962Z [W1204 10:48:50.614685820 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4244964Z 2025-12-04T10:58:28.4245126Z [W1204 10:48:50.614946937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4245128Z 2025-12-04T10:58:28.4245277Z [W1204 10:48:50.615049646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4245279Z 2025-12-04T10:58:28.4245443Z [W1204 10:48:50.617101992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4245445Z 2025-12-04T10:58:28.4245597Z [W1204 10:48:50.617442259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4245599Z 2025-12-04T10:58:28.4245748Z [W1204 10:48:50.617505158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4245750Z 2025-12-04T10:58:28.4245797Z ('RERUN', {'yellow': True}) [0.7204s] [100%] 2025-12-04T10:58:28.4246158Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:48:51.337777598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246186Z 2025-12-04T10:58:28.4246336Z [W1204 10:48:51.338154364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246339Z 2025-12-04T10:58:28.4246486Z [W1204 10:48:51.338221473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246488Z 2025-12-04T10:58:28.4246637Z [W1204 10:48:51.339478508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246639Z 2025-12-04T10:58:28.4246786Z [W1204 10:48:51.339730776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246801Z 2025-12-04T10:58:28.4246951Z [W1204 10:48:51.339788825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4246954Z 2025-12-04T10:58:28.4247102Z [W1204 10:48:51.341821112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4247104Z 2025-12-04T10:58:28.4247253Z [W1204 10:48:51.342166138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4247255Z 2025-12-04T10:58:28.4247504Z [W1204 10:48:51.342230007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4247506Z 2025-12-04T10:58:28.4247544Z FAILED [0.7518s] [100%] 2025-12-04T10:58:28.4247546Z 2025-12-04T10:58:28.4247599Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4247754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4247801Z Traceback (most recent call last): 2025-12-04T10:58:28.4247957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4247999Z method(*args, **kwargs) 2025-12-04T10:58:28.4248151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4248191Z method(*args, **kwargs) 2025-12-04T10:58:28.4248342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4248379Z with policy(): 2025-12-04T10:58:28.4248531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4248587Z raise RuntimeError(msg) 2025-12-04T10:58:28.4248993Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4249020Z 2025-12-04T10:58:28.4249094Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4249387Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4249389Z 2025-12-04T10:58:28.4249476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4249550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4249607Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4249784Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4249870Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4249908Z graph_break [] 2025-12-04T10:58:28.4250059Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4250105Z Traceback (most recent call last): 2025-12-04T10:58:28.4250258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4250297Z method(*args, **kwargs) 2025-12-04T10:58:28.4250448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4250501Z method(*args, **kwargs) 2025-12-04T10:58:28.4250650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4250688Z with policy(): 2025-12-04T10:58:28.4250841Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4250883Z raise RuntimeError(msg) 2025-12-04T10:58:28.4251296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4251299Z 2025-12-04T10:58:28.4251372Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4251667Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4251670Z 2025-12-04T10:58:28.4251757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4251832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4251888Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4252063Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4252136Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4252173Z graph_break [] 2025-12-04T10:58:28.4252245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4252315Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4252387Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4252563Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4252599Z graph_break [] 2025-12-04T10:58:28.4252663Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4252815Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4252861Z Traceback (most recent call last): 2025-12-04T10:58:28.4253014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4253054Z method(*args, **kwargs) 2025-12-04T10:58:28.4253205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4253246Z method(*args, **kwargs) 2025-12-04T10:58:28.4253424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4253477Z with policy(): 2025-12-04T10:58:28.4253630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4253672Z raise RuntimeError(msg) 2025-12-04T10:58:28.4254084Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4254087Z 2025-12-04T10:58:28.4254160Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4254468Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4254471Z 2025-12-04T10:58:28.4254557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4254631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4254686Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4254861Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4254933Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4254969Z graph_break [] 2025-12-04T10:58:28.4255042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4255099Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4255170Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4255346Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4255382Z graph_break [] 2025-12-04T10:58:28.4255455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4255509Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4255580Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4255754Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4255790Z graph_break [] 2025-12-04T10:58:28.4256047Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0646415a875cfaf3.xml - 2025-12-04T10:58:28.4256109Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4256774Z FAILED [0.7518s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4256776Z 2025-12-04T10:58:28.4256848Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4257138Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4257152Z 2025-12-04T10:58:28.4257238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4257301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4257368Z ================== 1 failed, 57 deselected, 2 rerun in 4.59s =================== 2025-12-04T10:58:28.4257406Z Got exit code 1 2025-12-04T10:58:28.4257445Z Retrying single test... 2025-12-04T10:58:28.4257643Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82ac466fa0e931af.xml 2025-12-04T10:58:28.4257699Z ============================= test session starts ============================== 2025-12-04T10:58:28.4257809Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4257864Z cachedir: .pytest_cache 2025-12-04T10:58:28.4258023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4258069Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4258110Z configfile: pytest.ini 2025-12-04T10:58:28.4258271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4258344Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4258633Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4258677Z Running 1 items in this shard 2025-12-04T10:58:28.4258679Z 2025-12-04T10:58:28.4259047Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:49:00.807484278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259050Z 2025-12-04T10:58:28.4259204Z [W1204 10:49:00.081369341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259206Z 2025-12-04T10:58:28.4259359Z [W1204 10:49:00.081499010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259361Z 2025-12-04T10:58:28.4259510Z [W1204 10:49:00.085159988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259513Z 2025-12-04T10:58:28.4259662Z [W1204 10:49:00.085464024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259677Z 2025-12-04T10:58:28.4259826Z [W1204 10:49:00.085525043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259829Z 2025-12-04T10:58:28.4259996Z [W1204 10:49:00.087785797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4259998Z 2025-12-04T10:58:28.4260148Z [W1204 10:49:00.088063874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4260150Z 2025-12-04T10:58:28.4260297Z [W1204 10:49:00.088124384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4260299Z 2025-12-04T10:58:28.4260348Z ('RERUN', {'yellow': True}) [2.9781s] [100%] 2025-12-04T10:58:28.4260712Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:49:02.269973430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4260727Z 2025-12-04T10:58:28.4260877Z [W1204 10:49:02.270358866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4260879Z 2025-12-04T10:58:28.4261027Z [W1204 10:49:02.270426555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261029Z 2025-12-04T10:58:28.4261177Z [W1204 10:49:02.271705330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261179Z 2025-12-04T10:58:28.4261327Z [W1204 10:49:02.271966407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261342Z 2025-12-04T10:58:28.4261490Z [W1204 10:49:02.272032117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261494Z 2025-12-04T10:58:28.4261641Z [W1204 10:49:02.274128783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261644Z 2025-12-04T10:58:28.4261792Z [W1204 10:49:02.274476588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261794Z 2025-12-04T10:58:28.4261943Z [W1204 10:49:02.274540108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4261944Z 2025-12-04T10:58:28.4261993Z ('RERUN', {'yellow': True}) [0.7246s] [100%] 2025-12-04T10:58:28.4262355Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:49:02.020224743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4262359Z 2025-12-04T10:58:28.4262509Z [W1204 10:49:02.020611749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4262511Z 2025-12-04T10:58:28.4262660Z [W1204 10:49:02.020676128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4262662Z 2025-12-04T10:58:28.4262809Z [W1204 10:49:02.021939193 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4262811Z 2025-12-04T10:58:28.4262959Z [W1204 10:49:02.022200080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4262962Z 2025-12-04T10:58:28.4263120Z [W1204 10:49:02.022266050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4263123Z 2025-12-04T10:58:28.4263308Z [W1204 10:49:02.024284496 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4263324Z 2025-12-04T10:58:28.4263473Z [W1204 10:49:02.024621273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4263477Z 2025-12-04T10:58:28.4263625Z [W1204 10:49:02.024683582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4263627Z 2025-12-04T10:58:28.4263666Z FAILED [0.7226s] [100%] 2025-12-04T10:58:28.4263668Z 2025-12-04T10:58:28.4263718Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4263874Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4263934Z Traceback (most recent call last): 2025-12-04T10:58:28.4264091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4264131Z method(*args, **kwargs) 2025-12-04T10:58:28.4264287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4264326Z method(*args, **kwargs) 2025-12-04T10:58:28.4264479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4264516Z with policy(): 2025-12-04T10:58:28.4264670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4264729Z raise RuntimeError(msg) 2025-12-04T10:58:28.4265134Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4265137Z 2025-12-04T10:58:28.4265212Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4265503Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4265505Z 2025-12-04T10:58:28.4265592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4265665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4265724Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4265900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4265975Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4266011Z graph_break [] 2025-12-04T10:58:28.4266166Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4266211Z Traceback (most recent call last): 2025-12-04T10:58:28.4266364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4266403Z method(*args, **kwargs) 2025-12-04T10:58:28.4266555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4266595Z method(*args, **kwargs) 2025-12-04T10:58:28.4266762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4266800Z with policy(): 2025-12-04T10:58:28.4266954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4267006Z raise RuntimeError(msg) 2025-12-04T10:58:28.4267421Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4267423Z 2025-12-04T10:58:28.4267496Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4267789Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4267806Z 2025-12-04T10:58:28.4267893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4267967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4268023Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4268199Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4268271Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4268307Z graph_break [] 2025-12-04T10:58:28.4268380Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4268434Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4268519Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4268693Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4268731Z graph_break [] 2025-12-04T10:58:28.4268783Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4268936Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4268980Z Traceback (most recent call last): 2025-12-04T10:58:28.4269134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4269173Z method(*args, **kwargs) 2025-12-04T10:58:28.4269324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4269365Z method(*args, **kwargs) 2025-12-04T10:58:28.4269516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4269553Z with policy(): 2025-12-04T10:58:28.4269707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4269748Z raise RuntimeError(msg) 2025-12-04T10:58:28.4270162Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4270164Z 2025-12-04T10:58:28.4270238Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4270543Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4270546Z 2025-12-04T10:58:28.4270634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4270718Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4270774Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4270949Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4271021Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4271057Z graph_break [] 2025-12-04T10:58:28.4271130Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4271185Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4271257Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4271444Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4271480Z graph_break [] 2025-12-04T10:58:28.4271553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4271608Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4271678Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4271852Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4271888Z graph_break [] 2025-12-04T10:58:28.4272131Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-82ac466fa0e931af.xml - 2025-12-04T10:58:28.4272201Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4272851Z FAILED [0.7226s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4272853Z 2025-12-04T10:58:28.4272926Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4273217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4273220Z 2025-12-04T10:58:28.4273339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4273400Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4273468Z ================== 1 failed, 57 deselected, 2 rerun in 4.59s =================== 2025-12-04T10:58:28.4273505Z Got exit code 1 2025-12-04T10:58:28.4273748Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4273875Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4274094Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dd873762bbafcb62.xml 2025-12-04T10:58:28.4274153Z ============================= test session starts ============================== 2025-12-04T10:58:28.4274264Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4274306Z cachedir: .pytest_cache 2025-12-04T10:58:28.4274477Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4274524Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4274564Z configfile: pytest.ini 2025-12-04T10:58:28.4274723Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4274797Z collecting ... collected 58 items / 41 deselected / 17 selected 2025-12-04T10:58:28.4274850Z stepcurrent: skipping 41 already run items. 2025-12-04T10:58:28.4274893Z Running 17 items in this shard 2025-12-04T10:58:28.4274896Z 2025-12-04T10:58:28.4275152Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5107s] [ 5%] 2025-12-04T10:58:28.4275413Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4574s] [ 5%] 2025-12-04T10:58:28.4275636Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4546s] [ 5%] 2025-12-04T10:58:28.4275638Z 2025-12-04T10:58:28.4275690Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4275841Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4275901Z Traceback (most recent call last): 2025-12-04T10:58:28.4276058Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4276099Z method(*args, **kwargs) 2025-12-04T10:58:28.4276251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4276291Z method(*args, **kwargs) 2025-12-04T10:58:28.4276443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4276480Z with policy(): 2025-12-04T10:58:28.4276633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4276674Z raise RuntimeError(msg) 2025-12-04T10:58:28.4277073Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4277076Z 2025-12-04T10:58:28.4277150Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4277442Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4277444Z 2025-12-04T10:58:28.4277532Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4277605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4277662Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4277850Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4277925Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4277962Z graph_break [] 2025-12-04T10:58:28.4278114Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4278170Z Traceback (most recent call last): 2025-12-04T10:58:28.4278324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4278363Z method(*args, **kwargs) 2025-12-04T10:58:28.4278514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4278553Z method(*args, **kwargs) 2025-12-04T10:58:28.4278704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4278742Z with policy(): 2025-12-04T10:58:28.4278893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4278945Z raise RuntimeError(msg) 2025-12-04T10:58:28.4279351Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4279353Z 2025-12-04T10:58:28.4279426Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4279715Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4279729Z 2025-12-04T10:58:28.4279817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4279891Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4279947Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4280124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4280197Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4280232Z graph_break [] 2025-12-04T10:58:28.4280305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4280359Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4280430Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4280606Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4280644Z graph_break [] 2025-12-04T10:58:28.4280696Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4280847Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4280892Z Traceback (most recent call last): 2025-12-04T10:58:28.4281045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4281085Z method(*args, **kwargs) 2025-12-04T10:58:28.4281236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4281278Z method(*args, **kwargs) 2025-12-04T10:58:28.4281439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4281478Z with policy(): 2025-12-04T10:58:28.4281629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4281671Z raise RuntimeError(msg) 2025-12-04T10:58:28.4282091Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4282093Z 2025-12-04T10:58:28.4282167Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4282462Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4282465Z 2025-12-04T10:58:28.4282552Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4282638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4282693Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4282868Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4282942Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4282978Z graph_break [] 2025-12-04T10:58:28.4283050Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4283104Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4283175Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4283420Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4283456Z graph_break [] 2025-12-04T10:58:28.4283529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4284959Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4285036Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4285210Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4285246Z graph_break [] 2025-12-04T10:58:28.4285492Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-dd873762bbafcb62.xml - 2025-12-04T10:58:28.4285554Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4286197Z FAILED [0.4546s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4286201Z 2025-12-04T10:58:28.4286274Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4286562Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4286565Z 2025-12-04T10:58:28.4286671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4286735Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4286803Z ================== 1 failed, 41 deselected, 2 rerun in 3.59s =================== 2025-12-04T10:58:28.4286841Z Got exit code 1 2025-12-04T10:58:28.4286881Z Retrying single test... 2025-12-04T10:58:28.4287096Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d3726d1926a081ed.xml 2025-12-04T10:58:28.4287154Z ============================= test session starts ============================== 2025-12-04T10:58:28.4287267Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4287307Z cachedir: .pytest_cache 2025-12-04T10:58:28.4287467Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4287514Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4287556Z configfile: pytest.ini 2025-12-04T10:58:28.4287732Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4287807Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4288094Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4288139Z Running 1 items in this shard 2025-12-04T10:58:28.4288141Z 2025-12-04T10:58:28.4288509Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:22.494500469 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4288535Z 2025-12-04T10:58:28.4288692Z [W1204 10:49:22.772900941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4288695Z 2025-12-04T10:58:28.4288848Z [W1204 10:49:22.773038029 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4288850Z 2025-12-04T10:58:28.4288999Z [W1204 10:49:22.776978643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289001Z 2025-12-04T10:58:28.4289151Z [W1204 10:49:22.777302250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289153Z 2025-12-04T10:58:28.4289300Z [W1204 10:49:22.777366979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289304Z 2025-12-04T10:58:28.4289452Z [W1204 10:49:22.779626173 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289455Z 2025-12-04T10:58:28.4289604Z [W1204 10:49:22.779894739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289607Z 2025-12-04T10:58:28.4289754Z [W1204 10:49:22.779954449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4289756Z 2025-12-04T10:58:28.4289805Z ('RERUN', {'yellow': True}) [2.8895s] [100%] 2025-12-04T10:58:28.4290170Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:23.932027897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290185Z 2025-12-04T10:58:28.4290335Z [W1204 10:49:23.932421103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290338Z 2025-12-04T10:58:28.4290485Z [W1204 10:49:23.932485972 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290498Z 2025-12-04T10:58:28.4290647Z [W1204 10:49:23.933837626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290649Z 2025-12-04T10:58:28.4290799Z [W1204 10:49:23.934111723 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290801Z 2025-12-04T10:58:28.4290948Z [W1204 10:49:23.934177642 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4290951Z 2025-12-04T10:58:28.4291101Z [W1204 10:49:23.936223749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4291114Z 2025-12-04T10:58:28.4291263Z [W1204 10:49:23.936572295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4291264Z 2025-12-04T10:58:28.4291414Z [W1204 10:49:23.936635504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4291415Z 2025-12-04T10:58:28.4291463Z ('RERUN', {'yellow': True}) [0.6668s] [100%] 2025-12-04T10:58:28.4291821Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:24.639692057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4291834Z 2025-12-04T10:58:28.4291985Z [W1204 10:49:24.640079322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4291987Z 2025-12-04T10:58:28.4292134Z [W1204 10:49:24.640148672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292137Z 2025-12-04T10:58:28.4292286Z [W1204 10:49:24.641421067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292288Z 2025-12-04T10:58:28.4292435Z [W1204 10:49:24.641681534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292438Z 2025-12-04T10:58:28.4292584Z [W1204 10:49:24.641742803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292587Z 2025-12-04T10:58:28.4292737Z [W1204 10:49:24.643760220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292740Z 2025-12-04T10:58:28.4292889Z [W1204 10:49:24.644145705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4292891Z 2025-12-04T10:58:28.4293040Z [W1204 10:49:24.644212994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4293042Z 2025-12-04T10:58:28.4293080Z FAILED [0.6887s] [100%] 2025-12-04T10:58:28.4293082Z 2025-12-04T10:58:28.4293135Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4293321Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4293367Z Traceback (most recent call last): 2025-12-04T10:58:28.4293540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4293582Z method(*args, **kwargs) 2025-12-04T10:58:28.4293735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4293775Z method(*args, **kwargs) 2025-12-04T10:58:28.4293939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4293976Z with policy(): 2025-12-04T10:58:28.4294128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4294169Z raise RuntimeError(msg) 2025-12-04T10:58:28.4294569Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4294586Z 2025-12-04T10:58:28.4294661Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4294956Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4294958Z 2025-12-04T10:58:28.4295046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4295119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4295175Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4295357Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4295444Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4295482Z graph_break [] 2025-12-04T10:58:28.4295632Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4295677Z Traceback (most recent call last): 2025-12-04T10:58:28.4295831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4295871Z method(*args, **kwargs) 2025-12-04T10:58:28.4296021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4296061Z method(*args, **kwargs) 2025-12-04T10:58:28.4296210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4296247Z with policy(): 2025-12-04T10:58:28.4296400Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4296442Z raise RuntimeError(msg) 2025-12-04T10:58:28.4296853Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4296856Z 2025-12-04T10:58:28.4296929Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4297222Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4297224Z 2025-12-04T10:58:28.4297322Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4297397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4297454Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4297644Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4297718Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4297755Z graph_break [] 2025-12-04T10:58:28.4297827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4297882Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4297953Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4298129Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4298166Z graph_break [] 2025-12-04T10:58:28.4298218Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4298380Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4298426Z Traceback (most recent call last): 2025-12-04T10:58:28.4298579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4298619Z method(*args, **kwargs) 2025-12-04T10:58:28.4298770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4298809Z method(*args, **kwargs) 2025-12-04T10:58:28.4298958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4299007Z with policy(): 2025-12-04T10:58:28.4299160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4299202Z raise RuntimeError(msg) 2025-12-04T10:58:28.4299608Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4299610Z 2025-12-04T10:58:28.4299683Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4299972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4299976Z 2025-12-04T10:58:28.4300063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4300137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4300193Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4300371Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4300443Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4300479Z graph_break [] 2025-12-04T10:58:28.4300551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4300605Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4300676Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4300863Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4300901Z graph_break [] 2025-12-04T10:58:28.4300975Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4301030Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4301112Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4301287Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4301322Z graph_break [] 2025-12-04T10:58:28.4301568Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d3726d1926a081ed.xml - 2025-12-04T10:58:28.4301626Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4302267Z FAILED [0.6887s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4302286Z 2025-12-04T10:58:28.4302358Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4302648Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4302650Z 2025-12-04T10:58:28.4302736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4302810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4302879Z ================== 1 failed, 57 deselected, 2 rerun in 4.41s =================== 2025-12-04T10:58:28.4302916Z Got exit code 1 2025-12-04T10:58:28.4302956Z Retrying single test... 2025-12-04T10:58:28.4303156Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e59d43973e67107e.xml 2025-12-04T10:58:28.4303212Z ============================= test session starts ============================== 2025-12-04T10:58:28.4303352Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4303394Z cachedir: .pytest_cache 2025-12-04T10:58:28.4303552Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4303598Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4303640Z configfile: pytest.ini 2025-12-04T10:58:28.4303803Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4303876Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4304166Z stepcurrent: skipping 41 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4304209Z Running 1 items in this shard 2025-12-04T10:58:28.4304211Z 2025-12-04T10:58:28.4304575Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:33.119426636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4304579Z 2025-12-04T10:58:28.4304747Z [W1204 10:49:34.393461344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4304752Z 2025-12-04T10:58:28.4304903Z [W1204 10:49:34.393590673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4304905Z 2025-12-04T10:58:28.4305069Z [W1204 10:49:34.396791506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305072Z 2025-12-04T10:58:28.4305222Z [W1204 10:49:34.397107622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305224Z 2025-12-04T10:58:28.4305373Z [W1204 10:49:34.397171561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305375Z 2025-12-04T10:58:28.4305525Z [W1204 10:49:34.399379745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305541Z 2025-12-04T10:58:28.4305689Z [W1204 10:49:34.399650912 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305691Z 2025-12-04T10:58:28.4305841Z [W1204 10:49:34.399712361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4305843Z 2025-12-04T10:58:28.4305891Z ('RERUN', {'yellow': True}) [2.9215s] [100%] 2025-12-04T10:58:28.4306253Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:35.500546455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4306255Z 2025-12-04T10:58:28.4306420Z [W1204 10:49:35.500926220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4306422Z 2025-12-04T10:58:28.4306571Z [W1204 10:49:35.500991579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4306573Z 2025-12-04T10:58:28.4306722Z [W1204 10:49:35.502279505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4306725Z 2025-12-04T10:58:28.4306873Z [W1204 10:49:35.502542062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4306875Z 2025-12-04T10:58:28.4307024Z [W1204 10:49:35.502602951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4307026Z 2025-12-04T10:58:28.4307176Z [W1204 10:49:35.504592668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4307179Z 2025-12-04T10:58:28.4307328Z [W1204 10:49:35.504933674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4307330Z 2025-12-04T10:58:28.4307481Z [W1204 10:49:35.504996833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4307482Z 2025-12-04T10:58:28.4307530Z ('RERUN', {'yellow': True}) [0.6143s] [100%] 2025-12-04T10:58:28.4307891Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:49:35.110148060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4307894Z 2025-12-04T10:58:28.4308056Z [W1204 10:49:35.110530385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308059Z 2025-12-04T10:58:28.4308209Z [W1204 10:49:35.110594744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308211Z 2025-12-04T10:58:28.4308373Z [W1204 10:49:35.111863530 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308375Z 2025-12-04T10:58:28.4308524Z [W1204 10:49:35.112126457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308526Z 2025-12-04T10:58:28.4308672Z [W1204 10:49:35.112189396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308675Z 2025-12-04T10:58:28.4308823Z [W1204 10:49:35.114194512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308825Z 2025-12-04T10:58:28.4308986Z [W1204 10:49:35.114533908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4308988Z 2025-12-04T10:58:28.4309137Z [W1204 10:49:35.114595268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4309139Z 2025-12-04T10:58:28.4309178Z FAILED [0.5971s] [100%] 2025-12-04T10:58:28.4309180Z 2025-12-04T10:58:28.4309231Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4309384Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4309429Z Traceback (most recent call last): 2025-12-04T10:58:28.4309587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4309639Z method(*args, **kwargs) 2025-12-04T10:58:28.4309793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4309833Z method(*args, **kwargs) 2025-12-04T10:58:28.4309986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4310022Z with policy(): 2025-12-04T10:58:28.4310174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4310215Z raise RuntimeError(msg) 2025-12-04T10:58:28.4310615Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4310618Z 2025-12-04T10:58:28.4310692Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4310984Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4310986Z 2025-12-04T10:58:28.4311073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4311146Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4311202Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4311380Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4311465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4311502Z graph_break [] 2025-12-04T10:58:28.4311653Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4311698Z Traceback (most recent call last): 2025-12-04T10:58:28.4311864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4311904Z method(*args, **kwargs) 2025-12-04T10:58:28.4312056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4312095Z method(*args, **kwargs) 2025-12-04T10:58:28.4312245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4312282Z with policy(): 2025-12-04T10:58:28.4312435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4312476Z raise RuntimeError(msg) 2025-12-04T10:58:28.4312896Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4312898Z 2025-12-04T10:58:28.4312972Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4313293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4313295Z 2025-12-04T10:58:28.4313382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4313471Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4313527Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4313704Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4313778Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4313814Z graph_break [] 2025-12-04T10:58:28.4313887Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4313940Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4314013Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4314191Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4314229Z graph_break [] 2025-12-04T10:58:28.4314281Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4314432Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4314477Z Traceback (most recent call last): 2025-12-04T10:58:28.4314631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4314670Z method(*args, **kwargs) 2025-12-04T10:58:28.4314821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4314860Z method(*args, **kwargs) 2025-12-04T10:58:28.4315009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4315046Z with policy(): 2025-12-04T10:58:28.4315212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4315254Z raise RuntimeError(msg) 2025-12-04T10:58:28.4315679Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4315681Z 2025-12-04T10:58:28.4315755Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4316043Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4316045Z 2025-12-04T10:58:28.4316134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4316207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4316278Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4316455Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4316528Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4316564Z graph_break [] 2025-12-04T10:58:28.4316638Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4316691Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4316762Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4316937Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4316985Z graph_break [] 2025-12-04T10:58:28.4317057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4317114Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4317186Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4317360Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4317397Z graph_break [] 2025-12-04T10:58:28.4317641Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-e59d43973e67107e.xml - 2025-12-04T10:58:28.4317700Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4318338Z FAILED [0.5971s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4318342Z 2025-12-04T10:58:28.4318415Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4318702Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4318706Z 2025-12-04T10:58:28.4318791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4318866Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4318933Z ================== 1 failed, 57 deselected, 2 rerun in 4.28s =================== 2025-12-04T10:58:28.4318972Z Got exit code 1 2025-12-04T10:58:28.4319225Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4319355Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4319553Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-45dc23d154c7ddc2.xml 2025-12-04T10:58:28.4319610Z ============================= test session starts ============================== 2025-12-04T10:58:28.4319719Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4319762Z cachedir: .pytest_cache 2025-12-04T10:58:28.4319921Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4319979Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4320019Z configfile: pytest.ini 2025-12-04T10:58:28.4320180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4320254Z collecting ... collected 58 items / 42 deselected / 16 selected 2025-12-04T10:58:28.4320307Z stepcurrent: skipping 42 already run items. 2025-12-04T10:58:28.4320352Z Running 16 items in this shard 2025-12-04T10:58:28.4320355Z 2025-12-04T10:58:28.4320608Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8803s] [ 6%] 2025-12-04T10:58:28.4320858Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4667s] [ 6%] 2025-12-04T10:58:28.4321094Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4656s] [ 6%] 2025-12-04T10:58:28.4321096Z 2025-12-04T10:58:28.4321148Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4321301Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4321346Z Traceback (most recent call last): 2025-12-04T10:58:28.4321503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4321543Z method(*args, **kwargs) 2025-12-04T10:58:28.4321695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4321737Z method(*args, **kwargs) 2025-12-04T10:58:28.4321888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4321924Z with policy(): 2025-12-04T10:58:28.4322077Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4322119Z raise RuntimeError(msg) 2025-12-04T10:58:28.4322518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4322521Z 2025-12-04T10:58:28.4322594Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4322900Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4322905Z 2025-12-04T10:58:28.4322991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4323076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4323132Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4323434Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4323507Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4323546Z graph_break [] 2025-12-04T10:58:28.4323699Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4323760Z Traceback (most recent call last): 2025-12-04T10:58:28.4323914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4323955Z method(*args, **kwargs) 2025-12-04T10:58:28.4324105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4324145Z method(*args, **kwargs) 2025-12-04T10:58:28.4324295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4324332Z with policy(): 2025-12-04T10:58:28.4324484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4324540Z raise RuntimeError(msg) 2025-12-04T10:58:28.4324948Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4324951Z 2025-12-04T10:58:28.4325026Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4325315Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4325318Z 2025-12-04T10:58:28.4325403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4325477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4325534Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4325807Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4325881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4325918Z graph_break [] 2025-12-04T10:58:28.4325989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4326045Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4326117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4326389Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4326440Z graph_break [] 2025-12-04T10:58:28.4326493Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4326644Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4326689Z Traceback (most recent call last): 2025-12-04T10:58:28.4326859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4326900Z method(*args, **kwargs) 2025-12-04T10:58:28.4327050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4327090Z method(*args, **kwargs) 2025-12-04T10:58:28.4327240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4327276Z with policy(): 2025-12-04T10:58:28.4327432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4327486Z raise RuntimeError(msg) 2025-12-04T10:58:28.4327891Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4327895Z 2025-12-04T10:58:28.4327967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4328257Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4328259Z 2025-12-04T10:58:28.4328358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4328432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4328488Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4328764Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4328837Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4328873Z graph_break [] 2025-12-04T10:58:28.4328944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4328999Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4329069Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4329342Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4329379Z graph_break [] 2025-12-04T10:58:28.4329452Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4329507Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4329578Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4329849Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4329885Z graph_break [] 2025-12-04T10:58:28.4330148Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-45dc23d154c7ddc2.xml - 2025-12-04T10:58:28.4330208Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4330860Z FAILED [0.4656s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4330862Z 2025-12-04T10:58:28.4330935Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4331226Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4331229Z 2025-12-04T10:58:28.4331315Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4331389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4331456Z ================== 1 failed, 42 deselected, 2 rerun in 3.97s =================== 2025-12-04T10:58:28.4331493Z Got exit code 1 2025-12-04T10:58:28.4331532Z Retrying single test... 2025-12-04T10:58:28.4331729Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-66d56322c24c71d7.xml 2025-12-04T10:58:28.4331787Z ============================= test session starts ============================== 2025-12-04T10:58:28.4331896Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4331937Z cachedir: .pytest_cache 2025-12-04T10:58:28.4332107Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4332154Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4332194Z configfile: pytest.ini 2025-12-04T10:58:28.4332357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4332429Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4332716Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4332760Z Running 1 items in this shard 2025-12-04T10:58:28.4332762Z 2025-12-04T10:58:28.4333128Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:49:56.556467400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333132Z 2025-12-04T10:58:28.4333321Z [W1204 10:49:56.828693671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333324Z 2025-12-04T10:58:28.4333478Z [W1204 10:49:56.828869279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333480Z 2025-12-04T10:58:28.4333631Z [W1204 10:49:56.832669565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333633Z 2025-12-04T10:58:28.4333782Z [W1204 10:49:56.832973621 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333784Z 2025-12-04T10:58:28.4333949Z [W1204 10:49:56.833038700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4333952Z 2025-12-04T10:58:28.4334100Z [W1204 10:49:56.835176155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4334102Z 2025-12-04T10:58:28.4334264Z [W1204 10:49:56.835442162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4334266Z 2025-12-04T10:58:28.4334415Z [W1204 10:49:56.835501691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4334417Z 2025-12-04T10:58:28.4334465Z ('RERUN', {'yellow': True}) [3.1740s] [100%] 2025-12-04T10:58:28.4334827Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:49:57.438647254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4334844Z 2025-12-04T10:58:28.4334993Z [W1204 10:49:57.438997630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4334995Z 2025-12-04T10:58:28.4335144Z [W1204 10:49:57.439070549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335146Z 2025-12-04T10:58:28.4335294Z [W1204 10:49:57.440322284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335296Z 2025-12-04T10:58:28.4335445Z [W1204 10:49:57.440576611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335447Z 2025-12-04T10:58:28.4335597Z [W1204 10:49:57.440635320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335613Z 2025-12-04T10:58:28.4335763Z [W1204 10:49:57.442637217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335765Z 2025-12-04T10:58:28.4335915Z [W1204 10:49:57.442920483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4335917Z 2025-12-04T10:58:28.4336065Z [W1204 10:49:57.442981363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4336067Z 2025-12-04T10:58:28.4336114Z ('RERUN', {'yellow': True}) [0.4689s] [100%] 2025-12-04T10:58:28.4336472Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:49:57.901215817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4336477Z 2025-12-04T10:58:28.4336627Z [W1204 10:49:57.901587332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4336629Z 2025-12-04T10:58:28.4336778Z [W1204 10:49:57.901651491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4336779Z 2025-12-04T10:58:28.4336926Z [W1204 10:49:57.902910537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4336928Z 2025-12-04T10:58:28.4337076Z [W1204 10:49:57.903168894 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4337078Z 2025-12-04T10:58:28.4337237Z [W1204 10:49:57.903234513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4337240Z 2025-12-04T10:58:28.4337390Z [W1204 10:49:57.905220699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4337393Z 2025-12-04T10:58:28.4337554Z [W1204 10:49:57.905479916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4337556Z 2025-12-04T10:58:28.4337706Z [W1204 10:49:57.905539776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4337708Z 2025-12-04T10:58:28.4337747Z FAILED [0.4610s] [100%] 2025-12-04T10:58:28.4337749Z 2025-12-04T10:58:28.4337800Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4337952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4338000Z Traceback (most recent call last): 2025-12-04T10:58:28.4338156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4338209Z method(*args, **kwargs) 2025-12-04T10:58:28.4338363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4338402Z method(*args, **kwargs) 2025-12-04T10:58:28.4338554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4338591Z with policy(): 2025-12-04T10:58:28.4338744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4338785Z raise RuntimeError(msg) 2025-12-04T10:58:28.4339185Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4339199Z 2025-12-04T10:58:28.4339273Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4339563Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4339565Z 2025-12-04T10:58:28.4339653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4339726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4339782Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4340059Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4340134Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4340170Z graph_break [] 2025-12-04T10:58:28.4340323Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4340368Z Traceback (most recent call last): 2025-12-04T10:58:28.4340522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4340561Z method(*args, **kwargs) 2025-12-04T10:58:28.4340712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4340750Z method(*args, **kwargs) 2025-12-04T10:58:28.4340913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4340951Z with policy(): 2025-12-04T10:58:28.4341104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4341144Z raise RuntimeError(msg) 2025-12-04T10:58:28.4341561Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4341564Z 2025-12-04T10:58:28.4341638Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4341927Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4341942Z 2025-12-04T10:58:28.4342029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4342102Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4342158Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4342431Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4342505Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4342541Z graph_break [] 2025-12-04T10:58:28.4342614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4342688Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4342760Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4343032Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4343069Z graph_break [] 2025-12-04T10:58:28.4343120Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4343307Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4343354Z Traceback (most recent call last): 2025-12-04T10:58:28.4343509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4343549Z method(*args, **kwargs) 2025-12-04T10:58:28.4343702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4343743Z method(*args, **kwargs) 2025-12-04T10:58:28.4343893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4343930Z with policy(): 2025-12-04T10:58:28.4344083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4344124Z raise RuntimeError(msg) 2025-12-04T10:58:28.4344530Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4344533Z 2025-12-04T10:58:28.4344622Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4344913Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4344917Z 2025-12-04T10:58:28.4345018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4345091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4345147Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4345418Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4345491Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4345529Z graph_break [] 2025-12-04T10:58:28.4345601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4345670Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4345740Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4346012Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4346048Z graph_break [] 2025-12-04T10:58:28.4346120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4346173Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4346245Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4346513Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4346564Z graph_break [] 2025-12-04T10:58:28.4346809Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-66d56322c24c71d7.xml - 2025-12-04T10:58:28.4346869Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4347511Z FAILED [0.4610s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4347515Z 2025-12-04T10:58:28.4347587Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4347877Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4347879Z 2025-12-04T10:58:28.4347965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4348026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4348092Z ================== 1 failed, 57 deselected, 2 rerun in 4.27s =================== 2025-12-04T10:58:28.4348129Z Got exit code 1 2025-12-04T10:58:28.4348168Z Retrying single test... 2025-12-04T10:58:28.4348379Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f39c86ee96ae3ff0.xml 2025-12-04T10:58:28.4348437Z ============================= test session starts ============================== 2025-12-04T10:58:28.4348549Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4348589Z cachedir: .pytest_cache 2025-12-04T10:58:28.4348758Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4348803Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4348844Z configfile: pytest.ini 2025-12-04T10:58:28.4349004Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4349077Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4349365Z stepcurrent: skipping 42 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4349421Z Running 1 items in this shard 2025-12-04T10:58:28.4349424Z 2025-12-04T10:58:28.4349789Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:50:07.370511330 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4349792Z 2025-12-04T10:58:28.4349945Z [W1204 10:50:07.641668021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4349947Z 2025-12-04T10:58:28.4350099Z [W1204 10:50:07.641823529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350101Z 2025-12-04T10:58:28.4350264Z [W1204 10:50:07.645553815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350266Z 2025-12-04T10:58:28.4350416Z [W1204 10:50:07.645870072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350419Z 2025-12-04T10:58:28.4350567Z [W1204 10:50:07.645932361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350570Z 2025-12-04T10:58:28.4350718Z [W1204 10:50:07.648177804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350720Z 2025-12-04T10:58:28.4350868Z [W1204 10:50:07.648449221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4350869Z 2025-12-04T10:58:28.4351017Z [W1204 10:50:07.648509901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4351020Z 2025-12-04T10:58:28.4351069Z ('RERUN', {'yellow': True}) [3.2540s] [100%] 2025-12-04T10:58:28.4351428Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:50:08.273043122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4351430Z 2025-12-04T10:58:28.4351579Z [W1204 10:50:08.273423598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4351581Z 2025-12-04T10:58:28.4351729Z [W1204 10:50:08.273500947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4351731Z 2025-12-04T10:58:28.4351892Z [W1204 10:50:08.274808262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4351895Z 2025-12-04T10:58:28.4352045Z [W1204 10:50:08.275080898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4352047Z 2025-12-04T10:58:28.4352206Z [W1204 10:50:08.275145708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4352208Z 2025-12-04T10:58:28.4352357Z [W1204 10:50:08.277258343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4352359Z 2025-12-04T10:58:28.4352506Z [W1204 10:50:08.277524719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4352509Z 2025-12-04T10:58:28.4352658Z [W1204 10:50:08.277589319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4352660Z 2025-12-04T10:58:28.4352708Z ('RERUN', {'yellow': True}) [0.4811s] [100%] 2025-12-04T10:58:28.4353079Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:50:08.761026013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353081Z 2025-12-04T10:58:28.4353229Z [W1204 10:50:08.761409959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353231Z 2025-12-04T10:58:28.4353410Z [W1204 10:50:08.761486698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353412Z 2025-12-04T10:58:28.4353561Z [W1204 10:50:08.762781822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353578Z 2025-12-04T10:58:28.4353727Z [W1204 10:50:08.763066369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353730Z 2025-12-04T10:58:28.4353879Z [W1204 10:50:08.763131248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4353881Z 2025-12-04T10:58:28.4354031Z [W1204 10:50:08.765240253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4354033Z 2025-12-04T10:58:28.4354180Z [W1204 10:50:08.765506110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4354182Z 2025-12-04T10:58:28.4354331Z [W1204 10:50:08.765567969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4354334Z 2025-12-04T10:58:28.4354372Z FAILED [0.4943s] [100%] 2025-12-04T10:58:28.4354374Z 2025-12-04T10:58:28.4354426Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4354579Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4354626Z Traceback (most recent call last): 2025-12-04T10:58:28.4354782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4354822Z method(*args, **kwargs) 2025-12-04T10:58:28.4354975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4355014Z method(*args, **kwargs) 2025-12-04T10:58:28.4355179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4355217Z with policy(): 2025-12-04T10:58:28.4355371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4355412Z raise RuntimeError(msg) 2025-12-04T10:58:28.4355825Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4355828Z 2025-12-04T10:58:28.4355901Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4356192Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4356196Z 2025-12-04T10:58:28.4356283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4356384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4356440Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4356716Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4356790Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4356826Z graph_break [] 2025-12-04T10:58:28.4356976Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4357020Z Traceback (most recent call last): 2025-12-04T10:58:28.4357187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4357226Z method(*args, **kwargs) 2025-12-04T10:58:28.4357377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4357416Z method(*args, **kwargs) 2025-12-04T10:58:28.4357567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4357603Z with policy(): 2025-12-04T10:58:28.4357757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4357797Z raise RuntimeError(msg) 2025-12-04T10:58:28.4358203Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4358207Z 2025-12-04T10:58:28.4358279Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4358569Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4358571Z 2025-12-04T10:58:28.4358658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4358733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4358788Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4359075Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4359150Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4359186Z graph_break [] 2025-12-04T10:58:28.4359259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4359325Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4359397Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4359668Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4359704Z graph_break [] 2025-12-04T10:58:28.4359756Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4359907Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4359953Z Traceback (most recent call last): 2025-12-04T10:58:28.4360122Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4360162Z method(*args, **kwargs) 2025-12-04T10:58:28.4360313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4360352Z method(*args, **kwargs) 2025-12-04T10:58:28.4360503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4360539Z with policy(): 2025-12-04T10:58:28.4360692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4360732Z raise RuntimeError(msg) 2025-12-04T10:58:28.4361157Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4361160Z 2025-12-04T10:58:28.4361234Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4361523Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4361525Z 2025-12-04T10:58:28.4361611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4361683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4361739Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4362013Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4362086Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4362123Z graph_break [] 2025-12-04T10:58:28.4362196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4362249Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4362321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4362592Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4362629Z graph_break [] 2025-12-04T10:58:28.4362713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4362770Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4362840Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4363124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4363161Z graph_break [] 2025-12-04T10:58:28.4363437Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f39c86ee96ae3ff0.xml - 2025-12-04T10:58:28.4363495Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4364138Z FAILED [0.4943s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4364155Z 2025-12-04T10:58:28.4364229Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4364517Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4364519Z 2025-12-04T10:58:28.4364605Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4364688Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4364754Z ================== 1 failed, 57 deselected, 2 rerun in 4.40s =================== 2025-12-04T10:58:28.4364791Z Got exit code 1 2025-12-04T10:58:28.4365034Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4365162Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4365361Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0658affd0834d40e.xml 2025-12-04T10:58:28.4365419Z ============================= test session starts ============================== 2025-12-04T10:58:28.4365529Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4365571Z cachedir: .pytest_cache 2025-12-04T10:58:28.4365728Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4365775Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4365815Z configfile: pytest.ini 2025-12-04T10:58:28.4365975Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4366049Z collecting ... collected 58 items / 43 deselected / 15 selected 2025-12-04T10:58:28.4366102Z stepcurrent: skipping 43 already run items. 2025-12-04T10:58:28.4366145Z Running 15 items in this shard 2025-12-04T10:58:28.4366147Z 2025-12-04T10:58:28.4366399Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6089s] [ 6%] 2025-12-04T10:58:28.4366663Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6235s] [ 6%] 2025-12-04T10:58:28.4366890Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.5923s] [ 6%] 2025-12-04T10:58:28.4366907Z 2025-12-04T10:58:28.4366959Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4367110Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4367155Z Traceback (most recent call last): 2025-12-04T10:58:28.4367311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4367351Z method(*args, **kwargs) 2025-12-04T10:58:28.4367505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4367546Z method(*args, **kwargs) 2025-12-04T10:58:28.4367709Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4367746Z with policy(): 2025-12-04T10:58:28.4367900Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4367941Z raise RuntimeError(msg) 2025-12-04T10:58:28.4368340Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4368342Z 2025-12-04T10:58:28.4368415Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4368719Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4368722Z 2025-12-04T10:58:28.4368810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4368883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4368939Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4369117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4369189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4369225Z graph_break [] 2025-12-04T10:58:28.4369378Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4369424Z Traceback (most recent call last): 2025-12-04T10:58:28.4369579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4369618Z method(*args, **kwargs) 2025-12-04T10:58:28.4369772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4369812Z method(*args, **kwargs) 2025-12-04T10:58:28.4369962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4369999Z with policy(): 2025-12-04T10:58:28.4370150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4370191Z raise RuntimeError(msg) 2025-12-04T10:58:28.4370609Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4370613Z 2025-12-04T10:58:28.4370704Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4370996Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4370998Z 2025-12-04T10:58:28.4371085Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4371158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4371215Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4371393Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4371478Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4371514Z graph_break [] 2025-12-04T10:58:28.4371588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4371642Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4371714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4371889Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4371926Z graph_break [] 2025-12-04T10:58:28.4371979Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4372142Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4372188Z Traceback (most recent call last): 2025-12-04T10:58:28.4372342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4372383Z method(*args, **kwargs) 2025-12-04T10:58:28.4372535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4372575Z method(*args, **kwargs) 2025-12-04T10:58:28.4372724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4372761Z with policy(): 2025-12-04T10:58:28.4372914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4372956Z raise RuntimeError(msg) 2025-12-04T10:58:28.4373401Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4373404Z 2025-12-04T10:58:28.4373478Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4373770Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4373772Z 2025-12-04T10:58:28.4373859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4373932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4374005Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4374181Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4374255Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4374291Z graph_break [] 2025-12-04T10:58:28.4374377Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4374433Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4374503Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4374678Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4374713Z graph_break [] 2025-12-04T10:58:28.4374786Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4374842Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4374914Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4375101Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4375139Z graph_break [] 2025-12-04T10:58:28.4375381Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0658affd0834d40e.xml - 2025-12-04T10:58:28.4375441Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4376084Z FAILED [0.5923s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4376099Z 2025-12-04T10:58:28.4376172Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4376464Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4376466Z 2025-12-04T10:58:28.4376551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4376613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4376678Z ================== 1 failed, 43 deselected, 2 rerun in 3.97s =================== 2025-12-04T10:58:28.4376717Z Got exit code 1 2025-12-04T10:58:28.4376756Z Retrying single test... 2025-12-04T10:58:28.4376953Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-448f5b9446604d04.xml 2025-12-04T10:58:28.4377011Z ============================= test session starts ============================== 2025-12-04T10:58:28.4377122Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4377163Z cachedir: .pytest_cache 2025-12-04T10:58:28.4377321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4377366Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4377406Z configfile: pytest.ini 2025-12-04T10:58:28.4377566Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4377652Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4377941Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4377986Z Running 1 items in this shard 2025-12-04T10:58:28.4377999Z 2025-12-04T10:58:28.4378366Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:28.688431964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4378369Z 2025-12-04T10:58:28.4378523Z [W1204 10:50:28.991394583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4378525Z 2025-12-04T10:58:28.4378679Z [W1204 10:50:28.991532451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4378692Z 2025-12-04T10:58:28.4378842Z [W1204 10:50:28.994902332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4378844Z 2025-12-04T10:58:28.4378994Z [W1204 10:50:28.995234008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4378996Z 2025-12-04T10:58:28.4379144Z [W1204 10:50:28.995298507 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4379146Z 2025-12-04T10:58:28.4379294Z [W1204 10:50:28.997471071 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4379296Z 2025-12-04T10:58:28.4379445Z [W1204 10:50:28.997746968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4379458Z 2025-12-04T10:58:28.4379608Z [W1204 10:50:28.997806177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4379609Z 2025-12-04T10:58:28.4379659Z ('RERUN', {'yellow': True}) [2.9412s] [100%] 2025-12-04T10:58:28.4380021Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:29.014982243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380023Z 2025-12-04T10:58:28.4380174Z [W1204 10:50:29.015394708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380176Z 2025-12-04T10:58:28.4380324Z [W1204 10:50:29.015474887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380328Z 2025-12-04T10:58:28.4380478Z [W1204 10:50:29.016777772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380480Z 2025-12-04T10:58:28.4380631Z [W1204 10:50:29.017065758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380633Z 2025-12-04T10:58:28.4380781Z [W1204 10:50:29.017131768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380782Z 2025-12-04T10:58:28.4380931Z [W1204 10:50:29.019232503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4380932Z 2025-12-04T10:58:28.4381091Z [W1204 10:50:29.019585198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4381094Z 2025-12-04T10:58:28.4381244Z [W1204 10:50:29.019650438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4381247Z 2025-12-04T10:58:28.4381295Z ('RERUN', {'yellow': True}) [0.5129s] [100%] 2025-12-04T10:58:28.4381666Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:30.514441663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4381668Z 2025-12-04T10:58:28.4381819Z [W1204 10:50:30.514814019 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4381821Z 2025-12-04T10:58:28.4381970Z [W1204 10:50:30.514877918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4381972Z 2025-12-04T10:58:28.4382132Z [W1204 10:50:30.516258342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382134Z 2025-12-04T10:58:28.4382283Z [W1204 10:50:30.516519938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382287Z 2025-12-04T10:58:28.4382435Z [W1204 10:50:30.516579188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382436Z 2025-12-04T10:58:28.4382585Z [W1204 10:50:30.518625603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382586Z 2025-12-04T10:58:28.4382736Z [W1204 10:50:30.518968049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382750Z 2025-12-04T10:58:28.4382899Z [W1204 10:50:30.519035769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4382902Z 2025-12-04T10:58:28.4382941Z FAILED [0.4916s] [100%] 2025-12-04T10:58:28.4382942Z 2025-12-04T10:58:28.4382995Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4383147Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4383194Z Traceback (most recent call last): 2025-12-04T10:58:28.4383393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4383435Z method(*args, **kwargs) 2025-12-04T10:58:28.4383588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4383630Z method(*args, **kwargs) 2025-12-04T10:58:28.4383782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4383819Z with policy(): 2025-12-04T10:58:28.4383974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4384014Z raise RuntimeError(msg) 2025-12-04T10:58:28.4384414Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4384416Z 2025-12-04T10:58:28.4384489Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4384801Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4384805Z 2025-12-04T10:58:28.4384892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4384978Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4385034Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4385212Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4385285Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4385322Z graph_break [] 2025-12-04T10:58:28.4385473Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4385520Z Traceback (most recent call last): 2025-12-04T10:58:28.4385673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4385728Z method(*args, **kwargs) 2025-12-04T10:58:28.4385879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4385919Z method(*args, **kwargs) 2025-12-04T10:58:28.4386069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4386105Z with policy(): 2025-12-04T10:58:28.4386258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4386299Z raise RuntimeError(msg) 2025-12-04T10:58:28.4386707Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4386724Z 2025-12-04T10:58:28.4386796Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4387089Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4387091Z 2025-12-04T10:58:28.4387178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4387252Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4387307Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4387484Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4387558Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4387594Z graph_break [] 2025-12-04T10:58:28.4387667Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4387723Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4387794Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4387969Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4388004Z graph_break [] 2025-12-04T10:58:28.4388056Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4388219Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4388266Z Traceback (most recent call last): 2025-12-04T10:58:28.4388421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4388460Z method(*args, **kwargs) 2025-12-04T10:58:28.4388621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4388661Z method(*args, **kwargs) 2025-12-04T10:58:28.4388811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4388847Z with policy(): 2025-12-04T10:58:28.4388999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4389039Z raise RuntimeError(msg) 2025-12-04T10:58:28.4389451Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4389465Z 2025-12-04T10:58:28.4389538Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4389831Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4389833Z 2025-12-04T10:58:28.4389919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4389992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4390048Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4390235Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4390309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4390344Z graph_break [] 2025-12-04T10:58:28.4390417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4390471Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4390542Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4390716Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4390752Z graph_break [] 2025-12-04T10:58:28.4390824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4390880Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4390951Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4391126Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4391161Z graph_break [] 2025-12-04T10:58:28.4391406Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-448f5b9446604d04.xml - 2025-12-04T10:58:28.4391464Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4392118Z FAILED [0.4916s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4392122Z 2025-12-04T10:58:28.4392194Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4392497Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4392499Z 2025-12-04T10:58:28.4392586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4392647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4392713Z ================== 1 failed, 57 deselected, 2 rerun in 4.11s =================== 2025-12-04T10:58:28.4392751Z Got exit code 1 2025-12-04T10:58:28.4392792Z Retrying single test... 2025-12-04T10:58:28.4392990Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c56222dc7b3f8027.xml 2025-12-04T10:58:28.4393059Z ============================= test session starts ============================== 2025-12-04T10:58:28.4393170Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4393211Z cachedir: .pytest_cache 2025-12-04T10:58:28.4393401Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4393447Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4393487Z configfile: pytest.ini 2025-12-04T10:58:28.4393651Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4393724Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4394032Z stepcurrent: skipping 43 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4394076Z Running 1 items in this shard 2025-12-04T10:58:28.4394078Z 2025-12-04T10:58:28.4394442Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:39.416744572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4394444Z 2025-12-04T10:58:28.4394599Z [W1204 10:50:39.701505344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4394600Z 2025-12-04T10:58:28.4394752Z [W1204 10:50:39.701658572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4394754Z 2025-12-04T10:58:28.4394905Z [W1204 10:50:39.705846022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4394908Z 2025-12-04T10:58:28.4395057Z [W1204 10:50:39.706169658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4395059Z 2025-12-04T10:58:28.4395209Z [W1204 10:50:39.706233688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4395211Z 2025-12-04T10:58:28.4395358Z [W1204 10:50:39.708643329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4395361Z 2025-12-04T10:58:28.4395521Z [W1204 10:50:39.708926206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4395524Z 2025-12-04T10:58:28.4395674Z [W1204 10:50:39.708987565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4395677Z 2025-12-04T10:58:28.4395725Z ('RERUN', {'yellow': True}) [2.9980s] [100%] 2025-12-04T10:58:28.4396102Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:40.877553119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396104Z 2025-12-04T10:58:28.4396254Z [W1204 10:50:40.877942785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396256Z 2025-12-04T10:58:28.4396405Z [W1204 10:50:40.878012614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396408Z 2025-12-04T10:58:28.4396558Z [W1204 10:50:40.879381368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396575Z 2025-12-04T10:58:28.4396724Z [W1204 10:50:40.879808433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396726Z 2025-12-04T10:58:28.4396875Z [W1204 10:50:40.879872162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4396876Z 2025-12-04T10:58:28.4397024Z [W1204 10:50:40.881871828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4397026Z 2025-12-04T10:58:28.4397176Z [W1204 10:50:40.882217914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4397196Z 2025-12-04T10:58:28.4397344Z [W1204 10:50:40.882283973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4397347Z 2025-12-04T10:58:28.4397395Z ('RERUN', {'yellow': True}) [0.6340s] [100%] 2025-12-04T10:58:28.4397755Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:50:41.483635385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4397757Z 2025-12-04T10:58:28.4397907Z [W1204 10:50:41.484048910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4397909Z 2025-12-04T10:58:28.4398060Z [W1204 10:50:41.484128589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398063Z 2025-12-04T10:58:28.4398211Z [W1204 10:50:41.485458183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398213Z 2025-12-04T10:58:28.4398363Z [W1204 10:50:41.485729550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398365Z 2025-12-04T10:58:28.4398514Z [W1204 10:50:41.485790939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398515Z 2025-12-04T10:58:28.4398663Z [W1204 10:50:41.487762575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398665Z 2025-12-04T10:58:28.4398813Z [W1204 10:50:41.488108821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398826Z 2025-12-04T10:58:28.4398974Z [W1204 10:50:41.488174450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4398977Z 2025-12-04T10:58:28.4399016Z FAILED [0.6090s] [100%] 2025-12-04T10:58:28.4399018Z 2025-12-04T10:58:28.4399081Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4399234Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4399279Z Traceback (most recent call last): 2025-12-04T10:58:28.4399437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4399477Z method(*args, **kwargs) 2025-12-04T10:58:28.4399630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4399672Z method(*args, **kwargs) 2025-12-04T10:58:28.4399823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4399872Z with policy(): 2025-12-04T10:58:28.4400027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4400068Z raise RuntimeError(msg) 2025-12-04T10:58:28.4400470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4400472Z 2025-12-04T10:58:28.4400546Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4400839Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4400854Z 2025-12-04T10:58:28.4400941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4401015Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4401072Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4401248Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4401321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4401357Z graph_break [] 2025-12-04T10:58:28.4401510Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4401557Z Traceback (most recent call last): 2025-12-04T10:58:28.4401712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4401752Z method(*args, **kwargs) 2025-12-04T10:58:28.4401905Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4401944Z method(*args, **kwargs) 2025-12-04T10:58:28.4402094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4402131Z with policy(): 2025-12-04T10:58:28.4402284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4402325Z raise RuntimeError(msg) 2025-12-04T10:58:28.4402744Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4402748Z 2025-12-04T10:58:28.4402823Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4403124Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4403126Z 2025-12-04T10:58:28.4403214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4403321Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4403377Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4403554Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4403646Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4403681Z graph_break [] 2025-12-04T10:58:28.4403755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4403809Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4403881Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4404057Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4404094Z graph_break [] 2025-12-04T10:58:28.4404145Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4404298Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4404358Z Traceback (most recent call last): 2025-12-04T10:58:28.4404512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4404553Z method(*args, **kwargs) 2025-12-04T10:58:28.4404704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4404744Z method(*args, **kwargs) 2025-12-04T10:58:28.4404893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4404931Z with policy(): 2025-12-04T10:58:28.4405081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4405123Z raise RuntimeError(msg) 2025-12-04T10:58:28.4405529Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4405533Z 2025-12-04T10:58:28.4405606Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4405895Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4405897Z 2025-12-04T10:58:28.4405984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4406056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4406111Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4406301Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4406375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4406412Z graph_break [] 2025-12-04T10:58:28.4406501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4406557Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4406628Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4406803Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4406838Z graph_break [] 2025-12-04T10:58:28.4406910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4406964Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4407038Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4407211Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4407260Z graph_break [] 2025-12-04T10:58:28.4407505Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c56222dc7b3f8027.xml - 2025-12-04T10:58:28.4407565Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4408208Z FAILED [0.6090s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4408222Z 2025-12-04T10:58:28.4408295Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4408585Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4408588Z 2025-12-04T10:58:28.4408673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4408734Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4408801Z ================== 1 failed, 57 deselected, 2 rerun in 4.39s =================== 2025-12-04T10:58:28.4408838Z Got exit code 1 2025-12-04T10:58:28.4409081Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4409211Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4409410Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b8cd4b08cde296da.xml 2025-12-04T10:58:28.4409467Z ============================= test session starts ============================== 2025-12-04T10:58:28.4409576Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4409618Z cachedir: .pytest_cache 2025-12-04T10:58:28.4409775Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4409821Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4409863Z configfile: pytest.ini 2025-12-04T10:58:28.4410034Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4410110Z collecting ... collected 58 items / 44 deselected / 14 selected 2025-12-04T10:58:28.4410163Z stepcurrent: skipping 44 already run items. 2025-12-04T10:58:28.4410218Z Running 14 items in this shard 2025-12-04T10:58:28.4410220Z 2025-12-04T10:58:28.4410473Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5939s] [ 7%] 2025-12-04T10:58:28.4410719Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6526s] [ 7%] 2025-12-04T10:58:28.4410943Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.6452s] [ 7%] 2025-12-04T10:58:28.4410946Z 2025-12-04T10:58:28.4411016Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4411167Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4411214Z Traceback (most recent call last): 2025-12-04T10:58:28.4411370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4411410Z method(*args, **kwargs) 2025-12-04T10:58:28.4411561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4411601Z method(*args, **kwargs) 2025-12-04T10:58:28.4411752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4411802Z with policy(): 2025-12-04T10:58:28.4411956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4411998Z raise RuntimeError(msg) 2025-12-04T10:58:28.4412392Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4412394Z 2025-12-04T10:58:28.4412467Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4412757Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4412761Z 2025-12-04T10:58:28.4412847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4412921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4413052Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4413231Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4413339Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4413375Z graph_break [] 2025-12-04T10:58:28.4413525Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4413571Z Traceback (most recent call last): 2025-12-04T10:58:28.4413724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4413783Z method(*args, **kwargs) 2025-12-04T10:58:28.4413934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4413975Z method(*args, **kwargs) 2025-12-04T10:58:28.4414140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4414178Z with policy(): 2025-12-04T10:58:28.4414330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4414370Z raise RuntimeError(msg) 2025-12-04T10:58:28.4414771Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4414774Z 2025-12-04T10:58:28.4414848Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4415155Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4415158Z 2025-12-04T10:58:28.4415245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4415318Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4415373Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4415550Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4415622Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4415675Z graph_break [] 2025-12-04T10:58:28.4415748Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4415805Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4415876Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4416052Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4416088Z graph_break [] 2025-12-04T10:58:28.4416141Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4416291Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4416336Z Traceback (most recent call last): 2025-12-04T10:58:28.4416491Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4416535Z method(*args, **kwargs) 2025-12-04T10:58:28.4416686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4416727Z method(*args, **kwargs) 2025-12-04T10:58:28.4416877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4416915Z with policy(): 2025-12-04T10:58:28.4417066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4417107Z raise RuntimeError(msg) 2025-12-04T10:58:28.4417519Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4417524Z 2025-12-04T10:58:28.4417596Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4417901Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4417904Z 2025-12-04T10:58:28.4417991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4418065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4418120Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4418296Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4418370Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4418407Z graph_break [] 2025-12-04T10:58:28.4418478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4418545Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4418616Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4418792Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4418828Z graph_break [] 2025-12-04T10:58:28.4418901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4418956Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4419027Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4419201Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4419250Z graph_break [] 2025-12-04T10:58:28.4419494Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b8cd4b08cde296da.xml - 2025-12-04T10:58:28.4419554Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4420194Z FAILED [0.6452s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4420198Z 2025-12-04T10:58:28.4420271Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4420560Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4420564Z 2025-12-04T10:58:28.4420650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4420713Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4420778Z ================== 1 failed, 44 deselected, 2 rerun in 4.06s =================== 2025-12-04T10:58:28.4420815Z Got exit code 1 2025-12-04T10:58:28.4420855Z Retrying single test... 2025-12-04T10:58:28.4421054Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d5990fe812f5f47b.xml 2025-12-04T10:58:28.4421123Z ============================= test session starts ============================== 2025-12-04T10:58:28.4421233Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4421275Z cachedir: .pytest_cache 2025-12-04T10:58:28.4421445Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4421490Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4421532Z configfile: pytest.ini 2025-12-04T10:58:28.4421690Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4421763Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4422051Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4422098Z Running 1 items in this shard 2025-12-04T10:58:28.4422100Z 2025-12-04T10:58:28.4422466Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:01.671004205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4422479Z 2025-12-04T10:58:28.4422633Z [W1204 10:51:01.928715568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4422635Z 2025-12-04T10:58:28.4422787Z [W1204 10:51:01.928845257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4422789Z 2025-12-04T10:58:28.4422939Z [W1204 10:51:01.931709412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4422955Z 2025-12-04T10:58:28.4423104Z [W1204 10:51:01.932015488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4423107Z 2025-12-04T10:58:28.4423300Z [W1204 10:51:01.932079618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4423302Z 2025-12-04T10:58:28.4423451Z [W1204 10:51:01.934166102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4423453Z 2025-12-04T10:58:28.4423601Z [W1204 10:51:01.934440329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4423603Z 2025-12-04T10:58:28.4423751Z [W1204 10:51:01.934501058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4423753Z 2025-12-04T10:58:28.4423804Z ('RERUN', {'yellow': True}) [2.7014s] [100%] 2025-12-04T10:58:28.4424168Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:02.867846848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424173Z 2025-12-04T10:58:28.4424323Z [W1204 10:51:02.868228794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424325Z 2025-12-04T10:58:28.4424474Z [W1204 10:51:02.868309393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424476Z 2025-12-04T10:58:28.4424624Z [W1204 10:51:02.869557758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424626Z 2025-12-04T10:58:28.4424791Z [W1204 10:51:02.869829785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424794Z 2025-12-04T10:58:28.4424942Z [W1204 10:51:02.869892854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4424967Z 2025-12-04T10:58:28.4425115Z [W1204 10:51:02.871780781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4425117Z 2025-12-04T10:58:28.4425266Z [W1204 10:51:02.872123967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4425268Z 2025-12-04T10:58:28.4425415Z [W1204 10:51:02.872189896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4425418Z 2025-12-04T10:58:28.4425467Z ('RERUN', {'yellow': True}) [0.4278s] [100%] 2025-12-04T10:58:28.4425823Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:03.295713854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4425841Z 2025-12-04T10:58:28.4425991Z [W1204 10:51:03.296091460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4425993Z 2025-12-04T10:58:28.4426142Z [W1204 10:51:03.296168529 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426144Z 2025-12-04T10:58:28.4426292Z [W1204 10:51:03.297413094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426308Z 2025-12-04T10:58:28.4426460Z [W1204 10:51:03.297671611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426462Z 2025-12-04T10:58:28.4426610Z [W1204 10:51:03.297730630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426612Z 2025-12-04T10:58:28.4426762Z [W1204 10:51:03.299614687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426764Z 2025-12-04T10:58:28.4426912Z [W1204 10:51:03.299949563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4426915Z 2025-12-04T10:58:28.4427063Z [W1204 10:51:03.300016693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4427064Z 2025-12-04T10:58:28.4427104Z FAILED [0.4401s] [100%] 2025-12-04T10:58:28.4427107Z 2025-12-04T10:58:28.4427158Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4427311Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4427357Z Traceback (most recent call last): 2025-12-04T10:58:28.4427515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4427555Z method(*args, **kwargs) 2025-12-04T10:58:28.4427708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4427748Z method(*args, **kwargs) 2025-12-04T10:58:28.4427901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4427937Z with policy(): 2025-12-04T10:58:28.4428103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4428145Z raise RuntimeError(msg) 2025-12-04T10:58:28.4428556Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4428559Z 2025-12-04T10:58:28.4428633Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4428924Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4428926Z 2025-12-04T10:58:28.4429014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4429088Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4429156Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4429333Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4429406Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4429442Z graph_break [] 2025-12-04T10:58:28.4429594Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4429639Z Traceback (most recent call last): 2025-12-04T10:58:28.4429793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4429832Z method(*args, **kwargs) 2025-12-04T10:58:28.4429997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4430037Z method(*args, **kwargs) 2025-12-04T10:58:28.4430187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4430224Z with policy(): 2025-12-04T10:58:28.4430376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4430417Z raise RuntimeError(msg) 2025-12-04T10:58:28.4430819Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4430821Z 2025-12-04T10:58:28.4430895Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4431186Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4431190Z 2025-12-04T10:58:28.4431277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4431350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4431406Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4431580Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4431652Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4431688Z graph_break [] 2025-12-04T10:58:28.4431774Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4431829Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4431902Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4432088Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4432125Z graph_break [] 2025-12-04T10:58:28.4432177Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4432327Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4432371Z Traceback (most recent call last): 2025-12-04T10:58:28.4432525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4432565Z method(*args, **kwargs) 2025-12-04T10:58:28.4432717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4432768Z method(*args, **kwargs) 2025-12-04T10:58:28.4432919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4432957Z with policy(): 2025-12-04T10:58:28.4433109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4433149Z raise RuntimeError(msg) 2025-12-04T10:58:28.4433597Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4433613Z 2025-12-04T10:58:28.4433688Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4433978Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4433981Z 2025-12-04T10:58:28.4434069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4434142Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4434198Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4434374Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4434448Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4434486Z graph_break [] 2025-12-04T10:58:28.4434560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4434615Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4434687Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4434862Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4434899Z graph_break [] 2025-12-04T10:58:28.4434972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4435026Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4435096Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4435272Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4435322Z graph_break [] 2025-12-04T10:58:28.4435567Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d5990fe812f5f47b.xml - 2025-12-04T10:58:28.4435627Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4436276Z FAILED [0.4401s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4436279Z 2025-12-04T10:58:28.4436352Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4436641Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4436657Z 2025-12-04T10:58:28.4436744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4436806Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4436873Z ================== 1 failed, 57 deselected, 2 rerun in 3.73s =================== 2025-12-04T10:58:28.4436909Z Got exit code 1 2025-12-04T10:58:28.4436950Z Retrying single test... 2025-12-04T10:58:28.4437149Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-edf28e3b8e96a8d0.xml 2025-12-04T10:58:28.4437206Z ============================= test session starts ============================== 2025-12-04T10:58:28.4437327Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4437369Z cachedir: .pytest_cache 2025-12-04T10:58:28.4437528Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4437574Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4437615Z configfile: pytest.ini 2025-12-04T10:58:28.4437775Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4437847Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4438134Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4438178Z Running 1 items in this shard 2025-12-04T10:58:28.4438181Z 2025-12-04T10:58:28.4438622Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:11.775630947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4438625Z 2025-12-04T10:58:28.4438780Z [W1204 10:51:11.050149237 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4438782Z 2025-12-04T10:58:28.4438932Z [W1204 10:51:11.050274066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4438934Z 2025-12-04T10:58:28.4439084Z [W1204 10:51:11.053267920 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439086Z 2025-12-04T10:58:28.4439255Z [W1204 10:51:11.053584196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439260Z 2025-12-04T10:58:28.4439408Z [W1204 10:51:11.053645305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439410Z 2025-12-04T10:58:28.4439570Z [W1204 10:51:11.055835379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439572Z 2025-12-04T10:58:28.4439721Z [W1204 10:51:11.056108775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439723Z 2025-12-04T10:58:28.4439871Z [W1204 10:51:11.056176994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4439873Z 2025-12-04T10:58:28.4439920Z ('RERUN', {'yellow': True}) [2.7203s] [100%] 2025-12-04T10:58:28.4440284Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:12.011968068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4440298Z 2025-12-04T10:58:28.4440449Z [W1204 10:51:12.012342643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4440451Z 2025-12-04T10:58:28.4440599Z [W1204 10:51:12.012408402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4440601Z 2025-12-04T10:58:28.4440749Z [W1204 10:51:12.013658437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4440751Z 2025-12-04T10:58:28.4440900Z [W1204 10:51:12.013920134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4440915Z 2025-12-04T10:58:28.4441065Z [W1204 10:51:12.013980343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4441067Z 2025-12-04T10:58:28.4441216Z [W1204 10:51:12.015914380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4441219Z 2025-12-04T10:58:28.4441367Z [W1204 10:51:12.016255446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4441369Z 2025-12-04T10:58:28.4441517Z [W1204 10:51:12.016320265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4441519Z 2025-12-04T10:58:28.4441566Z ('RERUN', {'yellow': True}) [0.4607s] [100%] 2025-12-04T10:58:28.4441930Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:51:13.472071887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4441933Z 2025-12-04T10:58:28.4442082Z [W1204 10:51:13.472445153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442084Z 2025-12-04T10:58:28.4442233Z [W1204 10:51:13.472509772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442235Z 2025-12-04T10:58:28.4442383Z [W1204 10:51:13.473757757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442385Z 2025-12-04T10:58:28.4442544Z [W1204 10:51:13.474021584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442546Z 2025-12-04T10:58:28.4442695Z [W1204 10:51:13.474086773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442698Z 2025-12-04T10:58:28.4442857Z [W1204 10:51:13.476018770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4442859Z 2025-12-04T10:58:28.4443008Z [W1204 10:51:13.476358826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4443010Z 2025-12-04T10:58:28.4443158Z [W1204 10:51:13.476421815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4443161Z 2025-12-04T10:58:28.4443199Z FAILED [0.4510s] [100%] 2025-12-04T10:58:28.4443201Z 2025-12-04T10:58:28.4443286Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4443437Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4443498Z Traceback (most recent call last): 2025-12-04T10:58:28.4443656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4443698Z method(*args, **kwargs) 2025-12-04T10:58:28.4443851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4443892Z method(*args, **kwargs) 2025-12-04T10:58:28.4444042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4444079Z with policy(): 2025-12-04T10:58:28.4444232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4444288Z raise RuntimeError(msg) 2025-12-04T10:58:28.4444682Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4444685Z 2025-12-04T10:58:28.4444758Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4445047Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4445050Z 2025-12-04T10:58:28.4445137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4445213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4445270Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4445450Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4445524Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4445561Z graph_break [] 2025-12-04T10:58:28.4445710Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4445756Z Traceback (most recent call last): 2025-12-04T10:58:28.4445909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4445950Z method(*args, **kwargs) 2025-12-04T10:58:28.4446116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4446157Z method(*args, **kwargs) 2025-12-04T10:58:28.4446307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4446345Z with policy(): 2025-12-04T10:58:28.4446511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4446553Z raise RuntimeError(msg) 2025-12-04T10:58:28.4446953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4446955Z 2025-12-04T10:58:28.4447028Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4447317Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4447333Z 2025-12-04T10:58:28.4447420Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4447495Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4447550Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4447728Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4447800Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4447837Z graph_break [] 2025-12-04T10:58:28.4447909Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4447977Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4448048Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4448225Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4448262Z graph_break [] 2025-12-04T10:58:28.4448314Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4448463Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4448508Z Traceback (most recent call last): 2025-12-04T10:58:28.4448661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4448701Z method(*args, **kwargs) 2025-12-04T10:58:28.4448851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4448891Z method(*args, **kwargs) 2025-12-04T10:58:28.4449042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4449079Z with policy(): 2025-12-04T10:58:28.4449232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4449273Z raise RuntimeError(msg) 2025-12-04T10:58:28.4449673Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4449675Z 2025-12-04T10:58:28.4449749Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4450051Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4450054Z 2025-12-04T10:58:28.4450153Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4450227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4450283Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4450461Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4450533Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4450569Z graph_break [] 2025-12-04T10:58:28.4450641Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4450697Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4450768Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4450957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4450992Z graph_break [] 2025-12-04T10:58:28.4451065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4451118Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4451190Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4451364Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4451401Z graph_break [] 2025-12-04T10:58:28.4451664Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-edf28e3b8e96a8d0.xml - 2025-12-04T10:58:28.4451725Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4452358Z FAILED [0.4510s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4452361Z 2025-12-04T10:58:28.4452433Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4452726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4452730Z 2025-12-04T10:58:28.4452815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4452877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4452943Z ================== 1 failed, 57 deselected, 2 rerun in 3.79s =================== 2025-12-04T10:58:28.4452980Z Got exit code 1 2025-12-04T10:58:28.4453220Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4453383Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4453595Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-143ad13a4aa5bafb.xml 2025-12-04T10:58:28.4453655Z ============================= test session starts ============================== 2025-12-04T10:58:28.4453765Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4453807Z cachedir: .pytest_cache 2025-12-04T10:58:28.4453979Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4454026Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4454066Z configfile: pytest.ini 2025-12-04T10:58:28.4454225Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4454299Z collecting ... collected 58 items / 45 deselected / 13 selected 2025-12-04T10:58:28.4454350Z stepcurrent: skipping 45 already run items. 2025-12-04T10:58:28.4454395Z Running 13 items in this shard 2025-12-04T10:58:28.4454399Z 2025-12-04T10:58:28.4454651Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8746s] [ 7%] 2025-12-04T10:58:28.4454913Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4592s] [ 7%] 2025-12-04T10:58:28.4455136Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4517s] [ 7%] 2025-12-04T10:58:28.4455138Z 2025-12-04T10:58:28.4455189Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4456682Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4456750Z Traceback (most recent call last): 2025-12-04T10:58:28.4456908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4456951Z method(*args, **kwargs) 2025-12-04T10:58:28.4457103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4457144Z method(*args, **kwargs) 2025-12-04T10:58:28.4457294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4457332Z with policy(): 2025-12-04T10:58:28.4457484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4457525Z raise RuntimeError(msg) 2025-12-04T10:58:28.4457922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4457927Z 2025-12-04T10:58:28.4458000Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4458291Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4458293Z 2025-12-04T10:58:28.4458379Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4458454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4458510Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4458802Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4458878Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4458915Z graph_break [] 2025-12-04T10:58:28.4459077Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4459124Z Traceback (most recent call last): 2025-12-04T10:58:28.4459278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4459319Z method(*args, **kwargs) 2025-12-04T10:58:28.4459470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4459510Z method(*args, **kwargs) 2025-12-04T10:58:28.4459663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4459700Z with policy(): 2025-12-04T10:58:28.4459865Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4459907Z raise RuntimeError(msg) 2025-12-04T10:58:28.4460308Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4460315Z 2025-12-04T10:58:28.4460388Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4460681Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4460695Z 2025-12-04T10:58:28.4460782Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4460856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4460913Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4461188Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4461261Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4461298Z graph_break [] 2025-12-04T10:58:28.4461370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4461424Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4461497Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4461768Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4461805Z graph_break [] 2025-12-04T10:58:28.4461858Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4462010Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4462056Z Traceback (most recent call last): 2025-12-04T10:58:28.4462209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4462249Z method(*args, **kwargs) 2025-12-04T10:58:28.4462412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4462452Z method(*args, **kwargs) 2025-12-04T10:58:28.4462604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4462640Z with policy(): 2025-12-04T10:58:28.4462806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4462847Z raise RuntimeError(msg) 2025-12-04T10:58:28.4463285Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4463287Z 2025-12-04T10:58:28.4463361Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4463652Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4463671Z 2025-12-04T10:58:28.4463759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4463832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4463887Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4464161Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4464234Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4464284Z graph_break [] 2025-12-04T10:58:28.4464362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4464417Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4464490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4464760Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4464797Z graph_break [] 2025-12-04T10:58:28.4464870Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4464925Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4464996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4465267Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4465304Z graph_break [] 2025-12-04T10:58:28.4465550Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-143ad13a4aa5bafb.xml - 2025-12-04T10:58:28.4465609Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4466259Z FAILED [0.4517s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4466262Z 2025-12-04T10:58:28.4466336Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4466641Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4466644Z 2025-12-04T10:58:28.4466730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4466792Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4466858Z ================== 1 failed, 45 deselected, 2 rerun in 3.95s =================== 2025-12-04T10:58:28.4466894Z Got exit code 1 2025-12-04T10:58:28.4466935Z Retrying single test... 2025-12-04T10:58:28.4467131Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f765b049d77139a4.xml 2025-12-04T10:58:28.4467191Z ============================= test session starts ============================== 2025-12-04T10:58:28.4467321Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4467362Z cachedir: .pytest_cache 2025-12-04T10:58:28.4467522Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4467569Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4467609Z configfile: pytest.ini 2025-12-04T10:58:28.4467769Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4467842Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4468130Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4468186Z Running 1 items in this shard 2025-12-04T10:58:28.4468189Z 2025-12-04T10:58:28.4468555Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:32.220932784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4468558Z 2025-12-04T10:58:28.4468712Z [W1204 10:51:33.489647652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4468714Z 2025-12-04T10:58:28.4468865Z [W1204 10:51:33.489813470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4468867Z 2025-12-04T10:58:28.4469019Z [W1204 10:51:33.493952460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469021Z 2025-12-04T10:58:28.4469170Z [W1204 10:51:33.494359535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469173Z 2025-12-04T10:58:28.4469324Z [W1204 10:51:33.494427444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469326Z 2025-12-04T10:58:28.4469476Z [W1204 10:51:33.496856715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469478Z 2025-12-04T10:58:28.4469627Z [W1204 10:51:33.497163651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469628Z 2025-12-04T10:58:28.4469792Z [W1204 10:51:33.497227540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4469795Z 2025-12-04T10:58:28.4469845Z ('RERUN', {'yellow': True}) [3.2504s] [100%] 2025-12-04T10:58:28.4470218Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:34.277366670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470221Z 2025-12-04T10:58:28.4470370Z [W1204 10:51:34.277758065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470373Z 2025-12-04T10:58:28.4470522Z [W1204 10:51:34.277833624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470524Z 2025-12-04T10:58:28.4470673Z [W1204 10:51:34.279109649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470676Z 2025-12-04T10:58:28.4470824Z [W1204 10:51:34.279376555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470837Z 2025-12-04T10:58:28.4470987Z [W1204 10:51:34.279437635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4470989Z 2025-12-04T10:58:28.4471137Z [W1204 10:51:34.281544159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4471139Z 2025-12-04T10:58:28.4471288Z [W1204 10:51:34.281807916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4471290Z 2025-12-04T10:58:28.4471440Z [W1204 10:51:34.281868435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4471455Z 2025-12-04T10:58:28.4471504Z ('RERUN', {'yellow': True}) [0.6733s] [100%] 2025-12-04T10:58:28.4471866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:34.953925275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4471868Z 2025-12-04T10:58:28.4472016Z [W1204 10:51:34.954330970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472018Z 2025-12-04T10:58:28.4472167Z [W1204 10:51:34.954410839 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472169Z 2025-12-04T10:58:28.4472316Z [W1204 10:51:34.955714963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472321Z 2025-12-04T10:58:28.4472469Z [W1204 10:51:34.955991799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472472Z 2025-12-04T10:58:28.4472622Z [W1204 10:51:34.956062059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472624Z 2025-12-04T10:58:28.4472771Z [W1204 10:51:34.958211683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472773Z 2025-12-04T10:58:28.4472921Z [W1204 10:51:34.958481539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4472923Z 2025-12-04T10:58:28.4473070Z [W1204 10:51:34.958543889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4473073Z 2025-12-04T10:58:28.4473124Z FAILED [0.6318s] [100%] 2025-12-04T10:58:28.4473126Z 2025-12-04T10:58:28.4473177Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4473364Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4473423Z Traceback (most recent call last): 2025-12-04T10:58:28.4473583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4473625Z method(*args, **kwargs) 2025-12-04T10:58:28.4473778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4473819Z method(*args, **kwargs) 2025-12-04T10:58:28.4473968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4474008Z with policy(): 2025-12-04T10:58:28.4474160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4474217Z raise RuntimeError(msg) 2025-12-04T10:58:28.4474614Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4474616Z 2025-12-04T10:58:28.4474691Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4474980Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4474996Z 2025-12-04T10:58:28.4475085Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4475158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4475217Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4475492Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4475566Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4475603Z graph_break [] 2025-12-04T10:58:28.4475754Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4475799Z Traceback (most recent call last): 2025-12-04T10:58:28.4475954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4475995Z method(*args, **kwargs) 2025-12-04T10:58:28.4476145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4476186Z method(*args, **kwargs) 2025-12-04T10:58:28.4476336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4476373Z with policy(): 2025-12-04T10:58:28.4476524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4476565Z raise RuntimeError(msg) 2025-12-04T10:58:28.4476978Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4476982Z 2025-12-04T10:58:28.4477056Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4477356Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4477358Z 2025-12-04T10:58:28.4477446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4477519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4477575Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4477849Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4477923Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4477972Z graph_break [] 2025-12-04T10:58:28.4478045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4478099Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4478171Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4478441Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4478476Z graph_break [] 2025-12-04T10:58:28.4478528Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4478681Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4478739Z Traceback (most recent call last): 2025-12-04T10:58:28.4478893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4478935Z method(*args, **kwargs) 2025-12-04T10:58:28.4479085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4479125Z method(*args, **kwargs) 2025-12-04T10:58:28.4479274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4479311Z with policy(): 2025-12-04T10:58:28.4479463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4479504Z raise RuntimeError(msg) 2025-12-04T10:58:28.4479907Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4479911Z 2025-12-04T10:58:28.4479985Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4480274Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4480278Z 2025-12-04T10:58:28.4480363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4480437Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4480491Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4480778Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4480851Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4480887Z graph_break [] 2025-12-04T10:58:28.4480985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4481040Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4481110Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4481382Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4481417Z graph_break [] 2025-12-04T10:58:28.4481492Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4481546Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4481629Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4481900Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4481937Z graph_break [] 2025-12-04T10:58:28.4482181Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-f765b049d77139a4.xml - 2025-12-04T10:58:28.4482241Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4482880Z FAILED [0.6318s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4482895Z 2025-12-04T10:58:28.4482968Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4483289Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4483291Z 2025-12-04T10:58:28.4483377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4483439Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4483507Z ================== 1 failed, 57 deselected, 2 rerun in 4.72s =================== 2025-12-04T10:58:28.4483544Z Got exit code 1 2025-12-04T10:58:28.4483585Z Retrying single test... 2025-12-04T10:58:28.4483785Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-975aabb7a71ef7ca.xml 2025-12-04T10:58:28.4483842Z ============================= test session starts ============================== 2025-12-04T10:58:28.4483952Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4483993Z cachedir: .pytest_cache 2025-12-04T10:58:28.4484151Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4484197Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4484237Z configfile: pytest.ini 2025-12-04T10:58:28.4484414Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4484489Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4484800Z stepcurrent: skipping 45 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4484844Z Running 1 items in this shard 2025-12-04T10:58:28.4484846Z 2025-12-04T10:58:28.4485209Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:44.141996349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485211Z 2025-12-04T10:58:28.4485365Z [W1204 10:51:45.402951060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485368Z 2025-12-04T10:58:28.4485519Z [W1204 10:51:45.403119938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485537Z 2025-12-04T10:58:28.4485688Z [W1204 10:51:45.406885942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485691Z 2025-12-04T10:58:28.4485839Z [W1204 10:51:45.407207078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485841Z 2025-12-04T10:58:28.4485989Z [W1204 10:51:45.407269418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4485991Z 2025-12-04T10:58:28.4486138Z [W1204 10:51:45.409415402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4486155Z 2025-12-04T10:58:28.4486304Z [W1204 10:51:45.409691078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4486307Z 2025-12-04T10:58:28.4486454Z [W1204 10:51:45.409750388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4486457Z 2025-12-04T10:58:28.4486506Z ('RERUN', {'yellow': True}) [3.3036s] [100%] 2025-12-04T10:58:28.4486866Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:45.152724416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4486868Z 2025-12-04T10:58:28.4487019Z [W1204 10:51:45.153100352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487022Z 2025-12-04T10:58:28.4487171Z [W1204 10:51:45.153168721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487174Z 2025-12-04T10:58:28.4487322Z [W1204 10:51:45.154427876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487324Z 2025-12-04T10:58:28.4487473Z [W1204 10:51:45.154680143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487475Z 2025-12-04T10:58:28.4487623Z [W1204 10:51:45.154738942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487625Z 2025-12-04T10:58:28.4487778Z [W1204 10:51:45.156748577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487781Z 2025-12-04T10:58:28.4487939Z [W1204 10:51:45.157018174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4487942Z 2025-12-04T10:58:28.4488090Z [W1204 10:51:45.157080463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4488103Z 2025-12-04T10:58:28.4488152Z ('RERUN', {'yellow': True}) [0.6133s] [100%] 2025-12-04T10:58:28.4488509Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:51:46.784121533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4488511Z 2025-12-04T10:58:28.4488660Z [W1204 10:51:46.784478088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4488663Z 2025-12-04T10:58:28.4488812Z [W1204 10:51:46.784542058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4488826Z 2025-12-04T10:58:28.4488974Z [W1204 10:51:46.785793992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4488977Z 2025-12-04T10:58:28.4489127Z [W1204 10:51:46.786053329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4489129Z 2025-12-04T10:58:28.4489276Z [W1204 10:51:46.786113598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4489278Z 2025-12-04T10:58:28.4489426Z [W1204 10:51:46.788112854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4489439Z 2025-12-04T10:58:28.4489589Z [W1204 10:51:46.788376581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4489593Z 2025-12-04T10:58:28.4489740Z [W1204 10:51:46.788435950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4489742Z 2025-12-04T10:58:28.4489782Z FAILED [0.6159s] [100%] 2025-12-04T10:58:28.4489784Z 2025-12-04T10:58:28.4489834Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4489986Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4490032Z Traceback (most recent call last): 2025-12-04T10:58:28.4490189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4490230Z method(*args, **kwargs) 2025-12-04T10:58:28.4490385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4490426Z method(*args, **kwargs) 2025-12-04T10:58:28.4490579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4490615Z with policy(): 2025-12-04T10:58:28.4490769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4490810Z raise RuntimeError(msg) 2025-12-04T10:58:28.4491205Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4491209Z 2025-12-04T10:58:28.4491292Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4491583Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4491586Z 2025-12-04T10:58:28.4491687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4491760Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4491817Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4492092Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4492166Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4492204Z graph_break [] 2025-12-04T10:58:28.4492354Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4492412Z Traceback (most recent call last): 2025-12-04T10:58:28.4492567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4492606Z method(*args, **kwargs) 2025-12-04T10:58:28.4492758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4492797Z method(*args, **kwargs) 2025-12-04T10:58:28.4492947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4492983Z with policy(): 2025-12-04T10:58:28.4493136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4493190Z raise RuntimeError(msg) 2025-12-04T10:58:28.4493626Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4493629Z 2025-12-04T10:58:28.4493701Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4493991Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4493993Z 2025-12-04T10:58:28.4494080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4494154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4494211Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4494484Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4494559Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4494595Z graph_break [] 2025-12-04T10:58:28.4494668Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4494721Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4494794Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4495087Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4495125Z graph_break [] 2025-12-04T10:58:28.4495177Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4495329Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4495388Z Traceback (most recent call last): 2025-12-04T10:58:28.4495543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4495583Z method(*args, **kwargs) 2025-12-04T10:58:28.4495735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4495774Z method(*args, **kwargs) 2025-12-04T10:58:28.4495926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4495964Z with policy(): 2025-12-04T10:58:28.4496119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4496173Z raise RuntimeError(msg) 2025-12-04T10:58:28.4496576Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4496578Z 2025-12-04T10:58:28.4496652Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4496938Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4496958Z 2025-12-04T10:58:28.4497045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4497119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4497175Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4497448Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4497521Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4497556Z graph_break [] 2025-12-04T10:58:28.4497629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4497683Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4497754Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4498029Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4498067Z graph_break [] 2025-12-04T10:58:28.4498140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4498195Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4498265Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4498535Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4498572Z graph_break [] 2025-12-04T10:58:28.4498828Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-975aabb7a71ef7ca.xml - 2025-12-04T10:58:28.4498889Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4499532Z FAILED [0.6159s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4499534Z 2025-12-04T10:58:28.4499607Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4499895Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4499911Z 2025-12-04T10:58:28.4499997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4500059Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4500126Z ================== 1 failed, 57 deselected, 2 rerun in 4.68s =================== 2025-12-04T10:58:28.4500164Z Got exit code 1 2025-12-04T10:58:28.4500404Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4500534Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4500733Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c671f56cba5b0340.xml 2025-12-04T10:58:28.4500803Z ============================= test session starts ============================== 2025-12-04T10:58:28.4500914Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4500955Z cachedir: .pytest_cache 2025-12-04T10:58:28.4501113Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4501160Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4501200Z configfile: pytest.ini 2025-12-04T10:58:28.4501359Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4501433Z collecting ... collected 58 items / 46 deselected / 12 selected 2025-12-04T10:58:28.4501486Z stepcurrent: skipping 46 already run items. 2025-12-04T10:58:28.4501529Z Running 12 items in this shard 2025-12-04T10:58:28.4501532Z 2025-12-04T10:58:28.4501786Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5178s] [ 8%] 2025-12-04T10:58:28.4502035Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4473s] [ 8%] 2025-12-04T10:58:28.4502260Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4354s] [ 8%] 2025-12-04T10:58:28.4502263Z 2025-12-04T10:58:28.4502314Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4502465Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4502512Z Traceback (most recent call last): 2025-12-04T10:58:28.4502678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4502721Z method(*args, **kwargs) 2025-12-04T10:58:28.4502872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4502923Z method(*args, **kwargs) 2025-12-04T10:58:28.4503074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4503112Z with policy(): 2025-12-04T10:58:28.4503295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4503337Z raise RuntimeError(msg) 2025-12-04T10:58:28.4503738Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4503759Z 2025-12-04T10:58:28.4503833Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4504126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4504129Z 2025-12-04T10:58:28.4504215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4504289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4504345Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4504523Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4504609Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4504647Z graph_break [] 2025-12-04T10:58:28.4504798Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4504845Z Traceback (most recent call last): 2025-12-04T10:58:28.4504998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4505038Z method(*args, **kwargs) 2025-12-04T10:58:28.4505189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4505229Z method(*args, **kwargs) 2025-12-04T10:58:28.4505378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4505417Z with policy(): 2025-12-04T10:58:28.4505568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4505610Z raise RuntimeError(msg) 2025-12-04T10:58:28.4506019Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4506022Z 2025-12-04T10:58:28.4506095Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4506387Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4506392Z 2025-12-04T10:58:28.4506492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4506566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4506621Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4506811Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4506884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4506920Z graph_break [] 2025-12-04T10:58:28.4506992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4507047Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4507117Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4507293Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4507341Z graph_break [] 2025-12-04T10:58:28.4507393Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4507546Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4507592Z Traceback (most recent call last): 2025-12-04T10:58:28.4507745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4507785Z method(*args, **kwargs) 2025-12-04T10:58:28.4507935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4507975Z method(*args, **kwargs) 2025-12-04T10:58:28.4508127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4508182Z with policy(): 2025-12-04T10:58:28.4508333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4508376Z raise RuntimeError(msg) 2025-12-04T10:58:28.4508783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4508785Z 2025-12-04T10:58:28.4508858Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4509155Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4509158Z 2025-12-04T10:58:28.4509244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4509319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4509373Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4509549Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4509621Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4509656Z graph_break [] 2025-12-04T10:58:28.4509728Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4509782Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4509853Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4510040Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4510077Z graph_break [] 2025-12-04T10:58:28.4510150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4510203Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4510286Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4510459Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4510496Z graph_break [] 2025-12-04T10:58:28.4510739Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c671f56cba5b0340.xml - 2025-12-04T10:58:28.4510800Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4511446Z FAILED [0.4354s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4511462Z 2025-12-04T10:58:28.4511534Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4511823Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4511825Z 2025-12-04T10:58:28.4511923Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4511985Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4512051Z ================== 1 failed, 46 deselected, 2 rerun in 3.57s =================== 2025-12-04T10:58:28.4512089Z Got exit code 1 2025-12-04T10:58:28.4512128Z Retrying single test... 2025-12-04T10:58:28.4512328Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2a4a093da105e4f5.xml 2025-12-04T10:58:28.4512385Z ============================= test session starts ============================== 2025-12-04T10:58:28.4512493Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4512534Z cachedir: .pytest_cache 2025-12-04T10:58:28.4512692Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4512738Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4512779Z configfile: pytest.ini 2025-12-04T10:58:28.4512939Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4513012Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4513336Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4513381Z Running 1 items in this shard 2025-12-04T10:58:28.4513383Z 2025-12-04T10:58:28.4513764Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:05.022328757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4513768Z 2025-12-04T10:58:28.4513921Z [W1204 10:52:06.295389941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4513924Z 2025-12-04T10:58:28.4514089Z [W1204 10:52:06.295524549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514091Z 2025-12-04T10:58:28.4514241Z [W1204 10:52:06.299106316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514243Z 2025-12-04T10:58:28.4514392Z [W1204 10:52:06.299422522 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514394Z 2025-12-04T10:58:28.4514542Z [W1204 10:52:06.299484501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514546Z 2025-12-04T10:58:28.4514695Z [W1204 10:52:06.301675544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514711Z 2025-12-04T10:58:28.4514860Z [W1204 10:52:06.301957921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4514863Z 2025-12-04T10:58:28.4515011Z [W1204 10:52:06.302022980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4515013Z 2025-12-04T10:58:28.4515062Z ('RERUN', {'yellow': True}) [2.9379s] [100%] 2025-12-04T10:58:28.4515426Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:07.436570775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4515442Z 2025-12-04T10:58:28.4515592Z [W1204 10:52:07.436982470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4515595Z 2025-12-04T10:58:28.4515744Z [W1204 10:52:07.437062559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4515747Z 2025-12-04T10:58:28.4515894Z [W1204 10:52:07.438377313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4515896Z 2025-12-04T10:58:28.4516044Z [W1204 10:52:07.438650399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4516046Z 2025-12-04T10:58:28.4516193Z [W1204 10:52:07.438711778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4516196Z 2025-12-04T10:58:28.4516346Z [W1204 10:52:07.440674734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4516348Z 2025-12-04T10:58:28.4516497Z [W1204 10:52:07.441023270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4516499Z 2025-12-04T10:58:28.4516647Z [W1204 10:52:07.441089139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4516649Z 2025-12-04T10:58:28.4516698Z ('RERUN', {'yellow': True}) [0.6430s] [100%] 2025-12-04T10:58:28.4517057Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:07.077814164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517060Z 2025-12-04T10:58:28.4517220Z [W1204 10:52:07.078297198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517223Z 2025-12-04T10:58:28.4517371Z [W1204 10:52:07.078370847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517385Z 2025-12-04T10:58:28.4517533Z [W1204 10:52:07.079691631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517535Z 2025-12-04T10:58:28.4517685Z [W1204 10:52:07.079957717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517686Z 2025-12-04T10:58:28.4517834Z [W1204 10:52:07.080022517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517837Z 2025-12-04T10:58:28.4517986Z [W1204 10:52:07.081994252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4517999Z 2025-12-04T10:58:28.4518147Z [W1204 10:52:07.082340918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4518149Z 2025-12-04T10:58:28.4518299Z [W1204 10:52:07.082406127 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4518301Z 2025-12-04T10:58:28.4518340Z FAILED [0.6359s] [100%] 2025-12-04T10:58:28.4518342Z 2025-12-04T10:58:28.4518392Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4518546Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4518591Z Traceback (most recent call last): 2025-12-04T10:58:28.4518764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4518805Z method(*args, **kwargs) 2025-12-04T10:58:28.4518958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4518997Z method(*args, **kwargs) 2025-12-04T10:58:28.4519149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4519185Z with policy(): 2025-12-04T10:58:28.4519338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4519378Z raise RuntimeError(msg) 2025-12-04T10:58:28.4519781Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4519785Z 2025-12-04T10:58:28.4519858Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4520151Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4520153Z 2025-12-04T10:58:28.4520240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4520312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4520369Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4520557Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4520632Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4520669Z graph_break [] 2025-12-04T10:58:28.4520821Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4520865Z Traceback (most recent call last): 2025-12-04T10:58:28.4521031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4521071Z method(*args, **kwargs) 2025-12-04T10:58:28.4521222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4521261Z method(*args, **kwargs) 2025-12-04T10:58:28.4521413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4521450Z with policy(): 2025-12-04T10:58:28.4521602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4521660Z raise RuntimeError(msg) 2025-12-04T10:58:28.4522073Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4522075Z 2025-12-04T10:58:28.4522147Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4522439Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4522441Z 2025-12-04T10:58:28.4522542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4522615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4522672Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4522850Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4522923Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4522959Z graph_break [] 2025-12-04T10:58:28.4523032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4523086Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4523157Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4523364Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4523402Z graph_break [] 2025-12-04T10:58:28.4523453Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4523606Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4523651Z Traceback (most recent call last): 2025-12-04T10:58:28.4523805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4523845Z method(*args, **kwargs) 2025-12-04T10:58:28.4523996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4524035Z method(*args, **kwargs) 2025-12-04T10:58:28.4524186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4524241Z with policy(): 2025-12-04T10:58:28.4524395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4524437Z raise RuntimeError(msg) 2025-12-04T10:58:28.4524858Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4524861Z 2025-12-04T10:58:28.4524933Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4525223Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4525226Z 2025-12-04T10:58:28.4525315Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4525403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4525458Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4525633Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4525706Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4525742Z graph_break [] 2025-12-04T10:58:28.4525814Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4525867Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4525938Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4526112Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4526164Z graph_break [] 2025-12-04T10:58:28.4526236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4526291Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4526362Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4526536Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4526571Z graph_break [] 2025-12-04T10:58:28.4526815Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2a4a093da105e4f5.xml - 2025-12-04T10:58:28.4526874Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4527518Z FAILED [0.6359s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4527522Z 2025-12-04T10:58:28.4527595Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4527883Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4527885Z 2025-12-04T10:58:28.4527983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4528045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4528112Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.4528149Z Got exit code 1 2025-12-04T10:58:28.4528190Z Retrying single test... 2025-12-04T10:58:28.4528399Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8d0acf921fd76a3f.xml 2025-12-04T10:58:28.4528456Z ============================= test session starts ============================== 2025-12-04T10:58:28.4528566Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4528607Z cachedir: .pytest_cache 2025-12-04T10:58:28.4528764Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4528811Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4528851Z configfile: pytest.ini 2025-12-04T10:58:28.4529010Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4529095Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4529384Z stepcurrent: skipping 46 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4529427Z Running 1 items in this shard 2025-12-04T10:58:28.4529430Z 2025-12-04T10:58:28.4529794Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:16.215554853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4529808Z 2025-12-04T10:58:28.4529962Z [W1204 10:52:17.471469167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4529965Z 2025-12-04T10:58:28.4530116Z [W1204 10:52:17.471625996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530119Z 2025-12-04T10:58:28.4530269Z [W1204 10:52:17.474630699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530271Z 2025-12-04T10:58:28.4530419Z [W1204 10:52:17.474915065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530422Z 2025-12-04T10:58:28.4530570Z [W1204 10:52:17.474975244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530573Z 2025-12-04T10:58:28.4530724Z [W1204 10:52:17.477144668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530727Z 2025-12-04T10:58:28.4530875Z [W1204 10:52:17.477411464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4530877Z 2025-12-04T10:58:28.4531027Z [W1204 10:52:17.477472964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4531029Z 2025-12-04T10:58:28.4531076Z ('RERUN', {'yellow': True}) [2.7483s] [100%] 2025-12-04T10:58:28.4531437Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:18.426088412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4531440Z 2025-12-04T10:58:28.4531600Z [W1204 10:52:18.426464607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4531603Z 2025-12-04T10:58:28.4531751Z [W1204 10:52:18.426539726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4531753Z 2025-12-04T10:58:28.4531912Z [W1204 10:52:18.427796631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4531914Z 2025-12-04T10:58:28.4532063Z [W1204 10:52:18.428074318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4532065Z 2025-12-04T10:58:28.4532213Z [W1204 10:52:18.428139667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4532215Z 2025-12-04T10:58:28.4532365Z [W1204 10:52:18.430062603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4532379Z 2025-12-04T10:58:28.4532527Z [W1204 10:52:18.430410999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4532529Z 2025-12-04T10:58:28.4532678Z [W1204 10:52:18.430476468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4532680Z 2025-12-04T10:58:28.4532727Z ('RERUN', {'yellow': True}) [0.4484s] [100%] 2025-12-04T10:58:28.4533089Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:52:18.879931152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533092Z 2025-12-04T10:58:28.4533294Z [W1204 10:52:18.880301687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533297Z 2025-12-04T10:58:28.4533446Z [W1204 10:52:18.880369136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533448Z 2025-12-04T10:58:28.4533597Z [W1204 10:52:18.881623441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533599Z 2025-12-04T10:58:28.4533747Z [W1204 10:52:18.881878318 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533749Z 2025-12-04T10:58:28.4533897Z [W1204 10:52:18.881938017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4533899Z 2025-12-04T10:58:28.4534047Z [W1204 10:52:18.883849893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4534049Z 2025-12-04T10:58:28.4534199Z [W1204 10:52:18.884187999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4534201Z 2025-12-04T10:58:28.4534349Z [W1204 10:52:18.884252648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4534352Z 2025-12-04T10:58:28.4534390Z FAILED [0.4473s] [100%] 2025-12-04T10:58:28.4534392Z 2025-12-04T10:58:28.4534444Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4534595Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4534642Z Traceback (most recent call last): 2025-12-04T10:58:28.4534813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4534856Z method(*args, **kwargs) 2025-12-04T10:58:28.4535011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4535051Z method(*args, **kwargs) 2025-12-04T10:58:28.4535222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4535260Z with policy(): 2025-12-04T10:58:28.4535411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4535452Z raise RuntimeError(msg) 2025-12-04T10:58:28.4535851Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4535855Z 2025-12-04T10:58:28.4535943Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4536233Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4536237Z 2025-12-04T10:58:28.4536323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4536397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4536452Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4536629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4536718Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4536755Z graph_break [] 2025-12-04T10:58:28.4536907Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4536952Z Traceback (most recent call last): 2025-12-04T10:58:28.4537105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4537146Z method(*args, **kwargs) 2025-12-04T10:58:28.4537298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4537337Z method(*args, **kwargs) 2025-12-04T10:58:28.4537487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4537524Z with policy(): 2025-12-04T10:58:28.4537677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4537719Z raise RuntimeError(msg) 2025-12-04T10:58:28.4538128Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4538130Z 2025-12-04T10:58:28.4538204Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4538493Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4538496Z 2025-12-04T10:58:28.4538594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4538669Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4538725Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4538914Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4538987Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4539023Z graph_break [] 2025-12-04T10:58:28.4539097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4539151Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4539222Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4539398Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4539436Z graph_break [] 2025-12-04T10:58:28.4539488Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4539652Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4539697Z Traceback (most recent call last): 2025-12-04T10:58:28.4539850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4539891Z method(*args, **kwargs) 2025-12-04T10:58:28.4540040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4540080Z method(*args, **kwargs) 2025-12-04T10:58:28.4540228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4540278Z with policy(): 2025-12-04T10:58:28.4540430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4540471Z raise RuntimeError(msg) 2025-12-04T10:58:28.4540881Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4540883Z 2025-12-04T10:58:28.4540957Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4541247Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4541250Z 2025-12-04T10:58:28.4541338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4541412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4541468Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4541647Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4541719Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4541756Z graph_break [] 2025-12-04T10:58:28.4541828Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4541883Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4541954Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4542139Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4542176Z graph_break [] 2025-12-04T10:58:28.4542250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4542304Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4542376Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4542559Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4542596Z graph_break [] 2025-12-04T10:58:28.4542840Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8d0acf921fd76a3f.xml - 2025-12-04T10:58:28.4542900Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4543575Z FAILED [0.4473s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4543596Z 2025-12-04T10:58:28.4543668Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4543959Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4543962Z 2025-12-04T10:58:28.4544047Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4544124Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4544190Z ================== 1 failed, 57 deselected, 2 rerun in 3.81s =================== 2025-12-04T10:58:28.4544229Z Got exit code 1 2025-12-04T10:58:28.4544471Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4544601Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4544799Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a888f6fd2b69dcfa.xml 2025-12-04T10:58:28.4544857Z ============================= test session starts ============================== 2025-12-04T10:58:28.4544966Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4545010Z cachedir: .pytest_cache 2025-12-04T10:58:28.4545168Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4545215Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4545255Z configfile: pytest.ini 2025-12-04T10:58:28.4545415Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4545489Z collecting ... collected 58 items / 47 deselected / 11 selected 2025-12-04T10:58:28.4545541Z stepcurrent: skipping 47 already run items. 2025-12-04T10:58:28.4545585Z Running 11 items in this shard 2025-12-04T10:58:28.4545587Z 2025-12-04T10:58:28.4545837Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4603s] [ 9%] 2025-12-04T10:58:28.4546099Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4461s] [ 9%] 2025-12-04T10:58:28.4546323Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4707s] [ 9%] 2025-12-04T10:58:28.4546339Z 2025-12-04T10:58:28.4546391Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4546541Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4546586Z Traceback (most recent call last): 2025-12-04T10:58:28.4546742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4546782Z method(*args, **kwargs) 2025-12-04T10:58:28.4546937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4546977Z method(*args, **kwargs) 2025-12-04T10:58:28.4547140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4547177Z with policy(): 2025-12-04T10:58:28.4547331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4547372Z raise RuntimeError(msg) 2025-12-04T10:58:28.4547766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4547770Z 2025-12-04T10:58:28.4547844Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4548147Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4548150Z 2025-12-04T10:58:28.4548237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4548311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4548366Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4548542Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4548614Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4548650Z graph_break [] 2025-12-04T10:58:28.4548802Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4548848Z Traceback (most recent call last): 2025-12-04T10:58:28.4549003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4549043Z method(*args, **kwargs) 2025-12-04T10:58:28.4549195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4549234Z method(*args, **kwargs) 2025-12-04T10:58:28.4549383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4549420Z with policy(): 2025-12-04T10:58:28.4549572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4549613Z raise RuntimeError(msg) 2025-12-04T10:58:28.4550031Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4550036Z 2025-12-04T10:58:28.4550119Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4550410Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4550412Z 2025-12-04T10:58:28.4550499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4550572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4550627Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4550805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4550889Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4550925Z graph_break [] 2025-12-04T10:58:28.4550998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4551053Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4551124Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4551299Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4551335Z graph_break [] 2025-12-04T10:58:28.4551388Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4551550Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4551596Z Traceback (most recent call last): 2025-12-04T10:58:28.4551750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4551790Z method(*args, **kwargs) 2025-12-04T10:58:28.4551941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4551981Z method(*args, **kwargs) 2025-12-04T10:58:28.4552130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4552167Z with policy(): 2025-12-04T10:58:28.4552318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4552359Z raise RuntimeError(msg) 2025-12-04T10:58:28.4552769Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4552772Z 2025-12-04T10:58:28.4552845Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4553133Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4553135Z 2025-12-04T10:58:28.4553220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4553337Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4553407Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4553582Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4553655Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4553691Z graph_break [] 2025-12-04T10:58:28.4553777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4553832Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4553903Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4554077Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4554113Z graph_break [] 2025-12-04T10:58:28.4554185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4554241Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4554312Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4554500Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4554538Z graph_break [] 2025-12-04T10:58:28.4554784Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a888f6fd2b69dcfa.xml - 2025-12-04T10:58:28.4554842Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4555474Z FAILED [0.4707s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4555493Z 2025-12-04T10:58:28.4555565Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4555853Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4555855Z 2025-12-04T10:58:28.4555940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4556002Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4556069Z ================== 1 failed, 47 deselected, 2 rerun in 3.54s =================== 2025-12-04T10:58:28.4556107Z Got exit code 1 2025-12-04T10:58:28.4556147Z Retrying single test... 2025-12-04T10:58:28.4556345Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11161411cfa84902.xml 2025-12-04T10:58:28.4556403Z ============================= test session starts ============================== 2025-12-04T10:58:28.4556513Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4556554Z cachedir: .pytest_cache 2025-12-04T10:58:28.4556711Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4556757Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4556797Z configfile: pytest.ini 2025-12-04T10:58:28.4556956Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4557042Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4557331Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4557375Z Running 1 items in this shard 2025-12-04T10:58:28.4557377Z 2025-12-04T10:58:28.4557750Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:37.486947000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4557753Z 2025-12-04T10:58:28.4557907Z [W1204 10:52:37.772260464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4557910Z 2025-12-04T10:58:28.4558061Z [W1204 10:52:37.772392943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558064Z 2025-12-04T10:58:28.4558226Z [W1204 10:52:37.775957779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558228Z 2025-12-04T10:58:28.4558378Z [W1204 10:52:37.776279715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558380Z 2025-12-04T10:58:28.4558529Z [W1204 10:52:37.776342994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558531Z 2025-12-04T10:58:28.4558678Z [W1204 10:52:37.778709835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558680Z 2025-12-04T10:58:28.4558830Z [W1204 10:52:37.778985461 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558842Z 2025-12-04T10:58:28.4558991Z [W1204 10:52:37.779050280 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4558994Z 2025-12-04T10:58:28.4559043Z ('RERUN', {'yellow': True}) [2.8771s] [100%] 2025-12-04T10:58:28.4559411Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:38.881122139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4559413Z 2025-12-04T10:58:28.4559561Z [W1204 10:52:38.881529754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4559563Z 2025-12-04T10:58:28.4559711Z [W1204 10:52:38.881593803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4559714Z 2025-12-04T10:58:28.4559863Z [W1204 10:52:38.882858037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4559866Z 2025-12-04T10:58:28.4560014Z [W1204 10:52:38.883177353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4560016Z 2025-12-04T10:58:28.4560164Z [W1204 10:52:38.883242932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4560166Z 2025-12-04T10:58:28.4560313Z [W1204 10:52:38.885322267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4560315Z 2025-12-04T10:58:28.4560475Z [W1204 10:52:38.885658933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4560478Z 2025-12-04T10:58:28.4560627Z [W1204 10:52:38.885721622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4560629Z 2025-12-04T10:58:28.4560678Z ('RERUN', {'yellow': True}) [0.6066s] [100%] 2025-12-04T10:58:28.4561046Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:39.489979481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561049Z 2025-12-04T10:58:28.4561200Z [W1204 10:52:39.490363917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561201Z 2025-12-04T10:58:28.4561352Z [W1204 10:52:39.490431526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561355Z 2025-12-04T10:58:28.4561503Z [W1204 10:52:39.491719890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561516Z 2025-12-04T10:58:28.4561666Z [W1204 10:52:39.491976917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561668Z 2025-12-04T10:58:28.4561815Z [W1204 10:52:39.492057796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561818Z 2025-12-04T10:58:28.4561966Z [W1204 10:52:39.494125630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4561967Z 2025-12-04T10:58:28.4562117Z [W1204 10:52:39.494464566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4562136Z 2025-12-04T10:58:28.4562285Z [W1204 10:52:39.494528045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4562287Z 2025-12-04T10:58:28.4562326Z FAILED [0.5991s] [100%] 2025-12-04T10:58:28.4562328Z 2025-12-04T10:58:28.4562380Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4562533Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4562578Z Traceback (most recent call last): 2025-12-04T10:58:28.4562737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4562776Z method(*args, **kwargs) 2025-12-04T10:58:28.4562932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4562974Z method(*args, **kwargs) 2025-12-04T10:58:28.4563126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4563163Z with policy(): 2025-12-04T10:58:28.4563352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4563393Z raise RuntimeError(msg) 2025-12-04T10:58:28.4563788Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4563790Z 2025-12-04T10:58:28.4563863Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4564167Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4564171Z 2025-12-04T10:58:28.4564259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4564346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4564403Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4564578Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4564652Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4564688Z graph_break [] 2025-12-04T10:58:28.4564840Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4564887Z Traceback (most recent call last): 2025-12-04T10:58:28.4565041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4565095Z method(*args, **kwargs) 2025-12-04T10:58:28.4565249Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4565288Z method(*args, **kwargs) 2025-12-04T10:58:28.4565438Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4565474Z with policy(): 2025-12-04T10:58:28.4565626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4565666Z raise RuntimeError(msg) 2025-12-04T10:58:28.4566069Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4566086Z 2025-12-04T10:58:28.4566160Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4566448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4566450Z 2025-12-04T10:58:28.4566537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4566609Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4566665Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4566841Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4566916Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4566952Z graph_break [] 2025-12-04T10:58:28.4567026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4567080Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4567152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4567328Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4567364Z graph_break [] 2025-12-04T10:58:28.4567415Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4567578Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4567624Z Traceback (most recent call last): 2025-12-04T10:58:28.4567778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4567818Z method(*args, **kwargs) 2025-12-04T10:58:28.4567981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4568021Z method(*args, **kwargs) 2025-12-04T10:58:28.4568171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4568207Z with policy(): 2025-12-04T10:58:28.4568359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4568399Z raise RuntimeError(msg) 2025-12-04T10:58:28.4568801Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4568817Z 2025-12-04T10:58:28.4568890Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4569178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4569180Z 2025-12-04T10:58:28.4569266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4569338Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4569394Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4569585Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4569659Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4569695Z graph_break [] 2025-12-04T10:58:28.4569768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4569823Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4569895Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4570069Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4570106Z graph_break [] 2025-12-04T10:58:28.4570178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4570233Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4570306Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4570481Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4570517Z graph_break [] 2025-12-04T10:58:28.4570761Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11161411cfa84902.xml - 2025-12-04T10:58:28.4570820Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4571460Z FAILED [0.5991s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4571464Z 2025-12-04T10:58:28.4571537Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4571839Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4571841Z 2025-12-04T10:58:28.4571928Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4571989Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4572055Z ================== 1 failed, 57 deselected, 2 rerun in 4.25s =================== 2025-12-04T10:58:28.4572092Z Got exit code 1 2025-12-04T10:58:28.4572132Z Retrying single test... 2025-12-04T10:58:28.4572333Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac300b286205a215.xml 2025-12-04T10:58:28.4572403Z ============================= test session starts ============================== 2025-12-04T10:58:28.4572514Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4572555Z cachedir: .pytest_cache 2025-12-04T10:58:28.4572713Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4572758Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4572798Z configfile: pytest.ini 2025-12-04T10:58:28.4572957Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4573030Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4573366Z stepcurrent: skipping 47 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4573412Z Running 1 items in this shard 2025-12-04T10:58:28.4573414Z 2025-12-04T10:58:28.4573775Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:48.703128966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4573777Z 2025-12-04T10:58:28.4573931Z [W1204 10:52:48.984204202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4573933Z 2025-12-04T10:58:28.4574083Z [W1204 10:52:48.984357291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574088Z 2025-12-04T10:58:28.4574237Z [W1204 10:52:48.987694989 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574240Z 2025-12-04T10:58:28.4574391Z [W1204 10:52:48.988011795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574393Z 2025-12-04T10:58:28.4574541Z [W1204 10:52:48.988074964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574543Z 2025-12-04T10:58:28.4574692Z [W1204 10:52:48.990316937 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574693Z 2025-12-04T10:58:28.4574841Z [W1204 10:52:48.990588113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4574844Z 2025-12-04T10:58:28.4575009Z [W1204 10:52:48.990648953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4575012Z 2025-12-04T10:58:28.4575061Z ('RERUN', {'yellow': True}) [2.8711s] [100%] 2025-12-04T10:58:28.4575432Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:49.108074671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4575435Z 2025-12-04T10:58:28.4575585Z [W1204 10:52:49.108464086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4575587Z 2025-12-04T10:58:28.4575735Z [W1204 10:52:49.108550585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4575738Z 2025-12-04T10:58:28.4575887Z [W1204 10:52:49.109833339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4575909Z 2025-12-04T10:58:28.4576059Z [W1204 10:52:49.110109136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4576062Z 2025-12-04T10:58:28.4576210Z [W1204 10:52:49.110172125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4576212Z 2025-12-04T10:58:28.4576359Z [W1204 10:52:49.112234140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4576361Z 2025-12-04T10:58:28.4576508Z [W1204 10:52:49.112582445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4576523Z 2025-12-04T10:58:28.4576674Z [W1204 10:52:49.112647195 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4576676Z 2025-12-04T10:58:28.4576723Z ('RERUN', {'yellow': True}) [0.6253s] [100%] 2025-12-04T10:58:28.4577082Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:52:50.735206846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577084Z 2025-12-04T10:58:28.4577234Z [W1204 10:52:50.735594021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577236Z 2025-12-04T10:58:28.4577384Z [W1204 10:52:50.735660360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577386Z 2025-12-04T10:58:28.4577535Z [W1204 10:52:50.736950994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577538Z 2025-12-04T10:58:28.4577686Z [W1204 10:52:50.737213231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577688Z 2025-12-04T10:58:28.4577836Z [W1204 10:52:50.737276800 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577838Z 2025-12-04T10:58:28.4577986Z [W1204 10:52:50.739289755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4577989Z 2025-12-04T10:58:28.4578137Z [W1204 10:52:50.739628911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4578139Z 2025-12-04T10:58:28.4578302Z [W1204 10:52:50.739691420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4578305Z 2025-12-04T10:58:28.4578343Z FAILED [0.6205s] [100%] 2025-12-04T10:58:28.4578345Z 2025-12-04T10:58:28.4578396Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4578558Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4578605Z Traceback (most recent call last): 2025-12-04T10:58:28.4578760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4578801Z method(*args, **kwargs) 2025-12-04T10:58:28.4578952Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4578993Z method(*args, **kwargs) 2025-12-04T10:58:28.4579145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4579195Z with policy(): 2025-12-04T10:58:28.4579348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4579389Z raise RuntimeError(msg) 2025-12-04T10:58:28.4579787Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4579791Z 2025-12-04T10:58:28.4579863Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4580154Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4580169Z 2025-12-04T10:58:28.4580256Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4580330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4580387Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4580564Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4580637Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4580673Z graph_break [] 2025-12-04T10:58:28.4580823Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4580869Z Traceback (most recent call last): 2025-12-04T10:58:28.4581024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4581066Z method(*args, **kwargs) 2025-12-04T10:58:28.4581216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4581256Z method(*args, **kwargs) 2025-12-04T10:58:28.4581406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4581444Z with policy(): 2025-12-04T10:58:28.4581595Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4581636Z raise RuntimeError(msg) 2025-12-04T10:58:28.4582045Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4582051Z 2025-12-04T10:58:28.4582123Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4582425Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4582427Z 2025-12-04T10:58:28.4582515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4582589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4582644Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4582820Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4582894Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4582942Z graph_break [] 2025-12-04T10:58:28.4583014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4583068Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4583140Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4583346Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4583382Z graph_break [] 2025-12-04T10:58:28.4583434Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4583583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4583643Z Traceback (most recent call last): 2025-12-04T10:58:28.4583797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4583841Z method(*args, **kwargs) 2025-12-04T10:58:28.4583991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4584033Z method(*args, **kwargs) 2025-12-04T10:58:28.4584182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4584219Z with policy(): 2025-12-04T10:58:28.4584371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4584413Z raise RuntimeError(msg) 2025-12-04T10:58:28.4584817Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4584821Z 2025-12-04T10:58:28.4584894Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4585182Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4585184Z 2025-12-04T10:58:28.4585270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4585344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4585398Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4585587Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4585661Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4585699Z graph_break [] 2025-12-04T10:58:28.4585771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4585840Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4585911Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4586086Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4586122Z graph_break [] 2025-12-04T10:58:28.4586195Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4586249Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4586321Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4586495Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4586549Z graph_break [] 2025-12-04T10:58:28.4586795Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ac300b286205a215.xml - 2025-12-04T10:58:28.4586855Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4587486Z FAILED [0.6205s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4587500Z 2025-12-04T10:58:28.4587573Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4587862Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4587864Z 2025-12-04T10:58:28.4587950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4588011Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4588078Z ================== 1 failed, 57 deselected, 2 rerun in 4.28s =================== 2025-12-04T10:58:28.4588115Z Got exit code 1 2025-12-04T10:58:28.4588354Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4588484Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4588684Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0ea9147f3a3da7ce.xml 2025-12-04T10:58:28.4588742Z ============================= test session starts ============================== 2025-12-04T10:58:28.4588852Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4588893Z cachedir: .pytest_cache 2025-12-04T10:58:28.4589052Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4589097Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4589137Z configfile: pytest.ini 2025-12-04T10:58:28.4589307Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4589384Z collecting ... collected 58 items / 48 deselected / 10 selected 2025-12-04T10:58:28.4589437Z stepcurrent: skipping 48 already run items. 2025-12-04T10:58:28.4589481Z Running 10 items in this shard 2025-12-04T10:58:28.4589483Z 2025-12-04T10:58:28.4589755Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8820s] [ 10%] 2025-12-04T10:58:28.4590003Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4635s] [ 10%] 2025-12-04T10:58:28.4590225Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4502s] [ 10%] 2025-12-04T10:58:28.4590229Z 2025-12-04T10:58:28.4590280Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4590441Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4590487Z Traceback (most recent call last): 2025-12-04T10:58:28.4590645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4590684Z method(*args, **kwargs) 2025-12-04T10:58:28.4590837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4590876Z method(*args, **kwargs) 2025-12-04T10:58:28.4591027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4591063Z with policy(): 2025-12-04T10:58:28.4591230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4591272Z raise RuntimeError(msg) 2025-12-04T10:58:28.4591668Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4591670Z 2025-12-04T10:58:28.4591742Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4592033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4592035Z 2025-12-04T10:58:28.4592122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4592196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4592253Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4592531Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4592606Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4592641Z graph_break [] 2025-12-04T10:58:28.4592793Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4592837Z Traceback (most recent call last): 2025-12-04T10:58:28.4592990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4593043Z method(*args, **kwargs) 2025-12-04T10:58:28.4593196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4593236Z method(*args, **kwargs) 2025-12-04T10:58:28.4593422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4593476Z with policy(): 2025-12-04T10:58:28.4593633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4593673Z raise RuntimeError(msg) 2025-12-04T10:58:28.4594075Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4594079Z 2025-12-04T10:58:28.4594152Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4594458Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4594461Z 2025-12-04T10:58:28.4594547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4594620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4594675Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4594950Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4595045Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4595081Z graph_break [] 2025-12-04T10:58:28.4595153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4595209Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4595280Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4595551Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4595587Z graph_break [] 2025-12-04T10:58:28.4595639Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4595790Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4595836Z Traceback (most recent call last): 2025-12-04T10:58:28.4595992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4596033Z method(*args, **kwargs) 2025-12-04T10:58:28.4596183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4596222Z method(*args, **kwargs) 2025-12-04T10:58:28.4596373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4596410Z with policy(): 2025-12-04T10:58:28.4596563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4596604Z raise RuntimeError(msg) 2025-12-04T10:58:28.4597022Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4597026Z 2025-12-04T10:58:28.4597099Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4597398Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4597401Z 2025-12-04T10:58:28.4597488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4597560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4597615Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4597892Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4597978Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4598013Z graph_break [] 2025-12-04T10:58:28.4598087Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4598141Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4598212Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4598481Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4598517Z graph_break [] 2025-12-04T10:58:28.4598588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4598656Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4598727Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4598999Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4599034Z graph_break [] 2025-12-04T10:58:28.4599279Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0ea9147f3a3da7ce.xml - 2025-12-04T10:58:28.4599337Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4599971Z FAILED [0.4502s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4599976Z 2025-12-04T10:58:28.4600049Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4600338Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4600340Z 2025-12-04T10:58:28.4600425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4600486Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4600564Z ================== 1 failed, 48 deselected, 2 rerun in 3.96s =================== 2025-12-04T10:58:28.4600602Z Got exit code 1 2025-12-04T10:58:28.4600642Z Retrying single test... 2025-12-04T10:58:28.4600842Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d7a9bb4dfdbdcd4f.xml 2025-12-04T10:58:28.4600910Z ============================= test session starts ============================== 2025-12-04T10:58:28.4601020Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4601062Z cachedir: .pytest_cache 2025-12-04T10:58:28.4601219Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4601265Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4601305Z configfile: pytest.ini 2025-12-04T10:58:28.4601467Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4601541Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4601838Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4601884Z Running 1 items in this shard 2025-12-04T10:58:28.4601886Z 2025-12-04T10:58:28.4602247Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:10.230041300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4602250Z 2025-12-04T10:58:28.4602404Z [W1204 10:53:11.517038483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4602417Z 2025-12-04T10:58:28.4602570Z [W1204 10:53:11.517206521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4602572Z 2025-12-04T10:58:28.4602722Z [W1204 10:53:11.520546719 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4602724Z 2025-12-04T10:58:28.4602873Z [W1204 10:53:11.520868965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4602875Z 2025-12-04T10:58:28.4603023Z [W1204 10:53:11.520929385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4603025Z 2025-12-04T10:58:28.4603173Z [W1204 10:53:11.523222276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4603175Z 2025-12-04T10:58:28.4603362Z [W1204 10:53:11.523501612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4603365Z 2025-12-04T10:58:28.4603514Z [W1204 10:53:11.523560732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4603516Z 2025-12-04T10:58:28.4603565Z ('RERUN', {'yellow': True}) [3.2782s] [100%] 2025-12-04T10:58:28.4603927Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:12.295064446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4603929Z 2025-12-04T10:58:28.4604079Z [W1204 10:53:12.295484710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604081Z 2025-12-04T10:58:28.4604254Z [W1204 10:53:12.295561559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604257Z 2025-12-04T10:58:28.4604407Z [W1204 10:53:12.296855563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604409Z 2025-12-04T10:58:28.4604573Z [W1204 10:53:12.297124680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604574Z 2025-12-04T10:58:28.4604726Z [W1204 10:53:12.297188769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604728Z 2025-12-04T10:58:28.4604877Z [W1204 10:53:12.299301043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4604879Z 2025-12-04T10:58:28.4605027Z [W1204 10:53:12.299569519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4605029Z 2025-12-04T10:58:28.4605193Z [W1204 10:53:12.299634849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4605194Z 2025-12-04T10:58:28.4605243Z ('RERUN', {'yellow': True}) [0.6388s] [100%] 2025-12-04T10:58:28.4605600Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:12.932364590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4605602Z 2025-12-04T10:58:28.4605750Z [W1204 10:53:12.932744946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4605753Z 2025-12-04T10:58:28.4605916Z [W1204 10:53:12.932816385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4605919Z 2025-12-04T10:58:28.4606068Z [W1204 10:53:12.934092929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606070Z 2025-12-04T10:58:28.4606219Z [W1204 10:53:12.934359385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606221Z 2025-12-04T10:58:28.4606370Z [W1204 10:53:12.934422105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606372Z 2025-12-04T10:58:28.4606520Z [W1204 10:53:12.936527728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606522Z 2025-12-04T10:58:28.4606672Z [W1204 10:53:12.936798085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606675Z 2025-12-04T10:58:28.4606825Z [W1204 10:53:12.936860274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4606827Z 2025-12-04T10:58:28.4606865Z FAILED [0.6315s] [100%] 2025-12-04T10:58:28.4606867Z 2025-12-04T10:58:28.4606920Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4607071Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4607117Z Traceback (most recent call last): 2025-12-04T10:58:28.4607273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4607314Z method(*args, **kwargs) 2025-12-04T10:58:28.4607477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4607519Z method(*args, **kwargs) 2025-12-04T10:58:28.4607671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4607708Z with policy(): 2025-12-04T10:58:28.4607873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4607914Z raise RuntimeError(msg) 2025-12-04T10:58:28.4608309Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4608311Z 2025-12-04T10:58:28.4608385Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4608676Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4608690Z 2025-12-04T10:58:28.4608776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4608852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4608908Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4609184Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4609257Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4609293Z graph_break [] 2025-12-04T10:58:28.4609457Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4609503Z Traceback (most recent call last): 2025-12-04T10:58:28.4609656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4609698Z method(*args, **kwargs) 2025-12-04T10:58:28.4609849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4609889Z method(*args, **kwargs) 2025-12-04T10:58:28.4610040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4610077Z with policy(): 2025-12-04T10:58:28.4610228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4610271Z raise RuntimeError(msg) 2025-12-04T10:58:28.4610672Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4610676Z 2025-12-04T10:58:28.4610749Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4611042Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4611044Z 2025-12-04T10:58:28.4611131Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4611205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4611273Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4611547Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4611633Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4611670Z graph_break [] 2025-12-04T10:58:28.4611743Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4611797Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4611868Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4612138Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4612176Z graph_break [] 2025-12-04T10:58:28.4612228Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4612391Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4612436Z Traceback (most recent call last): 2025-12-04T10:58:28.4612590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4612631Z method(*args, **kwargs) 2025-12-04T10:58:28.4612781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4612821Z method(*args, **kwargs) 2025-12-04T10:58:28.4612971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4613008Z with policy(): 2025-12-04T10:58:28.4613176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4613219Z raise RuntimeError(msg) 2025-12-04T10:58:28.4613655Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4613658Z 2025-12-04T10:58:28.4613730Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4614018Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4614020Z 2025-12-04T10:58:28.4614109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4614182Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4614238Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4614512Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4614585Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4614621Z graph_break [] 2025-12-04T10:58:28.4614693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4614747Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4614817Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4615101Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4615139Z graph_break [] 2025-12-04T10:58:28.4615212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4615280Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4615353Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4615623Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4615658Z graph_break [] 2025-12-04T10:58:28.4615906Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d7a9bb4dfdbdcd4f.xml - 2025-12-04T10:58:28.4615966Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4616616Z FAILED [0.6315s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4616619Z 2025-12-04T10:58:28.4616690Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4616980Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4617002Z 2025-12-04T10:58:28.4617089Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4617151Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4617218Z ================== 1 failed, 57 deselected, 2 rerun in 4.71s =================== 2025-12-04T10:58:28.4617254Z Got exit code 1 2025-12-04T10:58:28.4617295Z Retrying single test... 2025-12-04T10:58:28.4617491Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bd880748d82ecddd.xml 2025-12-04T10:58:28.4617549Z ============================= test session starts ============================== 2025-12-04T10:58:28.4617659Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4617700Z cachedir: .pytest_cache 2025-12-04T10:58:28.4617860Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4617906Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4617947Z configfile: pytest.ini 2025-12-04T10:58:28.4618108Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4618181Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4618468Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4618512Z Running 1 items in this shard 2025-12-04T10:58:28.4618514Z 2025-12-04T10:58:28.4618888Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:22.048744210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4618893Z 2025-12-04T10:58:28.4619045Z [W1204 10:53:23.319709054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619049Z 2025-12-04T10:58:28.4619211Z [W1204 10:53:23.319867542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619213Z 2025-12-04T10:58:28.4619365Z [W1204 10:53:23.323618445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619367Z 2025-12-04T10:58:28.4619514Z [W1204 10:53:23.323931861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619516Z 2025-12-04T10:58:28.4619666Z [W1204 10:53:23.323999260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619669Z 2025-12-04T10:58:28.4619829Z [W1204 10:53:23.326147863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619831Z 2025-12-04T10:58:28.4619982Z [W1204 10:53:23.326422440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4619984Z 2025-12-04T10:58:28.4620131Z [W1204 10:53:23.326481369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4620133Z 2025-12-04T10:58:28.4620181Z ('RERUN', {'yellow': True}) [3.2083s] [100%] 2025-12-04T10:58:28.4620540Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:23.929048807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4620554Z 2025-12-04T10:58:28.4620705Z [W1204 10:53:23.929419943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4620707Z 2025-12-04T10:58:28.4620857Z [W1204 10:53:23.929489072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4620859Z 2025-12-04T10:58:28.4621007Z [W1204 10:53:23.930744386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621010Z 2025-12-04T10:58:28.4621158Z [W1204 10:53:23.930996593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621159Z 2025-12-04T10:58:28.4621309Z [W1204 10:53:23.931062582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621311Z 2025-12-04T10:58:28.4621459Z [W1204 10:53:23.933053727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621462Z 2025-12-04T10:58:28.4621612Z [W1204 10:53:23.933313584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621614Z 2025-12-04T10:58:28.4621762Z [W1204 10:53:23.933373843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4621764Z 2025-12-04T10:58:28.4621811Z ('RERUN', {'yellow': True}) [0.4690s] [100%] 2025-12-04T10:58:28.4622184Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:53:24.403476137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622187Z 2025-12-04T10:58:28.4622336Z [W1204 10:53:24.403843172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622338Z 2025-12-04T10:58:28.4622498Z [W1204 10:53:24.403913491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622500Z 2025-12-04T10:58:28.4622649Z [W1204 10:53:24.405172815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622651Z 2025-12-04T10:58:28.4622799Z [W1204 10:53:24.405436932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622801Z 2025-12-04T10:58:28.4622950Z [W1204 10:53:24.405498421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4622954Z 2025-12-04T10:58:28.4623102Z [W1204 10:53:24.407501556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4623116Z 2025-12-04T10:58:28.4623296Z [W1204 10:53:24.407765813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4623298Z 2025-12-04T10:58:28.4623446Z [W1204 10:53:24.407827572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4623448Z 2025-12-04T10:58:28.4623487Z FAILED [0.4701s] [100%] 2025-12-04T10:58:28.4623489Z 2025-12-04T10:58:28.4623540Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4623692Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4623753Z Traceback (most recent call last): 2025-12-04T10:58:28.4623912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4623953Z method(*args, **kwargs) 2025-12-04T10:58:28.4624111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4624151Z method(*args, **kwargs) 2025-12-04T10:58:28.4624302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4624338Z with policy(): 2025-12-04T10:58:28.4624492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4624532Z raise RuntimeError(msg) 2025-12-04T10:58:28.4624929Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4624933Z 2025-12-04T10:58:28.4625006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4625296Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4625298Z 2025-12-04T10:58:28.4625385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4625458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4625515Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4625804Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4625879Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4625915Z graph_break [] 2025-12-04T10:58:28.4626077Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4626122Z Traceback (most recent call last): 2025-12-04T10:58:28.4626277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4626316Z method(*args, **kwargs) 2025-12-04T10:58:28.4626468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4626507Z method(*args, **kwargs) 2025-12-04T10:58:28.4626660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4626711Z with policy(): 2025-12-04T10:58:28.4626864Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4626906Z raise RuntimeError(msg) 2025-12-04T10:58:28.4627309Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4627311Z 2025-12-04T10:58:28.4628877Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4629171Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4629189Z 2025-12-04T10:58:28.4629278Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4629352Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4629409Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4629684Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4629757Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4629793Z graph_break [] 2025-12-04T10:58:28.4629868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4629921Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4629996Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4630271Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4630310Z graph_break [] 2025-12-04T10:58:28.4630362Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4630515Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4630561Z Traceback (most recent call last): 2025-12-04T10:58:28.4630717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4630758Z method(*args, **kwargs) 2025-12-04T10:58:28.4630922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4630963Z method(*args, **kwargs) 2025-12-04T10:58:28.4631113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4631150Z with policy(): 2025-12-04T10:58:28.4631314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4631356Z raise RuntimeError(msg) 2025-12-04T10:58:28.4631758Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4631761Z 2025-12-04T10:58:28.4631835Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4632127Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4632150Z 2025-12-04T10:58:28.4632238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4632312Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4632368Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4632642Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4632715Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4632764Z graph_break [] 2025-12-04T10:58:28.4632837Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4632893Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4632963Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4633236Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4633310Z graph_break [] 2025-12-04T10:58:28.4633384Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4633437Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4633509Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4633778Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4633816Z graph_break [] 2025-12-04T10:58:28.4634063Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bd880748d82ecddd.xml - 2025-12-04T10:58:28.4634123Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4634779Z FAILED [0.4701s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4634784Z 2025-12-04T10:58:28.4634856Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4635159Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4635161Z 2025-12-04T10:58:28.4635247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4635308Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4635374Z ================== 1 failed, 57 deselected, 2 rerun in 4.32s =================== 2025-12-04T10:58:28.4635411Z Got exit code 1 2025-12-04T10:58:28.4635651Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4635782Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4635993Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2415da58c2773266.xml 2025-12-04T10:58:28.4636052Z ============================= test session starts ============================== 2025-12-04T10:58:28.4636163Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4636205Z cachedir: .pytest_cache 2025-12-04T10:58:28.4636363Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4636409Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4636450Z configfile: pytest.ini 2025-12-04T10:58:28.4636612Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4636699Z collecting ... collected 58 items / 49 deselected / 9 selected 2025-12-04T10:58:28.4636755Z stepcurrent: skipping 49 already run items. 2025-12-04T10:58:28.4636798Z Running 9 items in this shard 2025-12-04T10:58:28.4636802Z 2025-12-04T10:58:28.4637059Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.6778s] [ 11%] 2025-12-04T10:58:28.4637310Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.6594s] [ 11%] 2025-12-04T10:58:28.4637534Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.5801s] [ 11%] 2025-12-04T10:58:28.4637537Z 2025-12-04T10:58:28.4637590Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4637742Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4637789Z Traceback (most recent call last): 2025-12-04T10:58:28.4637947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4637988Z method(*args, **kwargs) 2025-12-04T10:58:28.4638141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4638181Z method(*args, **kwargs) 2025-12-04T10:58:28.4638332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4638369Z with policy(): 2025-12-04T10:58:28.4638534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4638576Z raise RuntimeError(msg) 2025-12-04T10:58:28.4638990Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4638993Z 2025-12-04T10:58:28.4639066Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4639363Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4639365Z 2025-12-04T10:58:28.4639451Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4639527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4639582Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4639773Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4639846Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4639883Z graph_break [] 2025-12-04T10:58:28.4640036Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4640082Z Traceback (most recent call last): 2025-12-04T10:58:28.4640235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4640276Z method(*args, **kwargs) 2025-12-04T10:58:28.4640426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4640484Z method(*args, **kwargs) 2025-12-04T10:58:28.4640635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4640673Z with policy(): 2025-12-04T10:58:28.4640826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4640868Z raise RuntimeError(msg) 2025-12-04T10:58:28.4641283Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4641287Z 2025-12-04T10:58:28.4641360Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4641657Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4641660Z 2025-12-04T10:58:28.4641746Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4641821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4641876Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4642053Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4642125Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4642162Z graph_break [] 2025-12-04T10:58:28.4642245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4642302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4642375Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4642561Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4642597Z graph_break [] 2025-12-04T10:58:28.4642649Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4642801Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4642847Z Traceback (most recent call last): 2025-12-04T10:58:28.4642999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4643040Z method(*args, **kwargs) 2025-12-04T10:58:28.4643192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4643245Z method(*args, **kwargs) 2025-12-04T10:58:28.4643422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4643459Z with policy(): 2025-12-04T10:58:28.4643612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4643653Z raise RuntimeError(msg) 2025-12-04T10:58:28.4644064Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4644081Z 2025-12-04T10:58:28.4644155Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4644448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4644451Z 2025-12-04T10:58:28.4644538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4644612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4644667Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4644843Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4644915Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4644952Z graph_break [] 2025-12-04T10:58:28.4645025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4645079Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4645151Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4645326Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4645362Z graph_break [] 2025-12-04T10:58:28.4645436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4645489Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4645561Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4645735Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4645787Z graph_break [] 2025-12-04T10:58:28.4646030Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2415da58c2773266.xml - 2025-12-04T10:58:28.4646090Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4646760Z FAILED [0.5801s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4646763Z 2025-12-04T10:58:28.4646836Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4647128Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4647146Z 2025-12-04T10:58:28.4647232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4647295Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4647361Z ================== 1 failed, 49 deselected, 2 rerun in 4.08s =================== 2025-12-04T10:58:28.4647399Z Got exit code 1 2025-12-04T10:58:28.4647438Z Retrying single test... 2025-12-04T10:58:28.4647638Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-81028ec0cdbfd263.xml 2025-12-04T10:58:28.4647696Z ============================= test session starts ============================== 2025-12-04T10:58:28.4647819Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4647860Z cachedir: .pytest_cache 2025-12-04T10:58:28.4648020Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4648066Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4648107Z configfile: pytest.ini 2025-12-04T10:58:28.4648268Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4648340Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4648629Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4648672Z Running 1 items in this shard 2025-12-04T10:58:28.4648675Z 2025-12-04T10:58:28.4649044Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:43.110127908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649047Z 2025-12-04T10:58:28.4649201Z [W1204 10:53:44.380208986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649203Z 2025-12-04T10:58:28.4649356Z [W1204 10:53:44.380350554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649358Z 2025-12-04T10:58:28.4649508Z [W1204 10:53:44.383858930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649510Z 2025-12-04T10:58:28.4649672Z [W1204 10:53:44.384177986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649675Z 2025-12-04T10:58:28.4649823Z [W1204 10:53:44.384242365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649826Z 2025-12-04T10:58:28.4649985Z [W1204 10:53:44.386501547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4649987Z 2025-12-04T10:58:28.4650137Z [W1204 10:53:44.386774403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4650139Z 2025-12-04T10:58:28.4650287Z [W1204 10:53:44.386836032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4650290Z 2025-12-04T10:58:28.4650338Z ('RERUN', {'yellow': True}) [2.9390s] [100%] 2025-12-04T10:58:28.4650704Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:45.601619563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4650718Z 2025-12-04T10:58:28.4650869Z [W1204 10:53:45.602008099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4650871Z 2025-12-04T10:58:28.4651020Z [W1204 10:53:45.602080568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651022Z 2025-12-04T10:58:28.4651169Z [W1204 10:53:45.603385561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651171Z 2025-12-04T10:58:28.4651320Z [W1204 10:53:45.603655148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651333Z 2025-12-04T10:58:28.4651483Z [W1204 10:53:45.603717027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651485Z 2025-12-04T10:58:28.4651634Z [W1204 10:53:45.605765181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651636Z 2025-12-04T10:58:28.4651784Z [W1204 10:53:45.606113687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651786Z 2025-12-04T10:58:28.4651933Z [W1204 10:53:45.606181066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4651935Z 2025-12-04T10:58:28.4651984Z ('RERUN', {'yellow': True}) [0.7178s] [100%] 2025-12-04T10:58:28.4652349Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:46.308701290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4652353Z 2025-12-04T10:58:28.4652503Z [W1204 10:53:46.309083066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4652504Z 2025-12-04T10:58:28.4652653Z [W1204 10:53:46.309148895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4652655Z 2025-12-04T10:58:28.4652803Z [W1204 10:53:46.310426679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4652805Z 2025-12-04T10:58:28.4652967Z [W1204 10:53:46.310681755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4652970Z 2025-12-04T10:58:28.4653118Z [W1204 10:53:46.310740385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4653121Z 2025-12-04T10:58:28.4653321Z [W1204 10:53:46.312785869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4653323Z 2025-12-04T10:58:28.4653473Z [W1204 10:53:46.313131815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4653474Z 2025-12-04T10:58:28.4653621Z [W1204 10:53:46.313196654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4653624Z 2025-12-04T10:58:28.4653663Z FAILED [0.7062s] [100%] 2025-12-04T10:58:28.4653664Z 2025-12-04T10:58:28.4653717Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4653872Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4653933Z Traceback (most recent call last): 2025-12-04T10:58:28.4654092Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4654132Z method(*args, **kwargs) 2025-12-04T10:58:28.4654286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4654325Z method(*args, **kwargs) 2025-12-04T10:58:28.4654477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4654513Z with policy(): 2025-12-04T10:58:28.4654668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4654724Z raise RuntimeError(msg) 2025-12-04T10:58:28.4655129Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4655132Z 2025-12-04T10:58:28.4655206Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4655498Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4655500Z 2025-12-04T10:58:28.4655587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4655662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4655719Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4655896Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4655970Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4656006Z graph_break [] 2025-12-04T10:58:28.4656161Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4656206Z Traceback (most recent call last): 2025-12-04T10:58:28.4656362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4656401Z method(*args, **kwargs) 2025-12-04T10:58:28.4656568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4656609Z method(*args, **kwargs) 2025-12-04T10:58:28.4656761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4656797Z with policy(): 2025-12-04T10:58:28.4656960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4657001Z raise RuntimeError(msg) 2025-12-04T10:58:28.4657415Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4657418Z 2025-12-04T10:58:28.4657491Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4657785Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4657799Z 2025-12-04T10:58:28.4657887Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4657961Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4658017Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4658193Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4658266Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4658302Z graph_break [] 2025-12-04T10:58:28.4658375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4658443Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4658514Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4658690Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4658728Z graph_break [] 2025-12-04T10:58:28.4658780Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4658934Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4658978Z Traceback (most recent call last): 2025-12-04T10:58:28.4659133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4659173Z method(*args, **kwargs) 2025-12-04T10:58:28.4659327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4659367Z method(*args, **kwargs) 2025-12-04T10:58:28.4659518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4659554Z with policy(): 2025-12-04T10:58:28.4659707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4659748Z raise RuntimeError(msg) 2025-12-04T10:58:28.4660161Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4660164Z 2025-12-04T10:58:28.4660255Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4660548Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4660552Z 2025-12-04T10:58:28.4660651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4660724Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4660781Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4660957Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4661029Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4661064Z graph_break [] 2025-12-04T10:58:28.4661139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4661192Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4661276Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4661451Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4661487Z graph_break [] 2025-12-04T10:58:28.4661560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4661614Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4661684Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4661859Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4661907Z graph_break [] 2025-12-04T10:58:28.4662153Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-81028ec0cdbfd263.xml - 2025-12-04T10:58:28.4662213Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4662860Z FAILED [0.7062s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4662862Z 2025-12-04T10:58:28.4662935Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4663228Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4663231Z 2025-12-04T10:58:28.4663348Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4663410Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4663477Z ================== 1 failed, 57 deselected, 2 rerun in 4.53s =================== 2025-12-04T10:58:28.4663513Z Got exit code 1 2025-12-04T10:58:28.4663554Z Retrying single test... 2025-12-04T10:58:28.4663749Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d8207232cd8dd97.xml 2025-12-04T10:58:28.4663806Z ============================= test session starts ============================== 2025-12-04T10:58:28.4663933Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4663973Z cachedir: .pytest_cache 2025-12-04T10:58:28.4664132Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4664177Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4664233Z configfile: pytest.ini 2025-12-04T10:58:28.4664392Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4664465Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4664755Z stepcurrent: skipping 49 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4664799Z Running 1 items in this shard 2025-12-04T10:58:28.4664802Z 2025-12-04T10:58:28.4665169Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:55.759290292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665186Z 2025-12-04T10:58:28.4665342Z [W1204 10:53:55.038070512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665344Z 2025-12-04T10:58:28.4665495Z [W1204 10:53:55.038226170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665498Z 2025-12-04T10:58:28.4665648Z [W1204 10:53:55.041299841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665650Z 2025-12-04T10:58:28.4665800Z [W1204 10:53:55.041609748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665816Z 2025-12-04T10:58:28.4665966Z [W1204 10:53:55.041671217 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4665968Z 2025-12-04T10:58:28.4666117Z [W1204 10:53:55.043865599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4666119Z 2025-12-04T10:58:28.4666266Z [W1204 10:53:55.044144636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4666268Z 2025-12-04T10:58:28.4666417Z [W1204 10:53:55.044209725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4666420Z 2025-12-04T10:58:28.4666468Z ('RERUN', {'yellow': True}) [2.9885s] [100%] 2025-12-04T10:58:28.4666832Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:56.266462753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4666835Z 2025-12-04T10:58:28.4666986Z [W1204 10:53:56.266847238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4666988Z 2025-12-04T10:58:28.4667136Z [W1204 10:53:56.266911837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667138Z 2025-12-04T10:58:28.4667287Z [W1204 10:53:57.268208861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667288Z 2025-12-04T10:58:28.4667448Z [W1204 10:53:57.268473667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667452Z 2025-12-04T10:58:28.4667600Z [W1204 10:53:57.268534567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667603Z 2025-12-04T10:58:28.4667763Z [W1204 10:53:57.270607370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667765Z 2025-12-04T10:58:28.4667913Z [W1204 10:53:57.270946486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4667914Z 2025-12-04T10:58:28.4668062Z [W1204 10:53:57.271013385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4668064Z 2025-12-04T10:58:28.4668111Z ('RERUN', {'yellow': True}) [0.7092s] [100%] 2025-12-04T10:58:28.4668474Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 10:53:57.967920817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4668487Z 2025-12-04T10:58:28.4668638Z [W1204 10:53:57.968308142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4668640Z 2025-12-04T10:58:28.4668788Z [W1204 10:53:57.968375661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4668790Z 2025-12-04T10:58:28.4668939Z [W1204 10:53:57.969647355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4668940Z 2025-12-04T10:58:28.4669090Z [W1204 10:53:57.969905692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4669103Z 2025-12-04T10:58:28.4669251Z [W1204 10:53:57.969965551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4669254Z 2025-12-04T10:58:28.4669404Z [W1204 10:53:57.972041985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4669406Z 2025-12-04T10:58:28.4669555Z [W1204 10:53:57.972392130 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4669556Z 2025-12-04T10:58:28.4669704Z [W1204 10:53:57.972453780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4669706Z 2025-12-04T10:58:28.4669744Z FAILED [0.6973s] [100%] 2025-12-04T10:58:28.4669746Z 2025-12-04T10:58:28.4669798Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4669952Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4669999Z Traceback (most recent call last): 2025-12-04T10:58:28.4670156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4670197Z method(*args, **kwargs) 2025-12-04T10:58:28.4670350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4670390Z method(*args, **kwargs) 2025-12-04T10:58:28.4670541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4670578Z with policy(): 2025-12-04T10:58:28.4670741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4670784Z raise RuntimeError(msg) 2025-12-04T10:58:28.4671198Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 1048576 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4671202Z 2025-12-04T10:58:28.4671275Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4671570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4671573Z 2025-12-04T10:58:28.4671660Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4671735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4671793Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4671983Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4672056Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4672093Z graph_break [] 2025-12-04T10:58:28.4672246Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4672292Z Traceback (most recent call last): 2025-12-04T10:58:28.4672445Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4672485Z method(*args, **kwargs) 2025-12-04T10:58:28.4672636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4672697Z method(*args, **kwargs) 2025-12-04T10:58:28.4672845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4672883Z with policy(): 2025-12-04T10:58:28.4673035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4673076Z raise RuntimeError(msg) 2025-12-04T10:58:28.4673528Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 1048576 and is now reported as 2097152 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4673531Z 2025-12-04T10:58:28.4673604Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4673899Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4673901Z 2025-12-04T10:58:28.4673988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4674063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4674119Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4674295Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4674367Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4674404Z graph_break [] 2025-12-04T10:58:28.4674475Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4674546Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4674618Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4674794Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4674843Z graph_break [] 2025-12-04T10:58:28.4674896Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4675049Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4675094Z Traceback (most recent call last): 2025-12-04T10:58:28.4675246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4675286Z method(*args, **kwargs) 2025-12-04T10:58:28.4675437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4675478Z method(*args, **kwargs) 2025-12-04T10:58:28.4675642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4675679Z with policy(): 2025-12-04T10:58:28.4675831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4675872Z raise RuntimeError(msg) 2025-12-04T10:58:28.4676288Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4676291Z 2025-12-04T10:58:28.4676364Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4676672Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4676675Z 2025-12-04T10:58:28.4676762Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4676835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4676890Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4677065Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4677138Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4677174Z graph_break [] 2025-12-04T10:58:28.4677247Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4677302Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4677374Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4677552Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4677588Z graph_break [] 2025-12-04T10:58:28.4677660Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4677714Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4677785Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4677959Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4677996Z graph_break [] 2025-12-04T10:58:28.4678253Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9d8207232cd8dd97.xml - 2025-12-04T10:58:28.4678313Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4678969Z FAILED [0.6973s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 2097152 and is now reported as 3145728 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4678971Z 2025-12-04T10:58:28.4679043Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4679336Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4679351Z 2025-12-04T10:58:28.4679437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4679500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4679567Z ================== 1 failed, 57 deselected, 2 rerun in 4.56s =================== 2025-12-04T10:58:28.4679603Z Got exit code 1 2025-12-04T10:58:28.4679846Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4679974Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4680174Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3de43d04e6f7eee2.xml 2025-12-04T10:58:28.4680242Z ============================= test session starts ============================== 2025-12-04T10:58:28.4680352Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4680394Z cachedir: .pytest_cache 2025-12-04T10:58:28.4680555Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4680600Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4680641Z configfile: pytest.ini 2025-12-04T10:58:28.4680799Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4680872Z collecting ... collected 58 items / 50 deselected / 8 selected 2025-12-04T10:58:28.4680925Z stepcurrent: skipping 50 already run items. 2025-12-04T10:58:28.4680970Z Running 8 items in this shard 2025-12-04T10:58:28.4680973Z 2025-12-04T10:58:28.4681225Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4947s] [ 12%] 2025-12-04T10:58:28.4681475Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4557s] [ 12%] 2025-12-04T10:58:28.4681698Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 FAILED [0.4614s] [ 12%] 2025-12-04T10:58:28.4681702Z 2025-12-04T10:58:28.4681752Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4681914Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4681960Z Traceback (most recent call last): 2025-12-04T10:58:28.4682117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4682158Z method(*args, **kwargs) 2025-12-04T10:58:28.4682322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4682362Z method(*args, **kwargs) 2025-12-04T10:58:28.4682514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4682550Z with policy(): 2025-12-04T10:58:28.4682704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4682744Z raise RuntimeError(msg) 2025-12-04T10:58:28.4683144Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4683159Z 2025-12-04T10:58:28.4683232Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4683553Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4683556Z 2025-12-04T10:58:28.4683642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4683715Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4683771Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4683963Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4684038Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4684074Z graph_break [] 2025-12-04T10:58:28.4684224Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4684269Z Traceback (most recent call last): 2025-12-04T10:58:28.4684423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4684462Z method(*args, **kwargs) 2025-12-04T10:58:28.4684613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4684652Z method(*args, **kwargs) 2025-12-04T10:58:28.4684803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4684840Z with policy(): 2025-12-04T10:58:28.4684995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4685035Z raise RuntimeError(msg) 2025-12-04T10:58:28.4685444Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4685447Z 2025-12-04T10:58:28.4685519Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4685826Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4685829Z 2025-12-04T10:58:28.4685916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4685990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4686046Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4686235Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4686309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4686344Z graph_break [] 2025-12-04T10:58:28.4686417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4686470Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4686541Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4686717Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4686775Z graph_break [] 2025-12-04T10:58:28.4686826Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4686979Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4687023Z Traceback (most recent call last): 2025-12-04T10:58:28.4687179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4687219Z method(*args, **kwargs) 2025-12-04T10:58:28.4687371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4687410Z method(*args, **kwargs) 2025-12-04T10:58:28.4687575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4687612Z with policy(): 2025-12-04T10:58:28.4687765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4687804Z raise RuntimeError(msg) 2025-12-04T10:58:28.4688211Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4688214Z 2025-12-04T10:58:28.4688286Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4688578Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4688581Z 2025-12-04T10:58:28.4688668Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4688740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4688796Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4688971Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4689044Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4689079Z graph_break [] 2025-12-04T10:58:28.4689153Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4689206Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4689289Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4689465Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4689503Z graph_break [] 2025-12-04T10:58:28.4689575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4689641Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4689712Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4689887Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4689922Z graph_break [] 2025-12-04T10:58:28.4690166Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-3de43d04e6f7eee2.xml - 2025-12-04T10:58:28.4690227Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4690864Z FAILED [0.4614s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4690878Z 2025-12-04T10:58:28.4690951Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4691239Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4691252Z 2025-12-04T10:58:28.4691339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4691401Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4691468Z ================== 1 failed, 50 deselected, 2 rerun in 3.58s =================== 2025-12-04T10:58:28.4691505Z Got exit code 1 2025-12-04T10:58:28.4691546Z Retrying single test... 2025-12-04T10:58:28.4691744Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bd52b4f4d74f10e5.xml 2025-12-04T10:58:28.4691801Z ============================= test session starts ============================== 2025-12-04T10:58:28.4691910Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4691952Z cachedir: .pytest_cache 2025-12-04T10:58:28.4692110Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4692157Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4692197Z configfile: pytest.ini 2025-12-04T10:58:28.4692357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4692431Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4692718Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4692761Z Running 1 items in this shard 2025-12-04T10:58:28.4692763Z 2025-12-04T10:58:28.4693140Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:17.485448887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693143Z 2025-12-04T10:58:28.4693322Z [W1204 10:54:17.761491955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693324Z 2025-12-04T10:58:28.4693491Z [W1204 10:54:17.761664003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693493Z 2025-12-04T10:58:28.4693646Z [W1204 10:54:17.765018460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693648Z 2025-12-04T10:58:28.4693798Z [W1204 10:54:17.765320476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693800Z 2025-12-04T10:58:28.4693952Z [W1204 10:54:17.765387316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4693955Z 2025-12-04T10:58:28.4694104Z [W1204 10:54:17.767635467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4694121Z 2025-12-04T10:58:28.4694270Z [W1204 10:54:17.767913664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4694272Z 2025-12-04T10:58:28.4694421Z [W1204 10:54:17.767975433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4694422Z 2025-12-04T10:58:28.4694471Z ('RERUN', {'yellow': True}) [2.8902s] [100%] 2025-12-04T10:58:28.4694834Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:18.892690973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4694852Z 2025-12-04T10:58:28.4695001Z [W1204 10:54:18.893140057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695005Z 2025-12-04T10:58:28.4695153Z [W1204 10:54:18.893206766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695155Z 2025-12-04T10:58:28.4695303Z [W1204 10:54:18.894475860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695305Z 2025-12-04T10:58:28.4695452Z [W1204 10:54:18.894735897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695454Z 2025-12-04T10:58:28.4695602Z [W1204 10:54:18.894798686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695605Z 2025-12-04T10:58:28.4695753Z [W1204 10:54:18.896869660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695756Z 2025-12-04T10:58:28.4695907Z [W1204 10:54:18.897214756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4695909Z 2025-12-04T10:58:28.4696058Z [W1204 10:54:18.897279985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4696059Z 2025-12-04T10:58:28.4696107Z ('RERUN', {'yellow': True}) [0.6382s] [100%] 2025-12-04T10:58:28.4696480Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:19.531348091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4696484Z 2025-12-04T10:58:28.4696633Z [W1204 10:54:19.531771316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4696635Z 2025-12-04T10:58:28.4696795Z [W1204 10:54:19.531836055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4696796Z 2025-12-04T10:58:28.4696945Z [W1204 10:54:19.533097449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4696948Z 2025-12-04T10:58:28.4697095Z [W1204 10:54:19.533351386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4697097Z 2025-12-04T10:58:28.4697245Z [W1204 10:54:19.533410335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4697248Z 2025-12-04T10:58:28.4697396Z [W1204 10:54:19.535478089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4697410Z 2025-12-04T10:58:28.4697560Z [W1204 10:54:19.535818295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4697561Z 2025-12-04T10:58:28.4697709Z [W1204 10:54:19.535879234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4697711Z 2025-12-04T10:58:28.4697750Z FAILED [0.6245s] [100%] 2025-12-04T10:58:28.4697752Z 2025-12-04T10:58:28.4697803Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4697956Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4698014Z Traceback (most recent call last): 2025-12-04T10:58:28.4698174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4698215Z method(*args, **kwargs) 2025-12-04T10:58:28.4698366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4698408Z method(*args, **kwargs) 2025-12-04T10:58:28.4698558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4698595Z with policy(): 2025-12-04T10:58:28.4698748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4698790Z raise RuntimeError(msg) 2025-12-04T10:58:28.4699192Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4699196Z 2025-12-04T10:58:28.4699270Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4699559Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4699562Z 2025-12-04T10:58:28.4699649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4699723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4699779Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4699968Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4700043Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4700081Z graph_break [] 2025-12-04T10:58:28.4700232Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4700294Z Traceback (most recent call last): 2025-12-04T10:58:28.4700448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4700488Z method(*args, **kwargs) 2025-12-04T10:58:28.4700638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4700677Z method(*args, **kwargs) 2025-12-04T10:58:28.4700826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4700865Z with policy(): 2025-12-04T10:58:28.4701017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4701070Z raise RuntimeError(msg) 2025-12-04T10:58:28.4701475Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4701477Z 2025-12-04T10:58:28.4701556Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4701844Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4701859Z 2025-12-04T10:58:28.4701947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4702022Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4702078Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4702254Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4702327Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4702364Z graph_break [] 2025-12-04T10:58:28.4702436Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4702492Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4702563Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4702741Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4702779Z graph_break [] 2025-12-04T10:58:28.4702831Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4702981Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4703027Z Traceback (most recent call last): 2025-12-04T10:58:28.4703181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4703221Z method(*args, **kwargs) 2025-12-04T10:58:28.4703435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4703475Z method(*args, **kwargs) 2025-12-04T10:58:28.4703643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4703681Z with policy(): 2025-12-04T10:58:28.4703833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4703875Z raise RuntimeError(msg) 2025-12-04T10:58:28.4704296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4704298Z 2025-12-04T10:58:28.4704372Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4704663Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4704667Z 2025-12-04T10:58:28.4704754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4704842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4704897Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4705073Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4705145Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4705181Z graph_break [] 2025-12-04T10:58:28.4705253Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4705308Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4705379Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4705571Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4705608Z graph_break [] 2025-12-04T10:58:28.4705680Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4705734Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4705807Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4705981Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4706017Z graph_break [] 2025-12-04T10:58:28.4706261Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bd52b4f4d74f10e5.xml - 2025-12-04T10:58:28.4706322Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4706965Z FAILED [0.6245s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4706969Z 2025-12-04T10:58:28.4707041Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4707330Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4707333Z 2025-12-04T10:58:28.4707430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4707493Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4707560Z ================== 1 failed, 57 deselected, 2 rerun in 4.32s =================== 2025-12-04T10:58:28.4707597Z Got exit code 1 2025-12-04T10:58:28.4707637Z Retrying single test... 2025-12-04T10:58:28.4707846Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3a056bfd310a8aa.xml 2025-12-04T10:58:28.4707904Z ============================= test session starts ============================== 2025-12-04T10:58:28.4708013Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4708053Z cachedir: .pytest_cache 2025-12-04T10:58:28.4708213Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4708260Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4708300Z configfile: pytest.ini 2025-12-04T10:58:28.4708472Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4708544Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4708833Z stepcurrent: skipping 50 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4708877Z Running 1 items in this shard 2025-12-04T10:58:28.4708879Z 2025-12-04T10:58:28.4709244Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:28.783851428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4709260Z 2025-12-04T10:58:28.4709413Z [W1204 10:54:28.055119259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4709416Z 2025-12-04T10:58:28.4709567Z [W1204 10:54:28.055247757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4709569Z 2025-12-04T10:58:28.4709719Z [W1204 10:54:28.058435537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4709721Z 2025-12-04T10:58:28.4709870Z [W1204 10:54:28.058762102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4709872Z 2025-12-04T10:58:28.4710019Z [W1204 10:54:28.058824452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4710024Z 2025-12-04T10:58:28.4710172Z [W1204 10:54:28.061136612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4710174Z 2025-12-04T10:58:28.4710323Z [W1204 10:54:28.061415599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4710326Z 2025-12-04T10:58:28.4710474Z [W1204 10:54:28.061476618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4710476Z 2025-12-04T10:58:28.4710525Z ('RERUN', {'yellow': True}) [2.8852s] [100%] 2025-12-04T10:58:28.4710886Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:29.185971168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4710900Z 2025-12-04T10:58:28.4711054Z [W1204 10:54:29.186377553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711057Z 2025-12-04T10:58:28.4711224Z [W1204 10:54:29.186445622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711226Z 2025-12-04T10:58:28.4711377Z [W1204 10:54:29.187744395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711379Z 2025-12-04T10:58:28.4711527Z [W1204 10:54:29.188017782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711529Z 2025-12-04T10:58:28.4711677Z [W1204 10:54:29.188080511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711680Z 2025-12-04T10:58:28.4711830Z [W1204 10:54:29.190134825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711844Z 2025-12-04T10:58:28.4711992Z [W1204 10:54:29.190479711 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4711996Z 2025-12-04T10:58:28.4712144Z [W1204 10:54:29.190542770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4712145Z 2025-12-04T10:58:28.4712193Z ('RERUN', {'yellow': True}) [0.6483s] [100%] 2025-12-04T10:58:28.4712550Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 [W1204 10:54:30.870115850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4712564Z 2025-12-04T10:58:28.4712714Z [W1204 10:54:30.870514335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4712717Z 2025-12-04T10:58:28.4712866Z [W1204 10:54:30.870585954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4712869Z 2025-12-04T10:58:28.4713017Z [W1204 10:54:30.871885098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713019Z 2025-12-04T10:58:28.4713167Z [W1204 10:54:30.872160064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713169Z 2025-12-04T10:58:28.4713360Z [W1204 10:54:30.872222473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713363Z 2025-12-04T10:58:28.4713514Z [W1204 10:54:30.874259878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713516Z 2025-12-04T10:58:28.4713664Z [W1204 10:54:30.874601813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713665Z 2025-12-04T10:58:28.4713815Z [W1204 10:54:30.874664543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4713817Z 2025-12-04T10:58:28.4713855Z FAILED [0.6579s] [100%] 2025-12-04T10:58:28.4713857Z 2025-12-04T10:58:28.4713908Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4714061Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4714106Z Traceback (most recent call last): 2025-12-04T10:58:28.4714287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4714329Z method(*args, **kwargs) 2025-12-04T10:58:28.4714482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4714521Z method(*args, **kwargs) 2025-12-04T10:58:28.4714686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4714723Z with policy(): 2025-12-04T10:58:28.4714875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4714916Z raise RuntimeError(msg) 2025-12-04T10:58:28.4715315Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 65536 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4715332Z 2025-12-04T10:58:28.4715406Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4715701Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4715703Z 2025-12-04T10:58:28.4715790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4715864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4715919Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4716098Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4716187Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4716223Z graph_break [] 2025-12-04T10:58:28.4716374Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4716418Z Traceback (most recent call last): 2025-12-04T10:58:28.4716576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4716615Z method(*args, **kwargs) 2025-12-04T10:58:28.4716766Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4716805Z method(*args, **kwargs) 2025-12-04T10:58:28.4716956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4716992Z with policy(): 2025-12-04T10:58:28.4717147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4717189Z raise RuntimeError(msg) 2025-12-04T10:58:28.4717595Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 65536 and is now reported as 131072 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4717598Z 2025-12-04T10:58:28.4717669Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4717962Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4717964Z 2025-12-04T10:58:28.4718063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4718137Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4718192Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4718384Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4718457Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4718493Z graph_break [] 2025-12-04T10:58:28.4718564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4718619Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4718690Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4718866Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4718904Z graph_break [] 2025-12-04T10:58:28.4718955Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4719118Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4719163Z Traceback (most recent call last): 2025-12-04T10:58:28.4719318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4719357Z method(*args, **kwargs) 2025-12-04T10:58:28.4719509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4719548Z method(*args, **kwargs) 2025-12-04T10:58:28.4719699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4719749Z with policy(): 2025-12-04T10:58:28.4719902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4719944Z raise RuntimeError(msg) 2025-12-04T10:58:28.4720349Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4720351Z 2025-12-04T10:58:28.4720424Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4720715Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4720718Z 2025-12-04T10:58:28.4720806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4720878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4720935Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4721111Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4721184Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4721220Z graph_break [] 2025-12-04T10:58:28.4721292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4721346Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4721417Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4721604Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4721642Z graph_break [] 2025-12-04T10:58:28.4721716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4721770Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4721852Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4722028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4722063Z graph_break [] 2025-12-04T10:58:28.4722307Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c3a056bfd310a8aa.xml - 2025-12-04T10:58:28.4722365Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4723005Z FAILED [0.6579s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 196608 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4723019Z 2025-12-04T10:58:28.4723092Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4723419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4723421Z 2025-12-04T10:58:28.4723507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4723586Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4723653Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.4723689Z Got exit code 1 2025-12-04T10:58:28.4723931Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4724059Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4724260Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7e0adb993e2ba729.xml 2025-12-04T10:58:28.4724317Z ============================= test session starts ============================== 2025-12-04T10:58:28.4724428Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4724471Z cachedir: .pytest_cache 2025-12-04T10:58:28.4724630Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4724677Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4724717Z configfile: pytest.ini 2025-12-04T10:58:28.4724876Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4724949Z collecting ... collected 58 items / 51 deselected / 7 selected 2025-12-04T10:58:28.4725002Z stepcurrent: skipping 51 already run items. 2025-12-04T10:58:28.4725046Z Running 7 items in this shard 2025-12-04T10:58:28.4725048Z 2025-12-04T10:58:28.4725299Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8816s] [ 14%] 2025-12-04T10:58:28.4725562Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4715s] [ 14%] 2025-12-04T10:58:28.4725801Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.4825s] [ 14%] 2025-12-04T10:58:28.4725804Z 2025-12-04T10:58:28.4725855Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4726004Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4726048Z Traceback (most recent call last): 2025-12-04T10:58:28.4726204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4726244Z method(*args, **kwargs) 2025-12-04T10:58:28.4726399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4726453Z method(*args, **kwargs) 2025-12-04T10:58:28.4726605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4726641Z with policy(): 2025-12-04T10:58:28.4726795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4726835Z raise RuntimeError(msg) 2025-12-04T10:58:28.4727231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4727233Z 2025-12-04T10:58:28.4727326Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4727618Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4727621Z 2025-12-04T10:58:28.4727708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4727780Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4727837Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4728114Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4728188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4728225Z graph_break [] 2025-12-04T10:58:28.4728376Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4728422Z Traceback (most recent call last): 2025-12-04T10:58:28.4728577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4728616Z method(*args, **kwargs) 2025-12-04T10:58:28.4728769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4728807Z method(*args, **kwargs) 2025-12-04T10:58:28.4728957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4728993Z with policy(): 2025-12-04T10:58:28.4729144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4729198Z raise RuntimeError(msg) 2025-12-04T10:58:28.4729697Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4729701Z 2025-12-04T10:58:28.4729774Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4730063Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4730066Z 2025-12-04T10:58:28.4730152Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4730224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4730281Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4730568Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4730642Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4730677Z graph_break [] 2025-12-04T10:58:28.4730750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4730804Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4730876Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4731147Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4731195Z graph_break [] 2025-12-04T10:58:28.4731247Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4731398Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4731443Z Traceback (most recent call last): 2025-12-04T10:58:28.4731598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4731637Z method(*args, **kwargs) 2025-12-04T10:58:28.4731789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4731828Z method(*args, **kwargs) 2025-12-04T10:58:28.4731978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4732016Z with policy(): 2025-12-04T10:58:28.4732168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4732211Z raise RuntimeError(msg) 2025-12-04T10:58:28.4732618Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4732620Z 2025-12-04T10:58:28.4732693Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4732983Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4732997Z 2025-12-04T10:58:28.4733084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4733158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4733214Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4733610Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4733683Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4733719Z graph_break [] 2025-12-04T10:58:28.4733792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4733845Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4733918Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4734189Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4734240Z graph_break [] 2025-12-04T10:58:28.4734313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4734367Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4734439Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4734709Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4734746Z graph_break [] 2025-12-04T10:58:28.4734990Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-7e0adb993e2ba729.xml - 2025-12-04T10:58:28.4735065Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4735705Z FAILED [0.4825s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4735707Z 2025-12-04T10:58:28.4735780Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4736070Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4736073Z 2025-12-04T10:58:28.4736159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4736221Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4736286Z ================== 1 failed, 51 deselected, 2 rerun in 4.00s =================== 2025-12-04T10:58:28.4736324Z Got exit code 1 2025-12-04T10:58:28.4736363Z Retrying single test... 2025-12-04T10:58:28.4736563Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ed111abc081234f.xml 2025-12-04T10:58:28.4736619Z ============================= test session starts ============================== 2025-12-04T10:58:28.4736730Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4736785Z cachedir: .pytest_cache 2025-12-04T10:58:28.4736944Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4736990Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4737030Z configfile: pytest.ini 2025-12-04T10:58:28.4737200Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4737274Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4737559Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4737603Z Running 1 items in this shard 2025-12-04T10:58:28.4737605Z 2025-12-04T10:58:28.4737970Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:54:51.606142705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4737990Z 2025-12-04T10:58:28.4738143Z [W1204 10:54:51.861735710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738145Z 2025-12-04T10:58:28.4738298Z [W1204 10:54:51.861879798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738300Z 2025-12-04T10:58:28.4738450Z [W1204 10:54:51.865296245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738452Z 2025-12-04T10:58:28.4738602Z [W1204 10:54:51.865595851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738616Z 2025-12-04T10:58:28.4738766Z [W1204 10:54:51.865655310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738769Z 2025-12-04T10:58:28.4738918Z [W1204 10:54:51.867771773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4738921Z 2025-12-04T10:58:28.4739069Z [W1204 10:54:51.868050589 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4739071Z 2025-12-04T10:58:28.4739219Z [W1204 10:54:51.868111659 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4739221Z 2025-12-04T10:58:28.4739270Z ('RERUN', {'yellow': True}) [3.2854s] [100%] 2025-12-04T10:58:28.4739630Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:54:52.632600520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4739634Z 2025-12-04T10:58:28.4739785Z [W1204 10:54:52.632965275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4739788Z 2025-12-04T10:58:28.4739937Z [W1204 10:54:52.633035645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4739939Z 2025-12-04T10:58:28.4740087Z [W1204 10:54:52.634326558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740089Z 2025-12-04T10:58:28.4740237Z [W1204 10:54:52.634585075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740240Z 2025-12-04T10:58:28.4740401Z [W1204 10:54:52.634644994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740404Z 2025-12-04T10:58:28.4740552Z [W1204 10:54:52.636671928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740554Z 2025-12-04T10:58:28.4740715Z [W1204 10:54:52.636934055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740717Z 2025-12-04T10:58:28.4740866Z [W1204 10:54:52.636994364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4740868Z 2025-12-04T10:58:28.4740917Z ('RERUN', {'yellow': True}) [0.6299s] [100%] 2025-12-04T10:58:28.4741274Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:54:52.238885406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4741295Z 2025-12-04T10:58:28.4741444Z [W1204 10:54:52.239268821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4741447Z 2025-12-04T10:58:28.4741594Z [W1204 10:54:52.239349450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4741596Z 2025-12-04T10:58:28.4741745Z [W1204 10:54:52.241803799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4741747Z 2025-12-04T10:58:28.4741896Z [W1204 10:54:52.242412771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4741911Z 2025-12-04T10:58:28.4742061Z [W1204 10:54:52.242487270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4742064Z 2025-12-04T10:58:28.4742213Z [W1204 10:54:52.244956459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4742215Z 2025-12-04T10:58:28.4742362Z [W1204 10:54:52.245258395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4742364Z 2025-12-04T10:58:28.4742512Z [W1204 10:54:52.245321664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4742513Z 2025-12-04T10:58:28.4742551Z FAILED [0.6024s] [100%] 2025-12-04T10:58:28.4742553Z 2025-12-04T10:58:28.4742605Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4742757Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4742804Z Traceback (most recent call last): 2025-12-04T10:58:28.4742962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4743003Z method(*args, **kwargs) 2025-12-04T10:58:28.4743156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4743197Z method(*args, **kwargs) 2025-12-04T10:58:28.4743386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4743422Z with policy(): 2025-12-04T10:58:28.4743575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4743616Z raise RuntimeError(msg) 2025-12-04T10:58:28.4744030Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4744033Z 2025-12-04T10:58:28.4744121Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4744417Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4744419Z 2025-12-04T10:58:28.4744505Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4744579Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4744635Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4744913Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4745000Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4745038Z graph_break [] 2025-12-04T10:58:28.4745189Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4745235Z Traceback (most recent call last): 2025-12-04T10:58:28.4745387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4745428Z method(*args, **kwargs) 2025-12-04T10:58:28.4745578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4745632Z method(*args, **kwargs) 2025-12-04T10:58:28.4745783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4745820Z with policy(): 2025-12-04T10:58:28.4745974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4746015Z raise RuntimeError(msg) 2025-12-04T10:58:28.4746424Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4746426Z 2025-12-04T10:58:28.4746499Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4746794Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4746797Z 2025-12-04T10:58:28.4746883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4746957Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4747012Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4747284Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4747357Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4747392Z graph_break [] 2025-12-04T10:58:28.4747477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4747533Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4747605Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4747886Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4747924Z graph_break [] 2025-12-04T10:58:28.4747975Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4748128Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4748172Z Traceback (most recent call last): 2025-12-04T10:58:28.4748325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4748367Z method(*args, **kwargs) 2025-12-04T10:58:28.4748519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4748570Z method(*args, **kwargs) 2025-12-04T10:58:28.4748720Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4748757Z with policy(): 2025-12-04T10:58:28.4748909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4748949Z raise RuntimeError(msg) 2025-12-04T10:58:28.4749353Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4749368Z 2025-12-04T10:58:28.4749441Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4749731Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4749734Z 2025-12-04T10:58:28.4749821Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4749893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4749948Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4750221Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4750296Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4750331Z graph_break [] 2025-12-04T10:58:28.4750405Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4750460Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4750531Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4750805Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4750842Z graph_break [] 2025-12-04T10:58:28.4750915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4750969Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4751039Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4751321Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4751358Z graph_break [] 2025-12-04T10:58:28.4751614Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ed111abc081234f.xml - 2025-12-04T10:58:28.4751674Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4752307Z FAILED [0.6024s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4752322Z 2025-12-04T10:58:28.4752395Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4752685Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4752687Z 2025-12-04T10:58:28.4752774Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4752836Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4752903Z ================== 1 failed, 57 deselected, 2 rerun in 4.66s =================== 2025-12-04T10:58:28.4752939Z Got exit code 1 2025-12-04T10:58:28.4752979Z Retrying single test... 2025-12-04T10:58:28.4753192Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c49e0f4a8827304a.xml 2025-12-04T10:58:28.4753274Z ============================= test session starts ============================== 2025-12-04T10:58:28.4753385Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4753427Z cachedir: .pytest_cache 2025-12-04T10:58:28.4753585Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4753631Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4753671Z configfile: pytest.ini 2025-12-04T10:58:28.4753831Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4753904Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4754194Z stepcurrent: skipping 51 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4754239Z Running 1 items in this shard 2025-12-04T10:58:28.4754241Z 2025-12-04T10:58:28.4754603Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:55:02.233139306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4754605Z 2025-12-04T10:58:28.4754758Z [W1204 10:55:03.495172431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4754760Z 2025-12-04T10:58:28.4754909Z [W1204 10:55:03.495310119 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4754912Z 2025-12-04T10:58:28.4755079Z [W1204 10:55:03.498862494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755082Z 2025-12-04T10:58:28.4755233Z [W1204 10:55:03.499205439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755256Z 2025-12-04T10:58:28.4755405Z [W1204 10:55:03.499272929 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755407Z 2025-12-04T10:58:28.4755555Z [W1204 10:55:03.501406781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755557Z 2025-12-04T10:58:28.4755704Z [W1204 10:55:03.501686528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755707Z 2025-12-04T10:58:28.4755856Z [W1204 10:55:03.501749007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4755873Z 2025-12-04T10:58:28.4755922Z ('RERUN', {'yellow': True}) [3.2761s] [100%] 2025-12-04T10:58:28.4756284Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:55:04.281781409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4756286Z 2025-12-04T10:58:28.4756436Z [W1204 10:55:04.282145184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4756438Z 2025-12-04T10:58:28.4756586Z [W1204 10:55:04.282213593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4756601Z 2025-12-04T10:58:28.4756751Z [W1204 10:55:04.283494587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4756754Z 2025-12-04T10:58:28.4756901Z [W1204 10:55:04.283752493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4756903Z 2025-12-04T10:58:28.4757053Z [W1204 10:55:04.283815352 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4757055Z 2025-12-04T10:58:28.4757204Z [W1204 10:55:04.285827557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4757205Z 2025-12-04T10:58:28.4757353Z [W1204 10:55:04.286095433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4757355Z 2025-12-04T10:58:28.4757505Z [W1204 10:55:04.286159123 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4757508Z 2025-12-04T10:58:28.4757555Z ('RERUN', {'yellow': True}) [0.6494s] [100%] 2025-12-04T10:58:28.4757915Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 [W1204 10:55:04.914937294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4757917Z 2025-12-04T10:58:28.4758066Z [W1204 10:55:04.915303669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758069Z 2025-12-04T10:58:28.4758216Z [W1204 10:55:04.915371618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758218Z 2025-12-04T10:58:28.4758380Z [W1204 10:55:04.916632382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758383Z 2025-12-04T10:58:28.4758531Z [W1204 10:55:04.916888769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758533Z 2025-12-04T10:58:28.4758692Z [W1204 10:55:04.916948878 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758694Z 2025-12-04T10:58:28.4758842Z [W1204 10:55:04.918960812 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758844Z 2025-12-04T10:58:28.4758993Z [W1204 10:55:04.919231089 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4758995Z 2025-12-04T10:58:28.4759145Z [W1204 10:55:04.919293518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4759147Z 2025-12-04T10:58:28.4759197Z FAILED [0.6250s] [100%] 2025-12-04T10:58:28.4759199Z 2025-12-04T10:58:28.4759251Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4759403Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4759449Z Traceback (most recent call last): 2025-12-04T10:58:28.4759607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4759647Z method(*args, **kwargs) 2025-12-04T10:58:28.4759800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4759840Z method(*args, **kwargs) 2025-12-04T10:58:28.4760004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4760043Z with policy(): 2025-12-04T10:58:28.4760197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4760239Z raise RuntimeError(msg) 2025-12-04T10:58:28.4760637Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 66560 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4760640Z 2025-12-04T10:58:28.4760714Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4761006Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4761011Z 2025-12-04T10:58:28.4761099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4761172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4761229Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4761504Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4761577Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4761614Z graph_break [] 2025-12-04T10:58:28.4761766Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4761812Z Traceback (most recent call last): 2025-12-04T10:58:28.4761976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4762018Z method(*args, **kwargs) 2025-12-04T10:58:28.4762168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4762219Z method(*args, **kwargs) 2025-12-04T10:58:28.4762370Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4762407Z with policy(): 2025-12-04T10:58:28.4762559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4762600Z raise RuntimeError(msg) 2025-12-04T10:58:28.4763009Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 66560 and is now reported as 133120 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4763025Z 2025-12-04T10:58:28.4763100Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4763421Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4763424Z 2025-12-04T10:58:28.4763509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4763583Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4763639Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4763915Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4764006Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4764043Z graph_break [] 2025-12-04T10:58:28.4764116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4764171Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4764243Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4764513Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4764549Z graph_break [] 2025-12-04T10:58:28.4764601Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4764753Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4764800Z Traceback (most recent call last): 2025-12-04T10:58:28.4764953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4764995Z method(*args, **kwargs) 2025-12-04T10:58:28.4765145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4765185Z method(*args, **kwargs) 2025-12-04T10:58:28.4765335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4765372Z with policy(): 2025-12-04T10:58:28.4765524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4765582Z raise RuntimeError(msg) 2025-12-04T10:58:28.4765989Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4766007Z 2025-12-04T10:58:28.4766080Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4766372Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4766374Z 2025-12-04T10:58:28.4766459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4766532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4766589Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4766862Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4766953Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4766990Z graph_break [] 2025-12-04T10:58:28.4767062Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4767117Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4767189Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4767460Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4767505Z graph_break [] 2025-12-04T10:58:28.4767578Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4767634Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4767705Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4767975Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4768011Z graph_break [] 2025-12-04T10:58:28.4768258Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c49e0f4a8827304a.xml - 2025-12-04T10:58:28.4768317Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4768958Z FAILED [0.6250s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 133120 and is now reported as 199680 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4768961Z 2025-12-04T10:58:28.4769033Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4769321Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4769323Z 2025-12-04T10:58:28.4769429Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4769492Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4769561Z ================== 1 failed, 57 deselected, 2 rerun in 4.70s =================== 2025-12-04T10:58:28.4769597Z Got exit code 1 2025-12-04T10:58:28.4769850Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4769978Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4770177Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce382956f044f6f8.xml 2025-12-04T10:58:28.4770233Z ============================= test session starts ============================== 2025-12-04T10:58:28.4770345Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4770387Z cachedir: .pytest_cache 2025-12-04T10:58:28.4770546Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4770604Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4770645Z configfile: pytest.ini 2025-12-04T10:58:28.4770805Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4770879Z collecting ... collected 58 items / 52 deselected / 6 selected 2025-12-04T10:58:28.4770931Z stepcurrent: skipping 52 already run items. 2025-12-04T10:58:28.4770975Z Running 6 items in this shard 2025-12-04T10:58:28.4770977Z 2025-12-04T10:58:28.4771229Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5270s] [ 16%] 2025-12-04T10:58:28.4771492Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4399s] [ 16%] 2025-12-04T10:58:28.4771717Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4383s] [ 16%] 2025-12-04T10:58:28.4771720Z 2025-12-04T10:58:28.4771770Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4771923Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4771969Z Traceback (most recent call last): 2025-12-04T10:58:28.4772126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4772166Z method(*args, **kwargs) 2025-12-04T10:58:28.4772322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4772363Z method(*args, **kwargs) 2025-12-04T10:58:28.4772515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4772552Z with policy(): 2025-12-04T10:58:28.4772706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4772747Z raise RuntimeError(msg) 2025-12-04T10:58:28.4773148Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4773152Z 2025-12-04T10:58:28.4773236Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4773562Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4773566Z 2025-12-04T10:58:28.4773667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4773740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4773796Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4773974Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4774046Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4774083Z graph_break [] 2025-12-04T10:58:28.4774237Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4774296Z Traceback (most recent call last): 2025-12-04T10:58:28.4774450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4774490Z method(*args, **kwargs) 2025-12-04T10:58:28.4774642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4774681Z method(*args, **kwargs) 2025-12-04T10:58:28.4774831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4774867Z with policy(): 2025-12-04T10:58:28.4775020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4775075Z raise RuntimeError(msg) 2025-12-04T10:58:28.4775483Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4775487Z 2025-12-04T10:58:28.4775559Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4775849Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4775851Z 2025-12-04T10:58:28.4775937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4776010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4776068Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4776244Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4776317Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4776354Z graph_break [] 2025-12-04T10:58:28.4776427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4776480Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4776552Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4776727Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4776764Z graph_break [] 2025-12-04T10:58:28.4776830Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4776984Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4777029Z Traceback (most recent call last): 2025-12-04T10:58:28.4777194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4777234Z method(*args, **kwargs) 2025-12-04T10:58:28.4777385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4777425Z method(*args, **kwargs) 2025-12-04T10:58:28.4777575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4777610Z with policy(): 2025-12-04T10:58:28.4777761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4777804Z raise RuntimeError(msg) 2025-12-04T10:58:28.4778214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4778228Z 2025-12-04T10:58:28.4778300Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4778591Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4778594Z 2025-12-04T10:58:28.4778681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4778754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4778822Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4778997Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4779071Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4779106Z graph_break [] 2025-12-04T10:58:28.4779179Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4779233Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4779303Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4779477Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4779513Z graph_break [] 2025-12-04T10:58:28.4779587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4779642Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4779714Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4779888Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4779923Z graph_break [] 2025-12-04T10:58:28.4780166Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ce382956f044f6f8.xml - 2025-12-04T10:58:28.4780225Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4780878Z FAILED [0.4383s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4780883Z 2025-12-04T10:58:28.4780967Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4781258Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4781260Z 2025-12-04T10:58:28.4781346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4781407Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4781476Z ================== 1 failed, 52 deselected, 2 rerun in 3.57s =================== 2025-12-04T10:58:28.4781512Z Got exit code 1 2025-12-04T10:58:28.4781553Z Retrying single test... 2025-12-04T10:58:28.4781764Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d3213a35e338fddc.xml 2025-12-04T10:58:28.4781822Z ============================= test session starts ============================== 2025-12-04T10:58:28.4781933Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4781975Z cachedir: .pytest_cache 2025-12-04T10:58:28.4782132Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4782179Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4782219Z configfile: pytest.ini 2025-12-04T10:58:28.4782379Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4782470Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4782760Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4782803Z Running 1 items in this shard 2025-12-04T10:58:28.4782807Z 2025-12-04T10:58:28.4783170Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:24.334785942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783172Z 2025-12-04T10:58:28.4783354Z [W1204 10:55:24.617206213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783358Z 2025-12-04T10:58:28.4783510Z [W1204 10:55:24.617342851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783513Z 2025-12-04T10:58:28.4783665Z [W1204 10:55:24.621395269 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783667Z 2025-12-04T10:58:28.4783817Z [W1204 10:55:24.621711285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783820Z 2025-12-04T10:58:28.4783968Z [W1204 10:55:24.621774054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4783969Z 2025-12-04T10:58:28.4784117Z [W1204 10:55:24.624128214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4784120Z 2025-12-04T10:58:28.4784282Z [W1204 10:55:24.624406161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4784285Z 2025-12-04T10:58:28.4784434Z [W1204 10:55:24.624467060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4784435Z 2025-12-04T10:58:28.4784497Z ('RERUN', {'yellow': True}) [2.9315s] [100%] 2025-12-04T10:58:28.4784860Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:25.760220262 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4784862Z 2025-12-04T10:58:28.4785011Z [W1204 10:55:25.760611057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785014Z 2025-12-04T10:58:28.4785163Z [W1204 10:55:25.760676346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785180Z 2025-12-04T10:58:28.4785329Z [W1204 10:55:25.761953500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785330Z 2025-12-04T10:58:28.4785479Z [W1204 10:55:25.762267176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785481Z 2025-12-04T10:58:28.4785629Z [W1204 10:55:25.762330685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785631Z 2025-12-04T10:58:28.4785779Z [W1204 10:55:25.764350179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785782Z 2025-12-04T10:58:28.4785946Z [W1204 10:55:25.764696175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4785948Z 2025-12-04T10:58:28.4786098Z [W1204 10:55:25.764764904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4786099Z 2025-12-04T10:58:28.4786148Z ('RERUN', {'yellow': True}) [0.6340s] [100%] 2025-12-04T10:58:28.4786508Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:26.390526662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4786510Z 2025-12-04T10:58:28.4786658Z [W1204 10:55:26.390916627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4786661Z 2025-12-04T10:58:28.4786810Z [W1204 10:55:26.390985656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4786813Z 2025-12-04T10:58:28.4786962Z [W1204 10:55:26.392321219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4786964Z 2025-12-04T10:58:28.4787113Z [W1204 10:55:26.392592946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4787115Z 2025-12-04T10:58:28.4787263Z [W1204 10:55:26.392653605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4787265Z 2025-12-04T10:58:28.4787412Z [W1204 10:55:26.394673509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4787414Z 2025-12-04T10:58:28.4787575Z [W1204 10:55:26.395022584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4787578Z 2025-12-04T10:58:28.4787726Z [W1204 10:55:26.395087834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4787729Z 2025-12-04T10:58:28.4787781Z FAILED [0.6263s] [100%] 2025-12-04T10:58:28.4787783Z 2025-12-04T10:58:28.4787836Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4787989Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4788034Z Traceback (most recent call last): 2025-12-04T10:58:28.4788189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4788230Z method(*args, **kwargs) 2025-12-04T10:58:28.4788385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4788425Z method(*args, **kwargs) 2025-12-04T10:58:28.4788589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4788626Z with policy(): 2025-12-04T10:58:28.4788780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4788821Z raise RuntimeError(msg) 2025-12-04T10:58:28.4789221Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4789223Z 2025-12-04T10:58:28.4789298Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4789603Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4789606Z 2025-12-04T10:58:28.4789694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4789767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4789823Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4790000Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4790072Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4790109Z graph_break [] 2025-12-04T10:58:28.4790263Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4790309Z Traceback (most recent call last): 2025-12-04T10:58:28.4790464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4790505Z method(*args, **kwargs) 2025-12-04T10:58:28.4790657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4790697Z method(*args, **kwargs) 2025-12-04T10:58:28.4790846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4790883Z with policy(): 2025-12-04T10:58:28.4791035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4791077Z raise RuntimeError(msg) 2025-12-04T10:58:28.4791496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4791500Z 2025-12-04T10:58:28.4791585Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4791878Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4791881Z 2025-12-04T10:58:28.4791967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4792040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4792098Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4792274Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4792359Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4792396Z graph_break [] 2025-12-04T10:58:28.4792469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4792524Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4792596Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4792771Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4792807Z graph_break [] 2025-12-04T10:58:28.4792859Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4793024Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4793071Z Traceback (most recent call last): 2025-12-04T10:58:28.4793224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4793298Z method(*args, **kwargs) 2025-12-04T10:58:28.4793450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4793490Z method(*args, **kwargs) 2025-12-04T10:58:28.4793639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4793676Z with policy(): 2025-12-04T10:58:28.4793827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4793870Z raise RuntimeError(msg) 2025-12-04T10:58:28.4794277Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4794280Z 2025-12-04T10:58:28.4794355Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4794646Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4794649Z 2025-12-04T10:58:28.4794735Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4794809Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4794883Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4795059Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4795132Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4795182Z graph_break [] 2025-12-04T10:58:28.4795255Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4795310Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4795381Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4795556Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4795592Z graph_break [] 2025-12-04T10:58:28.4795666Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4795721Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4795813Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4795988Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4796025Z graph_break [] 2025-12-04T10:58:28.4796268Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d3213a35e338fddc.xml - 2025-12-04T10:58:28.4796327Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4796972Z FAILED [0.6263s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4796990Z 2025-12-04T10:58:28.4797063Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4797352Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4797354Z 2025-12-04T10:58:28.4797439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4797500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4797565Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.4797685Z Got exit code 1 2025-12-04T10:58:28.4797725Z Retrying single test... 2025-12-04T10:58:28.4797924Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32a9e9555c685bc8.xml 2025-12-04T10:58:28.4797982Z ============================= test session starts ============================== 2025-12-04T10:58:28.4798093Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4798133Z cachedir: .pytest_cache 2025-12-04T10:58:28.4799609Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4799656Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4799696Z configfile: pytest.ini 2025-12-04T10:58:28.4799858Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4799955Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4800244Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4800303Z Running 1 items in this shard 2025-12-04T10:58:28.4800305Z 2025-12-04T10:58:28.4800674Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:35.942755205 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4800677Z 2025-12-04T10:58:28.4800830Z [W1204 10:55:35.222023909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4800833Z 2025-12-04T10:58:28.4800986Z [W1204 10:55:35.222158677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801001Z 2025-12-04T10:58:28.4801151Z [W1204 10:55:35.225203848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801153Z 2025-12-04T10:58:28.4801304Z [W1204 10:55:35.225506654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801306Z 2025-12-04T10:58:28.4801454Z [W1204 10:55:35.225569513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801457Z 2025-12-04T10:58:28.4801605Z [W1204 10:55:35.227714626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801607Z 2025-12-04T10:58:28.4801769Z [W1204 10:55:35.228011682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801772Z 2025-12-04T10:58:28.4801920Z [W1204 10:55:35.228083111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4801921Z 2025-12-04T10:58:28.4801972Z ('RERUN', {'yellow': True}) [2.8552s] [100%] 2025-12-04T10:58:28.4802337Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:37.268364110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4802339Z 2025-12-04T10:58:28.4802489Z [W1204 10:55:37.268759795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4802491Z 2025-12-04T10:58:28.4802643Z [W1204 10:55:37.268837994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4802645Z 2025-12-04T10:58:28.4802793Z [W1204 10:55:37.270180656 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4802795Z 2025-12-04T10:58:28.4802944Z [W1204 10:55:37.270454973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4802946Z 2025-12-04T10:58:28.4803094Z [W1204 10:55:37.270516102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4803096Z 2025-12-04T10:58:28.4803244Z [W1204 10:55:37.272441568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4803246Z 2025-12-04T10:58:28.4803441Z [W1204 10:55:37.272781753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4803445Z 2025-12-04T10:58:28.4803593Z [W1204 10:55:37.272844432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4803595Z 2025-12-04T10:58:28.4803644Z ('RERUN', {'yellow': True}) [0.5494s] [100%] 2025-12-04T10:58:28.4804017Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:55:37.810488696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804020Z 2025-12-04T10:58:28.4804170Z [W1204 10:55:37.810865531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804172Z 2025-12-04T10:58:28.4804320Z [W1204 10:55:37.810932260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804323Z 2025-12-04T10:58:28.4804487Z [W1204 10:55:37.812221413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804489Z 2025-12-04T10:58:28.4804639Z [W1204 10:55:37.812486290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804641Z 2025-12-04T10:58:28.4804790Z [W1204 10:55:37.812548319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804791Z 2025-12-04T10:58:28.4804940Z [W1204 10:55:37.814469464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4804942Z 2025-12-04T10:58:28.4805090Z [W1204 10:55:37.814811170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4805106Z 2025-12-04T10:58:28.4805255Z [W1204 10:55:37.814875819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4805258Z 2025-12-04T10:58:28.4805297Z FAILED [0.5267s] [100%] 2025-12-04T10:58:28.4805300Z 2025-12-04T10:58:28.4805351Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4805505Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4805550Z Traceback (most recent call last): 2025-12-04T10:58:28.4805707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4805748Z method(*args, **kwargs) 2025-12-04T10:58:28.4805901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4805942Z method(*args, **kwargs) 2025-12-04T10:58:28.4806095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4806131Z with policy(): 2025-12-04T10:58:28.4806284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4806324Z raise RuntimeError(msg) 2025-12-04T10:58:28.4806728Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 131072 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4806731Z 2025-12-04T10:58:28.4806805Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4807114Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4807117Z 2025-12-04T10:58:28.4807205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4807291Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4807348Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4807526Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4807600Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4807636Z graph_break [] 2025-12-04T10:58:28.4807789Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4807835Z Traceback (most recent call last): 2025-12-04T10:58:28.4808001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4808040Z method(*args, **kwargs) 2025-12-04T10:58:28.4808192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4808231Z method(*args, **kwargs) 2025-12-04T10:58:28.4808382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4808418Z with policy(): 2025-12-04T10:58:28.4808571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4808611Z raise RuntimeError(msg) 2025-12-04T10:58:28.4809021Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 131072 and is now reported as 262144 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4809036Z 2025-12-04T10:58:28.4809110Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4809404Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4809407Z 2025-12-04T10:58:28.4809493Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4809567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4809623Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4809801Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4809876Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4809912Z graph_break [] 2025-12-04T10:58:28.4809986Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4810040Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4810113Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4810288Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4810325Z graph_break [] 2025-12-04T10:58:28.4810377Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4810541Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4810587Z Traceback (most recent call last): 2025-12-04T10:58:28.4810742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4810782Z method(*args, **kwargs) 2025-12-04T10:58:28.4810951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4810991Z method(*args, **kwargs) 2025-12-04T10:58:28.4811142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4811178Z with policy(): 2025-12-04T10:58:28.4811332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4811372Z raise RuntimeError(msg) 2025-12-04T10:58:28.4811782Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4811797Z 2025-12-04T10:58:28.4811871Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4812162Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4812164Z 2025-12-04T10:58:28.4812251Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4812323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4812391Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4812566Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4812639Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4812675Z graph_break [] 2025-12-04T10:58:28.4812749Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4812802Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4812873Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4813048Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4813085Z graph_break [] 2025-12-04T10:58:28.4813156Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4813213Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4813309Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4813486Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4813522Z graph_break [] 2025-12-04T10:58:28.4813767Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-32a9e9555c685bc8.xml - 2025-12-04T10:58:28.4813825Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4814486Z FAILED [0.5267s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 262144 and is now reported as 393216 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4814491Z 2025-12-04T10:58:28.4814564Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4814867Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4814869Z 2025-12-04T10:58:28.4814956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4815018Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4815085Z ================== 1 failed, 57 deselected, 2 rerun in 4.10s =================== 2025-12-04T10:58:28.4815124Z Got exit code 1 2025-12-04T10:58:28.4815365Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4815507Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4815706Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11cf4b05702d2d24.xml 2025-12-04T10:58:28.4815764Z ============================= test session starts ============================== 2025-12-04T10:58:28.4815875Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4815916Z cachedir: .pytest_cache 2025-12-04T10:58:28.4816074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4820791Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4820833Z configfile: pytest.ini 2025-12-04T10:58:28.4820992Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4821068Z collecting ... collected 58 items / 53 deselected / 5 selected 2025-12-04T10:58:28.4821122Z stepcurrent: skipping 53 already run items. 2025-12-04T10:58:28.4821166Z Running 5 items in this shard 2025-12-04T10:58:28.4821168Z 2025-12-04T10:58:28.4821419Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4288s] [ 20%] 2025-12-04T10:58:28.4821663Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4517s] [ 20%] 2025-12-04T10:58:28.4821889Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4515s] [ 20%] 2025-12-04T10:58:28.4821892Z 2025-12-04T10:58:28.4821943Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4822096Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4822141Z Traceback (most recent call last): 2025-12-04T10:58:28.4822299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4822339Z method(*args, **kwargs) 2025-12-04T10:58:28.4822493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4822533Z method(*args, **kwargs) 2025-12-04T10:58:28.4822699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4822737Z with policy(): 2025-12-04T10:58:28.4822892Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4822932Z raise RuntimeError(msg) 2025-12-04T10:58:28.4823376Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4823379Z 2025-12-04T10:58:28.4823453Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4823745Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4823748Z 2025-12-04T10:58:28.4823835Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4823922Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4823978Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4824153Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4824227Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4824263Z graph_break [] 2025-12-04T10:58:28.4824414Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4824459Z Traceback (most recent call last): 2025-12-04T10:58:28.4824615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4824666Z method(*args, **kwargs) 2025-12-04T10:58:28.4824818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4824858Z method(*args, **kwargs) 2025-12-04T10:58:28.4825010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4825046Z with policy(): 2025-12-04T10:58:28.4825198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4825239Z raise RuntimeError(msg) 2025-12-04T10:58:28.4825639Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4825643Z 2025-12-04T10:58:28.4825716Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4826006Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4826008Z 2025-12-04T10:58:28.4826097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4826170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4826226Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4826401Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4826489Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4826526Z graph_break [] 2025-12-04T10:58:28.4826600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4826654Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4826736Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4826911Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4826948Z graph_break [] 2025-12-04T10:58:28.4826999Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4827149Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4827194Z Traceback (most recent call last): 2025-12-04T10:58:28.4827351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4827390Z method(*args, **kwargs) 2025-12-04T10:58:28.4827554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4827593Z method(*args, **kwargs) 2025-12-04T10:58:28.4827745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4827781Z with policy(): 2025-12-04T10:58:28.4827934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4827974Z raise RuntimeError(msg) 2025-12-04T10:58:28.4828378Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4828390Z 2025-12-04T10:58:28.4828464Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4828752Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4828754Z 2025-12-04T10:58:28.4828841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4828914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4828970Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4829145Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4829220Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4829257Z graph_break [] 2025-12-04T10:58:28.4829330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4829383Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4829455Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4829629Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4829665Z graph_break [] 2025-12-04T10:58:28.4829737Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4829791Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4829862Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4830055Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4830093Z graph_break [] 2025-12-04T10:58:28.4830337Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-11cf4b05702d2d24.xml - 2025-12-04T10:58:28.4830408Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4831043Z FAILED [0.4515s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4831047Z 2025-12-04T10:58:28.4831120Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4831422Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4831424Z 2025-12-04T10:58:28.4831511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4831572Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4831638Z ================== 1 failed, 53 deselected, 2 rerun in 3.49s =================== 2025-12-04T10:58:28.4831674Z Got exit code 1 2025-12-04T10:58:28.4831715Z Retrying single test... 2025-12-04T10:58:28.4831915Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bea7ab66c3fe5197.xml 2025-12-04T10:58:28.4831982Z ============================= test session starts ============================== 2025-12-04T10:58:28.4832092Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4832134Z cachedir: .pytest_cache 2025-12-04T10:58:28.4832293Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4832338Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4832379Z configfile: pytest.ini 2025-12-04T10:58:28.4832538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4832611Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4832896Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4832942Z Running 1 items in this shard 2025-12-04T10:58:28.4832945Z 2025-12-04T10:58:28.4833340Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:55:56.549199489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4833342Z 2025-12-04T10:58:28.4833497Z [W1204 10:55:56.826292288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4833499Z 2025-12-04T10:58:28.4833651Z [W1204 10:55:56.826418666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4833654Z 2025-12-04T10:58:28.4833818Z [W1204 10:55:56.829659155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4833822Z 2025-12-04T10:58:28.4833971Z [W1204 10:55:56.829960011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4833974Z 2025-12-04T10:58:28.4834134Z [W1204 10:55:56.830025660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4834136Z 2025-12-04T10:58:28.4834286Z [W1204 10:55:56.832262851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4834288Z 2025-12-04T10:58:28.4834435Z [W1204 10:55:56.832535118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4834437Z 2025-12-04T10:58:28.4834586Z [W1204 10:55:56.832594517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4834588Z 2025-12-04T10:58:28.4834638Z ('RERUN', {'yellow': True}) [2.8345s] [100%] 2025-12-04T10:58:28.4835011Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:55:57.970856495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835013Z 2025-12-04T10:58:28.4835163Z [W1204 10:55:57.971473167 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835164Z 2025-12-04T10:58:28.4835312Z [W1204 10:55:57.971564966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835314Z 2025-12-04T10:58:28.4835463Z [W1204 10:55:57.973225594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835477Z 2025-12-04T10:58:28.4835626Z [W1204 10:55:57.973660889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835630Z 2025-12-04T10:58:28.4835779Z [W1204 10:55:57.973736768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835780Z 2025-12-04T10:58:28.4835930Z [W1204 10:55:57.976113977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4835932Z 2025-12-04T10:58:28.4836079Z [W1204 10:55:57.976571181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4836081Z 2025-12-04T10:58:28.4836229Z [W1204 10:55:57.976635870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4836233Z 2025-12-04T10:58:28.4836280Z ('RERUN', {'yellow': True}) [0.6593s] [100%] 2025-12-04T10:58:28.4836641Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:55:58.668202595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4836643Z 2025-12-04T10:58:28.4836793Z [W1204 10:55:58.668581540 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4836795Z 2025-12-04T10:58:28.4836942Z [W1204 10:55:58.668652179 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4836944Z 2025-12-04T10:58:28.4837102Z [W1204 10:55:58.669919143 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837105Z 2025-12-04T10:58:28.4837253Z [W1204 10:55:58.670186139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837256Z 2025-12-04T10:58:28.4837413Z [W1204 10:55:58.670250498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837415Z 2025-12-04T10:58:28.4837564Z [W1204 10:55:58.672255072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837567Z 2025-12-04T10:58:28.4837715Z [W1204 10:55:58.672594048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837717Z 2025-12-04T10:58:28.4837866Z [W1204 10:55:58.672656407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4837870Z 2025-12-04T10:58:28.4837908Z FAILED [0.6727s] [100%] 2025-12-04T10:58:28.4837910Z 2025-12-04T10:58:28.4837962Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4838123Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4838171Z Traceback (most recent call last): 2025-12-04T10:58:28.4838328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4838368Z method(*args, **kwargs) 2025-12-04T10:58:28.4838520Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4838561Z method(*args, **kwargs) 2025-12-04T10:58:28.4838712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4838761Z with policy(): 2025-12-04T10:58:28.4838913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4838957Z raise RuntimeError(msg) 2025-12-04T10:58:28.4839352Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4839355Z 2025-12-04T10:58:28.4839429Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4839721Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4839724Z 2025-12-04T10:58:28.4839812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4839887Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4839944Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4840124Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4840197Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4840234Z graph_break [] 2025-12-04T10:58:28.4840384Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4840429Z Traceback (most recent call last): 2025-12-04T10:58:28.4840582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4840633Z method(*args, **kwargs) 2025-12-04T10:58:28.4840785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4840826Z method(*args, **kwargs) 2025-12-04T10:58:28.4840988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4841025Z with policy(): 2025-12-04T10:58:28.4841176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4841218Z raise RuntimeError(msg) 2025-12-04T10:58:28.4841618Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4841623Z 2025-12-04T10:58:28.4841695Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4841997Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4842000Z 2025-12-04T10:58:28.4842086Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4842161Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4842216Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4842393Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4842465Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4842518Z graph_break [] 2025-12-04T10:58:28.4842590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4842646Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4842717Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4842894Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4842930Z graph_break [] 2025-12-04T10:58:28.4842982Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4843132Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4843177Z Traceback (most recent call last): 2025-12-04T10:58:28.4843360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4843403Z method(*args, **kwargs) 2025-12-04T10:58:28.4843555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4843596Z method(*args, **kwargs) 2025-12-04T10:58:28.4843746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4843786Z with policy(): 2025-12-04T10:58:28.4843937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4843979Z raise RuntimeError(msg) 2025-12-04T10:58:28.4844396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4844401Z 2025-12-04T10:58:28.4844474Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4844777Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4844779Z 2025-12-04T10:58:28.4844866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4844939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4844994Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4845170Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4845242Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4845280Z graph_break [] 2025-12-04T10:58:28.4845352Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4845420Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4845490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4845667Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4845702Z graph_break [] 2025-12-04T10:58:28.4845775Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4845830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4845902Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4846076Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4846127Z graph_break [] 2025-12-04T10:58:28.4846374Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bea7ab66c3fe5197.xml - 2025-12-04T10:58:28.4846436Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4847073Z FAILED [0.6727s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4847076Z 2025-12-04T10:58:28.4847149Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4847441Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4847444Z 2025-12-04T10:58:28.4847531Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4847594Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4847659Z ================== 1 failed, 57 deselected, 2 rerun in 4.33s =================== 2025-12-04T10:58:28.4847697Z Got exit code 1 2025-12-04T10:58:28.4847736Z Retrying single test... 2025-12-04T10:58:28.4847934Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-555a71a85f0c3b5c.xml 2025-12-04T10:58:28.4848001Z ============================= test session starts ============================== 2025-12-04T10:58:28.4848113Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4848154Z cachedir: .pytest_cache 2025-12-04T10:58:28.4848312Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4848370Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4848411Z configfile: pytest.ini 2025-12-04T10:58:28.4848571Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4848643Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4848933Z stepcurrent: skipping 53 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4848979Z Running 1 items in this shard 2025-12-04T10:58:28.4848981Z 2025-12-04T10:58:28.4849342Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:07.900941939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4849356Z 2025-12-04T10:58:28.4849508Z [W1204 10:56:07.173281202 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4849511Z 2025-12-04T10:58:28.4849662Z [W1204 10:56:07.173413601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4849663Z 2025-12-04T10:58:28.4849813Z [W1204 10:56:07.176579520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4849826Z 2025-12-04T10:58:28.4849976Z [W1204 10:56:07.176882116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4849979Z 2025-12-04T10:58:28.4850128Z [W1204 10:56:07.176943565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4850131Z 2025-12-04T10:58:28.4850279Z [W1204 10:56:07.179096627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4850281Z 2025-12-04T10:58:28.4850430Z [W1204 10:56:07.179367654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4850431Z 2025-12-04T10:58:28.4850580Z [W1204 10:56:07.179428503 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4850583Z 2025-12-04T10:58:28.4850633Z ('RERUN', {'yellow': True}) [2.7378s] [100%] 2025-12-04T10:58:28.4850996Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:08.125097631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851000Z 2025-12-04T10:58:28.4851150Z [W1204 10:56:08.125472386 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851151Z 2025-12-04T10:58:28.4851302Z [W1204 10:56:08.125536245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851304Z 2025-12-04T10:58:28.4851451Z [W1204 10:56:08.126783049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851454Z 2025-12-04T10:58:28.4851613Z [W1204 10:56:08.127049965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851616Z 2025-12-04T10:58:28.4851765Z [W1204 10:56:08.127113164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851767Z 2025-12-04T10:58:28.4851924Z [W1204 10:56:08.129034420 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4851926Z 2025-12-04T10:58:28.4852076Z [W1204 10:56:08.129368325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4852078Z 2025-12-04T10:58:28.4852225Z [W1204 10:56:08.129430095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4852227Z 2025-12-04T10:58:28.4852277Z ('RERUN', {'yellow': True}) [0.4465s] [100%] 2025-12-04T10:58:28.4852633Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:09.587981822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4852651Z 2025-12-04T10:58:28.4852801Z [W1204 10:56:09.588367137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4852803Z 2025-12-04T10:58:28.4852952Z [W1204 10:56:09.588432666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4852954Z 2025-12-04T10:58:28.4853102Z [W1204 10:56:09.589697980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853104Z 2025-12-04T10:58:28.4853292Z [W1204 10:56:09.589952957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853294Z 2025-12-04T10:58:28.4853442Z [W1204 10:56:09.590072775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853444Z 2025-12-04T10:58:28.4853593Z [W1204 10:56:09.592030870 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853595Z 2025-12-04T10:58:28.4853743Z [W1204 10:56:09.592371285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853745Z 2025-12-04T10:58:28.4853893Z [W1204 10:56:09.592433904 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4853895Z 2025-12-04T10:58:28.4853934Z FAILED [0.4674s] [100%] 2025-12-04T10:58:28.4853937Z 2025-12-04T10:58:28.4853988Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4854141Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4854186Z Traceback (most recent call last): 2025-12-04T10:58:28.4854343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4854384Z method(*args, **kwargs) 2025-12-04T10:58:28.4854537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4854577Z method(*args, **kwargs) 2025-12-04T10:58:28.4854728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4854765Z with policy(): 2025-12-04T10:58:28.4854933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4854974Z raise RuntimeError(msg) 2025-12-04T10:58:28.4855388Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8192 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4855391Z 2025-12-04T10:58:28.4855464Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4855752Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4855755Z 2025-12-04T10:58:28.4855843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4855918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4855998Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4856175Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4856248Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4856284Z graph_break [] 2025-12-04T10:58:28.4856434Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4856479Z Traceback (most recent call last): 2025-12-04T10:58:28.4856632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4856671Z method(*args, **kwargs) 2025-12-04T10:58:28.4856838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4856877Z method(*args, **kwargs) 2025-12-04T10:58:28.4857030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4857066Z with policy(): 2025-12-04T10:58:28.4857219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4857259Z raise RuntimeError(msg) 2025-12-04T10:58:28.4857666Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 8192 and is now reported as 16384 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4857668Z 2025-12-04T10:58:28.4857742Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4858033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4858036Z 2025-12-04T10:58:28.4858124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4858198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4858254Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4858430Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4858503Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4858539Z graph_break [] 2025-12-04T10:58:28.4858622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4858677Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4858750Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4858936Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4858973Z graph_break [] 2025-12-04T10:58:28.4859024Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4859175Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4859219Z Traceback (most recent call last): 2025-12-04T10:58:28.4859374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4859413Z method(*args, **kwargs) 2025-12-04T10:58:28.4859568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4859617Z method(*args, **kwargs) 2025-12-04T10:58:28.4859768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4859804Z with policy(): 2025-12-04T10:58:28.4859957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4859997Z raise RuntimeError(msg) 2025-12-04T10:58:28.4860398Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4860411Z 2025-12-04T10:58:28.4860485Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4860775Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4860778Z 2025-12-04T10:58:28.4860866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4860938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4860994Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4861168Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4861240Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4861276Z graph_break [] 2025-12-04T10:58:28.4861351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4861405Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4861477Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4861653Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4861691Z graph_break [] 2025-12-04T10:58:28.4861762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4861817Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4861888Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4862063Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4862099Z graph_break [] 2025-12-04T10:58:28.4862356Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-555a71a85f0c3b5c.xml - 2025-12-04T10:58:28.4862416Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4863059Z FAILED [0.4674s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 16384 and is now reported as 24576 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4863062Z 2025-12-04T10:58:28.4863134Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4863454Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4863469Z 2025-12-04T10:58:28.4863556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4863618Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4863685Z ================== 1 failed, 57 deselected, 2 rerun in 3.82s =================== 2025-12-04T10:58:28.4863721Z Got exit code 1 2025-12-04T10:58:28.4863961Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4864089Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4864299Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-344314f1c039dfe9.xml 2025-12-04T10:58:28.4864358Z ============================= test session starts ============================== 2025-12-04T10:58:28.4864467Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4864510Z cachedir: .pytest_cache 2025-12-04T10:58:28.4864668Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4864714Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4864753Z configfile: pytest.ini 2025-12-04T10:58:28.4864913Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4864986Z collecting ... collected 58 items / 54 deselected / 4 selected 2025-12-04T10:58:28.4865040Z stepcurrent: skipping 54 already run items. 2025-12-04T10:58:28.4865085Z Running 4 items in this shard 2025-12-04T10:58:28.4865087Z 2025-12-04T10:58:28.4865336Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.8962s] [ 25%] 2025-12-04T10:58:28.4865582Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4700s] [ 25%] 2025-12-04T10:58:28.4865807Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4606s] [ 25%] 2025-12-04T10:58:28.4865809Z 2025-12-04T10:58:28.4865859Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4866024Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4866070Z Traceback (most recent call last): 2025-12-04T10:58:28.4866231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4866271Z method(*args, **kwargs) 2025-12-04T10:58:28.4866435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4866476Z method(*args, **kwargs) 2025-12-04T10:58:28.4866627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4866664Z with policy(): 2025-12-04T10:58:28.4866815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4866857Z raise RuntimeError(msg) 2025-12-04T10:58:28.4867250Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4867266Z 2025-12-04T10:58:28.4867340Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4867629Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4867631Z 2025-12-04T10:58:28.4867718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4867791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4867846Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4868133Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4868208Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4868244Z graph_break [] 2025-12-04T10:58:28.4868397Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4868442Z Traceback (most recent call last): 2025-12-04T10:58:28.4868597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4868637Z method(*args, **kwargs) 2025-12-04T10:58:28.4868789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4868830Z method(*args, **kwargs) 2025-12-04T10:58:28.4868981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4869019Z with policy(): 2025-12-04T10:58:28.4869170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4869212Z raise RuntimeError(msg) 2025-12-04T10:58:28.4869614Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4869617Z 2025-12-04T10:58:28.4869690Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4869990Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4869994Z 2025-12-04T10:58:28.4870081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4870154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4870226Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4870500Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4870574Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4870609Z graph_break [] 2025-12-04T10:58:28.4870683Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4870739Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4870811Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4871094Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4871130Z graph_break [] 2025-12-04T10:58:28.4871182Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4871333Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4871378Z Traceback (most recent call last): 2025-12-04T10:58:28.4871531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4871572Z method(*args, **kwargs) 2025-12-04T10:58:28.4871733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4871775Z method(*args, **kwargs) 2025-12-04T10:58:28.4871925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4871963Z with policy(): 2025-12-04T10:58:28.4872115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4872157Z raise RuntimeError(msg) 2025-12-04T10:58:28.4872562Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4872566Z 2025-12-04T10:58:28.4872640Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4872932Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4872936Z 2025-12-04T10:58:28.4873023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4873096Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4873151Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4873452Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4873525Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4873575Z graph_break [] 2025-12-04T10:58:28.4873648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4873704Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4873775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4874060Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4874096Z graph_break [] 2025-12-04T10:58:28.4874169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4874222Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4874293Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4874562Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4874612Z graph_break [] 2025-12-04T10:58:28.4874857Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-344314f1c039dfe9.xml - 2025-12-04T10:58:28.4874916Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4875554Z FAILED [0.4606s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4875569Z 2025-12-04T10:58:28.4875643Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4875931Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4875934Z 2025-12-04T10:58:28.4876019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4876081Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4876147Z ================== 1 failed, 54 deselected, 2 rerun in 3.99s =================== 2025-12-04T10:58:28.4876184Z Got exit code 1 2025-12-04T10:58:28.4876224Z Retrying single test... 2025-12-04T10:58:28.4876422Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41254743d13eb4a5.xml 2025-12-04T10:58:28.4876479Z ============================= test session starts ============================== 2025-12-04T10:58:28.4876591Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4876631Z cachedir: .pytest_cache 2025-12-04T10:58:28.4876789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4876834Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4876876Z configfile: pytest.ini 2025-12-04T10:58:28.4877035Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4877108Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4877404Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4877450Z Running 1 items in this shard 2025-12-04T10:58:28.4877452Z 2025-12-04T10:58:28.4877829Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:29.500603254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4877832Z 2025-12-04T10:58:28.4877985Z [W1204 10:56:29.770986460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4877987Z 2025-12-04T10:58:28.4878139Z [W1204 10:56:29.771170238 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878140Z 2025-12-04T10:58:28.4878292Z [W1204 10:56:29.774649962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878304Z 2025-12-04T10:58:28.4878453Z [W1204 10:56:29.775024698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878455Z 2025-12-04T10:58:28.4878604Z [W1204 10:56:29.775091177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878607Z 2025-12-04T10:58:28.4878754Z [W1204 10:56:29.777480646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878756Z 2025-12-04T10:58:28.4878905Z [W1204 10:56:29.777776192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4878907Z 2025-12-04T10:58:28.4879054Z [W1204 10:56:29.777836511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4879068Z 2025-12-04T10:58:28.4879119Z ('RERUN', {'yellow': True}) [3.2806s] [100%] 2025-12-04T10:58:28.4879483Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:30.582896049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4879487Z 2025-12-04T10:58:28.4879635Z [W1204 10:56:30.583338183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4879637Z 2025-12-04T10:58:28.4879785Z [W1204 10:56:30.583420192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4879787Z 2025-12-04T10:58:28.4879936Z [W1204 10:56:30.584717085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4879938Z 2025-12-04T10:58:28.4880088Z [W1204 10:56:30.584992841 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4880089Z 2025-12-04T10:58:28.4880236Z [W1204 10:56:30.585060371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4880238Z 2025-12-04T10:58:28.4880386Z [W1204 10:56:30.587192343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4880388Z 2025-12-04T10:58:28.4880535Z [W1204 10:56:30.587459329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4880537Z 2025-12-04T10:58:28.4880694Z [W1204 10:56:30.587521279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4880697Z 2025-12-04T10:58:28.4880746Z ('RERUN', {'yellow': True}) [0.6601s] [100%] 2025-12-04T10:58:28.4881115Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:30.247241538 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881118Z 2025-12-04T10:58:28.4881268Z [W1204 10:56:30.247627583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881270Z 2025-12-04T10:58:28.4881418Z [W1204 10:56:30.247701793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881421Z 2025-12-04T10:58:28.4881571Z [W1204 10:56:30.248993186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881574Z 2025-12-04T10:58:28.4881732Z [W1204 10:56:30.249262282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881734Z 2025-12-04T10:58:28.4881882Z [W1204 10:56:30.249324132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4881885Z 2025-12-04T10:58:28.4882033Z [W1204 10:56:30.251446984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4882035Z 2025-12-04T10:58:28.4882182Z [W1204 10:56:30.251715691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4882184Z 2025-12-04T10:58:28.4882333Z [W1204 10:56:30.251775710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4882344Z 2025-12-04T10:58:28.4882383Z FAILED [0.6730s] [100%] 2025-12-04T10:58:28.4882386Z 2025-12-04T10:58:28.4882437Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4882590Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4882635Z Traceback (most recent call last): 2025-12-04T10:58:28.4882793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4882834Z method(*args, **kwargs) 2025-12-04T10:58:28.4882988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4883027Z method(*args, **kwargs) 2025-12-04T10:58:28.4883180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4883218Z with policy(): 2025-12-04T10:58:28.4883397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4883438Z raise RuntimeError(msg) 2025-12-04T10:58:28.4883842Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4883844Z 2025-12-04T10:58:28.4883917Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4884227Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4884231Z 2025-12-04T10:58:28.4884318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4884392Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4884448Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4884737Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4884811Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4884847Z graph_break [] 2025-12-04T10:58:28.4884999Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4885044Z Traceback (most recent call last): 2025-12-04T10:58:28.4885200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4885252Z method(*args, **kwargs) 2025-12-04T10:58:28.4885402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4885441Z method(*args, **kwargs) 2025-12-04T10:58:28.4885592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4885630Z with policy(): 2025-12-04T10:58:28.4885782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4885822Z raise RuntimeError(msg) 2025-12-04T10:58:28.4886226Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4886242Z 2025-12-04T10:58:28.4886315Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4886604Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4886606Z 2025-12-04T10:58:28.4886692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4886765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4886821Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4887093Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4887168Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4887205Z graph_break [] 2025-12-04T10:58:28.4887278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4887333Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4887405Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4887675Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4887712Z graph_break [] 2025-12-04T10:58:28.4887763Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4887925Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4887971Z Traceback (most recent call last): 2025-12-04T10:58:28.4888127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4888167Z method(*args, **kwargs) 2025-12-04T10:58:28.4888328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4888368Z method(*args, **kwargs) 2025-12-04T10:58:28.4888518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4888554Z with policy(): 2025-12-04T10:58:28.4888708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4888748Z raise RuntimeError(msg) 2025-12-04T10:58:28.4889154Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4889166Z 2025-12-04T10:58:28.4889241Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4889530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4889532Z 2025-12-04T10:58:28.4889618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4889690Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4889745Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4890028Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4890102Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4890138Z graph_break [] 2025-12-04T10:58:28.4890211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4890264Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4890336Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4890607Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4890644Z graph_break [] 2025-12-04T10:58:28.4890717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4890772Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4890843Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4891115Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4891151Z graph_break [] 2025-12-04T10:58:28.4891395Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-41254743d13eb4a5.xml - 2025-12-04T10:58:28.4891454Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4892104Z FAILED [0.6730s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4892108Z 2025-12-04T10:58:28.4892181Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4892473Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4892476Z 2025-12-04T10:58:28.4892561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4892625Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4892691Z ================== 1 failed, 57 deselected, 2 rerun in 4.78s =================== 2025-12-04T10:58:28.4892741Z Got exit code 1 2025-12-04T10:58:28.4892780Z Retrying single test... 2025-12-04T10:58:28.4892979Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9867f87fcec12f62.xml 2025-12-04T10:58:28.4893034Z ============================= test session starts ============================== 2025-12-04T10:58:28.4893144Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4893184Z cachedir: .pytest_cache 2025-12-04T10:58:28.4893369Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4893414Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4893455Z configfile: pytest.ini 2025-12-04T10:58:28.4893631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4893704Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4893989Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4894033Z Running 1 items in this shard 2025-12-04T10:58:28.4894035Z 2025-12-04T10:58:28.4894395Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:41.421368896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4894397Z 2025-12-04T10:58:28.4894552Z [W1204 10:56:41.661631976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4894555Z 2025-12-04T10:58:28.4894708Z [W1204 10:56:41.661773555 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4894710Z 2025-12-04T10:58:28.4894860Z [W1204 10:56:41.665406288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4894862Z 2025-12-04T10:58:28.4895010Z [W1204 10:56:41.665729283 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4895012Z 2025-12-04T10:58:28.4895159Z [W1204 10:56:41.665791603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4895161Z 2025-12-04T10:58:28.4895328Z [W1204 10:56:41.668096033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4895331Z 2025-12-04T10:58:28.4895479Z [W1204 10:56:41.668377399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4895483Z 2025-12-04T10:58:28.4895643Z [W1204 10:56:41.668437928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4895645Z 2025-12-04T10:58:28.4895694Z ('RERUN', {'yellow': True}) [3.2425s] [100%] 2025-12-04T10:58:28.4896052Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:42.456364099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896054Z 2025-12-04T10:58:28.4896205Z [W1204 10:56:42.456846122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896207Z 2025-12-04T10:58:28.4896355Z [W1204 10:56:42.456924561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896372Z 2025-12-04T10:58:28.4896522Z [W1204 10:56:42.458443992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896524Z 2025-12-04T10:58:28.4896674Z [W1204 10:56:42.458717068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896676Z 2025-12-04T10:58:28.4896824Z [W1204 10:56:42.458778897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896826Z 2025-12-04T10:58:28.4896976Z [W1204 10:56:42.461039328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4896993Z 2025-12-04T10:58:28.4897142Z [W1204 10:56:42.461306244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4897145Z 2025-12-04T10:58:28.4897293Z [W1204 10:56:42.461366434 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4897295Z 2025-12-04T10:58:28.4897342Z ('RERUN', {'yellow': True}) [0.6519s] [100%] 2025-12-04T10:58:28.4897700Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 10:56:42.130800562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4897702Z 2025-12-04T10:58:28.4897852Z [W1204 10:56:42.131189447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4897854Z 2025-12-04T10:58:28.4898003Z [W1204 10:56:42.131266746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898006Z 2025-12-04T10:58:28.4898155Z [W1204 10:56:42.132547259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898157Z 2025-12-04T10:58:28.4898305Z [W1204 10:56:42.132813835 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898307Z 2025-12-04T10:58:28.4898455Z [W1204 10:56:42.132880885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898457Z 2025-12-04T10:58:28.4898617Z [W1204 10:56:42.134957608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898620Z 2025-12-04T10:58:28.4898769Z [W1204 10:56:42.135224044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898772Z 2025-12-04T10:58:28.4898929Z [W1204 10:56:42.135287293 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4898931Z 2025-12-04T10:58:28.4898970Z FAILED [0.6811s] [100%] 2025-12-04T10:58:28.4898972Z 2025-12-04T10:58:28.4899023Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4899174Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4899221Z Traceback (most recent call last): 2025-12-04T10:58:28.4899377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4899421Z method(*args, **kwargs) 2025-12-04T10:58:28.4899573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4899624Z method(*args, **kwargs) 2025-12-04T10:58:28.4899776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4899814Z with policy(): 2025-12-04T10:58:28.4899966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4900007Z raise RuntimeError(msg) 2025-12-04T10:58:28.4900405Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 8704 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.4900419Z 2025-12-04T10:58:28.4900493Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4900784Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4900787Z 2025-12-04T10:58:28.4900873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4900948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4901004Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4901280Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4901355Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4901392Z graph_break [] 2025-12-04T10:58:28.4901542Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4901588Z Traceback (most recent call last): 2025-12-04T10:58:28.4901741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4901781Z method(*args, **kwargs) 2025-12-04T10:58:28.4901931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4901972Z method(*args, **kwargs) 2025-12-04T10:58:28.4902121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4902158Z with policy(): 2025-12-04T10:58:28.4902318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4902361Z raise RuntimeError(msg) 2025-12-04T10:58:28.4902772Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 8704 and is now reported as 17408 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.4902775Z 2025-12-04T10:58:28.4902848Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4903136Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4903138Z 2025-12-04T10:58:28.4903224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4903407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4903462Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4903752Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4903825Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4903862Z graph_break [] 2025-12-04T10:58:28.4903934Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4903988Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4904059Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4904330Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4904380Z graph_break [] 2025-12-04T10:58:28.4904432Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4904583Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.4904629Z Traceback (most recent call last): 2025-12-04T10:58:28.4904784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4904824Z method(*args, **kwargs) 2025-12-04T10:58:28.4904975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4905014Z method(*args, **kwargs) 2025-12-04T10:58:28.4905165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4905202Z with policy(): 2025-12-04T10:58:28.4905355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4905395Z raise RuntimeError(msg) 2025-12-04T10:58:28.4905803Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4905806Z 2025-12-04T10:58:28.4905878Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4906178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4906181Z 2025-12-04T10:58:28.4906268Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4906342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4906398Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4906683Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4906756Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4906792Z graph_break [] 2025-12-04T10:58:28.4906864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4906918Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4906991Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4907258Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4907306Z graph_break [] 2025-12-04T10:58:28.4907379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4907434Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4907505Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4907775Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4907810Z graph_break [] 2025-12-04T10:58:28.4908067Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9867f87fcec12f62.xml - 2025-12-04T10:58:28.4908127Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4908764Z FAILED [0.6811s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 17408 and is now reported as 26112 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.4908767Z 2025-12-04T10:58:28.4908839Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4909126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4909130Z 2025-12-04T10:58:28.4909216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4909278Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4909345Z ================== 1 failed, 57 deselected, 2 rerun in 4.75s =================== 2025-12-04T10:58:28.4909381Z Got exit code 1 2025-12-04T10:58:28.4909621Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.4909749Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4909957Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-56894740bd3d553e.xml 2025-12-04T10:58:28.4910016Z ============================= test session starts ============================== 2025-12-04T10:58:28.4910127Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4910168Z cachedir: .pytest_cache 2025-12-04T10:58:28.4910335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4910380Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4910421Z configfile: pytest.ini 2025-12-04T10:58:28.4910580Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4910653Z collecting ... collected 58 items / 55 deselected / 3 selected 2025-12-04T10:58:28.4910706Z stepcurrent: skipping 55 already run items. 2025-12-04T10:58:28.4910751Z Running 3 items in this shard 2025-12-04T10:58:28.4910753Z 2025-12-04T10:58:28.4911005Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.4973s] [ 33%] 2025-12-04T10:58:28.4911274Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4522s] [ 33%] 2025-12-04T10:58:28.4911498Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4540s] [ 33%] 2025-12-04T10:58:28.4911500Z 2025-12-04T10:58:28.4911550Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4911702Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4911758Z Traceback (most recent call last): 2025-12-04T10:58:28.4911915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4911956Z method(*args, **kwargs) 2025-12-04T10:58:28.4912110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4912150Z method(*args, **kwargs) 2025-12-04T10:58:28.4912301Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4912337Z with policy(): 2025-12-04T10:58:28.4912489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4912530Z raise RuntimeError(msg) 2025-12-04T10:58:28.4912929Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4912933Z 2025-12-04T10:58:28.4913006Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4913335Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4913338Z 2025-12-04T10:58:28.4913426Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4913498Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4913555Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4913744Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4913819Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4913855Z graph_break [] 2025-12-04T10:58:28.4914021Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4914066Z Traceback (most recent call last): 2025-12-04T10:58:28.4914219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4914258Z method(*args, **kwargs) 2025-12-04T10:58:28.4914410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4914449Z method(*args, **kwargs) 2025-12-04T10:58:28.4914601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4914638Z with policy(): 2025-12-04T10:58:28.4914790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4914845Z raise RuntimeError(msg) 2025-12-04T10:58:28.4915254Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4915257Z 2025-12-04T10:58:28.4915330Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4915620Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4915635Z 2025-12-04T10:58:28.4915723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4915796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4915852Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4916029Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4916101Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4916137Z graph_break [] 2025-12-04T10:58:28.4916210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4916264Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4916336Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4916512Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4916550Z graph_break [] 2025-12-04T10:58:28.4916601Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4916755Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4916799Z Traceback (most recent call last): 2025-12-04T10:58:28.4916953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4916992Z method(*args, **kwargs) 2025-12-04T10:58:28.4917144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4917183Z method(*args, **kwargs) 2025-12-04T10:58:28.4917344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4917381Z with policy(): 2025-12-04T10:58:28.4917534Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4917575Z raise RuntimeError(msg) 2025-12-04T10:58:28.4917993Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4917995Z 2025-12-04T10:58:28.4918068Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4918358Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4918361Z 2025-12-04T10:58:28.4918459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4918532Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4918588Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4918764Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4918837Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4918872Z graph_break [] 2025-12-04T10:58:28.4918945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4918999Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4919070Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4919255Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4919292Z graph_break [] 2025-12-04T10:58:28.4919364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4919420Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4919490Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4919664Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4919700Z graph_break [] 2025-12-04T10:58:28.4919942Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-56894740bd3d553e.xml - 2025-12-04T10:58:28.4920004Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4920647Z FAILED [0.4540s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4920650Z 2025-12-04T10:58:28.4920723Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4921016Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4921031Z 2025-12-04T10:58:28.4921119Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4921182Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4921247Z ================== 1 failed, 55 deselected, 2 rerun in 3.57s =================== 2025-12-04T10:58:28.4921284Z Got exit code 1 2025-12-04T10:58:28.4921334Z Retrying single test... 2025-12-04T10:58:28.4921533Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ef31bd092fb20be.xml 2025-12-04T10:58:28.4921590Z ============================= test session starts ============================== 2025-12-04T10:58:28.4921699Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4921740Z cachedir: .pytest_cache 2025-12-04T10:58:28.4921900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4921946Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4921997Z configfile: pytest.ini 2025-12-04T10:58:28.4922156Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4922229Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4922516Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4922560Z Running 1 items in this shard 2025-12-04T10:58:28.4922562Z 2025-12-04T10:58:28.4922928Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:02.607409734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4922942Z 2025-12-04T10:58:28.4923098Z [W1204 10:57:02.889698018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923100Z 2025-12-04T10:58:28.4923278Z [W1204 10:57:02.889860536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923281Z 2025-12-04T10:58:28.4923430Z [W1204 10:57:02.893306571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923432Z 2025-12-04T10:58:28.4923581Z [W1204 10:57:02.893620787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923583Z 2025-12-04T10:58:28.4923731Z [W1204 10:57:02.893682316 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923733Z 2025-12-04T10:58:28.4923882Z [W1204 10:57:02.895885867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4923885Z 2025-12-04T10:58:28.4924032Z [W1204 10:57:02.896163084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4924034Z 2025-12-04T10:58:28.4924183Z [W1204 10:57:02.896224783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4924184Z 2025-12-04T10:58:28.4924234Z ('RERUN', {'yellow': True}) [2.8107s] [100%] 2025-12-04T10:58:28.4924611Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:03.850360356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4924614Z 2025-12-04T10:58:28.4924764Z [W1204 10:57:03.850740591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4924767Z 2025-12-04T10:58:28.4924933Z [W1204 10:57:03.850806390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4924935Z 2025-12-04T10:58:28.4925084Z [W1204 10:57:03.852137942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925086Z 2025-12-04T10:58:28.4925235Z [W1204 10:57:03.852400519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925237Z 2025-12-04T10:58:28.4925386Z [W1204 10:57:03.852461388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925389Z 2025-12-04T10:58:28.4925538Z [W1204 10:57:03.854365873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925552Z 2025-12-04T10:58:28.4925701Z [W1204 10:57:03.854701199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925703Z 2025-12-04T10:58:28.4925851Z [W1204 10:57:03.854763988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4925853Z 2025-12-04T10:58:28.4925901Z ('RERUN', {'yellow': True}) [0.4493s] [100%] 2025-12-04T10:58:28.4926262Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:04.315326669 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4926276Z 2025-12-04T10:58:28.4926426Z [W1204 10:57:04.315712584 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4926429Z 2025-12-04T10:58:28.4926577Z [W1204 10:57:04.315788713 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4926579Z 2025-12-04T10:58:28.4926727Z [W1204 10:57:04.317034587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4926729Z 2025-12-04T10:58:28.4926876Z [W1204 10:57:04.317298463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4926878Z 2025-12-04T10:58:28.4927027Z [W1204 10:57:04.317360683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4927030Z 2025-12-04T10:58:28.4927178Z [W1204 10:57:04.319298437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4927182Z 2025-12-04T10:58:28.4927331Z [W1204 10:57:04.319634533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4927333Z 2025-12-04T10:58:28.4927484Z [W1204 10:57:04.319704332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4927486Z 2025-12-04T10:58:28.4927524Z FAILED [0.4781s] [100%] 2025-12-04T10:58:28.4927526Z 2025-12-04T10:58:28.4927577Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4927729Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4927786Z Traceback (most recent call last): 2025-12-04T10:58:28.4927943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4927985Z method(*args, **kwargs) 2025-12-04T10:58:28.4928137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4928187Z method(*args, **kwargs) 2025-12-04T10:58:28.4928339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4928376Z with policy(): 2025-12-04T10:58:28.4928529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4928571Z raise RuntimeError(msg) 2025-12-04T10:58:28.4928975Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4928989Z 2025-12-04T10:58:28.4929063Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4929355Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4929357Z 2025-12-04T10:58:28.4929443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4929518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4929574Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4929755Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4929839Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4929877Z graph_break [] 2025-12-04T10:58:28.4930029Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4930075Z Traceback (most recent call last): 2025-12-04T10:58:28.4930228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4930268Z method(*args, **kwargs) 2025-12-04T10:58:28.4930418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4930458Z method(*args, **kwargs) 2025-12-04T10:58:28.4930608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4930647Z with policy(): 2025-12-04T10:58:28.4930798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4930841Z raise RuntimeError(msg) 2025-12-04T10:58:28.4931247Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4931251Z 2025-12-04T10:58:28.4931322Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4931615Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4931629Z 2025-12-04T10:58:28.4931715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4931791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4931846Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4932034Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4932107Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4932144Z graph_break [] 2025-12-04T10:58:28.4932216Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4932270Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4932340Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4932518Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4932566Z graph_break [] 2025-12-04T10:58:28.4932619Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4932772Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4932817Z Traceback (most recent call last): 2025-12-04T10:58:28.4932969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4933009Z method(*args, **kwargs) 2025-12-04T10:58:28.4933160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4933200Z method(*args, **kwargs) 2025-12-04T10:58:28.4933384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4933435Z with policy(): 2025-12-04T10:58:28.4933586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4933629Z raise RuntimeError(msg) 2025-12-04T10:58:28.4934039Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4934042Z 2025-12-04T10:58:28.4934114Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4934408Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4934411Z 2025-12-04T10:58:28.4934497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4934571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4934626Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4934803Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4934874Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4934911Z graph_break [] 2025-12-04T10:58:28.4934983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4935037Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4935107Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4935297Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4935334Z graph_break [] 2025-12-04T10:58:28.4935407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4935472Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4935546Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4935718Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4935755Z graph_break [] 2025-12-04T10:58:28.4935999Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9ef31bd092fb20be.xml - 2025-12-04T10:58:28.4936060Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4936707Z FAILED [0.4781s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4936723Z 2025-12-04T10:58:28.4936795Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4937084Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4937098Z 2025-12-04T10:58:28.4937184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4937246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4937312Z ================== 1 failed, 57 deselected, 2 rerun in 3.91s =================== 2025-12-04T10:58:28.4937350Z Got exit code 1 2025-12-04T10:58:28.4937390Z Retrying single test... 2025-12-04T10:58:28.4937587Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-02cf393e3a4420eb.xml 2025-12-04T10:58:28.4937643Z ============================= test session starts ============================== 2025-12-04T10:58:28.4937753Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4937793Z cachedir: .pytest_cache 2025-12-04T10:58:28.4937951Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4937999Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4938039Z configfile: pytest.ini 2025-12-04T10:58:28.4938201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4938273Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4938564Z stepcurrent: skipping 55 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4938607Z Running 1 items in this shard 2025-12-04T10:58:28.4938609Z 2025-12-04T10:58:28.4938987Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:12.980344734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4938991Z 2025-12-04T10:58:28.4939143Z [W1204 10:57:12.261008634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939146Z 2025-12-04T10:58:28.4939306Z [W1204 10:57:12.261136222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939308Z 2025-12-04T10:58:28.4939458Z [W1204 10:57:12.264931263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939460Z 2025-12-04T10:58:28.4939608Z [W1204 10:57:12.265243508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939610Z 2025-12-04T10:58:28.4939757Z [W1204 10:57:12.265306038 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939761Z 2025-12-04T10:58:28.4939908Z [W1204 10:57:13.267568598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4939922Z 2025-12-04T10:58:28.4940073Z [W1204 10:57:13.267843095 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4940075Z 2025-12-04T10:58:28.4940223Z [W1204 10:57:13.267904174 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4940225Z 2025-12-04T10:58:28.4940273Z ('RERUN', {'yellow': True}) [2.9248s] [100%] 2025-12-04T10:58:28.4940637Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:14.431562944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4940652Z 2025-12-04T10:58:28.4940801Z [W1204 10:57:14.431958559 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4940804Z 2025-12-04T10:58:28.4940953Z [W1204 10:57:14.432032988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4940955Z 2025-12-04T10:58:28.4941103Z [W1204 10:57:14.433337571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941105Z 2025-12-04T10:58:28.4941252Z [W1204 10:57:14.433606258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941254Z 2025-12-04T10:58:28.4941402Z [W1204 10:57:14.433667957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941404Z 2025-12-04T10:58:28.4941553Z [W1204 10:57:14.435718510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941556Z 2025-12-04T10:58:28.4941706Z [W1204 10:57:14.436139875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941708Z 2025-12-04T10:58:28.4941856Z [W1204 10:57:14.436211784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4941858Z 2025-12-04T10:58:28.4941906Z ('RERUN', {'yellow': True}) [0.6690s] [100%] 2025-12-04T10:58:28.4942265Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 [W1204 10:57:14.097235040 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4942284Z 2025-12-04T10:58:28.4942433Z [W1204 10:57:14.097625225 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4942436Z 2025-12-04T10:58:28.4942584Z [W1204 10:57:14.097696394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4942596Z 2025-12-04T10:58:28.4942744Z [W1204 10:57:14.098973627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4942746Z 2025-12-04T10:58:28.4942894Z [W1204 10:57:14.099240124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4942896Z 2025-12-04T10:58:28.4943044Z [W1204 10:57:14.099301513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4943047Z 2025-12-04T10:58:28.4943196Z [W1204 10:57:14.101359586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4943208Z 2025-12-04T10:58:28.4943395Z [W1204 10:57:14.101698752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4943396Z 2025-12-04T10:58:28.4943545Z [W1204 10:57:14.101760651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4943546Z 2025-12-04T10:58:28.4943585Z FAILED [0.6466s] [100%] 2025-12-04T10:58:28.4943587Z 2025-12-04T10:58:28.4943638Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4943791Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4943835Z Traceback (most recent call last): 2025-12-04T10:58:28.4944009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4944050Z method(*args, **kwargs) 2025-12-04T10:58:28.4944203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4944243Z method(*args, **kwargs) 2025-12-04T10:58:28.4944396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4944433Z with policy(): 2025-12-04T10:58:28.4944586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4944627Z raise RuntimeError(msg) 2025-12-04T10:58:28.4945030Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 147456 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4945034Z 2025-12-04T10:58:28.4945108Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4945402Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4945405Z 2025-12-04T10:58:28.4945492Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4945564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4945621Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4945809Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4945884Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4945921Z graph_break [] 2025-12-04T10:58:28.4946074Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4946130Z Traceback (most recent call last): 2025-12-04T10:58:28.4946285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4946325Z method(*args, **kwargs) 2025-12-04T10:58:28.4946476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4946515Z method(*args, **kwargs) 2025-12-04T10:58:28.4946666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4946703Z with policy(): 2025-12-04T10:58:28.4946857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4946912Z raise RuntimeError(msg) 2025-12-04T10:58:28.4947322Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 147456 and is now reported as 294912 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4947324Z 2025-12-04T10:58:28.4947398Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4947689Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4947701Z 2025-12-04T10:58:28.4947788Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4947861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4947919Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4948095Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4948168Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4948203Z graph_break [] 2025-12-04T10:58:28.4948276Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4948330Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4948402Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4948577Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4948615Z graph_break [] 2025-12-04T10:58:28.4948667Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4948820Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 _ 2025-12-04T10:58:28.4948865Z Traceback (most recent call last): 2025-12-04T10:58:28.4949020Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4949059Z method(*args, **kwargs) 2025-12-04T10:58:28.4949212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4949251Z method(*args, **kwargs) 2025-12-04T10:58:28.4949401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4949447Z with policy(): 2025-12-04T10:58:28.4949601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4949642Z raise RuntimeError(msg) 2025-12-04T10:58:28.4950061Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4950063Z 2025-12-04T10:58:28.4950136Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4950426Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4950430Z 2025-12-04T10:58:28.4950516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4950600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4950656Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4950830Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4950904Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4950939Z graph_break [] 2025-12-04T10:58:28.4951013Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4951066Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4951138Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4951314Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4951367Z graph_break [] 2025-12-04T10:58:28.4951439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4951493Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4951565Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4951739Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4951774Z graph_break [] 2025-12-04T10:58:28.4952017Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-02cf393e3a4420eb.xml - 2025-12-04T10:58:28.4952076Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4952717Z FAILED [0.6466s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16! Caching allocator allocated memory was 294912 and is now reported as 442368 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4952720Z 2025-12-04T10:58:28.4952793Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4953082Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4953084Z 2025-12-04T10:58:28.4953181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4953243Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4953343Z ================== 1 failed, 57 deselected, 2 rerun in 4.41s =================== 2025-12-04T10:58:28.4953379Z Got exit code 1 2025-12-04T10:58:28.4953637Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16 2025-12-04T10:58:28.4953766Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4953963Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49446edf3a40311f.xml 2025-12-04T10:58:28.4954020Z ============================= test session starts ============================== 2025-12-04T10:58:28.4954132Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4954175Z cachedir: .pytest_cache 2025-12-04T10:58:28.4954333Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4954392Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4954432Z configfile: pytest.ini 2025-12-04T10:58:28.4954592Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4954664Z collecting ... collected 58 items / 56 deselected / 2 selected 2025-12-04T10:58:28.4954718Z stepcurrent: skipping 56 already run items. 2025-12-04T10:58:28.4954761Z Running 2 items in this shard 2025-12-04T10:58:28.4954763Z 2025-12-04T10:58:28.4955013Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.5001s] [ 50%] 2025-12-04T10:58:28.4955271Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4608s] [ 50%] 2025-12-04T10:58:28.4955496Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4694s] [ 50%] 2025-12-04T10:58:28.4955498Z 2025-12-04T10:58:28.4955549Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4955701Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4955746Z Traceback (most recent call last): 2025-12-04T10:58:28.4955903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4955945Z method(*args, **kwargs) 2025-12-04T10:58:28.4956098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4956139Z method(*args, **kwargs) 2025-12-04T10:58:28.4956289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4956327Z with policy(): 2025-12-04T10:58:28.4956479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4956521Z raise RuntimeError(msg) 2025-12-04T10:58:28.4956914Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4956918Z 2025-12-04T10:58:28.4957004Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4957295Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4957298Z 2025-12-04T10:58:28.4957396Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4957469Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4957526Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4957702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4957775Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4957812Z graph_break [] 2025-12-04T10:58:28.4957965Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4958020Z Traceback (most recent call last): 2025-12-04T10:58:28.4958174Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4958214Z method(*args, **kwargs) 2025-12-04T10:58:28.4958365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4958404Z method(*args, **kwargs) 2025-12-04T10:58:28.4958554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4958591Z with policy(): 2025-12-04T10:58:28.4958743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4958795Z raise RuntimeError(msg) 2025-12-04T10:58:28.4959198Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4959202Z 2025-12-04T10:58:28.4959275Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4959564Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4959566Z 2025-12-04T10:58:28.4959652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4959725Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4959783Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4959958Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4960032Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4960069Z graph_break [] 2025-12-04T10:58:28.4960143Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4960197Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4960269Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4960443Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4960480Z graph_break [] 2025-12-04T10:58:28.4960543Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4960694Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4960740Z Traceback (most recent call last): 2025-12-04T10:58:28.4960894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4960945Z method(*args, **kwargs) 2025-12-04T10:58:28.4961096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4961136Z method(*args, **kwargs) 2025-12-04T10:58:28.4961286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4961322Z with policy(): 2025-12-04T10:58:28.4961475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4961518Z raise RuntimeError(msg) 2025-12-04T10:58:28.4961920Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4961932Z 2025-12-04T10:58:28.4962005Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4962295Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4962297Z 2025-12-04T10:58:28.4962385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4962459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4962525Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4962700Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4962774Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4962811Z graph_break [] 2025-12-04T10:58:28.4962883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4962937Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4963007Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4963182Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4963218Z graph_break [] 2025-12-04T10:58:28.4963335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4963390Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4963463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4963637Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4963674Z graph_break [] 2025-12-04T10:58:28.4963917Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-49446edf3a40311f.xml - 2025-12-04T10:58:28.4963977Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4964629Z FAILED [0.4694s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4964633Z 2025-12-04T10:58:28.4964719Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4965009Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4965011Z 2025-12-04T10:58:28.4965096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4965158Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4965224Z ================== 1 failed, 56 deselected, 2 rerun in 3.59s =================== 2025-12-04T10:58:28.4965262Z Got exit code 1 2025-12-04T10:58:28.4965302Z Retrying single test... 2025-12-04T10:58:28.4965522Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-855219d0320404d9.xml 2025-12-04T10:58:28.4965579Z ============================= test session starts ============================== 2025-12-04T10:58:28.4965689Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4965729Z cachedir: .pytest_cache 2025-12-04T10:58:28.4965888Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4965933Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4965974Z configfile: pytest.ini 2025-12-04T10:58:28.4966133Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4966218Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4966504Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4966549Z Running 1 items in this shard 2025-12-04T10:58:28.4966551Z 2025-12-04T10:58:28.4966916Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:34.550630547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4966920Z 2025-12-04T10:58:28.4967073Z [W1204 10:57:34.824281757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967076Z 2025-12-04T10:58:28.4967227Z [W1204 10:57:34.824416456 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967230Z 2025-12-04T10:58:28.4967379Z [W1204 10:57:34.827635043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967381Z 2025-12-04T10:58:28.4967530Z [W1204 10:57:34.827938889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967532Z 2025-12-04T10:58:28.4967681Z [W1204 10:57:34.828005069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967683Z 2025-12-04T10:58:28.4967831Z [W1204 10:57:34.830280809 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4967833Z 2025-12-04T10:58:28.4967998Z [W1204 10:57:34.830559435 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4968001Z 2025-12-04T10:58:28.4968149Z [W1204 10:57:34.830619794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4968151Z 2025-12-04T10:58:28.4968209Z ('RERUN', {'yellow': True}) [2.8791s] [100%] 2025-12-04T10:58:28.4968568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:35.954148664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4968571Z 2025-12-04T10:58:28.4968719Z [W1204 10:57:35.954556219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4968721Z 2025-12-04T10:58:28.4968871Z [W1204 10:57:35.954626818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4968885Z 2025-12-04T10:58:28.4969034Z [W1204 10:57:35.955967270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969036Z 2025-12-04T10:58:28.4969185Z [W1204 10:57:35.956249287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969187Z 2025-12-04T10:58:28.4969334Z [W1204 10:57:35.956315686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969336Z 2025-12-04T10:58:28.4969483Z [W1204 10:57:35.958337249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969485Z 2025-12-04T10:58:28.4969632Z [W1204 10:57:35.958681795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969646Z 2025-12-04T10:58:28.4969795Z [W1204 10:57:35.958744184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4969797Z 2025-12-04T10:58:28.4969846Z ('RERUN', {'yellow': True}) [0.6336s] [100%] 2025-12-04T10:58:28.4970202Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:36.646256758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970204Z 2025-12-04T10:58:28.4970352Z [W1204 10:57:36.646642083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970354Z 2025-12-04T10:58:28.4970504Z [W1204 10:57:36.646708562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970507Z 2025-12-04T10:58:28.4970656Z [W1204 10:57:36.647991315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970658Z 2025-12-04T10:58:28.4970808Z [W1204 10:57:36.648254301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970811Z 2025-12-04T10:58:28.4970960Z [W1204 10:57:36.648322351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4970961Z 2025-12-04T10:58:28.4971111Z [W1204 10:57:36.650382364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4971113Z 2025-12-04T10:58:28.4971270Z [W1204 10:57:36.650730939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4971273Z 2025-12-04T10:58:28.4971422Z [W1204 10:57:36.650796268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4971424Z 2025-12-04T10:58:28.4971462Z FAILED [0.6827s] [100%] 2025-12-04T10:58:28.4971464Z 2025-12-04T10:58:28.4971527Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4971680Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4971726Z Traceback (most recent call last): 2025-12-04T10:58:28.4971884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4971926Z method(*args, **kwargs) 2025-12-04T10:58:28.4972078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4972120Z method(*args, **kwargs) 2025-12-04T10:58:28.4972284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4972320Z with policy(): 2025-12-04T10:58:28.4972474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4972514Z raise RuntimeError(msg) 2025-12-04T10:58:28.4972914Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4972917Z 2025-12-04T10:58:28.4972990Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4973333Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4973336Z 2025-12-04T10:58:28.4973422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4973497Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4973553Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4973730Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4973803Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4975143Z graph_break [] 2025-12-04T10:58:28.4975296Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4975343Z Traceback (most recent call last): 2025-12-04T10:58:28.4975499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4975540Z method(*args, **kwargs) 2025-12-04T10:58:28.4975691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4975732Z method(*args, **kwargs) 2025-12-04T10:58:28.4975882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4975919Z with policy(): 2025-12-04T10:58:28.4976072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4976113Z raise RuntimeError(msg) 2025-12-04T10:58:28.4976535Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4976539Z 2025-12-04T10:58:28.4976627Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4976919Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4976922Z 2025-12-04T10:58:28.4977007Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4977081Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4977137Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4977316Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4977406Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4977443Z graph_break [] 2025-12-04T10:58:28.4977517Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4977572Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4977644Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4977820Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4977856Z graph_break [] 2025-12-04T10:58:28.4977908Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4978060Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4978120Z Traceback (most recent call last): 2025-12-04T10:58:28.4978276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4978316Z method(*args, **kwargs) 2025-12-04T10:58:28.4978468Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4978507Z method(*args, **kwargs) 2025-12-04T10:58:28.4978659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4978695Z with policy(): 2025-12-04T10:58:28.4978849Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4978888Z raise RuntimeError(msg) 2025-12-04T10:58:28.4979296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4979299Z 2025-12-04T10:58:28.4979372Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4979660Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4979662Z 2025-12-04T10:58:28.4979748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4979821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4979876Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4980065Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4980138Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4980174Z graph_break [] 2025-12-04T10:58:28.4980264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4980320Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4980390Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4980565Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4980601Z graph_break [] 2025-12-04T10:58:28.4980673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4980730Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4980801Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4980987Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4981023Z graph_break [] 2025-12-04T10:58:28.4981266Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-855219d0320404d9.xml - 2025-12-04T10:58:28.4981325Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4981965Z FAILED [0.6827s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4981979Z 2025-12-04T10:58:28.4982051Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4982340Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4982342Z 2025-12-04T10:58:28.4982428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4982489Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4982555Z ================== 1 failed, 57 deselected, 2 rerun in 4.36s =================== 2025-12-04T10:58:28.4982593Z Got exit code 1 2025-12-04T10:58:28.4982634Z Retrying single test... 2025-12-04T10:58:28.4982832Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d87002f690f4a6fe.xml 2025-12-04T10:58:28.4982891Z ============================= test session starts ============================== 2025-12-04T10:58:28.4983002Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4983044Z cachedir: .pytest_cache 2025-12-04T10:58:28.4983203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4983286Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4983326Z configfile: pytest.ini 2025-12-04T10:58:28.4983489Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4983577Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4983866Z stepcurrent: skipping 56 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4983910Z Running 1 items in this shard 2025-12-04T10:58:28.4983912Z 2025-12-04T10:58:28.4984289Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:45.038569180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4984292Z 2025-12-04T10:58:28.4984445Z [W1204 10:57:46.318245405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4984448Z 2025-12-04T10:58:28.4984600Z [W1204 10:57:46.318388673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4984603Z 2025-12-04T10:58:28.4984753Z [W1204 10:57:46.321729100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4984769Z 2025-12-04T10:58:28.4984918Z [W1204 10:57:46.322047045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4984920Z 2025-12-04T10:58:28.4985070Z [W1204 10:57:46.322112504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4985072Z 2025-12-04T10:58:28.4985220Z [W1204 10:57:46.324372015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4985222Z 2025-12-04T10:58:28.4985371Z [W1204 10:57:46.324641431 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4985385Z 2025-12-04T10:58:28.4985536Z [W1204 10:57:46.324702921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4985539Z 2025-12-04T10:58:28.4985587Z ('RERUN', {'yellow': True}) [2.8745s] [100%] 2025-12-04T10:58:28.4985947Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:47.459638908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4985950Z 2025-12-04T10:58:28.4986098Z [W1204 10:57:47.460032643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986100Z 2025-12-04T10:58:28.4986251Z [W1204 10:57:47.460114122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986254Z 2025-12-04T10:58:28.4986403Z [W1204 10:57:47.461409175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986406Z 2025-12-04T10:58:28.4986555Z [W1204 10:57:47.461688771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986557Z 2025-12-04T10:58:28.4986705Z [W1204 10:57:47.461755290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986707Z 2025-12-04T10:58:28.4986854Z [W1204 10:57:47.463803853 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4986856Z 2025-12-04T10:58:28.4987017Z [W1204 10:57:47.464217978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4987020Z 2025-12-04T10:58:28.4987169Z [W1204 10:57:47.464287847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4987172Z 2025-12-04T10:58:28.4987220Z ('RERUN', {'yellow': True}) [0.6316s] [100%] 2025-12-04T10:58:28.4987593Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:47.114739892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4987596Z 2025-12-04T10:58:28.4987746Z [W1204 10:57:47.115138067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4987747Z 2025-12-04T10:58:28.4987897Z [W1204 10:57:47.115217716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4987900Z 2025-12-04T10:58:28.4988048Z [W1204 10:57:47.116512079 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988060Z 2025-12-04T10:58:28.4988210Z [W1204 10:57:47.116781545 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988212Z 2025-12-04T10:58:28.4988361Z [W1204 10:57:47.116843935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988363Z 2025-12-04T10:58:28.4988510Z [W1204 10:57:47.118898468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988512Z 2025-12-04T10:58:28.4988660Z [W1204 10:57:47.119254153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988673Z 2025-12-04T10:58:28.4988821Z [W1204 10:57:47.119321342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.4988824Z 2025-12-04T10:58:28.4988863Z FAILED [0.6493s] [100%] 2025-12-04T10:58:28.4988865Z 2025-12-04T10:58:28.4988917Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.4989070Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4989115Z Traceback (most recent call last): 2025-12-04T10:58:28.4989273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4989314Z method(*args, **kwargs) 2025-12-04T10:58:28.4989467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4989509Z method(*args, **kwargs) 2025-12-04T10:58:28.4989661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4989699Z with policy(): 2025-12-04T10:58:28.4989852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4989893Z raise RuntimeError(msg) 2025-12-04T10:58:28.4990293Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9216 on device 0. CUDA driver allocated memory was 807403520 and is now 1207959552. 2025-12-04T10:58:28.4990295Z 2025-12-04T10:58:28.4990370Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4990671Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4990675Z 2025-12-04T10:58:28.4990763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4990846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4990904Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4991081Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4991154Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4991189Z graph_break [] 2025-12-04T10:58:28.4991340Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4991387Z Traceback (most recent call last): 2025-12-04T10:58:28.4991542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4991592Z method(*args, **kwargs) 2025-12-04T10:58:28.4991743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4991783Z method(*args, **kwargs) 2025-12-04T10:58:28.4991935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4991972Z with policy(): 2025-12-04T10:58:28.4992126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4992166Z raise RuntimeError(msg) 2025-12-04T10:58:28.4992570Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 9216 and is now reported as 18432 on device 0. CUDA driver allocated memory was 1207959552 and is now 1222639616. 2025-12-04T10:58:28.4992584Z 2025-12-04T10:58:28.4992658Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4992946Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4992948Z 2025-12-04T10:58:28.4993035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4993107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4993162Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4993375Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4993449Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4993486Z graph_break [] 2025-12-04T10:58:28.4993558Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4993614Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4993686Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4993860Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4993896Z graph_break [] 2025-12-04T10:58:28.4993948Z =================================== FAILURES =================================== 2025-12-04T10:58:28.4994099Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _ 2025-12-04T10:58:28.4994164Z Traceback (most recent call last): 2025-12-04T10:58:28.4994319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4994360Z method(*args, **kwargs) 2025-12-04T10:58:28.4994525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.4994565Z method(*args, **kwargs) 2025-12-04T10:58:28.4994715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.4994752Z with policy(): 2025-12-04T10:58:28.4994902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.4994943Z raise RuntimeError(msg) 2025-12-04T10:58:28.4995345Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4995360Z 2025-12-04T10:58:28.4995434Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4995722Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4995724Z 2025-12-04T10:58:28.4995811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4995884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4995941Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4996117Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4996206Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4996242Z graph_break [] 2025-12-04T10:58:28.4996315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4996370Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4996442Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4996617Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4996654Z graph_break [] 2025-12-04T10:58:28.4996726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.4996780Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.4996854Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.4997027Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.4997064Z graph_break [] 2025-12-04T10:58:28.4997308Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d87002f690f4a6fe.xml - 2025-12-04T10:58:28.4997368Z =========================== short test summary info ============================ 2025-12-04T10:58:28.4998007Z FAILED [0.6493s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16! Caching allocator allocated memory was 18432 and is now reported as 27648 on device 0. CUDA driver allocated memory was 1222639616 and is now 1237319680. 2025-12-04T10:58:28.4998011Z 2025-12-04T10:58:28.4998084Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.4998382Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4998385Z 2025-12-04T10:58:28.4998472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.4998535Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.4998601Z ================== 1 failed, 57 deselected, 2 rerun in 4.32s =================== 2025-12-04T10:58:28.4998638Z Got exit code 1 2025-12-04T10:58:28.4998878Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 2025-12-04T10:58:28.4999018Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.4999215Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b502a4fd6fd5cae.xml 2025-12-04T10:58:28.4999273Z ============================= test session starts ============================== 2025-12-04T10:58:28.4999382Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.4999424Z cachedir: .pytest_cache 2025-12-04T10:58:28.4999584Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.4999630Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.4999670Z configfile: pytest.ini 2025-12-04T10:58:28.4999842Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.4999916Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.4999970Z stepcurrent: skipping 57 already run items. 2025-12-04T10:58:28.5000013Z Running 1 items in this shard 2025-12-04T10:58:28.5000015Z 2025-12-04T10:58:28.5000267Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [2.9054s] [100%] 2025-12-04T10:58:28.5000514Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4920s] [100%] 2025-12-04T10:58:28.5000740Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4608s] [100%] 2025-12-04T10:58:28.5000744Z 2025-12-04T10:58:28.5000795Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.5000945Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5000992Z Traceback (most recent call last): 2025-12-04T10:58:28.5001148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5001189Z method(*args, **kwargs) 2025-12-04T10:58:28.5001340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5001380Z method(*args, **kwargs) 2025-12-04T10:58:28.5001530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5001568Z with policy(): 2025-12-04T10:58:28.5001729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5001772Z raise RuntimeError(msg) 2025-12-04T10:58:28.5002174Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.5002177Z 2025-12-04T10:58:28.5002251Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5002540Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5002544Z 2025-12-04T10:58:28.5002630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5002703Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5002771Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5003052Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5003124Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5003160Z graph_break [] 2025-12-04T10:58:28.5003350Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5003395Z Traceback (most recent call last): 2025-12-04T10:58:28.5003549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5003607Z method(*args, **kwargs) 2025-12-04T10:58:28.5003757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5003799Z method(*args, **kwargs) 2025-12-04T10:58:28.5003949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5003986Z with policy(): 2025-12-04T10:58:28.5004138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5004178Z raise RuntimeError(msg) 2025-12-04T10:58:28.5004578Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.5004581Z 2025-12-04T10:58:28.5004655Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5004944Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5004948Z 2025-12-04T10:58:28.5005033Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5005107Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5005162Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5006369Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5006456Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5006495Z graph_break [] 2025-12-04T10:58:28.5006567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5006622Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5006706Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5006977Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5007013Z graph_break [] 2025-12-04T10:58:28.5007074Z =================================== FAILURES =================================== 2025-12-04T10:58:28.5007226Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5007272Z Traceback (most recent call last): 2025-12-04T10:58:28.5007428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5007488Z method(*args, **kwargs) 2025-12-04T10:58:28.5007641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5007681Z method(*args, **kwargs) 2025-12-04T10:58:28.5007832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5007868Z with policy(): 2025-12-04T10:58:28.5008021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5008063Z raise RuntimeError(msg) 2025-12-04T10:58:28.5008470Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5008473Z 2025-12-04T10:58:28.5008547Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5008836Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5008838Z 2025-12-04T10:58:28.5008926Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5008999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5009056Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5009329Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5009403Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5009439Z graph_break [] 2025-12-04T10:58:28.5009512Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5009567Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5009638Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5009907Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5009976Z graph_break [] 2025-12-04T10:58:28.5010061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5010115Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5010188Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5010469Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5010507Z graph_break [] 2025-12-04T10:58:28.5010752Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-2b502a4fd6fd5cae.xml - 2025-12-04T10:58:28.5010812Z =========================== short test summary info ============================ 2025-12-04T10:58:28.5011450Z FAILED [0.4608s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5011468Z 2025-12-04T10:58:28.5011541Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5011828Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5011831Z 2025-12-04T10:58:28.5011916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5011978Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.5012044Z ================== 1 failed, 57 deselected, 2 rerun in 4.02s =================== 2025-12-04T10:58:28.5012082Z Got exit code 1 2025-12-04T10:58:28.5012122Z Retrying single test... 2025-12-04T10:58:28.5012319Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c062e6f7d4d7491a.xml 2025-12-04T10:58:28.5012871Z ============================= test session starts ============================== 2025-12-04T10:58:28.5013412Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.5013459Z cachedir: .pytest_cache 2025-12-04T10:58:28.5013640Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.5013697Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.5013747Z configfile: pytest.ini 2025-12-04T10:58:28.5013927Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.5014009Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.5014344Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5014394Z Running 1 items in this shard 2025-12-04T10:58:28.5014398Z 2025-12-04T10:58:28.5014767Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:08.677669568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5015296Z 2025-12-04T10:58:28.5015461Z [W1204 10:58:08.956271297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5015537Z 2025-12-04T10:58:28.5015692Z [W1204 10:58:08.956395675 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5015696Z 2025-12-04T10:58:28.5015893Z [W1204 10:58:08.959996838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5015895Z 2025-12-04T10:58:28.5016047Z [W1204 10:58:08.960312794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5016049Z 2025-12-04T10:58:28.5016197Z [W1204 10:58:08.960393093 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5016201Z 2025-12-04T10:58:28.5016353Z [W1204 10:58:08.962500295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5016355Z 2025-12-04T10:58:28.5016507Z [W1204 10:58:08.962771351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5016569Z 2025-12-04T10:58:28.5016719Z [W1204 10:58:08.962830541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5016725Z 2025-12-04T10:58:28.5016781Z ('RERUN', {'yellow': True}) [3.1851s] [100%] 2025-12-04T10:58:28.5017144Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:09.576575554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017148Z 2025-12-04T10:58:28.5017301Z [W1204 10:58:09.576944400 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017304Z 2025-12-04T10:58:28.5017452Z [W1204 10:58:09.577020049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017458Z 2025-12-04T10:58:28.5017606Z [W1204 10:58:09.578316552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017609Z 2025-12-04T10:58:28.5017760Z [W1204 10:58:09.578581338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017762Z 2025-12-04T10:58:28.5017910Z [W1204 10:58:09.578642047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5017913Z 2025-12-04T10:58:28.5018064Z [W1204 10:58:09.580669961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5018066Z 2025-12-04T10:58:28.5018216Z [W1204 10:58:09.580936797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5018218Z 2025-12-04T10:58:28.5018370Z [W1204 10:58:09.580997346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5018372Z 2025-12-04T10:58:28.5018422Z ('RERUN', {'yellow': True}) [0.4811s] [100%] 2025-12-04T10:58:28.5018784Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:09.042676568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5018801Z 2025-12-04T10:58:28.5018955Z [W1204 10:58:09.043055213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5018957Z 2025-12-04T10:58:28.5019119Z [W1204 10:58:09.043128822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019122Z 2025-12-04T10:58:28.5019273Z [W1204 10:58:09.044387096 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019275Z 2025-12-04T10:58:28.5019435Z [W1204 10:58:09.044642502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019440Z 2025-12-04T10:58:28.5019589Z [W1204 10:58:09.044701971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019592Z 2025-12-04T10:58:28.5019743Z [W1204 10:58:09.046703725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019745Z 2025-12-04T10:58:28.5019894Z [W1204 10:58:09.046965162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5019912Z 2025-12-04T10:58:28.5020064Z [W1204 10:58:09.047029311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5020066Z 2025-12-04T10:58:28.5020140Z FAILED [0.4573s] [100%] 2025-12-04T10:58:28.5020142Z 2025-12-04T10:58:28.5020199Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.5020357Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5020406Z Traceback (most recent call last): 2025-12-04T10:58:28.5020581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5020625Z method(*args, **kwargs) 2025-12-04T10:58:28.5020782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5020824Z method(*args, **kwargs) 2025-12-04T10:58:28.5020977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5021016Z with policy(): 2025-12-04T10:58:28.5021175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5021216Z raise RuntimeError(msg) 2025-12-04T10:58:28.5021620Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.5021623Z 2025-12-04T10:58:28.5021703Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5022005Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5022008Z 2025-12-04T10:58:28.5022101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5022184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5022248Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5022528Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5022622Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5022676Z graph_break [] 2025-12-04T10:58:28.5022849Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5022898Z Traceback (most recent call last): 2025-12-04T10:58:28.5023067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5023109Z method(*args, **kwargs) 2025-12-04T10:58:28.5023299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5023339Z method(*args, **kwargs) 2025-12-04T10:58:28.5023494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5023534Z with policy(): 2025-12-04T10:58:28.5023689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5023731Z raise RuntimeError(msg) 2025-12-04T10:58:28.5024143Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.5024164Z 2025-12-04T10:58:28.5024242Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5024537Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5024541Z 2025-12-04T10:58:28.5024631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5024709Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5024772Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5025049Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5025129Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5025168Z graph_break [] 2025-12-04T10:58:28.5025247Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5025303Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5025380Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5025654Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5025695Z graph_break [] 2025-12-04T10:58:28.5025750Z =================================== FAILURES =================================== 2025-12-04T10:58:28.5025906Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5025954Z Traceback (most recent call last): 2025-12-04T10:58:28.5026112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5026153Z method(*args, **kwargs) 2025-12-04T10:58:28.5026308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5026366Z method(*args, **kwargs) 2025-12-04T10:58:28.5026519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5026571Z with policy(): 2025-12-04T10:58:28.5026727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5026770Z raise RuntimeError(msg) 2025-12-04T10:58:28.5027214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5027217Z 2025-12-04T10:58:28.5027294Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5027584Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5027587Z 2025-12-04T10:58:28.5027678Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5027770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5027830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5028107Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5028185Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5028222Z graph_break [] 2025-12-04T10:58:28.5028299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5028356Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5028431Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5028702Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5028744Z graph_break [] 2025-12-04T10:58:28.5028818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5028877Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5028949Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5029222Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5029260Z graph_break [] 2025-12-04T10:58:28.5029515Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c062e6f7d4d7491a.xml - 2025-12-04T10:58:28.5029578Z =========================== short test summary info ============================ 2025-12-04T10:58:28.5030222Z FAILED [0.4573s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5030225Z 2025-12-04T10:58:28.5030315Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5030620Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5030622Z 2025-12-04T10:58:28.5030715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5030777Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.5030861Z ================== 1 failed, 57 deselected, 2 rerun in 4.29s =================== 2025-12-04T10:58:28.5030900Z Got exit code 1 2025-12-04T10:58:28.5030945Z Retrying single test... 2025-12-04T10:58:28.5031144Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-20346e84e00bf2bc.xml 2025-12-04T10:58:28.5031206Z ============================= test session starts ============================== 2025-12-04T10:58:28.5031325Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.5031367Z cachedir: .pytest_cache 2025-12-04T10:58:28.5031530Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.5031593Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.5031638Z configfile: pytest.ini 2025-12-04T10:58:28.5031803Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.5031882Z collecting ... collected 58 items / 57 deselected / 1 selected 2025-12-04T10:58:28.5032169Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5032219Z Running 1 items in this shard 2025-12-04T10:58:28.5032221Z 2025-12-04T10:58:28.5032588Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:19.504829001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5032591Z 2025-12-04T10:58:28.5032749Z [W1204 10:58:19.780976377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5032752Z 2025-12-04T10:58:28.5032904Z [W1204 10:58:19.781134845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5032910Z 2025-12-04T10:58:28.5033060Z [W1204 10:58:19.784768207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033063Z 2025-12-04T10:58:28.5033215Z [W1204 10:58:19.785075813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033217Z 2025-12-04T10:58:28.5033398Z [W1204 10:58:19.785139852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033401Z 2025-12-04T10:58:28.5033553Z [W1204 10:58:19.787268554 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033554Z 2025-12-04T10:58:28.5033704Z [W1204 10:58:19.787538211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033706Z 2025-12-04T10:58:28.5033858Z [W1204 10:58:19.787603070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5033860Z 2025-12-04T10:58:28.5033931Z ('RERUN', {'yellow': True}) [3.1773s] [100%] 2025-12-04T10:58:28.5034306Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:20.401450311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5034309Z 2025-12-04T10:58:28.5034463Z [W1204 10:58:20.401826346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5034465Z 2025-12-04T10:58:28.5034631Z [W1204 10:58:20.401899565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5034633Z 2025-12-04T10:58:28.5034787Z [W1204 10:58:20.403168689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5034789Z 2025-12-04T10:58:28.5034939Z [W1204 10:58:20.403425455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5034945Z 2025-12-04T10:58:28.5035095Z [W1204 10:58:20.403485275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5035113Z 2025-12-04T10:58:28.5035266Z [W1204 10:58:20.405486738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5035268Z 2025-12-04T10:58:28.5035417Z [W1204 10:58:20.405751165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5035419Z 2025-12-04T10:58:28.5035571Z [W1204 10:58:20.405811664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5035572Z 2025-12-04T10:58:28.5035621Z ('RERUN', {'yellow': True}) [0.4759s] [100%] 2025-12-04T10:58:28.5035982Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 [W1204 10:58:20.874683623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5035986Z 2025-12-04T10:58:28.5036140Z [W1204 10:58:20.875062918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036141Z 2025-12-04T10:58:28.5036290Z [W1204 10:58:20.875137857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036292Z 2025-12-04T10:58:28.5036443Z [W1204 10:58:20.876407210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036445Z 2025-12-04T10:58:28.5036594Z [W1204 10:58:20.876681766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036596Z 2025-12-04T10:58:28.5036749Z [W1204 10:58:20.876742636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036752Z 2025-12-04T10:58:28.5036904Z [W1204 10:58:20.878756219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5036909Z 2025-12-04T10:58:28.5037062Z [W1204 10:58:20.879031735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5037064Z 2025-12-04T10:58:28.5037216Z [W1204 10:58:20.879096235 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T10:58:28.5037217Z 2025-12-04T10:58:28.5037257Z FAILED [0.4733s] [100%] 2025-12-04T10:58:28.5037271Z 2025-12-04T10:58:28.5037328Z ==================================== RERUNS ==================================== 2025-12-04T10:58:28.5037493Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5037543Z Traceback (most recent call last): 2025-12-04T10:58:28.5037704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5037749Z method(*args, **kwargs) 2025-12-04T10:58:28.5037914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5037959Z method(*args, **kwargs) 2025-12-04T10:58:28.5038111Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5038152Z with policy(): 2025-12-04T10:58:28.5038304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5038351Z raise RuntimeError(msg) 2025-12-04T10:58:28.5038747Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 0 and is now reported as 9728 on device 0. CUDA driver allocated memory was 807403520 and is now 1298137088. 2025-12-04T10:58:28.5038767Z 2025-12-04T10:58:28.5038843Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5039139Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5039141Z 2025-12-04T10:58:28.5039230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5039309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5039367Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5039648Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5039724Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5039765Z graph_break [] 2025-12-04T10:58:28.5039918Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5039967Z Traceback (most recent call last): 2025-12-04T10:58:28.5040120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5040165Z method(*args, **kwargs) 2025-12-04T10:58:28.5040316Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5040359Z method(*args, **kwargs) 2025-12-04T10:58:28.5040511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5040553Z with policy(): 2025-12-04T10:58:28.5040705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5040751Z raise RuntimeError(msg) 2025-12-04T10:58:28.5041153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 9728 and is now reported as 19456 on device 0. CUDA driver allocated memory was 1298137088 and is now 1312817152. 2025-12-04T10:58:28.5041172Z 2025-12-04T10:58:28.5041247Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5041560Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5041564Z 2025-12-04T10:58:28.5041651Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5041729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5041800Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5042077Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5042152Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5042193Z graph_break [] 2025-12-04T10:58:28.5042267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5042328Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5042399Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5042686Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5042723Z graph_break [] 2025-12-04T10:58:28.5042780Z =================================== FAILURES =================================== 2025-12-04T10:58:28.5042933Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 _ 2025-12-04T10:58:28.5042983Z Traceback (most recent call last): 2025-12-04T10:58:28.5043137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5043181Z method(*args, **kwargs) 2025-12-04T10:58:28.5043382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T10:58:28.5043425Z method(*args, **kwargs) 2025-12-04T10:58:28.5043579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T10:58:28.5043619Z with policy(): 2025-12-04T10:58:28.5043776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T10:58:28.5043819Z raise RuntimeError(msg) 2025-12-04T10:58:28.5044231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5044235Z 2025-12-04T10:58:28.5044310Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5044605Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5044608Z 2025-12-04T10:58:28.5044695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5044773Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5044830Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5045106Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5045199Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5045257Z graph_break [] 2025-12-04T10:58:28.5045331Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5045391Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5045463Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5045752Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5045792Z graph_break [] 2025-12-04T10:58:28.5045866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T10:58:28.5045927Z stats [('calls_captured', 3), ('unique_graphs', 1)] 2025-12-04T10:58:28.5045999Z aot_autograd [('total', 1), ('autograd_cache_bypass', 1), ('ok', 1)] 2025-12-04T10:58:28.5046274Z inductor [('pattern_matcher_nodes', 8), ('woq_matcher_nodes', 6), ('pattern_matcher_count', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('pad_mm_bench', 1), ('fxgraph_cache_miss', 1), ('woq_matcher_count', 1), ('extern_calls', 1)] 2025-12-04T10:58:28.5046334Z graph_break [] 2025-12-04T10:58:28.5046583Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-20346e84e00bf2bc.xml - 2025-12-04T10:58:28.5046644Z =========================== short test summary info ============================ 2025-12-04T10:58:28.5047280Z FAILED [0.4733s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16! Caching allocator allocated memory was 19456 and is now reported as 29184 on device 0. CUDA driver allocated memory was 1312817152 and is now 1327497216. 2025-12-04T10:58:28.5047284Z 2025-12-04T10:58:28.5047361Z To execute this test, run the following from the base repo dir: 2025-12-04T10:58:28.5047650Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5047652Z 2025-12-04T10:58:28.5047742Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T10:58:28.5047804Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T10:58:28.5047875Z ================== 1 failed, 57 deselected, 2 rerun in 4.29s =================== 2025-12-04T10:58:28.5047914Z Got exit code 1 2025-12-04T10:58:28.5048160Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16 2025-12-04T10:58:28.5048293Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T10:58:28.5048496Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-abb30614e4fe38ab.xml 2025-12-04T10:58:28.5048555Z ============================= test session starts ============================== 2025-12-04T10:58:28.5048671Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T10:58:28.5048713Z cachedir: .pytest_cache 2025-12-04T10:58:28.5048876Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T10:58:28.5048936Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T10:58:28.5048981Z configfile: pytest.ini 2025-12-04T10:58:28.5049154Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T10:58:28.5049232Z collecting ... collected 58 items / 58 deselected / 0 selected 2025-12-04T10:58:28.5049288Z stepcurrent: skipping 58 already run items. 2025-12-04T10:58:28.5049336Z Running 0 items in this shard 2025-12-04T10:58:28.5049338Z 2025-12-04T10:58:28.5049590Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-abb30614e4fe38ab.xml - 2025-12-04T10:58:28.5049653Z ============================ 58 deselected in 0.01s ============================ 2025-12-04T10:58:28.5061897Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_concat_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_144_out_features_65_cuda_bfloat16'] 2025-12-04T10:58:28.5061999Z 2025-12-04T10:58:28.5062191Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 1/1 (test/test-reports/inductor.test_cuda_select_algorithm_1.1_2c02839777e739fe_.log) 2025-12-04T10:58:28.5062194Z 2025-12-04T10:58:28.5062326Z Finished inductor/test_cuda_select_algorithm 1/1 ... [2025-12-04 10:58:28.199173][2249892.465835059], took 31.38min 2025-12-04T10:58:28.5062556Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:58:28.5062648Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:58:28.5062745Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T10:58:28.5062797Z Uploading artifacts took 0.00 seconds 2025-12-04T10:58:28.5062857Z inductor/test_cuda_select_algorithm 1/1 failed! 2025-12-04T10:58:28.5062970Z Running functorch/test_eager_transforms 1/1 ... [2025-12-04 10:58:28.205266][2249892.471938189] 2025-12-04T10:58:28.5063020Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:58:28.5063377Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'functorch/test_eager_transforms.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:58:28.205455] 2025-12-04T10:58:43.1443888Z 2025-12-04T10:58:43.1444881Z functorch/test_eager_transforms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_eager_transforms_1.1_afc7b31dc0abb405_.log 2025-12-04T10:58:43.1502507Z Running 360 items in this shard: test/functorch/test_eager_transforms.py::TestSliceArgnums::test_argnums_reorders, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_duplicate_argnums, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_negative_int_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_positive_int_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_flat_args_with_tuple_argnum, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_invalid_argnum_type, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_not_enough_argnums, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_out_of_bounds_argnum_values, test/functorch/test_eager_transforms.py::TestSliceArgnums::test_pytree_args, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_buffer_tying, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_combine_state_for_ensemble_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_combine_state_for_ensemble_smoke, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_correctness_mnist_mechanism_functional_call, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_correctness_mnist_mechanism_make_functional, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_disable_autograd_tracking_disable_autograd_tracking_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_disable_autograd_tracking_disable_autograd_tracking_True, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_make_functional_state_correctly_returned_after_forward_mechanism_functional_call, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_make_functional_state_correctly_returned_after_forward_mechanism_make_functional, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying_ensemble, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_parameter_tying_grad, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_leaf, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_mismatch_error, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_stack_module_state_smoke, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_using_detach_functional_call_detach_params_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_using_detach_functional_call_detach_params_True, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_with_buffers_disable_autograd_tracking_disable_autograd_tracking_False, test/functorch/test_eager_transforms.py::TestMakeFunctional::test_with_buffers_disable_autograd_tracking_disable_autograd_tracking_True, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_advanced_indexing_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composed_with_autograd_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_complicated_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_composite_two_ops_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_conj_bit_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_dtype_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_escaped_wrappers_are_ignored_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_escaped_wrappers_are_marked_as_dead_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_fn_with_kwargs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_functional_init_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_functional_init_with_buffers_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_of_vjp_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_of_vjp_of_grad_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_grad_pytree_inputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_captures_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_view_base_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_inplace_on_view_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_invalid_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_is_cuda_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_layout_sparse_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_manual_seed_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_negative_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_nesting_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_inside_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_mixed_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_nested_complicated_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_nested_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_fn_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_outside_vjp_only_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_no_grad_value_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_numel_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_out_of_order_argnums_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_primitive_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_print_captured_tensor_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_shape_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_ctor_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_grad_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_tensor_print_vmap_vmap_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_hessian_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_unrelated_vjp_multiple_inputs_outputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_view_inplace_simple_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_views_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_of_grad_composition_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_outputs_can_any_pytree_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_error_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_input_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_pytree_output_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_vjp_two_outputs_cuda, test/functorch/test_eager_transforms.py::TestGradTransformCUDA::test_zero_grad_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_log_softmax_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_new_empty_materializes_tensor_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_new_zeros_materializes_tensor_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_embeddingnet_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_embeddingnet_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_inplace_view_cuda, test/functorch/test_eager_transforms.py::TestVmapOfGradCUDA::test_per_sample_grads_simple_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_correctness_different_devices_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_correctness_different_devices_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_default_arg_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_default_arg_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_multi_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_multi_input_multi_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_unrelated_outputs_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_unrelated_outputs_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_zero_dim_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_against_reference_zero_dim_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_defaults_to_zero_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_defaults_to_zero_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_effect_on_return_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_effect_on_return_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_tuple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_argnums_tuple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_tensor_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_aux_tensor_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_chunksize_one__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_chunksize_one__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_composition__preallocate_and_copy_False_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_chunk_jacrev_composition__preallocate_and_copy_True_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_complex_error_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_diff_numel_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_diff_numel_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_dimensionality_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_dimensionality_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_empty_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_float_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_float_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_hessian_simple_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_inplace_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_inplace_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_jac_with_non_tensor_args_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_jac_with_non_tensor_args_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_args_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_args_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_multidim_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_outputs_pytree_multidim_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_inputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_multiple_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_multiple_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_single_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_multiple_outputs_single_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_negative_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_negative_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_nested_jac_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_nested_jac_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_out_of_bounds_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_out_of_bounds_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_outputs_can_any_pytree_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_outputs_can_any_pytree_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_repeated_argnums_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_repeated_argnums_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_not_flat_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_simple_not_flat_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_take_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_take_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_input_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_input_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_output_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_unrelated_output_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_vmap_on_jac_simple_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestJacCUDA::test_vmap_on_jac_simple_jacrev_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_autograd_function_disables_fwd_grad_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_aux_pytree_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_aux_tensor_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_inside_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_mixed_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_disable_fwd_grad_outside_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_inplace_on_captures_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_inputs_are_tuples_of_tensors_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_jvp_inside_autograd_function_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_jvp_new_tensor_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_inputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_inputs_outputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_multiple_outputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_nonempty_primals_and_tangents_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_outputs_can_any_pytree_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_primals_tangents_length_mismatch_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_pytree_inputs_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_pytree_inputs_error_cases_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_simple_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_strict_mode_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_unrelated_input_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_unrelated_output_cuda, test/functorch/test_eager_transforms.py::TestJvpCUDA::test_zerotensor_vmapjvp_interaction_cuda, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_basic_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_composition_grad_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_composition_vmap_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_errors_cuda, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_nested_input_nested_output_cuda_float32, test/functorch/test_eager_transforms.py::TestLinearizeCUDA::test_linearize_return_cuda_float32, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_base_inplace_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_base_view_inplace_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_all_dual_no_view_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_right_dual_base_prop_cuda, test/functorch/test_eager_transforms.py::TestVmapJvpInplaceViewCUDA::test_right_dual_view_prop_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_multi_input_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_simple_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_hessian_vectorize_correctness_unrelated_outputs_cuda, test/functorch/test_eager_transforms.py::TestHessianCUDA::test_jacfwd_different_levels_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_functionalize_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_function_no_setup_context_transform_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jacfwd_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jacrev_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_jvp_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_autograd_functional_vjp_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_functionalize_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_grad_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_can_use_vmap_when_key_is_excluded_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_functionalize_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_transforms_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_deprecation_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_grad_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_jvp_supports_saved_tensor_hooks_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_make_fx_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_no_warning_on_import_functorch_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_requires_grad_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_retain_grad_inside_transform_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_grad_and_value_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_hessian_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_transforms_dont_support_saved_tensor_hooks_transform_jacrev_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_doesnt_support_saved_tensor_hooks_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vjp_vmap_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_grad_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_vjp_cuda, test/functorch/test_eager_transforms.py::TestComposabilityCUDA::test_vmap_vmap_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_ensemble_regression_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_ensemble_regression_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_AlphaDropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_AlphaDropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_Dropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_Dropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_FeatureAlphaDropout_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_find_learning_rate_ensembling_FeatureAlphaDropout_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_lennard_jones_batched_jac_jac_jacfwd_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_lennard_jones_batched_jac_jac_jacrev_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_omniglot_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_omniglot_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_regression_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_maml_regression_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_resnet18_per_sample_grads_mechanism_functional_call_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_resnet18_per_sample_grads_mechanism_make_functional_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_functional_call_originally_track_running_stats_False_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_functional_call_originally_track_running_stats_True_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_make_functional_originally_track_running_stats_False_cuda, test/functorch/test_eager_transforms.py::TestExamplesCorrectnessCUDA::test_update_batch_norm_mechanism_make_functional_originally_track_running_stats_True_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_basic_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_functional_call_multiple_dicts_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_name_wrapping_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_no_grad_inside_grad_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_no_grad_outside_grad_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_vmap_grad_sum_cuda, test/functorch/test_eager_transforms.py::TestHigherOrderOperatorInteractionCUDA::test_vmap_sum_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fake_tensors_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_multi_out_op_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_out_op_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_reapply_views_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_fx_transpose_simple_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_grad_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_nonfunctional_output_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_opt_tensor_list_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_optional_tensorlist1_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_functionalize_optional_tensorlist2_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_inplace_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_linear_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_multioutput_inplace_slice_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_multioutput_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_resize_program_inputs_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_simple_view_cuda, test/functorch/test_eager_transforms.py::TestFunctionalizeCUDA::test_vmap_functionalize_jvp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_jvp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_False_save_for_vjp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_jvp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_input_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_input_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_neither_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_neither_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_False_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_function_returns_input_inner_requires_grad_True_save_for_vjp_save_tensors_output_mark_dirty_True_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_grad_fn_name_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_needs_input_grads_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_once_differentiable_autograd_vjp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_once_differentiable_grad_vjp_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionCUDA::test_set_materialize_grads_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_has_vmap_staticmethod_and_has_generate_vmap_rule_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_in_dims_multiple_inputs_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_in_dims_single_input_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_incompatible_out_dims_error_msg_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_info_object_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_kwarg_only_tensors_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_no_vmap_staticmethod_and_no_generate_vmap_rule_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_none_returns_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_should_have_two_returns_cuda, test/functorch/test_eager_transforms.py::TestAutogradFunctionVmapAPICUDA::test_skips_empty_layer_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_error_if_name_collision_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_nesting_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_overrides_saved_tensors_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_CtxWithSavedTensors_passthrough_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_debug_unwrap_cuda, test/functorch/test_eager_transforms.py::TestHelpersCUDA::test_reductify_leaf_cuda, test/functorch/test_eager_transforms.py::TestCompileTransformsCUDA::test_compile_vmap_hessian_cuda, test/functorch/test_eager_transforms.py::TestCompileTransformsCUDA::test_grad_deprecated_api_cuda, test/functorch/test_eager_transforms.py::TestGradTrackingTensorToListCUDA::test_tolist_conj_neg_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTrackingTensorToListCUDA::test_tolist_multidimensional_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTrackingTensorToListCUDA::test_tolist_nested_grad_cuda, test/functorch/test_eager_transforms.py::TestGradTrackingTensorToListCUDA::test_tolist_with_grad_cuda 2025-12-04T10:58:43.1554032Z 2025-12-04T10:58:43.1554169Z Finished functorch/test_eager_transforms 1/1 ... [2025-12-04 10:58:43.144489][2249907.411158463], took 0.25min 2025-12-04T10:58:43.1554592Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:58:43.1554950Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:58:43.1555178Z Running test_sparse_semi_structured 1/1 ... [2025-12-04 10:58:43.150466][2249907.417140054] 2025-12-04T10:58:43.1555368Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:58:43.1555762Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_sparse_semi_structured.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:58:43.150660] 2025-12-04T10:58:48.5229583Z 2025-12-04T10:58:48.5230409Z test_sparse_semi_structured 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_semi_structured_1.1_90c8a517329f38ed_.log 2025-12-04T10:58:48.5237012Z Running 42 items in this shard: test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cusparselt, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cutlass, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_sp24_compile, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_indices, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_linear, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_min_sparse_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mlp, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_TN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_to_sparse_semi_structured, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dim, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_values, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_gemm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_edge_case1, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_id, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_meta_correctness, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_prune_dense_static_sort, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pruning_algo_largest_abs_values_greedy, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply_dense, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_bmm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_mat_vec, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions_all_patterns, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_linear_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_sparse_semi_structured_ops_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_compile_autotune, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_csrc_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cusparselt_backend, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_fp8fp8_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm_fp8 2025-12-04T10:58:48.5242844Z 2025-12-04T10:58:48.5242966Z Finished test_sparse_semi_structured 1/1 ... [2025-12-04 10:58:48.522700][2249912.789368296], took 0.09min 2025-12-04T10:58:48.5243402Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T10:58:48.5289654Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T10:58:48.5292281Z Running inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 10:58:48.528975][2249912.795649153] 2025-12-04T10:58:48.5292831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T10:58:48.5293682Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_aot_inductor_arrayref.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:58:48.529160] 2025-12-04T11:02:57.6771373Z 2025-12-04T11:02:57.6772732Z inductor/test_aot_inductor_arrayref 1/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_aot_inductor_arrayref_1.2_b89ddd02971beda4_.log 2025-12-04T11:02:57.6820076Z Running 159 items in this shard: test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_m_32_n_64_q_group_64_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_32_num_groups_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_amp_fallback_random_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_constant_tensor_name_collision_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_fp8_dtype_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_sym_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printer_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_debug_printing_model_inputs_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_profiler_enable_kernel_profile_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_aoti_runtime_asserts_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_autotune_int64_user_defined_triton_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_backward_no_op_logging_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_bool_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_mutation_3_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_buffer_reuse_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_clamp_decomposition_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_codegen_int_array_var_fix_memory_leak_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_composed_dynamic_size_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_cpu_predicate_cuda_operands_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_mismatched_branch_output_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_nested_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_non_tensor_predicates_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_share_predicate_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_symint_input_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_use_buffers_from_outer_scope_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_cond_with_reinterpret_view_inputs_outputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_consecutive_compiles_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_constant_folding_with_update_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_conv3d_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_convolution_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_custom_op_in_subgraph_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_d2h_copy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_deconv_freezing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_device_moved_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_duplicated_params_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_scalar_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_dynamic_smem_above_default_limit_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_cat_dtype_promotion_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_empty_constant_folding_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fake_tensor_device_validation_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fallback_mem_leak_fix_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fill__fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_foreach_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_fqn_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_free_inactive_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_index_put_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_dynamic_dim_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_large_mmaped_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_libtorch_free_so_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_linear_dynamic_maxautotune_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_cubin_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_missing_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_mixed_device_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_model_modified_weights_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_multi_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nan_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_nested_tensor_from_jagged_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_non_default_gpu_device_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_none_args_aot_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_normal_functional_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_on_gpu_device1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_misaligned_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_1_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_output_path_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_poi_multiple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_profile_benchmark_harness_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_proxy_executor_permute_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quanatized_int8_linear_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_quantized_linear_bias_none_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_repeated_user_defined_triton_kernel_embed_kernel_binary_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replace_unbacked_symbol_with_backed_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_replicate_on_devices_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_return_view_constant_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_reuse_kernel_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_complex_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_dtype_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_fp8_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_large_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_shape_failed_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_same_backing_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scaled_grouped_mm_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_scatter_reduce_fallback_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_2_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sdpa_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_seq_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_dynamic_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_False_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_simple_embed_kernel_binary_True_max_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_from_multi_output_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_size_with_unbacked_add_expr_transitive_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_stride_with_unbacked_expr_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_subclasses_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sym_i64_input_codegen_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_symint_item_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_sympy_cpp_printer_min_max_minmax0_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_bool_param_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_dynamic_grid_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_equal_to_1_float_arg_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_extern_kernel_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_1_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_on_device_tma_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_reinterpret_view_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_sympy_expr_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_new_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_kernel_with_none_inputs_and_equal_to_1_arg_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_triton_mutated_autotuning_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_0_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbacked_expr_replacements_shift_k_3_use_static_size_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_unbounded_expr_substitutions_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_update_inactive_constant_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_using_model_name_for_files_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_weight_on_disk_legacy_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_conv_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_mixed_device_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_outer_buffers_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_pytree_inputs_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_False_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_sym_expr_cond_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_while_loop_with_unbacked_symint_closure_dynamic_True_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_grid_with_unbacked_symbols_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_buffer_cpu_with_stack_allocation, test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_zero_size_weight_cpu_with_stack_allocation 2025-12-04T11:02:57.6858251Z 2025-12-04T11:02:57.6858403Z Finished inductor/test_aot_inductor_arrayref 1/2 ... [2025-12-04 11:02:57.677062][2250161.943731671], took 4.15min 2025-12-04T11:02:57.6858840Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:02:57.6859241Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:02:57.6859480Z Running inductor/test_compile_subprocess 1/3 ... [2025-12-04 11:02:57.683313][2250161.949985847] 2025-12-04T11:02:57.6859681Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:02:57.6860085Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_compile_subprocess.py', '--shard-id=1', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:02:57.683487] 2025-12-04T11:06:25.8171069Z 2025-12-04T11:06:25.8171940Z inductor/test_compile_subprocess 1/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_subprocess_1.3_435a134dac1837b6_.log 2025-12-04T11:06:25.8208034Z Running 284 items in this shard: test/inductor/test_compile_subprocess.py::TestSubprocess::test_async, test/inductor/test_compile_subprocess.py::TestSubprocess::test_progressive, test/inductor/test_compile_subprocess.py::GPUTests::test__dyn_quant_matmul_4bit_fp32_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool1d_argmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_avg_pool_with_output_size_0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_adaptive_max_pool2d2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_complex_strided_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_add_const_float_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_addmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_angle_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_aoti_eager_cache_hit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_arange4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_argmax_argmin2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_as_strided_on_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_alignment_op_name_fail_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_alignment_op_name_pass_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_assert_size_stride_op_name_fail_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_async, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool2d_backward3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_avg_pool3d_backward4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bmm2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_add_autotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int32_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int64_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_int8_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_int_uint8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_bucketize_nd_tiling_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_buffer_copied_in_graph_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_builtins_round_int_ndigits_pos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_empty_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_extern_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cat_unbacked_empty_1d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_check_stack_no_cycles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_type_promotion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_clamp_type_promotion_non_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_complex_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_complex_memory_overlap_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_cudagraphs_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_config_option_dont_assume_alignment_recompiles_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_consecutive_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_constant_pad_3d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv3d_channels_last_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_backward_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_bn_fuse_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_conv_shape_check_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_convolution5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_copy_non_blocking_is_pinned_use_cat_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cos_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cudnn_rnn_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_cumsum_zero_dim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_op_default_layout_constraint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_custom_scan_op_compiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_deterministic_codegen_with_suffix_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_device_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_diagonal_copy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div5_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_presicion_accuracy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_prim_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_div_softmax_symfloat_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_bfloat16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float16_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_float64_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int16_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int32_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int64_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_int8_int8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_dtypeview_uint8_bfloat16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_bag_byte_unpack_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_embedding_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_empty_strided_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_emulate_precision_triton_fp_fusion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fallback_mutable_op_list_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fft_real_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float16_to_int16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_float32_to_int32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fmin_fmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fractional_max_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_full_like_transposed_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_fuse_tiled_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gelu_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_generated_code_has_size_stride_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_getitem_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_cpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_gpu_scalar_with_gpu_tensor_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_arange1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_arange2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_argmax_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_constant_tensor1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_graph_partition_unbacked_symint_as_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_grid_sampler_expand_preserves_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardsigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardswish_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_hardtanh_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_horizonal_fusion1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_horizonal_fusion2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_device_assert_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_propagation_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_fallback2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_index_put_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inductor_assert_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_activations_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_flip_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_mixed_dtype_ops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_inplace_resize_as_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_input_mutation1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_invalid_operand_issue1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_isin_tensor_scalar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_grid_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_strided_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_large_tensor_reduction_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lerp_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_like_rands_sliced_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_dynamic_maxautotune_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linear_float64_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_linspace3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_dynamic_shape_assertion_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_mode_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_lite_mode_not_decompose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_log1p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_1_dim_3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_low_memory_max_pool_dilation_2_dim_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mark_unbacked_with_hint_override_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_fill_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_masked_scatter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d6_dilation_1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d6_dilation_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_max_pool2d_with_indices_backward6_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mixed_mm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mm_views_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multi_gpu_recompile_on_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_prime_size_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_multilayer_var_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_mutable_custom_op_fixed_layout_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_assert_inside_triton_kernel_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_sort_stable_True_descending_False_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_nan_to_num_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_narrow_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_no_specization_over_symbolic_value_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_output_strides_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_cast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pad_view_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pattern_matcher_multi_user_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_permute2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_philox_rand_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pixel_shuffle_channels_last_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_airy_ai_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_bessel_j1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_chebyshev_polynomial_u_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erf_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_erfinv_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_exp2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_expit_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_i1e_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_legendre_polynomial_p_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_modified_bessel_k0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_polygamma_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_round_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_t_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_shifted_chebyshev_polynomial_v_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_spherical_bessel_j0_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pointwise_xlogy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_polar_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_by_natural_log2_dynamic_shapes_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_pow_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_profiler_mark_wrapper_call_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randint_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_randn_with_dtype_and_device_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reduction4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_reinterpret_dtypeview_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_clone_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_copy_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_slice1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_view_default_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_remove_noop_view_dtype_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_repeat_interleave_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_replication_pad_errors_with_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_round_correctness_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_cpu_tensor_arg_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_input_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scalar_output_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_scatter_bf16_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_searchsorted_broadcast_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_searchsorted_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_setitem_with_int_parameter_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_shape_prop_torch_ones_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_should_pad_bench_for_bmm_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sigmoid_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_simplify_loops_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_single_elem_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_size_asserts_for_multi_output_fallback_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_mutation3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter4_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_scatter_reinplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_slice_view_with_graph_break_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_softmax_one_kernel_persist_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sort_bool_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_cumsum_index_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_failed_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_reduction_dynamic_shape_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_split_with_sizes_with_unbacked_symints_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_sum_int_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tensor3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_tmp_not_defined_issue2_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_topk_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transpose_add_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transpose_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_transposed_propagates_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unbacked_float_item_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsigned_constant_tensors_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_float32_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unspec_inputs_uint8_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_unsqueeze_inplace_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_bilinear2d_a_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_upsample_nearest2d_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_vectorized_ops_masked_var_novec_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_as_real_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_detach_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_view_uint8_through_differing_bitwidths_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views1_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views3_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_views7_cuda, test/inductor/test_compile_subprocess.py::GPUTests::test_zeros_cuda 2025-12-04T11:06:25.8238981Z 2025-12-04T11:06:25.8239115Z Finished inductor/test_compile_subprocess 1/3 ... [2025-12-04 11:06:25.818190][2250370.084858799], took 3.47min 2025-12-04T11:06:25.8239557Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:06:25.8239938Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:06:25.8242352Z Running inductor/test_multi_kernel 1/1 ... [2025-12-04 11:06:25.824063][2250370.090736398] 2025-12-04T11:06:25.8242789Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:06:25.8244462Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_multi_kernel.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:06:25.824250] 2025-12-04T11:06:44.8156185Z 2025-12-04T11:06:44.8157226Z inductor/test_multi_kernel 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_multi_kernel_1.1_7ab509f30d9101a7_.log 2025-12-04T11:06:44.8161960Z Running 19 items in this shard: test/inductor/test_multi_kernel.py::MultiKernelTest::test_batchnorm_training, test/inductor/test_multi_kernel.py::MultiKernelTest::test_inplace_update, test/inductor/test_multi_kernel.py::MultiKernelTest::test_layernorm, test/inductor/test_multi_kernel.py::MultiKernelTest::test_pass_same_arg_multi_times, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper_non_persistent_reduction, test/inductor/test_multi_kernel.py::MultiKernelTest::test_reduction_scratch_buffer_cpp_wrapper_persistent_reduction, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_cpp_wrapper, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_force_non_persistent_reduction_force_kernel_0, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_force_non_persistent_reduction_force_kernel_1, test/inductor/test_multi_kernel.py::MultiKernelTest::test_softmax_warn_mixed_layout, test/inductor/test_multi_kernel.py::MultiKernelTest::test_sort_disables_multi_kernel, test/inductor/test_multi_kernel.py::MultiKernelTest::test_split_scan, test/inductor/test_multi_kernel.py::MultiKernelTest::test_transformer_snippet, test/inductor/test_multi_kernel.py::MultiKernelTest::test_transformer_snippet_with_fallback_random, test/inductor/test_multi_kernel.py::MultiKernelTest::test_triton_gemm, test/inductor/test_multi_kernel.py::MultiKernelTest::test_triton_relu_fused_gemm 2025-12-04T11:06:44.8166942Z 2025-12-04T11:06:44.8167221Z Finished inductor/test_multi_kernel 1/1 ... [2025-12-04 11:06:44.815213][2250389.081882465], took 0.32min 2025-12-04T11:06:44.8167804Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:06:44.8211607Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:06:44.8212575Z Running inductor/test_loop_ordering 1/1 ... [2025-12-04 11:06:44.821066][2250389.087739215] 2025-12-04T11:06:44.8213019Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:06:44.8214434Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_loop_ordering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:06:44.821260] 2025-12-04T11:07:17.2303441Z 2025-12-04T11:07:17.2304208Z inductor/test_loop_ordering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_loop_ordering_1.1_19de1510c534f434_.log 2025-12-04T11:07:17.2311226Z Running 53 items in this shard: test/inductor/test_loop_ordering.py::ImplDetailTest::test_merge_loops_invalidate_pw_dep_cache, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_and_merge_loops, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_modular_indexing, test/inductor/test_loop_ordering.py::ImplDetailTest::test_reorder_twice, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_3dred_pw_2d_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_apbt_realize, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_broadcast_shapes, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_different_reduction_order, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_for_reordering_reindex, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_cast_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fp8_pattern_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_reduction_with_tiled_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_fuse_with_scalar_shared_memory, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_multi_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_interaction_with_triton_template, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_keep_fake_dep, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_softmax, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_outer_dimension_sum_fuse_with_pw, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_pw_outer_red_2, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_sum_and_t, test/inductor/test_loop_ordering.py::LoopOrderingTest::test_view, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_coalescing, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_induced_fused_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps0, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps1, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps2, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_inferred_splits_inps3, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_no_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_reduction_pointwise, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_remapped_reads_split, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_tiling, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_solve_for_zero, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_False, test/inductor/test_loop_ordering.py::MemoryCoalescingTest::test_tiled_coalesce_analysis_downcast_transposed_v_True, test/inductor/test_loop_ordering.py::TestTiling::test_3d_pointwise, test/inductor/test_loop_ordering.py::TestTiling::test_cat, test/inductor/test_loop_ordering.py::TestTiling::test_find_broadcast_var, test/inductor/test_loop_ordering.py::TestTiling::test_mutation_deps, test/inductor/test_loop_ordering.py::TestTiling::test_penalized_small_dim, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_NHWC_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_T_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_NHWC, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_T, test/inductor/test_loop_ordering.py::TestTiling::test_pointwise_a_cont_b_cont, test/inductor/test_loop_ordering.py::TestTiling::test_tiled_reduction, test/inductor/test_loop_ordering.py::TestIndexInversion::test_inversion_cases, test/inductor/test_loop_ordering.py::TestIndexInversion::test_original_complex_expression 2025-12-04T11:07:17.2317349Z 2025-12-04T11:07:17.2317477Z Finished inductor/test_loop_ordering 1/1 ... [2025-12-04 11:07:17.230044][2250421.496711674], took 0.54min 2025-12-04T11:07:17.2317886Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:07:17.2364907Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:07:17.2367097Z Running dynamo/test_functions 1/1 ... [2025-12-04 11:07:17.236467][2250421.503140256] 2025-12-04T11:07:17.2367304Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:07:17.2368314Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_functions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:07:17.236661] 2025-12-04T11:07:46.4920752Z 2025-12-04T11:07:46.4921960Z dynamo/test_functions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_functions_1.1_cc6dd756285bf551_.log 2025-12-04T11:07:46.4978567Z Running 478 items in this shard: test/dynamo/test_functions.py::FunctionTests::test_T, test/dynamo/test_functions.py::FunctionTests::test_add, test/dynamo/test_functions.py::FunctionTests::test_add_, test/dynamo/test_functions.py::FunctionTests::test_addcdiv, test/dynamo/test_functions.py::FunctionTests::test_addcdiv_, test/dynamo/test_functions.py::FunctionTests::test_addcmul_, test/dynamo/test_functions.py::FunctionTests::test_are_functorch_transforms_active, test/dynamo/test_functions.py::FunctionTests::test_attrgetter, test/dynamo/test_functions.py::FunctionTests::test_broadcast_foreach_pow, test/dynamo/test_functions.py::FunctionTests::test_build_list_unpack, test/dynamo/test_functions.py::FunctionTests::test_call_dict1, test/dynamo/test_functions.py::FunctionTests::test_call_dict2, test/dynamo/test_functions.py::FunctionTests::test_call_dict3, test/dynamo/test_functions.py::FunctionTests::test_call_dict4, test/dynamo/test_functions.py::FunctionTests::test_call_dict5, test/dynamo/test_functions.py::FunctionTests::test_callable_builtin, test/dynamo/test_functions.py::FunctionTests::test_callable_class, test/dynamo/test_functions.py::FunctionTests::test_callable_lambda, test/dynamo/test_functions.py::FunctionTests::test_callable_list, test/dynamo/test_functions.py::FunctionTests::test_callable_torch, test/dynamo/test_functions.py::FunctionTests::test_chunks1, test/dynamo/test_functions.py::FunctionTests::test_class_dict, test/dynamo/test_functions.py::FunctionTests::test_cls_eq, test/dynamo/test_functions.py::FunctionTests::test_cls_hasattr, test/dynamo/test_functions.py::FunctionTests::test_cls_is, test/dynamo/test_functions.py::FunctionTests::test_compare_constant_and_tensor, test/dynamo/test_functions.py::FunctionTests::test_complex_closure, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add1, test/dynamo/test_functions.py::FunctionTests::test_const_tuple_add2, test/dynamo/test_functions.py::FunctionTests::test_constant1, test/dynamo/test_functions.py::FunctionTests::test_constant2, test/dynamo/test_functions.py::FunctionTests::test_constant3, test/dynamo/test_functions.py::FunctionTests::test_constant4, test/dynamo/test_functions.py::FunctionTests::test_constant_set, test/dynamo/test_functions.py::FunctionTests::test_context_wrapping_nested_functions_no_closure, test/dynamo/test_functions.py::FunctionTests::test_cublas_allow_tf32, test/dynamo/test_functions.py::FunctionTests::test_custom_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_default_dict_closure, test/dynamo/test_functions.py::FunctionTests::test_default_dict_constr, test/dynamo/test_functions.py::FunctionTests::test_default_dict_dict, test/dynamo/test_functions.py::FunctionTests::test_default_dict_lambda, test/dynamo/test_functions.py::FunctionTests::test_default_dict_list, test/dynamo/test_functions.py::FunctionTests::test_default_dict_set, test/dynamo/test_functions.py::FunctionTests::test_default_dict_tuple, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_defaultdict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_del, test/dynamo/test_functions.py::FunctionTests::test_deque, test/dynamo/test_functions.py::FunctionTests::test_device, test/dynamo/test_functions.py::FunctionTests::test_device_constant, test/dynamo/test_functions.py::FunctionTests::test_dict_copy, test/dynamo/test_functions.py::FunctionTests::test_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_dict_hasattr, test/dynamo/test_functions.py::FunctionTests::test_dict_id_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_items_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set1, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set2, test/dynamo/test_functions.py::FunctionTests::test_dict_key_set3, test/dynamo/test_functions.py::FunctionTests::test_dict_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_dict_ops, test/dynamo/test_functions.py::FunctionTests::test_dict_param_keys, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault1, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault2, test/dynamo/test_functions.py::FunctionTests::test_dict_setdefault3, test/dynamo/test_functions.py::FunctionTests::test_dict_sorted, test/dynamo/test_functions.py::FunctionTests::test_dict_tuple_lazy_guard, test/dynamo/test_functions.py::FunctionTests::test_dict_update, test/dynamo/test_functions.py::FunctionTests::test_dict_update_kwargs, test/dynamo/test_functions.py::FunctionTests::test_dict_values, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_available, test/dynamo/test_functions.py::FunctionTests::test_distributed_is_initialized, test/dynamo/test_functions.py::FunctionTests::test_dtype, test/dynamo/test_functions.py::FunctionTests::test_dtype_compare, test/dynamo/test_functions.py::FunctionTests::test_elipsis, test/dynamo/test_functions.py::FunctionTests::test_enumerate, test/dynamo/test_functions.py::FunctionTests::test_enumerate_custom, test/dynamo/test_functions.py::FunctionTests::test_enumerate_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter, test/dynamo/test_functions.py::FunctionTests::test_filter_fallback, test/dynamo/test_functions.py::FunctionTests::test_filter_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_infinite_iterator, test/dynamo/test_functions.py::FunctionTests::test_filter_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_filter_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_finfo, test/dynamo/test_functions.py::FunctionTests::test_flat_param_same_storage_size, test/dynamo/test_functions.py::FunctionTests::test_float, test/dynamo/test_functions.py::FunctionTests::test_fn_with_self_set, test/dynamo/test_functions.py::FunctionTests::test_foreach_lerp_, test/dynamo/test_functions.py::FunctionTests::test_fstrings1, test/dynamo/test_functions.py::FunctionTests::test_fstrings2, test/dynamo/test_functions.py::FunctionTests::test_fstrings3, test/dynamo/test_functions.py::FunctionTests::test_fstrings4, test/dynamo/test_functions.py::FunctionTests::test_fstrings5, test/dynamo/test_functions.py::FunctionTests::test_fstrings6, test/dynamo/test_functions.py::FunctionTests::test_funcdef_closure, test/dynamo/test_functions.py::FunctionTests::test_functools_cache_guard, test/dynamo/test_functions.py::FunctionTests::test_functools_partial, test/dynamo/test_functions.py::FunctionTests::test_functools_partial_binding, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_generic_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_get_autocast_gpu_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_calculate_correct_fan, test/dynamo/test_functions.py::FunctionTests::test_get_default_dtype, test/dynamo/test_functions.py::FunctionTests::test_get_device_properties_tensor_device, test/dynamo/test_functions.py::FunctionTests::test_get_privateuse1_name, test/dynamo/test_functions.py::FunctionTests::test_getattr, test/dynamo/test_functions.py::FunctionTests::test_getattr_metaclass, test/dynamo/test_functions.py::FunctionTests::test_globalfn, test/dynamo/test_functions.py::FunctionTests::test_globalmodule, test/dynamo/test_functions.py::FunctionTests::test_globalvar, test/dynamo/test_functions.py::FunctionTests::test_import1, test/dynamo/test_functions.py::FunctionTests::test_in_not_in, test/dynamo/test_functions.py::FunctionTests::test_index, test/dynamo/test_functions.py::FunctionTests::test_indexed_range, test/dynamo/test_functions.py::FunctionTests::test_indirect1, test/dynamo/test_functions.py::FunctionTests::test_indirect2, test/dynamo/test_functions.py::FunctionTests::test_indirect3, test/dynamo/test_functions.py::FunctionTests::test_inline_jit__unwrap_optional, test/dynamo/test_functions.py::FunctionTests::test_inline_jit_annotations, test/dynamo/test_functions.py::FunctionTests::test_inline_lru_cache_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_script_if_tracing_fn_with_default_args, test/dynamo/test_functions.py::FunctionTests::test_inline_softmax, test/dynamo/test_functions.py::FunctionTests::test_inline_with_default, test/dynamo/test_functions.py::FunctionTests::test_inner_function, test/dynamo/test_functions.py::FunctionTests::test_is, test/dynamo/test_functions.py::FunctionTests::test_is_any_autocast_enabled, test/dynamo/test_functions.py::FunctionTests::test_is_checkpoint_valid, test/dynamo/test_functions.py::FunctionTests::test_is_complex, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_frame_counts, test/dynamo/test_functions.py::FunctionTests::test_is_contiguous_memory_format, test/dynamo/test_functions.py::FunctionTests::test_is_floating_point, test/dynamo/test_functions.py::FunctionTests::test_is_fx_tracing, test/dynamo/test_functions.py::FunctionTests::test_is_in_onnx_export, test/dynamo/test_functions.py::FunctionTests::test_is_inference_mode_global_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_inference_recompilation, test/dynamo/test_functions.py::FunctionTests::test_is_integer, test/dynamo/test_functions.py::FunctionTests::test_is_not, test/dynamo/test_functions.py::FunctionTests::test_is_not_null, test/dynamo/test_functions.py::FunctionTests::test_is_quantized, test/dynamo/test_functions.py::FunctionTests::test_is_sparse, test/dynamo/test_functions.py::FunctionTests::test_isinstance, test/dynamo/test_functions.py::FunctionTests::test_islice_chain, test/dynamo/test_functions.py::FunctionTests::test_itemgetter, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain, test/dynamo/test_functions.py::FunctionTests::test_itertools_chain_from_iterable, test/dynamo/test_functions.py::FunctionTests::test_itertools_combinations, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress, test/dynamo/test_functions.py::FunctionTests::test_itertools_compress_tensors, test/dynamo/test_functions.py::FunctionTests::test_itertools_filterfalse_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_pairwise, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_basic, test/dynamo/test_functions.py::FunctionTests::test_itertools_permutations_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_product, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_args, test/dynamo/test_functions.py::FunctionTests::test_itertools_product_various_iterators, test/dynamo/test_functions.py::FunctionTests::test_itertools_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_jit_annotate, test/dynamo/test_functions.py::FunctionTests::test_len_constant_dict, test/dynamo/test_functions.py::FunctionTests::test_len_constant_list, test/dynamo/test_functions.py::FunctionTests::test_len_constant_misc_iterables, test/dynamo/test_functions.py::FunctionTests::test_len_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_add, test/dynamo/test_functions.py::FunctionTests::test_list_add_then_mutate, test/dynamo/test_functions.py::FunctionTests::test_list_clear, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill, test/dynamo/test_functions.py::FunctionTests::test_list_compare_polyfill_non_lists, test/dynamo/test_functions.py::FunctionTests::test_list_convert, test/dynamo/test_functions.py::FunctionTests::test_list_expand_lhs, test/dynamo/test_functions.py::FunctionTests::test_list_index_with_constant_tensor, test/dynamo/test_functions.py::FunctionTests::test_list_reversed, test/dynamo/test_functions.py::FunctionTests::test_list_setitem, test/dynamo/test_functions.py::FunctionTests::test_list_setitem_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice, test/dynamo/test_functions.py::FunctionTests::test_list_slice_assignment, test/dynamo/test_functions.py::FunctionTests::test_list_sorted1, test/dynamo/test_functions.py::FunctionTests::test_list_sorted2, test/dynamo/test_functions.py::FunctionTests::test_list_truth, test/dynamo/test_functions.py::FunctionTests::test_listarg1, test/dynamo/test_functions.py::FunctionTests::test_listarg2, test/dynamo/test_functions.py::FunctionTests::test_listarg3, test/dynamo/test_functions.py::FunctionTests::test_listarg4, test/dynamo/test_functions.py::FunctionTests::test_listarg5, test/dynamo/test_functions.py::FunctionTests::test_load_global_bool, test/dynamo/test_functions.py::FunctionTests::test_lru_cache_warning_issued_during_tracing, test/dynamo/test_functions.py::FunctionTests::test_mT, test/dynamo/test_functions.py::FunctionTests::test_manual_seed, test/dynamo/test_functions.py::FunctionTests::test_map_call_function_ex, test/dynamo/test_functions.py::FunctionTests::test_map_deque_extendleft, test/dynamo/test_functions.py::FunctionTests::test_map_dict_fromkeys, test/dynamo/test_functions.py::FunctionTests::test_map_enumerate, test/dynamo/test_functions.py::FunctionTests::test_map_infinite, test/dynamo/test_functions.py::FunctionTests::test_map_iter, test/dynamo/test_functions.py::FunctionTests::test_map_list, test/dynamo/test_functions.py::FunctionTests::test_map_list_extend, test/dynamo/test_functions.py::FunctionTests::test_map_list_slice_assign, test/dynamo/test_functions.py::FunctionTests::test_map_max, test/dynamo/test_functions.py::FunctionTests::test_map_max_const, test/dynamo/test_functions.py::FunctionTests::test_map_partial_unpack, test/dynamo/test_functions.py::FunctionTests::test_map_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_map_reduce, test/dynamo/test_functions.py::FunctionTests::test_map_return, test/dynamo/test_functions.py::FunctionTests::test_map_set, test/dynamo/test_functions.py::FunctionTests::test_map_sorted, test/dynamo/test_functions.py::FunctionTests::test_map_str_join, test/dynamo/test_functions.py::FunctionTests::test_map_sum, test/dynamo/test_functions.py::FunctionTests::test_map_tuple, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_twice, test/dynamo/test_functions.py::FunctionTests::test_map_unpack_vars, test/dynamo/test_functions.py::FunctionTests::test_map_with_graph_break, test/dynamo/test_functions.py::FunctionTests::test_map_zip_dict, test/dynamo/test_functions.py::FunctionTests::test_match_mapping_and_match_keys, test/dynamo/test_functions.py::FunctionTests::test_match_sequence, test/dynamo/test_functions.py::FunctionTests::test_math_fma, test/dynamo/test_functions.py::FunctionTests::test_math_radians, test/dynamo/test_functions.py::FunctionTests::test_mean_sum_np, test/dynamo/test_functions.py::FunctionTests::test_methodcall1, test/dynamo/test_functions.py::FunctionTests::test_methodcall2, test/dynamo/test_functions.py::FunctionTests::test_methodcall3, test/dynamo/test_functions.py::FunctionTests::test_methodcaller, test/dynamo/test_functions.py::FunctionTests::test_min_max, test/dynamo/test_functions.py::FunctionTests::test_module_constant, test/dynamo/test_functions.py::FunctionTests::test_namedtuple, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_defaults, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_fields, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_hasattr, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_replace, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_subclass, test/dynamo/test_functions.py::FunctionTests::test_namedtuple_user_methods, test/dynamo/test_functions.py::FunctionTests::test_ndarray_builtin_functions, test/dynamo/test_functions.py::FunctionTests::test_ndarray_method, test/dynamo/test_functions.py::FunctionTests::test_ndarray_methods_returning_scalar, test/dynamo/test_functions.py::FunctionTests::test_ndarray_reshape, test/dynamo/test_functions.py::FunctionTests::test_ndarray_transpose, test/dynamo/test_functions.py::FunctionTests::test_ndim, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_function, test/dynamo/test_functions.py::FunctionTests::test_no_recompile_inner_lambda, test/dynamo/test_functions.py::FunctionTests::test_non_inlined_closure, test/dynamo/test_functions.py::FunctionTests::test_not_list, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_as_input_int_or_float_int, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_float, test/dynamo/test_functions.py::FunctionTests::test_np_constant_collections_guards_int, test/dynamo/test_functions.py::FunctionTests::test_np_finfo, test/dynamo/test_functions.py::FunctionTests::test_np_iinfo, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type0, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_as_integer_ratio_num_type3, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_bit_length_num_type1, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type2, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_conjugate_num_type4, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_hex_num_type5, test/dynamo/test_functions.py::FunctionTests::test_number_method_method_is_integer_num_type6, test/dynamo/test_functions.py::FunctionTests::test_numpy_attributes, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_argument_to_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_dtype_call_in_function, test/dynamo/test_functions.py::FunctionTests::test_numpy_fft, test/dynamo/test_functions.py::FunctionTests::test_numpy_linalg, test/dynamo/test_functions.py::FunctionTests::test_numpy_meshgrid, test/dynamo/test_functions.py::FunctionTests::test_numpy_random, test/dynamo/test_functions.py::FunctionTests::test_numpy_size, test/dynamo/test_functions.py::FunctionTests::test_obj_eq, test/dynamo/test_functions.py::FunctionTests::test_obj_is, test/dynamo/test_functions.py::FunctionTests::test_ordered_dict_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partial_across_graph_break_uninvoked, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_UDF, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_as_input_partials_mod, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_args_and_kwargs, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix, test/dynamo/test_functions.py::FunctionTests::test_partials_graph_break_reconstruct_mix_no_source, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___annotations__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___builtins__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___call__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___class__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___closure__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___code__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___defaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___delattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dict__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___dir__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___doc__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___eq__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___format__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ge__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___get__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___getattribute__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___globals__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___gt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___hash__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___init_subclass__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___kwdefaults__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___le__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___lt__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___module__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___name__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___ne__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___new__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___qualname__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___reduce_ex__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___repr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___setattr__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___sizeof__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___str__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr___subclasshook__, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_args, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_func, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_attr_keywords, test/dynamo/test_functions.py::FunctionTests::test_partials_hasattr_set_attr, test/dynamo/test_functions.py::FunctionTests::test_partials_lambda, test/dynamo/test_functions.py::FunctionTests::test_partials_recompilation, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_torch_op_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_arg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_method, test/dynamo/test_functions.py::FunctionTests::test_partials_udf_kwarg_module, test/dynamo/test_functions.py::FunctionTests::test_pop, test/dynamo/test_functions.py::FunctionTests::test_pos, test/dynamo/test_functions.py::FunctionTests::test_pos_only_args_with_same_name_in_star_kwargs, test/dynamo/test_functions.py::FunctionTests::test_pow_int, test/dynamo/test_functions.py::FunctionTests::test_promote_types, test/dynamo/test_functions.py::FunctionTests::test_rand_inlined, test/dynamo/test_functions.py::FunctionTests::test_rand_tensor_partial, test/dynamo/test_functions.py::FunctionTests::test_range1, test/dynamo/test_functions.py::FunctionTests::test_range2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_2, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break, test/dynamo/test_functions.py::FunctionTests::test_range_iterator_graph_break_2, test/dynamo/test_functions.py::FunctionTests::test_range_length, test/dynamo/test_functions.py::FunctionTests::test_range_with_index, test/dynamo/test_functions.py::FunctionTests::test_range_with_slice_index, test/dynamo/test_functions.py::FunctionTests::test_reduce, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_none_initial, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single, test/dynamo/test_functions.py::FunctionTests::test_reduce_with_single_with_initial, test/dynamo/test_functions.py::FunctionTests::test_return_dict, test/dynamo/test_functions.py::FunctionTests::test_return_dict2, test/dynamo/test_functions.py::FunctionTests::test_return_multiple_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_numpy_ndarray, test/dynamo/test_functions.py::FunctionTests::test_return_tuple1, test/dynamo/test_functions.py::FunctionTests::test_return_tuple2, test/dynamo/test_functions.py::FunctionTests::test_returning_recursive_func, test/dynamo/test_functions.py::FunctionTests::test_round, test/dynamo/test_functions.py::FunctionTests::test_set_add, test/dynamo/test_functions.py::FunctionTests::test_set_in_frozenset, test/dynamo/test_functions.py::FunctionTests::test_set_keys_view, test/dynamo/test_functions.py::FunctionTests::test_set_update_bytecode, test/dynamo/test_functions.py::FunctionTests::test_set_update_list_with_duplicated_items, test/dynamo/test_functions.py::FunctionTests::test_shape1, test/dynamo/test_functions.py::FunctionTests::test_shape2, test/dynamo/test_functions.py::FunctionTests::test_size_tuple_add, test/dynamo/test_functions.py::FunctionTests::test_slice1, test/dynamo/test_functions.py::FunctionTests::test_slice2, test/dynamo/test_functions.py::FunctionTests::test_slice3, test/dynamo/test_functions.py::FunctionTests::test_slice4, test/dynamo/test_functions.py::FunctionTests::test_slice5, test/dynamo/test_functions.py::FunctionTests::test_slice6, test/dynamo/test_functions.py::FunctionTests::test_slice_eq, test/dynamo/test_functions.py::FunctionTests::test_sliced_range, test/dynamo/test_functions.py::FunctionTests::test_sorted_const_key_non_const_items, test/dynamo/test_functions.py::FunctionTests::test_sourceless_build_method_type, test/dynamo/test_functions.py::FunctionTests::test_startswith, test/dynamo/test_functions.py::FunctionTests::test_sum, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_shortcut_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_arg, test/dynamo/test_functions.py::FunctionTests::test_sum_with_start_kwarg, test/dynamo/test_functions.py::FunctionTests::test_symbool_to_int, test/dynamo/test_functions.py::FunctionTests::test_tensor_dim, test/dynamo/test_functions.py::FunctionTests::test_tensor_element_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_is_complex, test/dynamo/test_functions.py::FunctionTests::test_tensor_len, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_shape, test/dynamo/test_functions.py::FunctionTests::test_tensor_new_with_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size, test/dynamo/test_functions.py::FunctionTests::test_tensor_size_indexed_by_symint, test/dynamo/test_functions.py::FunctionTests::test_tensor_type, test/dynamo/test_functions.py::FunctionTests::test_tensor_type2, test/dynamo/test_functions.py::FunctionTests::test_tensor_type3, test/dynamo/test_functions.py::FunctionTests::test_tensor_type4, test/dynamo/test_functions.py::FunctionTests::test_tensor_type5, test/dynamo/test_functions.py::FunctionTests::test_to, test/dynamo/test_functions.py::FunctionTests::test_torch_distributions_functions, test/dynamo/test_functions.py::FunctionTests::test_torch_from_numpy, test/dynamo/test_functions.py::FunctionTests::test_torch_get_device_module, test/dynamo/test_functions.py::FunctionTests::test_torch_size_as_dict_key, test/dynamo/test_functions.py::FunctionTests::test_torch_size_hasattr, test/dynamo/test_functions.py::FunctionTests::test_torch_source, test/dynamo/test_functions.py::FunctionTests::test_transpose_for_scores, test/dynamo/test_functions.py::FunctionTests::test_truth, test/dynamo/test_functions.py::FunctionTests::test_tuple1, test/dynamo/test_functions.py::FunctionTests::test_tuple2, test/dynamo/test_functions.py::FunctionTests::test_tuple_contains, test/dynamo/test_functions.py::FunctionTests::test_tuple_iadd, test/dynamo/test_functions.py::FunctionTests::test_tuple_map, test/dynamo/test_functions.py::FunctionTests::test_tuple_sorted, test/dynamo/test_functions.py::FunctionTests::test_two_point_iter, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op, test/dynamo/test_functions.py::FunctionTests::test_unary_fold_op_seq, test/dynamo/test_functions.py::FunctionTests::test_unpack1, test/dynamo/test_functions.py::FunctionTests::test_unpack2, test/dynamo/test_functions.py::FunctionTests::test_unpack3, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex1, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex2, test/dynamo/test_functions.py::FunctionTests::test_unpack_ex3, test/dynamo/test_functions.py::FunctionTests::test_unpack_mutable_map, test/dynamo/test_functions.py::FunctionTests::test_unsqueeze_inplace, test/dynamo/test_functions.py::FunctionTests::test_viamethod, test/dynamo/test_functions.py::FunctionTests::test_viatorch, test/dynamo/test_functions.py::FunctionTests::test_zip_longest, test/dynamo/test_functions.py::FunctionTests::test_zip_reconstruct, test/dynamo/test_functions.py::DefaultsTests::test_cast_tensor_single_elem, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_factory, test/dynamo/test_functions.py::DefaultsTests::test_dataclass_nested, test/dynamo/test_functions.py::DefaultsTests::test_fn_with_attr, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_construction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_illegal_call_method, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_copy, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_intersection, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_symmetric_difference, test/dynamo/test_functions.py::DefaultsTests::test_frozenset_return_type_method_name_union, test/dynamo/test_functions.py::DefaultsTests::test_full_with_tensor_fill_value, test/dynamo/test_functions.py::DefaultsTests::test_func_attrs, test/dynamo/test_functions.py::DefaultsTests::test_func_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_func_default_torch_args, test/dynamo/test_functions.py::DefaultsTests::test_functional_compile, test/dynamo/test_functions.py::DefaultsTests::test_functools_partial_id, test/dynamo/test_functions.py::DefaultsTests::test_fx_immutable_list_mutation_not_allowed, test/dynamo/test_functions.py::DefaultsTests::test_fx_map_aggregate, test/dynamo/test_functions.py::DefaultsTests::test_gpu_current_device, test/dynamo/test_functions.py::DefaultsTests::test_in_set_inplace, test/dynamo/test_functions.py::DefaultsTests::test_in_set_would_fail_broadcast, test/dynamo/test_functions.py::DefaultsTests::test_inspect_method_source, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_init_in_compile_vmapped_mutated_tensor_tensor_multi_arg, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_mutated_tensor_tensor_across_graph_break, test/dynamo/test_functions.py::DefaultsTests::test_is_not_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_is_vmapped_mutated_tensor_tensor, test/dynamo/test_functions.py::DefaultsTests::test_keyword, test/dynamo/test_functions.py::DefaultsTests::test_listlike_of_tensors_contains_constant, test/dynamo/test_functions.py::DefaultsTests::test_map_strict, test/dynamo/test_functions.py::DefaultsTests::test_map_strict_with_graph_break, test/dynamo/test_functions.py::DefaultsTests::test_meth_default_tensor_args, test/dynamo/test_functions.py::DefaultsTests::test_property_class_transmute, test/dynamo/test_functions.py::DefaultsTests::test_property_functools_partial, test/dynamo/test_functions.py::DefaultsTests::test_pybind_object, test/dynamo/test_functions.py::DefaultsTests::test_reconstructed_name, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___frozenset, test/dynamo/test_functions.py::DefaultsTests::test_set_call___init___set, test/dynamo/test_functions.py::DefaultsTests::test_set_construction, test/dynamo/test_functions.py::DefaultsTests::test_skip_function_call_very_weird_value, test/dynamo/test_functions.py::DefaultsTests::test_str_handler_for_user_defined_object, test/dynamo/test_functions.py::DefaultsTests::test_sys_recursionlimit, test/dynamo/test_functions.py::DefaultsTests::test_tree_map, test/dynamo/test_functions.py::DefaultsTests::test_udf_list, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_udf_list_slice, test/dynamo/test_functions.py::DefaultsTests::test_udf_namedtuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_construction_custom_new, test/dynamo/test_functions.py::DefaultsTests::test_udf_tuple_reconstruction, test/dynamo/test_functions.py::DefaultsTests::test_zip_strict 2025-12-04T11:07:46.5024227Z 2025-12-04T11:07:46.5024346Z Finished dynamo/test_functions 1/1 ... [2025-12-04 11:07:46.492195][2250450.758865], took 0.49min 2025-12-04T11:07:46.5024744Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:07:46.5025124Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:07:46.5025365Z Running dynamo/test_regional_inductor 1/1 ... [2025-12-04 11:07:46.497998][2250450.76467098] 2025-12-04T11:07:46.5025565Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:07:46.5025973Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_regional_inductor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:07:46.498188] 2025-12-04T11:08:11.1998663Z 2025-12-04T11:08:11.1999511Z dynamo/test_regional_inductor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_regional_inductor_1.1_c522314514d97697_.log 2025-12-04T11:08:11.2005108Z Running 20 items in this shard: test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_annotation_inductor_configs, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_boxed_calling_convention, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_flex_attention_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_flex_attention_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_invalid_inductor_config, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_invoke_subgraph_inner_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_invoke_subgraph_inner_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_invoke_subgraph_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_invoke_subgraph_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_max_autotune_no_cudagraphs_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_max_autotune_no_cudagraphs_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_repeated_blocks_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_repeated_blocks_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_selective_ac_flex_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_selective_ac_flex_serialize_True, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_simple_serialize_False, test/dynamo/test_regional_inductor.py::RegionalInductorTests::test_simple_serialize_True, test/dynamo/test_regional_inductor.py::TestRegionalOutputCode::test_regional_compiled_forward_backward, test/dynamo/test_regional_inductor.py::TestRegionalOutputCode::test_regional_output_code_serialization, test/dynamo/test_regional_inductor.py::TestRegionalOutputCode::test_regional_output_code_with_backward 2025-12-04T11:08:11.2009794Z 2025-12-04T11:08:11.2009997Z Finished dynamo/test_regional_inductor 1/1 ... [2025-12-04 11:08:11.199542][2250475.46621197], took 0.41min 2025-12-04T11:08:11.2010533Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:08:11.2056496Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:08:11.2058145Z Running inductor/test_inplace_padding 1/1 ... [2025-12-04 11:08:11.205690][2250475.472363535] 2025-12-04T11:08:11.2058529Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:08:11.2060074Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:08:11.205879] 2025-12-04T11:08:27.1446720Z 2025-12-04T11:08:27.1447805Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_6b6e64d6b0ef4ebf_.log 2025-12-04T11:08:27.1450690Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input 2025-12-04T11:08:27.1451948Z 2025-12-04T11:08:27.1452083Z Finished inductor/test_inplace_padding 1/1 ... [2025-12-04 11:08:27.144366][2250491.411035454], took 0.27min 2025-12-04T11:08:27.1454841Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:08:27.1506003Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:08:27.1507994Z Running inductor/test_fp8 1/1 ... [2025-12-04 11:08:27.150707][2250491.417380407] 2025-12-04T11:08:27.1508186Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:08:27.1509942Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_fp8.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:08:27.150903] 2025-12-04T11:45:24.2413593Z 2025-12-04T11:45:24.2414570Z PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_6df9ffee87f5c527_.log) 2025-12-04T11:45:24.2419256Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a83d7a3c7bb0c63.xml 2025-12-04T11:45:24.2419963Z ============================= test session starts ============================== 2025-12-04T11:45:24.2420568Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.2421090Z cachedir: .pytest_cache 2025-12-04T11:45:24.2422395Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.2422809Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.2423037Z configfile: pytest.ini 2025-12-04T11:45:24.2423470Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.2423870Z collecting ... collected 188 items 2025-12-04T11:45:24.2424103Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T11:45:24.2470185Z Running 188 items in this shard: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda, test/inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda, test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda 2025-12-04T11:45:24.2501676Z 2025-12-04T11:45:24.2501848Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [1.0970s] [ 0%] 2025-12-04T11:45:24.2503946Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.1592s] [ 1%] 2025-12-04T11:45:24.2504322Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.4038s] [ 1%] 2025-12-04T11:45:24.2504669Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.1300s] [ 2%] 2025-12-04T11:45:24.2505035Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.2876s] [ 2%] 2025-12-04T11:45:24.2505371Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.2130s] [ 3%] 2025-12-04T11:45:24.2505700Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2192s] [ 3%] 2025-12-04T11:45:24.2506031Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.3208s] [ 4%] 2025-12-04T11:45:24.2506366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2330s] [ 4%] 2025-12-04T11:45:24.2506701Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_along_with_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.2522s] [ 5%] 2025-12-04T11:45:24.2507087Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,1,15_cuda PASSED [0.2013s] [ 5%] 2025-12-04T11:45:24.2507400Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,15_cuda PASSED [0.2046s] [ 6%] 2025-12-04T11:45:24.2507711Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,4096_cuda PASSED [0.2271s] [ 6%] 2025-12-04T11:45:24.2508024Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_1,10,512_cuda PASSED [0.2191s] [ 7%] 2025-12-04T11:45:24.2508336Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.2381s] [ 7%] 2025-12-04T11:45:24.2508648Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,1,15_cuda PASSED [0.2064s] [ 8%] 2025-12-04T11:45:24.2508949Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,15_cuda PASSED [0.2068s] [ 9%] 2025-12-04T11:45:24.2509251Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,4096_cuda PASSED [0.4217s] [ 9%] 2025-12-04T11:45:24.2509558Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_1,10,512_cuda PASSED [0.2335s] [ 10%] 2025-12-04T11:45:24.2509865Z inductor/test_fp8.py::TestFP8TypesCUDA::test_amax_fp8_quant_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.2578s] [ 10%] 2025-12-04T11:45:24.2510415Z inductor/test_fp8.py::TestFP8TypesCUDA::test_bad_cast_cuda C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e5m2, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise( 2025-12-04T11:45:24.2510931Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] 'cuda', 2025-12-04T11:45:24.2511192Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] torch.float8_e5m2, 2025-12-04T11:45:24.2511468Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] def inner_fn(index): 2025-12-04T11:45:24.2511747Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] i0, i1, i2 = index 2025-12-04T11:45:24.2512057Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77) 2025-12-04T11:45:24.2512427Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] tmp1 = ops.to_dtype(tmp0, torch.float8_e5m2, src_dtype=torch.float8_e4m3fn) 2025-12-04T11:45:24.2512774Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] return tmp1 2025-12-04T11:45:24.2513037Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] , 2025-12-04T11:45:24.2513361Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] ranges=[s77, s77, s77], 2025-12-04T11:45:24.2513657Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] origin_node=convert_element_type, 2025-12-04T11:45:24.2513996Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] origins=OrderedSet([convert_element_type]), 2025-12-04T11:45:24.2514291Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] stack_traces = {, 2025-12-04T11:45:24.2514644Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] File "/var/lib/jenkins/pytorch/test/inductor/test_fp8.py", line 164, in fp8_cast, 2025-12-04T11:45:24.2515011Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] return x.to(dtype=dtype), 2025-12-04T11:45:24.2515276Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] , 2025-12-04T11:45:24.2515499Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] } 2025-12-04T11:45:24.2515857Z C1204 11:08:37.906000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/0] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None) 2025-12-04T11:45:24.2516388Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] Error in codegen for ComputedBuffer(name='buf0', layout=FixedLayout('cuda:0', torch.float8_e4m3fn, size=[s77, s77, s77], stride=[s77**2, s77, 1]), data=Pointwise( 2025-12-04T11:45:24.2516808Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] 'cuda', 2025-12-04T11:45:24.2517064Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] torch.float8_e4m3fn, 2025-12-04T11:45:24.2517339Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] def inner_fn(index): 2025-12-04T11:45:24.2517613Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] i0, i1, i2 = index 2025-12-04T11:45:24.2517920Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] tmp0 = ops.load(arg1_1, i2 + i0 * s77**2 + i1 * s77) 2025-12-04T11:45:24.2518288Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] tmp1 = ops.to_dtype(tmp0, torch.float8_e4m3fn, src_dtype=torch.float8_e5m2) 2025-12-04T11:45:24.2518615Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] return tmp1 2025-12-04T11:45:24.2518852Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] , 2025-12-04T11:45:24.2519101Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] ranges=[s77, s77, s77], 2025-12-04T11:45:24.2519395Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] origin_node=convert_element_type, 2025-12-04T11:45:24.2519713Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] origins=OrderedSet([convert_element_type]), 2025-12-04T11:45:24.2520011Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] stack_traces = {, 2025-12-04T11:45:24.2520359Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] File "/var/lib/jenkins/pytorch/test/inductor/test_fp8.py", line 164, in fp8_cast, 2025-12-04T11:45:24.2520723Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] return x.to(dtype=dtype), 2025-12-04T11:45:24.2520983Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] , 2025-12-04T11:45:24.2521225Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] } 2025-12-04T11:45:24.2521581Z C1204 11:08:37.945000 662548 site-packages/torch/_inductor/scheduler.py:1683] [0/1] ), _split_size=None, _original_inner_fn=None, _original_ranges=None, _original_reduction_ranges=None) 2025-12-04T11:45:24.2521869Z PASSED [0.1540s] [ 11%] 2025-12-04T11:45:24.2522065Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_bfloat16_cuda_bfloat16 PASSED [1.6487s] [ 11%] 2025-12-04T11:45:24.2522392Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [1.5435s] [ 12%] 2025-12-04T11:45:24.2522725Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 ('RERUN', {'yellow': True}) [1.0968s] [ 12%] 2025-12-04T11:45:24.2523032Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 FAILED [1.0730s] [ 12%] 2025-12-04T11:45:24.2523193Z 2025-12-04T11:45:24.2523287Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.2523471Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2523649Z Traceback (most recent call last): 2025-12-04T11:45:24.2523898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2524161Z method(*args, **kwargs) 2025-12-04T11:45:24.2524386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2524617Z method(*args, **kwargs) 2025-12-04T11:45:24.2524832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2525058Z with policy(): 2025-12-04T11:45:24.2525268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2525497Z raise RuntimeError(msg) 2025-12-04T11:45:24.2525911Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1197473792 and is now 1304428544. 2025-12-04T11:45:24.2526277Z 2025-12-04T11:45:24.2526352Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2526656Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2526882Z 2025-12-04T11:45:24.2526970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2527170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2527329Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2527462Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2527658Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2528241Z inductor [('triton_bundler_save_kernel', 128), ('benchmarking.InductorBenchmarker.benchmark', 14), ('benchmarking.InductorBenchmarker.benchmark_gpu', 14), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2528754Z graph_break [] 2025-12-04T11:45:24.2528924Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2529160Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2529344Z Traceback (most recent call last): 2025-12-04T11:45:24.2529598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2529856Z method(*args, **kwargs) 2025-12-04T11:45:24.2530093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2530329Z method(*args, **kwargs) 2025-12-04T11:45:24.2530555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2530788Z with policy(): 2025-12-04T11:45:24.2531025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2531270Z raise RuntimeError(msg) 2025-12-04T11:45:24.2531681Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1302331392 and is now 1344274432. 2025-12-04T11:45:24.2532100Z 2025-12-04T11:45:24.2532178Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2532490Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2532774Z 2025-12-04T11:45:24.2532865Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2533082Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2533313Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2533453Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2533715Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2534352Z inductor [('triton_bundler_save_kernel', 128), ('benchmarking.InductorBenchmarker.benchmark', 14), ('benchmarking.InductorBenchmarker.benchmark_gpu', 14), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2534942Z graph_break [] 2025-12-04T11:45:24.2535157Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2535430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2535635Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2535786Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2535977Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2536554Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 17), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2537063Z graph_break [] 2025-12-04T11:45:24.2537278Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2537502Z =================================== FAILURES =================================== 2025-12-04T11:45:24.2537699Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2537884Z Traceback (most recent call last): 2025-12-04T11:45:24.2538157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2538415Z method(*args, **kwargs) 2025-12-04T11:45:24.2538699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2538984Z method(*args, **kwargs) 2025-12-04T11:45:24.2539278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2539568Z with policy(): 2025-12-04T11:45:24.2539843Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2540102Z raise RuntimeError(msg) 2025-12-04T11:45:24.2540553Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1342177280 and is now 1375731712. 2025-12-04T11:45:24.2540911Z 2025-12-04T11:45:24.2540989Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2541287Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2541508Z 2025-12-04T11:45:24.2541598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2541795Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2541953Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2542085Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2542296Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2542872Z inductor [('triton_bundler_save_kernel', 128), ('benchmarking.InductorBenchmarker.benchmark', 14), ('benchmarking.InductorBenchmarker.benchmark_gpu', 14), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2543444Z graph_break [] 2025-12-04T11:45:24.2543677Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2543964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2544138Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2544316Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2544565Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2545173Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 17), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2545688Z graph_break [] 2025-12-04T11:45:24.2545847Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2546070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2546229Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2546365Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2546559Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2547138Z inductor [('triton_bundler_save_kernel', 112), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark', 11), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2547645Z graph_break [] 2025-12-04T11:45:24.2547801Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2548148Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a83d7a3c7bb0c63.xml - 2025-12-04T11:45:24.2548476Z =========================== short test summary info ============================ 2025-12-04T11:45:24.2549061Z FAILED [1.0730s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1342177280 and is now 1375731712. 2025-12-04T11:45:24.2549557Z 2025-12-04T11:45:24.2549657Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2549962Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2550199Z 2025-12-04T11:45:24.2550288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2550482Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.2550681Z ==================== 1 failed, 22 passed, 2 rerun in 11.29s ==================== 2025-12-04T11:45:24.2550835Z Got exit code 1 2025-12-04T11:45:24.2550943Z Retrying single test... 2025-12-04T11:45:24.2551159Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-788c969b172c46f1.xml 2025-12-04T11:45:24.2551425Z ============================= test session starts ============================== 2025-12-04T11:45:24.2551646Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.2551844Z cachedir: .pytest_cache 2025-12-04T11:45:24.2552076Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.2552329Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.2552454Z configfile: pytest.ini 2025-12-04T11:45:24.2552728Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.2553007Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.2553359Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2553616Z Running 1 items in this shard 2025-12-04T11:45:24.2553696Z 2025-12-04T11:45:24.2553966Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:08:50.939071546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2554279Z 2025-12-04T11:45:24.2554432Z [W1204 11:08:50.164654582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2554624Z 2025-12-04T11:45:24.2554782Z [W1204 11:08:50.164782821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2554997Z 2025-12-04T11:45:24.2555148Z [W1204 11:08:50.166624335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2555343Z 2025-12-04T11:45:24.2555497Z [W1204 11:08:50.166690754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2555688Z 2025-12-04T11:45:24.2555840Z [W1204 11:08:50.167172188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2556032Z 2025-12-04T11:45:24.2556187Z [W1204 11:08:50.167281596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2556379Z 2025-12-04T11:45:24.2556538Z [W1204 11:08:50.167331606 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2556762Z 2025-12-04T11:45:24.2556943Z [W1204 11:08:50.167500433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2557146Z 2025-12-04T11:45:24.2557304Z [W1204 11:08:50.167551022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2557492Z 2025-12-04T11:45:24.2557663Z [W1204 11:08:50.167806219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2557862Z 2025-12-04T11:45:24.2558011Z [W1204 11:08:50.167857758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2558207Z 2025-12-04T11:45:24.2558362Z [W1204 11:08:50.168020476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2558555Z 2025-12-04T11:45:24.2558706Z [W1204 11:08:50.168073005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2558901Z 2025-12-04T11:45:24.2559050Z [W1204 11:08:50.168194634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2559264Z 2025-12-04T11:45:24.2559419Z [W1204 11:08:50.168241313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2559617Z 2025-12-04T11:45:24.2559771Z [W1204 11:08:50.168340702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2559963Z 2025-12-04T11:45:24.2560112Z [W1204 11:08:50.168388141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2560307Z 2025-12-04T11:45:24.2560470Z [W1204 11:08:52.724831407 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2560662Z 2025-12-04T11:45:24.2560813Z [W1204 11:08:52.725035704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2561008Z 2025-12-04T11:45:24.2561184Z [W1204 11:08:52.725095023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2561377Z 2025-12-04T11:45:24.2561530Z [W1204 11:08:52.725351480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2561718Z 2025-12-04T11:45:24.2561877Z [W1204 11:08:52.725405789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2562082Z 2025-12-04T11:45:24.2562240Z [W1204 11:08:52.725533277 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2562438Z 2025-12-04T11:45:24.2562588Z [W1204 11:08:52.725606296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2562787Z 2025-12-04T11:45:24.2562936Z [W1204 11:08:52.725651406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2563133Z 2025-12-04T11:45:24.2563321Z [W1204 11:08:52.725783504 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2563514Z 2025-12-04T11:45:24.2563669Z [W1204 11:08:52.725831493 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2563890Z 2025-12-04T11:45:24.2564040Z [W1204 11:08:52.725978981 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2564238Z 2025-12-04T11:45:24.2564407Z [W1204 11:08:52.726032891 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2564603Z 2025-12-04T11:45:24.2564759Z [W1204 11:08:52.726139379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2564949Z 2025-12-04T11:45:24.2565129Z [W1204 11:08:52.726185738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2565320Z 2025-12-04T11:45:24.2565473Z [W1204 11:08:52.726278337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2565662Z 2025-12-04T11:45:24.2565819Z [W1204 11:08:52.726323737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2566010Z 2025-12-04T11:45:24.2566173Z [W1204 11:08:52.726413515 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2566397Z 2025-12-04T11:45:24.2566555Z [W1204 11:08:52.726458885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2566744Z 2025-12-04T11:45:24.2566803Z ('RERUN', {'yellow': True}) [3.1581s] [100%] 2025-12-04T11:45:24.2567154Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:08:53.584978632 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2567460Z 2025-12-04T11:45:24.2567613Z [W1204 11:08:53.585191680 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2567810Z 2025-12-04T11:45:24.2567964Z [W1204 11:08:53.585261579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2568157Z 2025-12-04T11:45:24.2568306Z [W1204 11:08:53.585519765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2568503Z 2025-12-04T11:45:24.2568660Z [W1204 11:08:53.585572984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2568884Z 2025-12-04T11:45:24.2569035Z [W1204 11:08:53.585697573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2569230Z 2025-12-04T11:45:24.2569383Z [W1204 11:08:53.585770652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2569575Z 2025-12-04T11:45:24.2569758Z [W1204 11:08:53.585815421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2569956Z 2025-12-04T11:45:24.2570111Z [W1204 11:08:53.585946709 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2570306Z 2025-12-04T11:45:24.2570463Z [W1204 11:08:53.585993869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2570653Z 2025-12-04T11:45:24.2570810Z [W1204 11:08:53.586143886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2570997Z 2025-12-04T11:45:24.2571153Z [W1204 11:08:53.586193736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2571361Z 2025-12-04T11:45:24.2571516Z [W1204 11:08:53.586297134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2571720Z 2025-12-04T11:45:24.2571876Z [W1204 11:08:53.586342614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2572074Z 2025-12-04T11:45:24.2572246Z [W1204 11:08:53.586434192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2572443Z 2025-12-04T11:45:24.2572592Z [W1204 11:08:53.586478672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2572789Z 2025-12-04T11:45:24.2572942Z [W1204 11:08:53.586568921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2573137Z 2025-12-04T11:45:24.2573350Z [W1204 11:08:53.586613000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2573543Z 2025-12-04T11:45:24.2573698Z [W1204 11:08:54.285597343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2573922Z 2025-12-04T11:45:24.2574075Z [W1204 11:08:54.285784410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2574268Z 2025-12-04T11:45:24.2574423Z [W1204 11:08:54.285842779 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2574610Z 2025-12-04T11:45:24.2574784Z [W1204 11:08:54.286100216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2574981Z 2025-12-04T11:45:24.2575145Z [W1204 11:08:54.286158425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2575342Z 2025-12-04T11:45:24.2575496Z [W1204 11:08:54.286287393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2575697Z 2025-12-04T11:45:24.2575855Z [W1204 11:08:54.286362832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2576047Z 2025-12-04T11:45:24.2576199Z [W1204 11:08:54.286408772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2576388Z 2025-12-04T11:45:24.2576543Z [W1204 11:08:54.286547240 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2576736Z 2025-12-04T11:45:24.2576894Z [W1204 11:08:54.286594969 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2577091Z 2025-12-04T11:45:24.2577247Z [W1204 11:08:54.286742377 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2577448Z 2025-12-04T11:45:24.2577604Z [W1204 11:08:54.286789246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2577793Z 2025-12-04T11:45:24.2577946Z [W1204 11:08:54.286894845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2578141Z 2025-12-04T11:45:24.2578297Z [W1204 11:08:54.286946814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2578491Z 2025-12-04T11:45:24.2578644Z [W1204 11:08:54.287058823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2578873Z 2025-12-04T11:45:24.2579053Z [W1204 11:08:54.287106572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2579241Z 2025-12-04T11:45:24.2579399Z [W1204 11:08:54.287197951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2579591Z 2025-12-04T11:45:24.2579761Z [W1204 11:08:54.287242370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2579952Z 2025-12-04T11:45:24.2580007Z ('RERUN', {'yellow': True}) [1.5274s] [100%] 2025-12-04T11:45:24.2580366Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:08:54.084832735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2630063Z 2025-12-04T11:45:24.2630399Z [W1204 11:08:54.085045162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2630981Z 2025-12-04T11:45:24.2631347Z [W1204 11:08:54.085109662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2631658Z 2025-12-04T11:45:24.2631856Z [W1204 11:08:54.085367478 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2632161Z 2025-12-04T11:45:24.2632380Z [W1204 11:08:54.085424657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2632628Z 2025-12-04T11:45:24.2632835Z [W1204 11:08:54.085559695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2633119Z 2025-12-04T11:45:24.2633391Z [W1204 11:08:54.085637984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2633748Z 2025-12-04T11:45:24.2634011Z [W1204 11:08:54.085684254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2634244Z 2025-12-04T11:45:24.2634454Z [W1204 11:08:54.085824682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2634697Z 2025-12-04T11:45:24.2634962Z [W1204 11:08:54.085873551 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2635203Z 2025-12-04T11:45:24.2635478Z [W1204 11:08:54.086028349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2635742Z 2025-12-04T11:45:24.2636021Z [W1204 11:08:54.086080328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2636229Z 2025-12-04T11:45:24.2636496Z [W1204 11:08:54.086191257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2636770Z 2025-12-04T11:45:24.2636955Z [W1204 11:08:54.086238336 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2637179Z 2025-12-04T11:45:24.2637349Z [W1204 11:08:54.086336935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2637567Z 2025-12-04T11:45:24.2637725Z [W1204 11:08:54.086382704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2637978Z 2025-12-04T11:45:24.2638169Z [W1204 11:08:54.086476003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2638498Z 2025-12-04T11:45:24.2638687Z [W1204 11:08:54.086521172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2638966Z 2025-12-04T11:45:24.2639179Z [W1204 11:08:55.799341474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2639455Z 2025-12-04T11:45:24.2639742Z [W1204 11:08:55.799519362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2639983Z 2025-12-04T11:45:24.2640253Z [W1204 11:08:55.799575971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2640481Z 2025-12-04T11:45:24.2640714Z [W1204 11:08:55.799831057 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2640943Z 2025-12-04T11:45:24.2641217Z [W1204 11:08:55.799885317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2641456Z 2025-12-04T11:45:24.2641712Z [W1204 11:08:55.800023165 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2641989Z 2025-12-04T11:45:24.2642156Z [W1204 11:08:55.800102544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2642401Z 2025-12-04T11:45:24.2642561Z [W1204 11:08:55.800148513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2642817Z 2025-12-04T11:45:24.2643009Z [W1204 11:08:55.800288881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2643363Z 2025-12-04T11:45:24.2643568Z [W1204 11:08:55.800338500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2643886Z 2025-12-04T11:45:24.2644088Z [W1204 11:08:55.800483968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2644371Z 2025-12-04T11:45:24.2644648Z [W1204 11:08:55.800532578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2644915Z 2025-12-04T11:45:24.2645206Z [W1204 11:08:55.800638116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2645492Z 2025-12-04T11:45:24.2645725Z [W1204 11:08:55.800685215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2645990Z 2025-12-04T11:45:24.2646277Z [W1204 11:08:55.800778004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2646537Z 2025-12-04T11:45:24.2646805Z [W1204 11:08:55.800823253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2647059Z 2025-12-04T11:45:24.2647298Z [W1204 11:08:55.800913192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2647569Z 2025-12-04T11:45:24.2647760Z [W1204 11:08:55.800957872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2648043Z 2025-12-04T11:45:24.2648139Z FAILED [1.5009s] [100%] 2025-12-04T11:45:24.2648303Z 2025-12-04T11:45:24.2648454Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.2648758Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2649023Z Traceback (most recent call last): 2025-12-04T11:45:24.2649405Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2649793Z method(*args, **kwargs) 2025-12-04T11:45:24.2650134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2650562Z method(*args, **kwargs) 2025-12-04T11:45:24.2650935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2651303Z with policy(): 2025-12-04T11:45:24.2651686Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2652057Z raise RuntimeError(msg) 2025-12-04T11:45:24.2652629Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 807403520 and is now 975175680. 2025-12-04T11:45:24.2653128Z 2025-12-04T11:45:24.2653238Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2653711Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2654023Z 2025-12-04T11:45:24.2654204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2654545Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2654828Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2655106Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2655849Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2656579Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2656946Z graph_break [] 2025-12-04T11:45:24.2657302Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2657707Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2658056Z Traceback (most recent call last): 2025-12-04T11:45:24.2658466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2658829Z method(*args, **kwargs) 2025-12-04T11:45:24.2659202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2659590Z method(*args, **kwargs) 2025-12-04T11:45:24.2659913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2660274Z with policy(): 2025-12-04T11:45:24.2660567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2661068Z raise RuntimeError(msg) 2025-12-04T11:45:24.2661550Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 973078528 and is now 1015021568. 2025-12-04T11:45:24.2662001Z 2025-12-04T11:45:24.2662135Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2662519Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2662791Z 2025-12-04T11:45:24.2662958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2663347Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2663567Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2663860Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2664473Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2665150Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2665404Z graph_break [] 2025-12-04T11:45:24.2665652Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2665979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2666215Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2666437Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2666733Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2667375Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 17), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2668047Z graph_break [] 2025-12-04T11:45:24.2668323Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2668567Z =================================== FAILURES =================================== 2025-12-04T11:45:24.2668863Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2669112Z Traceback (most recent call last): 2025-12-04T11:45:24.2669418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2669740Z method(*args, **kwargs) 2025-12-04T11:45:24.2670003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2670346Z method(*args, **kwargs) 2025-12-04T11:45:24.2670630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2671029Z with policy(): 2025-12-04T11:45:24.2671333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2671804Z raise RuntimeError(msg) 2025-12-04T11:45:24.2672277Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1012924416 and is now 1046478848. 2025-12-04T11:45:24.2672664Z 2025-12-04T11:45:24.2672795Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2673179Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2673496Z 2025-12-04T11:45:24.2673633Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2673914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2674128Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2674375Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2674990Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2675653Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2675947Z graph_break [] 2025-12-04T11:45:24.2676188Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2676518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2676811Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2677016Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2677391Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2678142Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 17), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2678763Z graph_break [] 2025-12-04T11:45:24.2679005Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2679434Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2679666Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2679900Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2680225Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2680917Z inductor [('triton_bundler_save_kernel', 112), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark', 11), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2681466Z graph_break [] 2025-12-04T11:45:24.2681897Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2682322Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-788c969b172c46f1.xml - 2025-12-04T11:45:24.2682726Z =========================== short test summary info ============================ 2025-12-04T11:45:24.2683456Z FAILED [1.5009s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1012924416 and is now 1046478848. 2025-12-04T11:45:24.2683995Z 2025-12-04T11:45:24.2684096Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2684606Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2684851Z 2025-12-04T11:45:24.2685014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2685283Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.2685632Z ================== 1 failed, 187 deselected, 2 rerun in 6.21s ================== 2025-12-04T11:45:24.2685911Z Got exit code 1 2025-12-04T11:45:24.2686051Z Retrying single test... 2025-12-04T11:45:24.2686334Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0f7995b2e77e2bbb.xml 2025-12-04T11:45:24.2686626Z ============================= test session starts ============================== 2025-12-04T11:45:24.2686995Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.2687255Z cachedir: .pytest_cache 2025-12-04T11:45:24.2687690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.2688049Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.2688250Z configfile: pytest.ini 2025-12-04T11:45:24.2688567Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.2688906Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.2689269Z stepcurrent: skipping 22 already run items. Running only test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2689584Z Running 1 items in this shard 2025-12-04T11:45:24.2689723Z 2025-12-04T11:45:24.2690053Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:09:03.118704643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2690470Z 2025-12-04T11:45:24.2690664Z [W1204 11:09:04.343970442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2690936Z 2025-12-04T11:45:24.2691144Z [W1204 11:09:04.344130170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2691360Z 2025-12-04T11:45:24.2691723Z [W1204 11:09:04.346146542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2691924Z 2025-12-04T11:45:24.2692178Z [W1204 11:09:04.346216401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2692400Z 2025-12-04T11:45:24.2692605Z [W1204 11:09:04.346668365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2692839Z 2025-12-04T11:45:24.2693068Z [W1204 11:09:04.346780074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2693348Z 2025-12-04T11:45:24.2693579Z [W1204 11:09:04.346829413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2693813Z 2025-12-04T11:45:24.2693975Z [W1204 11:09:04.347013550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2694250Z 2025-12-04T11:45:24.2694441Z [W1204 11:09:04.347067160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2694712Z 2025-12-04T11:45:24.2694886Z [W1204 11:09:04.347338526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2695118Z 2025-12-04T11:45:24.2695303Z [W1204 11:09:04.347387995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2695501Z 2025-12-04T11:45:24.2695716Z [W1204 11:09:04.347551803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2695950Z 2025-12-04T11:45:24.2696315Z [W1204 11:09:04.347605142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2696553Z 2025-12-04T11:45:24.2696811Z [W1204 11:09:04.347717620 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2697024Z 2025-12-04T11:45:24.2697231Z [W1204 11:09:04.347771320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2697435Z 2025-12-04T11:45:24.2697651Z [W1204 11:09:04.347891508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2697891Z 2025-12-04T11:45:24.2698115Z [W1204 11:09:04.347939107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2698349Z 2025-12-04T11:45:24.2698558Z [W1204 11:09:05.851342464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2698781Z 2025-12-04T11:45:24.2698964Z [W1204 11:09:05.851547851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2699224Z 2025-12-04T11:45:24.2699390Z [W1204 11:09:05.851606880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2699623Z 2025-12-04T11:45:24.2699815Z [W1204 11:09:05.851860707 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2700078Z 2025-12-04T11:45:24.2700363Z [W1204 11:09:05.851916886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2700572Z 2025-12-04T11:45:24.2700816Z [W1204 11:09:05.852054004 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2701047Z 2025-12-04T11:45:24.2701233Z [W1204 11:09:05.852135063 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2701457Z 2025-12-04T11:45:24.2701629Z [W1204 11:09:05.852181412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2701827Z 2025-12-04T11:45:24.2702082Z [W1204 11:09:05.852318600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2702319Z 2025-12-04T11:45:24.2702561Z [W1204 11:09:05.852368160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2702817Z 2025-12-04T11:45:24.2703004Z [W1204 11:09:05.852516958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2703225Z 2025-12-04T11:45:24.2703423Z [W1204 11:09:05.852565257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2703709Z 2025-12-04T11:45:24.2703878Z [W1204 11:09:05.852670346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2704171Z 2025-12-04T11:45:24.2704427Z [W1204 11:09:05.852716645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2704644Z 2025-12-04T11:45:24.2704841Z [W1204 11:09:05.852810294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2705196Z 2025-12-04T11:45:24.2705387Z [W1204 11:09:05.852856453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2705658Z 2025-12-04T11:45:24.2705857Z [W1204 11:09:05.852951272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2706070Z 2025-12-04T11:45:24.2706242Z [W1204 11:09:05.852997301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2706526Z 2025-12-04T11:45:24.2706630Z ('RERUN', {'yellow': True}) [3.0674s] [100%] 2025-12-04T11:45:24.2707072Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:09:06.624352423 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2707464Z 2025-12-04T11:45:24.2707638Z [W1204 11:09:06.624548391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2707904Z 2025-12-04T11:45:24.2708102Z [W1204 11:09:06.624610510 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2708317Z 2025-12-04T11:45:24.2708550Z [W1204 11:09:06.624864196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2708763Z 2025-12-04T11:45:24.2708934Z [W1204 11:09:06.624921436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2709199Z 2025-12-04T11:45:24.2709542Z [W1204 11:09:06.625057184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2709773Z 2025-12-04T11:45:24.2710003Z [W1204 11:09:06.625139373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2710223Z 2025-12-04T11:45:24.2710381Z [W1204 11:09:06.625185042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2710635Z 2025-12-04T11:45:24.2710841Z [W1204 11:09:06.625322840 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2711064Z 2025-12-04T11:45:24.2711244Z [W1204 11:09:06.625370289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2711467Z 2025-12-04T11:45:24.2711683Z [W1204 11:09:06.625513957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2712008Z 2025-12-04T11:45:24.2712181Z [W1204 11:09:06.625561417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2712391Z 2025-12-04T11:45:24.2712548Z [W1204 11:09:06.625664825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2712833Z 2025-12-04T11:45:24.2713031Z [W1204 11:09:06.625709735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2713231Z 2025-12-04T11:45:24.2713445Z [W1204 11:09:06.625803323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2713666Z 2025-12-04T11:45:24.2713863Z [W1204 11:09:06.625848253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2714100Z 2025-12-04T11:45:24.2714397Z [W1204 11:09:06.625941092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2714641Z 2025-12-04T11:45:24.2714874Z [W1204 11:09:06.625985861 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2715130Z 2025-12-04T11:45:24.2715312Z [W1204 11:09:07.323157744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2715552Z 2025-12-04T11:45:24.2715730Z [W1204 11:09:07.323336282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2716002Z 2025-12-04T11:45:24.2716203Z [W1204 11:09:07.323394911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2716472Z 2025-12-04T11:45:24.2716646Z [W1204 11:09:07.323633758 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2716905Z 2025-12-04T11:45:24.2717066Z [W1204 11:09:07.323686887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2717284Z 2025-12-04T11:45:24.2717458Z [W1204 11:09:07.323809845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2717744Z 2025-12-04T11:45:24.2717945Z [W1204 11:09:07.323881864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2718178Z 2025-12-04T11:45:24.2718416Z [W1204 11:09:07.323926564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2718637Z 2025-12-04T11:45:24.2718943Z [W1204 11:09:07.324066942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2719177Z 2025-12-04T11:45:24.2719367Z [W1204 11:09:07.324117801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2719605Z 2025-12-04T11:45:24.2719822Z [W1204 11:09:07.324260159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2720102Z 2025-12-04T11:45:24.2720327Z [W1204 11:09:07.324308138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2720566Z 2025-12-04T11:45:24.2720765Z [W1204 11:09:07.324410727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2721029Z 2025-12-04T11:45:24.2721197Z [W1204 11:09:07.324456726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2721407Z 2025-12-04T11:45:24.2721599Z [W1204 11:09:07.324547705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2721856Z 2025-12-04T11:45:24.2722024Z [W1204 11:09:07.324592784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2722266Z 2025-12-04T11:45:24.2722502Z [W1204 11:09:07.324686483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2722732Z 2025-12-04T11:45:24.2722941Z [W1204 11:09:07.324730973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2723164Z 2025-12-04T11:45:24.2723238Z ('RERUN', {'yellow': True}) [1.4613s] [100%] 2025-12-04T11:45:24.2723965Z inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 [W1204 11:09:07.124778849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2724294Z 2025-12-04T11:45:24.2724495Z [W1204 11:09:07.124998416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2724736Z 2025-12-04T11:45:24.2724930Z [W1204 11:09:07.125062946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2725178Z 2025-12-04T11:45:24.2725382Z [W1204 11:09:07.125322732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2725671Z 2025-12-04T11:45:24.2725847Z [W1204 11:09:07.125376531 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2726056Z 2025-12-04T11:45:24.2726213Z [W1204 11:09:07.125509369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2726504Z 2025-12-04T11:45:24.2726680Z [W1204 11:09:07.125585608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2726986Z 2025-12-04T11:45:24.2727159Z [W1204 11:09:07.125630188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2727471Z 2025-12-04T11:45:24.2727671Z [W1204 11:09:07.125765936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2727900Z 2025-12-04T11:45:24.2728086Z [W1204 11:09:07.125812425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2728326Z 2025-12-04T11:45:24.2728525Z [W1204 11:09:07.125954613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2728739Z 2025-12-04T11:45:24.2728913Z [W1204 11:09:07.126005923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2729135Z 2025-12-04T11:45:24.2729408Z [W1204 11:09:07.126131821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2729652Z 2025-12-04T11:45:24.2729900Z [W1204 11:09:07.126178030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2730162Z 2025-12-04T11:45:24.2730323Z [W1204 11:09:07.126271739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2730618Z 2025-12-04T11:45:24.2730807Z [W1204 11:09:07.126315778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2731026Z 2025-12-04T11:45:24.2731193Z [W1204 11:09:07.126410577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2731498Z 2025-12-04T11:45:24.2731699Z [W1204 11:09:07.126454436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2731950Z 2025-12-04T11:45:24.2732115Z [W1204 11:09:08.828683309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2732508Z 2025-12-04T11:45:24.2732677Z [W1204 11:09:08.828861497 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2732922Z 2025-12-04T11:45:24.2733106Z [W1204 11:09:08.828916776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2733392Z 2025-12-04T11:45:24.2733603Z [W1204 11:09:08.829168403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2733822Z 2025-12-04T11:45:24.2734023Z [W1204 11:09:08.829224042 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2734221Z 2025-12-04T11:45:24.2734389Z [W1204 11:09:08.829348760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2734615Z 2025-12-04T11:45:24.2734805Z [W1204 11:09:08.829422149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2735054Z 2025-12-04T11:45:24.2735244Z [W1204 11:09:08.829466939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2735554Z 2025-12-04T11:45:24.2735745Z [W1204 11:09:08.829596697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2736235Z 2025-12-04T11:45:24.2916374Z [W1204 11:09:08.829644126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2916727Z 2025-12-04T11:45:24.2916894Z [W1204 11:09:08.829782264 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2917246Z 2025-12-04T11:45:24.2917410Z [W1204 11:09:08.829828754 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2917608Z 2025-12-04T11:45:24.2917762Z [W1204 11:09:08.829931072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2917971Z 2025-12-04T11:45:24.2918128Z [W1204 11:09:08.829976442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2918346Z 2025-12-04T11:45:24.2918499Z [W1204 11:09:08.830072190 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2918690Z 2025-12-04T11:45:24.2918847Z [W1204 11:09:08.830118940 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2919038Z 2025-12-04T11:45:24.2919200Z [W1204 11:09:08.830211118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2919390Z 2025-12-04T11:45:24.2919547Z [W1204 11:09:08.830255368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.2919736Z 2025-12-04T11:45:24.2919785Z FAILED [1.5345s] [100%] 2025-12-04T11:45:24.2919854Z 2025-12-04T11:45:24.2919928Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.2920125Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2920313Z Traceback (most recent call last): 2025-12-04T11:45:24.2920571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2920827Z method(*args, **kwargs) 2025-12-04T11:45:24.2921078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2921322Z method(*args, **kwargs) 2025-12-04T11:45:24.2921547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2921782Z with policy(): 2025-12-04T11:45:24.2922002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2922342Z raise RuntimeError(msg) 2025-12-04T11:45:24.2922784Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 807403520 and is now 975175680. 2025-12-04T11:45:24.2923157Z 2025-12-04T11:45:24.2923246Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2923670Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2924098Z 2025-12-04T11:45:24.2924190Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2924393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2924552Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2924685Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2925239Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2925860Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2926281Z graph_break [] 2025-12-04T11:45:24.2926499Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2926740Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2926929Z Traceback (most recent call last): 2025-12-04T11:45:24.2927173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2927416Z method(*args, **kwargs) 2025-12-04T11:45:24.2927658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2927913Z method(*args, **kwargs) 2025-12-04T11:45:24.2928140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2928392Z with policy(): 2025-12-04T11:45:24.2928611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2928851Z raise RuntimeError(msg) 2025-12-04T11:45:24.2929262Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 973078528 and is now 1019215872. 2025-12-04T11:45:24.2929629Z 2025-12-04T11:45:24.2929732Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2930062Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2930297Z 2025-12-04T11:45:24.2930391Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2930595Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2930754Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2930890Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2931425Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2932037Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2932262Z graph_break [] 2025-12-04T11:45:24.2932451Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2932681Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2932847Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2932986Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2933244Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2933893Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark', 19), ('benchmarking.InductorBenchmarker.benchmark_gpu', 19), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2934431Z graph_break [] 2025-12-04T11:45:24.2934609Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2934841Z =================================== FAILURES =================================== 2025-12-04T11:45:24.2935061Z __________ TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 ___________ 2025-12-04T11:45:24.2935243Z Traceback (most recent call last): 2025-12-04T11:45:24.2935488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2935730Z method(*args, **kwargs) 2025-12-04T11:45:24.2935959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2936215Z method(*args, **kwargs) 2025-12-04T11:45:24.2936447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2936684Z with policy(): 2025-12-04T11:45:24.2936953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2937203Z raise RuntimeError(msg) 2025-12-04T11:45:24.2937609Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1017118720 and is now 1063256064. 2025-12-04T11:45:24.2937973Z 2025-12-04T11:45:24.2938048Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2938349Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2938588Z 2025-12-04T11:45:24.2938679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2938882Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2939047Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2939184Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2939723Z inductor [('triton_bundler_save_kernel', 144), ('benchmarking.InductorBenchmarker.benchmark', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 16), ('async_compile_cache_miss', 9), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2940303Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2940477Z graph_break [] 2025-12-04T11:45:24.2940675Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2940920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2941079Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2941239Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2941450Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2942067Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark', 19), ('benchmarking.InductorBenchmarker.benchmark_gpu', 19), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2942579Z graph_break [] 2025-12-04T11:45:24.2942732Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2942949Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2943106Z frames [('total', 2), ('ok', 2)] 2025-12-04T11:45:24.2943315Z stats [('calls_captured', 22), ('unique_graphs', 2)] 2025-12-04T11:45:24.2943524Z aot_autograd [('total', 2), ('autograd_cache_miss', 2), ('autograd_cache_saved', 2), ('ok', 2)] 2025-12-04T11:45:24.2944105Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark', 19), ('benchmarking.InductorBenchmarker.benchmark_gpu', 19), ('async_compile_cache_miss', 12), ('async_compile_cache_hit', 6), ('pattern_matcher_count', 4), ('pattern_matcher_nodes', 4), ('extern_calls', 4), ('fxgraph_cache_miss', 2), ('triton_bundler_save_static_autotuner', 2)] 2025-12-04T11:45:24.2944643Z graph_break [] 2025-12-04T11:45:24.2944805Z aten_mm_info [('aten._scaled_mm.default_s77_s0_s77', 1), ('aten._scaled_mm.default_s77_s0_s27', 1)] 2025-12-04T11:45:24.2945151Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0f7995b2e77e2bbb.xml - 2025-12-04T11:45:24.2945445Z =========================== short test summary info ============================ 2025-12-04T11:45:24.2946026Z FAILED [1.5345s] inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16! Caching allocator allocated memory was 0 and is now reported as 4096 on device 0. CUDA driver allocated memory was 1017118720 and is now 1063256064. 2025-12-04T11:45:24.2946524Z 2025-12-04T11:45:24.2946607Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2946913Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8TypesCUDA.test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2947139Z 2025-12-04T11:45:24.2947232Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2947425Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.2947601Z ================== 1 failed, 187 deselected, 2 rerun in 6.08s ================== 2025-12-04T11:45:24.2947750Z Got exit code 1 2025-12-04T11:45:24.2947944Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16 2025-12-04T11:45:24.2948243Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.2948572Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-be96b4a034715c09.xml 2025-12-04T11:45:24.2948817Z ============================= test session starts ============================== 2025-12-04T11:45:24.2949032Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.2949228Z cachedir: .pytest_cache 2025-12-04T11:45:24.2949465Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.2949726Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.2949859Z configfile: pytest.ini 2025-12-04T11:45:24.2950117Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.2950396Z collecting ... collected 188 items / 23 deselected / 165 selected 2025-12-04T11:45:24.2950576Z stepcurrent: skipping 23 already run items. 2025-12-04T11:45:24.2950712Z Running 165 items in this shard 2025-12-04T11:45:24.2950784Z 2025-12-04T11:45:24.2951134Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_False_cuda [W1204 11:09:17.664142481 collection.cpp:1148] Warning: ROCTracer produced duplicate flow start: 1 (function operator()) 2025-12-04T11:45:24.2951526Z PASSED [2.8066s] [ 0%] 2025-12-04T11:45:24.2951776Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e4m3fn_shape_4,2048,4096_keepdim_True_cuda PASSED [2.0000s] [ 1%] 2025-12-04T11:45:24.2952182Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_False_cuda PASSED [1.6927s] [ 1%] 2025-12-04T11:45:24.2952586Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_benchmark_float8_e5m2_shape_4,2048,4096_keepdim_True_cuda PASSED [2.3476s] [ 2%] 2025-12-04T11:45:24.2953001Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.3154s] [ 3%] 2025-12-04T11:45:24.2953443Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.3880s] [ 3%] 2025-12-04T11:45:24.2953836Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.3732s] [ 4%] 2025-12-04T11:45:24.2954222Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.3038s] [ 4%] 2025-12-04T11:45:24.2954613Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.3702s] [ 5%] 2025-12-04T11:45:24.2955002Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.2449s] [ 6%] 2025-12-04T11:45:24.2955386Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.3928s] [ 6%] 2025-12-04T11:45:24.2955771Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.3827s] [ 7%] 2025-12-04T11:45:24.2956154Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.2857s] [ 7%] 2025-12-04T11:45:24.2956542Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e4m3fn_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.3942s] [ 8%] 2025-12-04T11:45:24.2956924Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,1,15_cuda PASSED [0.2452s] [ 9%] 2025-12-04T11:45:24.2957300Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,15_cuda PASSED [0.3726s] [ 9%] 2025-12-04T11:45:24.2957679Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,4096_cuda PASSED [0.3685s] [ 10%] 2025-12-04T11:45:24.2958058Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_1,10,512_cuda PASSED [0.5360s] [ 10%] 2025-12-04T11:45:24.2958440Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_False_shape_4,2048,4096_cuda PASSED [0.2554s] [ 11%] 2025-12-04T11:45:24.2958817Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,1,15_cuda PASSED [0.1332s] [ 12%] 2025-12-04T11:45:24.2959225Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,15_cuda PASSED [0.2666s] [ 12%] 2025-12-04T11:45:24.2959596Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,4096_cuda PASSED [0.2543s] [ 13%] 2025-12-04T11:45:24.2959972Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_1,10,512_cuda PASSED [0.1718s] [ 13%] 2025-12-04T11:45:24.2960384Z inductor/test_fp8.py::TestFP8TypesCUDA::test_layernorm_fp8_quant_float8_e5m2_amax_keep_dim_True_shape_4,2048,4096_cuda PASSED [0.2569s] [ 14%] 2025-12-04T11:45:24.2960756Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.4664s] [ 15%] 2025-12-04T11:45:24.2961125Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.4291s] [ 15%] 2025-12-04T11:45:24.2961499Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_16,16,16_cuda PASSED [0.3563s] [ 16%] 2025-12-04T11:45:24.2961864Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_bfloat16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.3878s] [ 16%] 2025-12-04T11:45:24.2962228Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.3460s] [ 17%] 2025-12-04T11:45:24.2962581Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.3765s] [ 18%] 2025-12-04T11:45:24.2962928Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_16,16,16_cuda PASSED [0.3746s] [ 18%] 2025-12-04T11:45:24.2963343Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float16_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.3770s] [ 19%] 2025-12-04T11:45:24.2963692Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_16,16,16_cuda PASSED [0.3294s] [ 20%] 2025-12-04T11:45:24.2964043Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e4m3fn_shape_4,2048,4096_cuda PASSED [0.3657s] [ 20%] 2025-12-04T11:45:24.2964398Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_16,16,16_cuda PASSED [0.3342s] [ 21%] 2025-12-04T11:45:24.2964743Z inductor/test_fp8.py::TestFP8TypesCUDA::test_to_fp8_saturated_float32_float8_e5m2_shape_4,2048,4096_cuda PASSED [0.3650s] [ 21%] 2025-12-04T11:45:24.2965091Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_15,3,13_dst_types0_cuda_bfloat16 PASSED [0.5043s] [ 22%] 2025-12-04T11:45:24.2965458Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_bfloat16_shape_4,2048,4096_dst_types0_cuda_bfloat16 PASSED [0.3802s] [ 23%] 2025-12-04T11:45:24.2965809Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_15,3,13_dst_types0_cuda_float16 PASSED [0.2300s] [ 23%] 2025-12-04T11:45:24.2966147Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float16_shape_4,2048,4096_dst_types0_cuda_float16 PASSED [0.3309s] [ 24%] 2025-12-04T11:45:24.2966493Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_15,3,13_dst_types0_cuda_float32 PASSED [0.2230s] [ 24%] 2025-12-04T11:45:24.2966829Z inductor/test_fp8.py::TestFP8TypesCUDA::test_valid_cast_float32_shape_4,2048,4096_dst_types0_cuda_float32 PASSED [0.3409s] [ 25%] 2025-12-04T11:45:24.2967161Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e4m3fn_cuda PASSED [0.1105s] [ 26%] 2025-12-04T11:45:24.2967467Z inductor/test_fp8.py::TestFP8TypesCUDA::test_xblock_for_small_numel_float8_e5m2_cuda PASSED [0.1121s] [ 26%] 2025-12-04T11:45:24.2967895Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0002s] (Need device-side TMA support in Triton) [ 27%] 2025-12-04T11:45:24.2968471Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 27%] 2025-12-04T11:45:24.2970424Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 28%] 2025-12-04T11:45:24.2970949Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape0_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 29%] 2025-12-04T11:45:24.2971441Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes0_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 29%] 2025-12-04T11:45:24.2971939Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_False_scaling_block_sizes1_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 30%] 2025-12-04T11:45:24.2972434Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes0_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 30%] 2025-12-04T11:45:24.2972921Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_main_loop_scaling_shape1_use_fast_accum_True_scaling_block_sizes1_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 31%] 2025-12-04T11:45:24.2973414Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fp8_max_autotune_cuda SKIPPED [0.0001s] (Not supported on non B200) [ 32%] 2025-12-04T11:45:24.2973714Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_mx_fusion_cuda PASSED [3.5887s] [ 32%] 2025-12-04T11:45:24.2974074Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6877s] [ 33%] 2025-12-04T11:45:24.2974540Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.1971s] [ 33%] 2025-12-04T11:45:24.2974990Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda FAILED [1.1722s] [ 33%] 2025-12-04T11:45:24.2975219Z 2025-12-04T11:45:24.2975281Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.2975521Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.2975761Z Traceback (most recent call last): 2025-12-04T11:45:24.2976009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2976250Z method(*args, **kwargs) 2025-12-04T11:45:24.2976477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2976714Z method(*args, **kwargs) 2025-12-04T11:45:24.2976938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2977171Z with policy(): 2025-12-04T11:45:24.2977390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2977662Z raise RuntimeError(msg) 2025-12-04T11:45:24.2978150Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 1811939328 and is now 2004877312. 2025-12-04T11:45:24.2978601Z 2025-12-04T11:45:24.2978675Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2979053Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.2979383Z 2025-12-04T11:45:24.2979490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2979692Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2979851Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.2980008Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.2980237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.2980895Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.2981469Z graph_break [] 2025-12-04T11:45:24.2981602Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.2981786Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.2982420Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.2982997Z current_size = base.storage().size() 2025-12-04T11:45:24.2983128Z Autotune Choices Stats: 2025-12-04T11:45:24.2983648Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.2984147Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.2984300Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.2984515Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.2984944Z triton_mm_17 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.2985474Z triton_mm_8 0.0068 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.2985989Z triton_mm_14 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.2986494Z triton_mm_16 0.0069 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.2987000Z triton_mm_18 0.0076 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.2987507Z triton_mm_12 0.0080 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.2988042Z triton_mm_9 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.2988567Z triton_mm_13 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.2989090Z triton_mm_15 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.2989588Z triton_mm_11 0.0088 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.2990007Z SingleProcess AUTOTUNE benchmarking takes 0.1216 seconds and 0.1447 seconds precompiling for 20 choices 2025-12-04T11:45:24.2990344Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.2990583Z Traceback (most recent call last): 2025-12-04T11:45:24.2990850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2991093Z method(*args, **kwargs) 2025-12-04T11:45:24.2991325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.2991562Z method(*args, **kwargs) 2025-12-04T11:45:24.2991786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.2992022Z with policy(): 2025-12-04T11:45:24.2992243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.2992484Z raise RuntimeError(msg) 2025-12-04T11:45:24.2992986Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 2000683008 and is now 2057306112. 2025-12-04T11:45:24.2993722Z 2025-12-04T11:45:24.2993814Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.2994195Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.2994497Z 2025-12-04T11:45:24.2994596Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.2994806Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.2994973Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.2995114Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.2995315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.2995949Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.2996517Z graph_break [] 2025-12-04T11:45:24.2996647Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.2996834Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.2997487Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.2998087Z current_size = base.storage().size() 2025-12-04T11:45:24.2998220Z Autotune Choices Stats: 2025-12-04T11:45:24.2998684Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.2999160Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.2999314Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.2999532Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.2999938Z triton_mm_17 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3000465Z triton_mm_8 0.0068 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3000965Z triton_mm_14 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3001501Z triton_mm_16 0.0069 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3002014Z triton_mm_18 0.0076 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3002525Z triton_mm_12 0.0080 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3003032Z triton_mm_9 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3003594Z triton_mm_13 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3004096Z triton_mm_15 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3004593Z triton_mm_11 0.0088 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3004997Z SingleProcess AUTOTUNE benchmarking takes 0.1216 seconds and 0.1447 seconds precompiling for 20 choices 2025-12-04T11:45:24.3005248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3005434Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3005571Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3005786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3006421Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3006975Z graph_break [] 2025-12-04T11:45:24.3007101Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3007279Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3007437Z Autotune Choices Stats: 2025-12-04T11:45:24.3007874Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3008359Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3008511Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3008723Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3009123Z triton_mm_36 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3009627Z triton_mm_33 0.0068 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3104779Z triton_mm_27 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3105583Z triton_mm_35 0.0070 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3106416Z triton_mm_37 0.0077 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3107236Z triton_mm_31 0.0081 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3107989Z triton_mm_28 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3108735Z triton_mm_34 0.0082 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3109417Z triton_mm_32 0.0086 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3110009Z triton_mm_25 0.0097 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3110558Z SingleProcess AUTOTUNE benchmarking takes 0.1212 seconds and 0.1363 seconds precompiling for 20 choices 2025-12-04T11:45:24.3110838Z =================================== FAILURES =================================== 2025-12-04T11:45:24.3111142Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3111441Z Traceback (most recent call last): 2025-12-04T11:45:24.3111754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3112077Z method(*args, **kwargs) 2025-12-04T11:45:24.3112338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3112621Z method(*args, **kwargs) 2025-12-04T11:45:24.3112898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3113182Z with policy(): 2025-12-04T11:45:24.3113494Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3113840Z raise RuntimeError(msg) 2025-12-04T11:45:24.3114421Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 2053111808 and is now 2109734912. 2025-12-04T11:45:24.3114921Z 2025-12-04T11:45:24.3115034Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3115473Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3115800Z 2025-12-04T11:45:24.3115919Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3116187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3116404Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3116583Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3116868Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3117583Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3118238Z graph_break [] 2025-12-04T11:45:24.3118424Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3118653Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3119326Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3119952Z current_size = base.storage().size() 2025-12-04T11:45:24.3120115Z Autotune Choices Stats: 2025-12-04T11:45:24.3120634Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.3121183Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3121385Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3121640Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3122099Z triton_mm_17 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3122682Z triton_mm_8 0.0068 ms 89.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3123237Z triton_mm_14 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3123865Z triton_mm_16 0.0069 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3124451Z triton_mm_18 0.0076 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3124990Z triton_mm_12 0.0080 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3125575Z triton_mm_9 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3126140Z triton_mm_13 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3126693Z triton_mm_15 0.0083 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3127256Z triton_mm_11 0.0088 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3127697Z SingleProcess AUTOTUNE benchmarking takes 0.1216 seconds and 0.1447 seconds precompiling for 20 choices 2025-12-04T11:45:24.3128031Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3128323Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3159616Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3159934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3161007Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3161755Z graph_break [] 2025-12-04T11:45:24.3162029Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3162419Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3162770Z Autotune Choices Stats: 2025-12-04T11:45:24.3163410Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3164051Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3164247Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3164519Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3164956Z triton_mm_36 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3165501Z triton_mm_33 0.0068 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3166057Z triton_mm_27 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3166580Z triton_mm_35 0.0070 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3167129Z triton_mm_37 0.0077 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3167658Z triton_mm_31 0.0081 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3168183Z triton_mm_28 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3168758Z triton_mm_34 0.0082 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3169295Z triton_mm_32 0.0086 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3169847Z triton_mm_25 0.0097 ms 65.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3170282Z SingleProcess AUTOTUNE benchmarking takes 0.1212 seconds and 0.1363 seconds precompiling for 20 choices 2025-12-04T11:45:24.3170550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3170771Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3170930Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3171371Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3172056Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3172666Z graph_break [] 2025-12-04T11:45:24.3172819Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3173024Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3173221Z Autotune Choices Stats: 2025-12-04T11:45:24.3173796Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-12-04T11:45:24.3174355Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3174550Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3174777Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3175195Z triton_mm_55 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3175733Z triton_mm_54 0.0066 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3176267Z triton_mm_52 0.0067 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3176819Z triton_mm_46 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3177170Z _scaled_mm 0.0068 ms 93.6% 2025-12-04T11:45:24.3177491Z triton_mm_56 0.0074 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3178025Z triton_mm_50 0.0080 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3178547Z triton_mm_47 0.0080 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3179061Z triton_mm_53 0.0083 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3179567Z triton_mm_49 0.0084 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3179984Z SingleProcess AUTOTUNE benchmarking takes 0.1219 seconds and 0.1460 seconds precompiling for 20 choices 2025-12-04T11:45:24.3180361Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-be96b4a034715c09.xml - 2025-12-04T11:45:24.3180664Z =========================== short test summary info ============================ 2025-12-04T11:45:24.3202050Z FAILED [1.1722s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 276824064 and is now reported as 276825088 on device 0. CUDA driver allocated memory was 2053111808 and is now 2109734912. 2025-12-04T11:45:24.3202721Z 2025-12-04T11:45:24.3202856Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3203318Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3203635Z 2025-12-04T11:45:24.3203733Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3203983Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.3204196Z ======= 1 failed, 45 passed, 9 skipped, 23 deselected, 2 rerun in 29.62s ======= 2025-12-04T11:45:24.3204392Z Got exit code 1 2025-12-04T11:45:24.3204821Z Retrying single test... 2025-12-04T11:45:24.3205062Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-828742ca81f15891.xml 2025-12-04T11:45:24.3205375Z ============================= test session starts ============================== 2025-12-04T11:45:24.3205614Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.3205825Z cachedir: .pytest_cache 2025-12-04T11:45:24.3206101Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.3206375Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.3206538Z configfile: pytest.ini 2025-12-04T11:45:24.3206799Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.3207109Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.3207521Z stepcurrent: skipping 77 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3207883Z Running 1 items in this shard 2025-12-04T11:45:24.3207982Z 2025-12-04T11:45:24.3208204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4583s] [100%] 2025-12-04T11:45:24.3208712Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9987s] [100%] 2025-12-04T11:45:24.3209187Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.9407s] [100%] 2025-12-04T11:45:24.3209429Z 2025-12-04T11:45:24.3209506Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.3209776Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3210042Z Traceback (most recent call last): 2025-12-04T11:45:24.3210333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3210636Z method(*args, **kwargs) 2025-12-04T11:45:24.3211060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3211945Z method(*args, **kwargs) 2025-12-04T11:45:24.3213136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3213582Z with policy(): 2025-12-04T11:45:24.3213933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3214260Z raise RuntimeError(msg) 2025-12-04T11:45:24.3214757Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.3215240Z 2025-12-04T11:45:24.3215340Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3215750Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3216072Z 2025-12-04T11:45:24.3216180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3216432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3216624Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3216793Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3217439Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3218089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3218308Z graph_break [] 2025-12-04T11:45:24.3218466Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3218667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3219331Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3219928Z current_size = base.storage().size() 2025-12-04T11:45:24.3220093Z Autotune Choices Stats: 2025-12-04T11:45:24.3220572Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3221070Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3221265Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3221503Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3221926Z triton_mm_17 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3222478Z triton_mm_8 0.0068 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3223000Z triton_mm_16 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3223638Z triton_mm_14 0.0070 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3224171Z triton_mm_18 0.0076 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3224719Z triton_mm_12 0.0076 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3225262Z triton_mm_11 0.0081 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3225788Z triton_mm_9 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3226326Z triton_mm_15 0.0082 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3226866Z triton_mm_13 0.0084 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3227297Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3852 seconds precompiling for 20 choices 2025-12-04T11:45:24.3227671Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3227965Z Traceback (most recent call last): 2025-12-04T11:45:24.3228233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3228526Z method(*args, **kwargs) 2025-12-04T11:45:24.3228790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3229316Z method(*args, **kwargs) 2025-12-04T11:45:24.3229579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3229833Z with policy(): 2025-12-04T11:45:24.3230097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3230366Z raise RuntimeError(msg) 2025-12-04T11:45:24.3230854Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.3231320Z 2025-12-04T11:45:24.3231412Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3231814Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3232127Z 2025-12-04T11:45:24.3232240Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3253065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3253335Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3253514Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3254195Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3254905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3255136Z graph_break [] 2025-12-04T11:45:24.3255312Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3255522Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3256151Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3256774Z current_size = base.storage().size() 2025-12-04T11:45:24.3256927Z Autotune Choices Stats: 2025-12-04T11:45:24.3257846Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3258355Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3258532Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3258788Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3259216Z triton_mm_17 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3259739Z triton_mm_8 0.0068 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3260316Z triton_mm_16 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3260952Z triton_mm_14 0.0070 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3261588Z triton_mm_18 0.0076 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3262272Z triton_mm_12 0.0076 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3262896Z triton_mm_11 0.0081 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3263578Z triton_mm_9 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3264308Z triton_mm_15 0.0082 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3264938Z triton_mm_13 0.0084 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3265511Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3852 seconds precompiling for 20 choices 2025-12-04T11:45:24.3265915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3266272Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3266533Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3266837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3267609Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3268294Z graph_break [] 2025-12-04T11:45:24.3268572Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3268857Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3269110Z Autotune Choices Stats: 2025-12-04T11:45:24.3269690Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:24.3270328Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3270558Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3270874Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3271449Z triton_mm_36 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3272068Z triton_mm_33 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3272677Z triton_mm_35 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3273317Z triton_mm_27 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3273949Z triton_mm_37 0.0077 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3274634Z triton_mm_31 0.0081 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3275313Z triton_mm_34 0.0082 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3275996Z triton_mm_30 0.0082 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3276656Z triton_mm_32 0.0084 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3277312Z triton_mm_28 0.0092 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3277846Z SingleProcess AUTOTUNE benchmarking takes 0.1224 seconds and 0.2869 seconds precompiling for 20 choices 2025-12-04T11:45:24.3278272Z =================================== FAILURES =================================== 2025-12-04T11:45:24.3278671Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3279018Z Traceback (most recent call last): 2025-12-04T11:45:24.3279399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3279774Z method(*args, **kwargs) 2025-12-04T11:45:24.3280255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3280652Z method(*args, **kwargs) 2025-12-04T11:45:24.3280974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3281310Z with policy(): 2025-12-04T11:45:24.3281650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3282075Z raise RuntimeError(msg) 2025-12-04T11:45:24.3282770Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.3283242Z 2025-12-04T11:45:24.3283442Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3283921Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3284278Z 2025-12-04T11:45:24.3284449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3284708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3284953Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3285106Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3285726Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3286401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3286608Z graph_break [] 2025-12-04T11:45:24.3286767Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3287088Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3287747Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3288363Z current_size = base.storage().size() 2025-12-04T11:45:24.3288514Z Autotune Choices Stats: 2025-12-04T11:45:24.3288991Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3289090Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3289169Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3289300Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3289559Z triton_mm_17 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3289827Z triton_mm_8 0.0068 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3290080Z triton_mm_16 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3290327Z triton_mm_14 0.0070 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3290568Z triton_mm_18 0.0076 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3290814Z triton_mm_12 0.0076 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3291068Z triton_mm_11 0.0081 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3291514Z triton_mm_9 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3291766Z triton_mm_15 0.0082 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3292004Z triton_mm_13 0.0084 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3292154Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3852 seconds precompiling for 20 choices 2025-12-04T11:45:24.3292248Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3292320Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3292404Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3292527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3293058Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3293106Z graph_break [] 2025-12-04T11:45:24.3293203Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3293326Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3293392Z Autotune Choices Stats: 2025-12-04T11:45:24.3293769Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:24.3293855Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3293928Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3294093Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3294344Z triton_mm_36 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3294594Z triton_mm_33 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3294842Z triton_mm_35 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3295077Z triton_mm_27 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3295345Z triton_mm_37 0.0077 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3295584Z triton_mm_31 0.0081 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3295832Z triton_mm_34 0.0082 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3296079Z triton_mm_30 0.0082 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3296308Z triton_mm_32 0.0084 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3296581Z triton_mm_28 0.0092 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3296748Z SingleProcess AUTOTUNE benchmarking takes 0.1224 seconds and 0.2869 seconds precompiling for 20 choices 2025-12-04T11:45:24.3296844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3296899Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3296972Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3468758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3469627Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3469687Z graph_break [] 2025-12-04T11:45:24.3469761Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3469851Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3469895Z Autotune Choices Stats: 2025-12-04T11:45:24.3470322Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.3470394Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3470449Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3470579Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3470823Z triton_mm_55 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3471057Z triton_mm_52 0.0068 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3471290Z triton_mm_46 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3471516Z triton_mm_54 0.0069 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3471750Z triton_mm_56 0.0076 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3471977Z triton_mm_47 0.0080 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3472210Z triton_mm_53 0.0083 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3472434Z triton_mm_51 0.0084 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3472715Z triton_mm_50 0.0088 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3472941Z triton_mm_49 0.0090 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3473092Z SingleProcess AUTOTUNE benchmarking takes 0.1459 seconds and 0.2714 seconds precompiling for 20 choices 2025-12-04T11:45:24.3473365Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-828742ca81f15891.xml - 2025-12-04T11:45:24.3473428Z =========================== short test summary info ============================ 2025-12-04T11:45:24.3474038Z FAILED [0.9407s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.3474060Z 2025-12-04T11:45:24.3474138Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3474404Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3474406Z 2025-12-04T11:45:24.3474498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3474561Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.3474632Z ================== 1 failed, 187 deselected, 2 rerun in 4.42s ================== 2025-12-04T11:45:24.3474670Z Got exit code 1 2025-12-04T11:45:24.3474712Z Retrying single test... 2025-12-04T11:45:24.3474861Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1005e8d3171a5679.xml 2025-12-04T11:45:24.3474923Z ============================= test session starts ============================== 2025-12-04T11:45:24.3475036Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.3475080Z cachedir: .pytest_cache 2025-12-04T11:45:24.3475241Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.3475290Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.3475331Z configfile: pytest.ini 2025-12-04T11:45:24.3475500Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.3475578Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.3475833Z stepcurrent: skipping 77 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3475878Z Running 1 items in this shard 2025-12-04T11:45:24.3475881Z 2025-12-04T11:45:24.3476098Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3405s] [100%] 2025-12-04T11:45:24.3476313Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8891s] [100%] 2025-12-04T11:45:24.3476504Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.9442s] [100%] 2025-12-04T11:45:24.3476520Z 2025-12-04T11:45:24.3476572Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.3476732Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3476780Z Traceback (most recent call last): 2025-12-04T11:45:24.3476944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3476988Z method(*args, **kwargs) 2025-12-04T11:45:24.3477154Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3477196Z method(*args, **kwargs) 2025-12-04T11:45:24.3477346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3477384Z with policy(): 2025-12-04T11:45:24.3477539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3477581Z raise RuntimeError(msg) 2025-12-04T11:45:24.3477975Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.3477991Z 2025-12-04T11:45:24.3478068Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3478331Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3478333Z 2025-12-04T11:45:24.3478424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3478503Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3478552Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3478612Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3479109Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3479211Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3479248Z graph_break [] 2025-12-04T11:45:24.3479316Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3479391Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3479888Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3479940Z current_size = base.storage().size() 2025-12-04T11:45:24.3479985Z Autotune Choices Stats: 2025-12-04T11:45:24.3480363Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:24.3480429Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3480492Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3480616Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3480865Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3481107Z triton_mm_14 0.0069 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3481338Z triton_mm_8 0.0071 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3481564Z triton_mm_16 0.0073 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3481798Z triton_mm_18 0.0081 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3482039Z triton_mm_13 0.0082 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3482266Z triton_mm_11 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3482491Z triton_mm_12 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3482719Z triton_mm_9 0.0084 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3482947Z triton_mm_15 0.0086 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3483079Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3825 seconds precompiling for 20 choices 2025-12-04T11:45:24.3483228Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3483898Z Traceback (most recent call last): 2025-12-04T11:45:24.3484062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3484106Z method(*args, **kwargs) 2025-12-04T11:45:24.3484262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3484303Z method(*args, **kwargs) 2025-12-04T11:45:24.3484458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3484496Z with policy(): 2025-12-04T11:45:24.3484654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3484695Z raise RuntimeError(msg) 2025-12-04T11:45:24.3485092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.3485143Z 2025-12-04T11:45:24.3485221Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3485482Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3485484Z 2025-12-04T11:45:24.3485597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3485673Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3485719Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3485776Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3486267Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3486384Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3486424Z graph_break [] 2025-12-04T11:45:24.3486491Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3486565Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3487055Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3487106Z current_size = base.storage().size() 2025-12-04T11:45:24.3487151Z Autotune Choices Stats: 2025-12-04T11:45:24.3487523Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:24.3487590Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3487640Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3487764Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3488000Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3488230Z triton_mm_14 0.0069 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3488461Z triton_mm_8 0.0071 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3488687Z triton_mm_16 0.0073 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3488920Z triton_mm_18 0.0081 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3489169Z triton_mm_13 0.0082 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3489407Z triton_mm_11 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3489631Z triton_mm_12 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3489864Z triton_mm_9 0.0084 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3490094Z triton_mm_15 0.0086 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3490240Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3825 seconds precompiling for 20 choices 2025-12-04T11:45:24.3490319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3490364Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3490423Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3490523Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3491014Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3491052Z graph_break [] 2025-12-04T11:45:24.3491119Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3491192Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3491234Z Autotune Choices Stats: 2025-12-04T11:45:24.3491596Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006479999981820583, "best_triton_pos": 0} 2025-12-04T11:45:24.3491662Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3491713Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3491835Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3492067Z triton_mm_33 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3492300Z triton_mm_36 0.0068 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3492525Z triton_mm_35 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3492772Z triton_mm_27 0.0071 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3493005Z triton_mm_37 0.0074 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3493241Z triton_mm_31 0.0079 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3493495Z triton_mm_30 0.0085 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3493722Z triton_mm_34 0.0086 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3493946Z triton_mm_32 0.0086 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3494193Z triton_mm_28 0.0088 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3494322Z SingleProcess AUTOTUNE benchmarking takes 0.1248 seconds and 0.2868 seconds precompiling for 20 choices 2025-12-04T11:45:24.3494379Z =================================== FAILURES =================================== 2025-12-04T11:45:24.3494523Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3494576Z Traceback (most recent call last): 2025-12-04T11:45:24.3494735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3494778Z method(*args, **kwargs) 2025-12-04T11:45:24.3494932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3494976Z method(*args, **kwargs) 2025-12-04T11:45:24.3495128Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3495168Z with policy(): 2025-12-04T11:45:24.3495322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3495367Z raise RuntimeError(msg) 2025-12-04T11:45:24.3495766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.3495769Z 2025-12-04T11:45:24.3495843Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3496108Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3496110Z 2025-12-04T11:45:24.3496198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3496275Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3496334Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3496395Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3496892Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3497010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3497047Z graph_break [] 2025-12-04T11:45:24.3497110Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3497184Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3497677Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3497752Z current_size = base.storage().size() 2025-12-04T11:45:24.3497793Z Autotune Choices Stats: 2025-12-04T11:45:24.3498164Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:24.3498227Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3498281Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3498401Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3498639Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3498867Z triton_mm_14 0.0069 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3499095Z triton_mm_8 0.0071 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3499323Z triton_mm_16 0.0073 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3499557Z triton_mm_18 0.0081 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3499783Z triton_mm_13 0.0082 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3500003Z triton_mm_11 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3500226Z triton_mm_12 0.0083 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3500475Z triton_mm_9 0.0084 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3500713Z triton_mm_15 0.0086 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3500844Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3825 seconds precompiling for 20 choices 2025-12-04T11:45:24.3500916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3500962Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3501018Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3501121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3501606Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3501658Z graph_break [] 2025-12-04T11:45:24.3501722Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3501796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3501836Z Autotune Choices Stats: 2025-12-04T11:45:24.3502203Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006479999981820583, "best_triton_pos": 0} 2025-12-04T11:45:24.3502267Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3502318Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3502446Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3502686Z triton_mm_33 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3502923Z triton_mm_36 0.0068 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3503159Z triton_mm_35 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3503426Z triton_mm_27 0.0071 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3503662Z triton_mm_37 0.0074 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3503896Z triton_mm_31 0.0079 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3504155Z triton_mm_30 0.0085 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3504381Z triton_mm_34 0.0086 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3504625Z triton_mm_32 0.0086 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3504861Z triton_mm_28 0.0088 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3504995Z SingleProcess AUTOTUNE benchmarking takes 0.1248 seconds and 0.2868 seconds precompiling for 20 choices 2025-12-04T11:45:24.3505070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3505116Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3505184Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3505288Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3505781Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.3505820Z graph_break [] 2025-12-04T11:45:24.3505892Z aten_mm_info [('aten._scaled_mm.default_1024_16_1024', 1)] 2025-12-04T11:45:24.3505964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3506009Z Autotune Choices Stats: 2025-12-04T11:45:24.3506536Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.3506603Z AUTOTUNE scaled_mm(1024x1024, 1024x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.3506654Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3506781Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3507016Z triton_mm_55 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3507251Z triton_mm_52 0.0069 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3507487Z triton_mm_54 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3507716Z triton_mm_46 0.0071 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3507952Z triton_mm_56 0.0073 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.3508212Z triton_mm_47 0.0079 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3508448Z triton_mm_53 0.0081 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3508673Z triton_mm_51 0.0084 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.3508902Z triton_mm_49 0.0085 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3509135Z triton_mm_50 0.0086 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3509282Z SingleProcess AUTOTUNE benchmarking takes 0.1390 seconds and 0.2732 seconds precompiling for 20 choices 2025-12-04T11:45:24.3509477Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1005e8d3171a5679.xml - 2025-12-04T11:45:24.3509537Z =========================== short test summary info ============================ 2025-12-04T11:45:24.3510140Z FAILED [0.9442s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.3510144Z 2025-12-04T11:45:24.3510218Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3510481Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3510483Z 2025-12-04T11:45:24.3510577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3510639Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.3510720Z ================== 1 failed, 187 deselected, 2 rerun in 4.19s ================== 2025-12-04T11:45:24.3510760Z Got exit code 1 2025-12-04T11:45:24.3510978Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.3511109Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.3511259Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9683e4fcb44a6fa2.xml 2025-12-04T11:45:24.3511319Z ============================= test session starts ============================== 2025-12-04T11:45:24.3511434Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.3511474Z cachedir: .pytest_cache 2025-12-04T11:45:24.3511637Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.3511686Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.3511741Z configfile: pytest.ini 2025-12-04T11:45:24.3511906Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.3512000Z collecting ... collected 188 items / 78 deselected / 110 selected 2025-12-04T11:45:24.3512057Z stepcurrent: skipping 78 already run items. 2025-12-04T11:45:24.3512107Z Running 110 items in this shard 2025-12-04T11:45:24.3512109Z 2025-12-04T11:45:24.3513046Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/uz/cuzq7nx4gx44cgfloyneynqstyfmidhbbrumvx6n53zx4hnm7g6n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3513203Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3513478Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3513641Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3513790Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3514083Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3514223Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3514483Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3514633Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3514898Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3515059Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3515338Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3515477Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3515763Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3515959Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3516289Z E1204 11:10:17.843000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3517069Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/h3/ch35ivh6nzwg4ldmmaauflzxygnmbhlnwlcqzovcmzixouxjadp5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3517219Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3517438Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3517595Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3517743Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3518043Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3518178Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3518439Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3518584Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3518841Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3519003Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3519279Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3519416Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3519702Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3519894Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3520210Z E1204 11:10:17.853000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3521003Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/27/c27cbelxf3hztcd2igdqgeoxsc4hsx5wy6lmieiie7fe2xl7bmeh.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3521162Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3521389Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3521546Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3521690Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3521977Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3522108Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3522378Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3522515Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3522774Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3522930Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3523197Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3523375Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3523649Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3523847Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3524170Z E1204 11:10:17.856000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3524909Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/dc/cdceiiqymbavtugfidsansgslilml2hf2hfzmjtbl4ieixlso6ci.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3525056Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3525295Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3525452Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3525610Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3525895Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3526023Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3526279Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3526413Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3526689Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3526843Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3527110Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3527247Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3527521Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3527715Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3528029Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3528763Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/pg/cpgolbbjtm5k3einq3jw2k3hplh4p7dots34butwhiblxt5muabr.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3528914Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3529127Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3529293Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3529451Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3529761Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3529897Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3530172Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3530315Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3530568Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3530729Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3531009Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3531148Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3531423Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3531621Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3531942Z E1204 11:10:17.857000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3532672Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsv2slys0/qg/cqgobuc7iba2k2x75stljq77bhgirhnfput6br7r4abeo5sqggud.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3532824Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3533038Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3533200Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3533375Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3533661Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3533810Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3534082Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3534225Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3534492Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3534653Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3534921Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3535060Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3535354Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3535549Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3535866Z E1204 11:10:17.859000 690429 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3535920Z ('RERUN', {'yellow': True}) [3.2808s] [ 0%] 2025-12-04T11:45:24.3536245Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:10:19.759000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3536542Z E1204 11:10:19.759000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3536671Z E1204 11:10:19.759000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3536815Z E1204 11:10:19.762000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3537107Z E1204 11:10:19.762000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3537235Z E1204 11:10:19.762000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3537377Z E1204 11:10:19.764000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3537669Z E1204 11:10:19.764000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3537795Z E1204 11:10:19.764000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3537938Z E1204 11:10:19.823000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3538248Z E1204 11:10:19.823000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3538376Z E1204 11:10:19.823000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3538532Z E1204 11:10:19.825000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3538827Z E1204 11:10:19.825000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3538954Z E1204 11:10:19.825000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3539095Z E1204 11:10:19.827000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3539389Z E1204 11:10:19.827000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3539526Z E1204 11:10:19.827000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3539576Z ('RERUN', {'yellow': True}) [1.5288s] [ 0%] 2025-12-04T11:45:24.3539893Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:10:21.104000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3540188Z E1204 11:10:21.104000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3540312Z E1204 11:10:21.104000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3540456Z E1204 11:10:21.106000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3540748Z E1204 11:10:21.106000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3540873Z E1204 11:10:21.106000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3541016Z E1204 11:10:21.108000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3541307Z E1204 11:10:21.108000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3541433Z E1204 11:10:21.108000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3541574Z E1204 11:10:21.150000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3541866Z E1204 11:10:21.150000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3541990Z E1204 11:10:21.150000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3542142Z E1204 11:10:21.152000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3542444Z E1204 11:10:21.152000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3542578Z E1204 11:10:21.152000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3542721Z E1204 11:10:21.154000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3543014Z E1204 11:10:21.154000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.3543145Z E1204 11:10:21.154000 690429 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3543187Z FAILED [1.3460s] [ 0%] 2025-12-04T11:45:24.3543189Z 2025-12-04T11:45:24.3543305Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.3543474Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3543525Z Traceback (most recent call last): 2025-12-04T11:45:24.3543685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3543732Z method(*args, **kwargs) 2025-12-04T11:45:24.3543886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3543929Z method(*args, **kwargs) 2025-12-04T11:45:24.3544083Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3544125Z with policy(): 2025-12-04T11:45:24.3544279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3544325Z raise RuntimeError(msg) 2025-12-04T11:45:24.3544724Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 2051014656. 2025-12-04T11:45:24.3544727Z 2025-12-04T11:45:24.3544804Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3545071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.3545074Z 2025-12-04T11:45:24.3545164Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3545244Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3545289Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3545349Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3545841Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3545943Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3546007Z graph_break [] 2025-12-04T11:45:24.3546076Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3546162Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3546661Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3546712Z current_size = base.storage().size() 2025-12-04T11:45:24.3546752Z Autotune Choices Stats: 2025-12-04T11:45:24.3547230Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01351999957114458, "best_triton_pos": 1, "best_triton_time": 0.01583999954164028, "best_triton_kernel": "triton_mm_35", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3547303Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3547356Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3547489Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3547534Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.3547774Z triton_mm_35 0.0158 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3548005Z triton_mm_15 0.0163 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3548237Z triton_mm_13 0.0174 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3548464Z triton_mm_34 0.0176 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3548692Z triton_mm_14 0.0177 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3548917Z triton_mm_33 0.0185 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3549149Z triton_mm_31 0.0187 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3549374Z triton_mm_16 0.0192 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3549601Z triton_mm_32 0.0196 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3549734Z SingleProcess AUTOTUNE benchmarking takes 0.1949 seconds and 1.0401 seconds precompiling for 33 choices 2025-12-04T11:45:24.3549892Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3549939Z Traceback (most recent call last): 2025-12-04T11:45:24.3550105Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3550149Z method(*args, **kwargs) 2025-12-04T11:45:24.3550300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3550341Z method(*args, **kwargs) 2025-12-04T11:45:24.3550500Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3550538Z with policy(): 2025-12-04T11:45:24.3550689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3550730Z raise RuntimeError(msg) 2025-12-04T11:45:24.3551129Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2051014656 and is now 3015704576. 2025-12-04T11:45:24.3551141Z 2025-12-04T11:45:24.3551217Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3551481Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.3551485Z 2025-12-04T11:45:24.3551574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3551650Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3551692Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3551751Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3552241Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3552343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3552379Z graph_break [] 2025-12-04T11:45:24.3552447Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3552519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3553007Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3553056Z current_size = base.storage().size() 2025-12-04T11:45:24.3553099Z Autotune Choices Stats: 2025-12-04T11:45:24.3553623Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01351999957114458, "best_triton_pos": 1, "best_triton_time": 0.01583999954164028, "best_triton_kernel": "triton_mm_35", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3553696Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3553748Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3553886Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3553935Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.3554189Z triton_mm_35 0.0158 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3554435Z triton_mm_15 0.0163 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3554663Z triton_mm_13 0.0174 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3554895Z triton_mm_34 0.0176 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3555126Z triton_mm_14 0.0177 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3555368Z triton_mm_33 0.0185 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3555597Z triton_mm_31 0.0187 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3555823Z triton_mm_16 0.0192 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3556055Z triton_mm_32 0.0196 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3556188Z SingleProcess AUTOTUNE benchmarking takes 0.1949 seconds and 1.0401 seconds precompiling for 33 choices 2025-12-04T11:45:24.3556267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3556309Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3556369Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3556471Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3556927Z inductor [('triton_bundler_save_kernel', 304), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('async_compile_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3556969Z graph_break [] 2025-12-04T11:45:24.3557035Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3557112Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3557153Z Autotune Choices Stats: 2025-12-04T11:45:24.3557622Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "_scaled_mm", "best_time": 0.013559999875724316, "best_triton_pos": 1, "best_triton_time": 0.016039999201893806, "best_triton_kernel": "triton_mm_53", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3557702Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3557768Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3557890Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3557936Z _scaled_mm 0.0136 ms 100.0% 2025-12-04T11:45:24.3558178Z triton_mm_53 0.0160 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3558413Z triton_mm_73 0.0161 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3558641Z triton_mm_52 0.0174 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3558875Z triton_mm_72 0.0174 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3559135Z triton_mm_51 0.0176 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3559361Z triton_mm_71 0.0186 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3559589Z triton_mm_69 0.0187 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3559813Z triton_mm_70 0.0194 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3560041Z triton_mm_54 0.0196 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3560174Z SingleProcess AUTOTUNE benchmarking takes 0.2682 seconds and 0.7990 seconds precompiling for 39 choices 2025-12-04T11:45:24.3560229Z =================================== FAILURES =================================== 2025-12-04T11:45:24.3560378Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.3560426Z Traceback (most recent call last): 2025-12-04T11:45:24.3560584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3560625Z method(*args, **kwargs) 2025-12-04T11:45:24.3560784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.3560823Z method(*args, **kwargs) 2025-12-04T11:45:24.3560978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.3561014Z with policy(): 2025-12-04T11:45:24.3561167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.3561207Z raise RuntimeError(msg) 2025-12-04T11:45:24.3561623Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 3015704576 and is now 3980394496. 2025-12-04T11:45:24.3561627Z 2025-12-04T11:45:24.3561701Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3561972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.3561975Z 2025-12-04T11:45:24.3562063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3562140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3562182Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3562241Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3562727Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3562836Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3562874Z graph_break [] 2025-12-04T11:45:24.3562938Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3563013Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3563536Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.3563585Z current_size = base.storage().size() 2025-12-04T11:45:24.3563625Z Autotune Choices Stats: 2025-12-04T11:45:24.3564092Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01351999957114458, "best_triton_pos": 1, "best_triton_time": 0.01583999954164028, "best_triton_kernel": "triton_mm_35", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3564160Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3564210Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3564330Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3564373Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.3564610Z triton_mm_35 0.0158 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3564839Z triton_mm_15 0.0163 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3565067Z triton_mm_13 0.0174 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3565299Z triton_mm_34 0.0176 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3565552Z triton_mm_14 0.0177 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3565791Z triton_mm_33 0.0185 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3566017Z triton_mm_31 0.0187 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3566240Z triton_mm_16 0.0192 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3566468Z triton_mm_32 0.0196 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3566612Z SingleProcess AUTOTUNE benchmarking takes 0.1949 seconds and 1.0401 seconds precompiling for 33 choices 2025-12-04T11:45:24.3566686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3566730Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3566785Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3566886Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3567342Z inductor [('triton_bundler_save_kernel', 304), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('async_compile_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3567382Z graph_break [] 2025-12-04T11:45:24.3567445Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3567522Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3567562Z Autotune Choices Stats: 2025-12-04T11:45:24.3568028Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "_scaled_mm", "best_time": 0.013559999875724316, "best_triton_pos": 1, "best_triton_time": 0.016039999201893806, "best_triton_kernel": "triton_mm_53", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3568098Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3568148Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3568271Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3568313Z _scaled_mm 0.0136 ms 100.0% 2025-12-04T11:45:24.3568545Z triton_mm_53 0.0160 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3568778Z triton_mm_73 0.0161 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3569005Z triton_mm_52 0.0174 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3569254Z triton_mm_72 0.0174 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3569491Z triton_mm_51 0.0176 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3569720Z triton_mm_71 0.0186 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3569944Z triton_mm_69 0.0187 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3570170Z triton_mm_70 0.0194 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3570405Z triton_mm_54 0.0196 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3570537Z SingleProcess AUTOTUNE benchmarking takes 0.2682 seconds and 0.7990 seconds precompiling for 39 choices 2025-12-04T11:45:24.3570610Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.3570653Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.3570710Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.3570809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.3571264Z inductor [('triton_bundler_save_kernel', 304), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('async_compile_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.3571301Z graph_break [] 2025-12-04T11:45:24.3571366Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.3571437Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.3571479Z Autotune Choices Stats: 2025-12-04T11:45:24.3571939Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "_scaled_mm", "best_time": 0.014519000425934792, "best_triton_pos": 1, "best_triton_time": 0.01607999950647354, "best_triton_kernel": "triton_mm_91", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.3572009Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.3572058Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.3572182Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.3572223Z _scaled_mm 0.0145 ms 100.0% 2025-12-04T11:45:24.3572452Z triton_mm_91 0.0161 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3572684Z triton_mm_111 0.0162 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3572936Z triton_mm_89 0.0174 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3573180Z triton_mm_90 0.0175 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3573803Z triton_mm_110 0.0175 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3574032Z triton_mm_107 0.0186 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3574259Z triton_mm_109 0.0188 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3574505Z triton_mm_108 0.0192 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.3574731Z triton_mm_92 0.0198 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.3574862Z SingleProcess AUTOTUNE benchmarking takes 0.2693 seconds and 0.6474 seconds precompiling for 39 choices 2025-12-04T11:45:24.3575059Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9683e4fcb44a6fa2.xml - 2025-12-04T11:45:24.3575119Z =========================== short test summary info ============================ 2025-12-04T11:45:24.3575715Z FAILED [1.3460s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 3015704576 and is now 3980394496. 2025-12-04T11:45:24.3575718Z 2025-12-04T11:45:24.3575791Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.3576057Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.3576060Z 2025-12-04T11:45:24.3576151Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.3576218Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.3576290Z ================== 1 failed, 78 deselected, 2 rerun in 6.17s =================== 2025-12-04T11:45:24.3576332Z Got exit code 1 2025-12-04T11:45:24.3576377Z Retrying single test... 2025-12-04T11:45:24.3576522Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7e488c55ed2ff5cb.xml 2025-12-04T11:45:24.3576583Z ============================= test session starts ============================== 2025-12-04T11:45:24.3576697Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.3576756Z cachedir: .pytest_cache 2025-12-04T11:45:24.3576928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.3576978Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.3577019Z configfile: pytest.ini 2025-12-04T11:45:24.3577186Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.3577276Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.3577535Z stepcurrent: skipping 78 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.3577578Z Running 1 items in this shard 2025-12-04T11:45:24.3577580Z 2025-12-04T11:45:24.3577917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:30.953978678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3577921Z 2025-12-04T11:45:24.3578077Z [W1204 11:10:31.315553837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3578095Z 2025-12-04T11:45:24.3578249Z [W1204 11:10:31.323472748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3578251Z 2025-12-04T11:45:24.3578573Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3578874Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3579013Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3579523Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3579901Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3580135Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3580347Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3580559Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3580857Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3581098Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3581420Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3581658Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3581963Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3582197Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3582499Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3582741Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3583038Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3583327Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3583620Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3583855Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3584148Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3584348Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3584580Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3584880Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3585081Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3585310Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3585604Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3585865Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3586157Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3586386Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3586597Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3586802Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3587017Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3587201Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3587381Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3587912Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/27/c27cbelxf3hztcd2igdqgeoxsc4hsx5wy6lmieiie7fe2xl7bmeh.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3588061Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3588286Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3588445Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3588592Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3588881Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3589016Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3589275Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3589421Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3589682Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3589840Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3590150Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3590295Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3590585Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3590786Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3591103Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3591406Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3591549Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3592035Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3592299Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3592529Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3592745Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3592946Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3593245Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3593522Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3593818Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3594058Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3594350Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3594628Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3594925Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3595177Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3595474Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3595710Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3596007Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3596256Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3596551Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3596750Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3596989Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3597293Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3597492Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3597733Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3598027Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3598264Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3598562Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3598783Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3599021Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3599224Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3599456Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3599627Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3599816Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3599920Z E1204 11:10:31.026000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3600085Z [W1204 11:10:31.346454251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3600087Z 2025-12-04T11:45:24.3600260Z [W1204 11:10:31.350399356 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3600262Z 2025-12-04T11:45:24.3600576Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3600876Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3601008Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3601495Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3601755Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3601985Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3602202Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3602404Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3602705Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3602940Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3603304Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3603543Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3603851Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3604089Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3604385Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3604626Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3604934Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3605171Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3605468Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3605702Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3606001Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3606199Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3606439Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3606734Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3606938Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3607176Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3607469Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3607729Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3608031Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3608269Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3608482Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3608684Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3608903Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3609072Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3609271Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3609797Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/uz/cuzq7nx4gx44cgfloyneynqstyfmidhbbrumvx6n53zx4hnm7g6n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3609953Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3610175Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3610335Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3610486Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3610774Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3610912Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3611171Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3611313Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3611569Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3611729Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3612013Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3612157Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3612449Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3612643Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3612961Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3613287Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3613436Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3613919Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3614174Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3614405Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3614611Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3614816Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3615107Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3615346Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3615641Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3615875Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3616169Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3616415Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3616721Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3616968Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3617257Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3617493Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3617783Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3618028Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3618319Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3618520Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3618760Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3619052Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3619253Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3619484Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3619780Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3620010Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3620306Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3620529Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3620746Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3620963Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3621176Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3621358Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3621536Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3621641Z E1204 11:10:31.082000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3621950Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3622264Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3622397Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3622873Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3623131Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3623394Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3623603Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3623804Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3624096Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3624333Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3624624Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3624860Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3625178Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3625412Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3625719Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3625951Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3626244Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3626477Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3626785Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3627021Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3627311Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3627512Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3627744Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3628039Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3628236Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3628472Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3628765Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3628996Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3629289Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3629518Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3629739Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3629952Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3630164Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3630331Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3630512Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3631046Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/qg/cqgobuc7iba2k2x75stljq77bhgirhnfput6br7r4abeo5sqggud.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3631203Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3631421Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3631577Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3631726Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3632012Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3632147Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3632405Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3632543Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3632804Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3632961Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3633238Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3633403Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3633686Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3633915Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3634228Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3634544Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3634676Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3635165Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3635434Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3635666Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3635879Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3636083Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3636380Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3636617Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3636918Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3637158Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3637453Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3637695Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3637987Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3638238Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3638540Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3638797Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3639096Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3639329Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3639627Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3639840Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3640085Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3640377Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3640583Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3640820Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3641114Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3641351Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3641644Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3641872Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3642082Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3642292Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3642510Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3642698Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3642884Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3642989Z E1204 11:10:31.084000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3643363Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3643658Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3643795Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3644278Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3644546Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3644782Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3644991Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3645539Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3645834Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3646074Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3646373Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3646607Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3646905Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3647137Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3647462Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3647701Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3648005Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3648242Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3648533Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3648773Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3649076Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3649279Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3649516Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3649812Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3650016Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3650251Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3650550Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3650786Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3651083Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3651311Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3651519Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3651726Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3651964Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3652141Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3652333Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3652861Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/h3/ch35ivh6nzwg4ldmmaauflzxygnmbhlnwlcqzovcmzixouxjadp5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3653018Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3653236Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3653445Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3653592Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3653884Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3654019Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3654284Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3654424Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3654688Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3654850Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3655122Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3655263Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3655540Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3655739Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3656055Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3656385Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3656522Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3657018Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3657277Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3657505Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3657731Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3657937Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3658231Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3658474Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3658766Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3659006Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3659303Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3659543Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3659839Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3660075Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3660372Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3660620Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3660933Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3661178Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3661474Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3661678Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3661914Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3662222Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3662419Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3662657Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3662951Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3663190Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3663524Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3663746Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3663961Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3664167Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3664384Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3664554Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3664737Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3664844Z E1204 11:10:31.085000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3665032Z [W1204 11:10:31.355590795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3665034Z 2025-12-04T11:45:24.3665349Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3665659Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3665796Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3666271Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3666545Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3666775Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3666980Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3667189Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3667480Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3667721Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3668016Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3668254Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3668550Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3668785Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3669084Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3669316Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3669637Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3669886Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3670177Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3670414Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3670710Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3670922Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3671158Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3671462Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3671665Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3671898Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3672198Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3672429Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3672728Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3672952Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3673166Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3673397Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3673612Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3673809Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3674007Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3674554Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/dc/cdceiiqymbavtugfidsansgslilml2hf2hfzmjtbl4ieixlso6ci.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3674703Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3674924Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3675088Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3675249Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3675541Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3675673Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3675937Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3676080Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3676340Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3676499Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3676776Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3676916Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3677193Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3677392Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3677708Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3678006Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3678148Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3678655Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3678914Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3679142Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3679358Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3679560Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3679871Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3680110Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3680404Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3680643Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3680939Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3681179Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3681471Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3681712Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3682009Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3682242Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3682539Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3682793Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3683105Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3683353Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3683587Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3683885Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3684100Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3684340Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3684631Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3684867Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3685164Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3685387Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3685602Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3685806Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3686025Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3686195Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3686383Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3686485Z E1204 11:10:31.086000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3686798Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3687124Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3687258Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3687751Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3688006Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3688240Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3688468Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3688670Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3688965Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3689202Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3689500Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3689736Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3690032Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3690272Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3690566Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3690804Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3691095Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3691333Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3691646Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3691898Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3692194Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3692392Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3692634Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3692927Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3693143Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3693408Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3693706Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3693944Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3694239Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3694464Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3694672Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3694881Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3695098Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3695271Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3695456Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3695998Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpdun8s796/pg/cpgolbbjtm5k3einq3jw2k3hplh4p7dots34butwhiblxt5muabr.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.3696167Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.3696397Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.3696558Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.3696704Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.3696996Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.3697137Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.3697414Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.3697557Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.3697813Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.3697975Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.3698248Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.3698389Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.3698666Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.3698865Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.3699185Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3699479Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3699618Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3700098Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3700377Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3700621Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3700831Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3701038Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3701334Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3701574Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3701877Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3702118Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3702416Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3702649Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3702949Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3703183Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3703512Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3703753Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3704045Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3704281Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3704577Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3704812Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3705048Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3705364Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3705562Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3705796Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3706094Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3706342Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3706639Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3706863Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3707075Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3707281Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.3707495Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.3707667Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.3707847Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.3707956Z E1204 11:10:31.089000 696578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.3708009Z ('RERUN', {'yellow': True}) [3.5242s] [100%] 2025-12-04T11:45:24.3708351Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:32.972878583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3708354Z 2025-12-04T11:45:24.3708502Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3708800Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3709125Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3709261Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3709752Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3710005Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3710239Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3710463Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3710664Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3710963Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3711199Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3711497Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3711734Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3712031Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3712270Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3712563Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3712801Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3713092Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3713374Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3713600Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3713805Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3714037Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3714238Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3714481Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3714775Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3714990Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3715221Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3715518Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3715744Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3715942Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3716167Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3716376Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3716581Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3716777Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3717003Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3717211Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3717414Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3717629Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3717872Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3718180Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3718413Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3718712Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3718936Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3719157Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3719361Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3719571Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3719774Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3720008Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3720309Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3720542Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3720841Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3721082Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3721378Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3721615Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3721907Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3722175Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3722474Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3722717Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3723016Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3723293Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3723591Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3723840Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3724137Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3724375Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3724669Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3724908Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3725200Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3725439Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3725733Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3725967Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3726176Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3726372Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.3726695Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3726929Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3727242Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3727474Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3727770Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3728006Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3728312Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3728549Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3728842Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3729078Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3729375Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3729574Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3729778Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3729978Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3730192Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3730394Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3730632Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3730926Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3731151Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3731354Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3731562Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3731764Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3731997Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3732297Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3732545Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3732841Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3733045Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3733286Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3733494Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3733734Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3734036Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3734260Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3734468Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3734675Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3734878Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3735179Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3735443Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3735746Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3735992Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3736294Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3736537Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3736835Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3737089Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3737383Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3737588Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3737788Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3738017Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3738228Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3738431Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3738642Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3738940Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3739184Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3739479Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3739721Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3740045Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3740294Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3740596Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3740831Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3741134Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3741376Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3741581Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3741784Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3741978Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3742195Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3742395Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3742695Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3742919Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3743122Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3743360Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3743561Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3743860Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3744092Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3744415Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3744667Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3744962Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3745199Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3745494Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3745749Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3746044Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3746282Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3746578Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3746812Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3747110Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3747344Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3747645Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3747878Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3748176Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3748413Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3748727Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3748930Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3749138Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3749376Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3749672Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3749916Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3750227Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3750462Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3750763Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3751001Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3846375Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3846857Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3847167Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3847390Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3847630Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3847939Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3848175Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3848600Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3848845Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3849073Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3849278Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3849482Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3849782Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3849995Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3850237Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3850441Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3850641Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3850938Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3851161Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3851368Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3851568Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3851764Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3851915Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.3852118Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3852348Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3852555Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3852754Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3853000Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3853212Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3853449Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3853674Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3853883Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3854081Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3854322Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3854528Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3854731Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3854929Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3855148Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3855351Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3855553Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3855758Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3856054Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3856272Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3856474Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3856677Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3856869Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3857098Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3857319Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3857533Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3857734Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3857936Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3858235Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3858447Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3858667Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3858864Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3859069Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3859370Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3859583Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3859790Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3859988Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3860191Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3860486Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3860686Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.3860893Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.3861082Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.3861306Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.3861521Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.3861750Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.3861947Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.3862141Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.3862328Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.3862502Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.3862648Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.3862755Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.3862886Z E1204 11:10:32.712000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3863045Z [W1204 11:10:32.981717541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3863049Z 2025-12-04T11:45:24.3863200Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3863531Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3863837Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3863973Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3864471Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3864732Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3864961Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3865172Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3865370Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3865696Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3865936Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3866240Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3866477Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3866769Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3867019Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3867311Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3867548Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3867842Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3868061Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3868273Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3868470Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3868682Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3868882Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3869117Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3869415Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3869612Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3869860Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3870162Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3870398Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3870596Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3870817Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3871029Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3871225Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3871436Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3871653Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3871860Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3872058Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3872254Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3872488Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3872780Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3873013Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3873333Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3873555Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3873757Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3873957Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3874204Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3874403Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3874650Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3874941Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3875177Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3875471Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3875725Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3876020Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3876250Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3876547Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3876779Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3877074Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3877306Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3877603Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3877841Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3878132Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3878370Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3878689Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3878925Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3879227Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3879464Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3879763Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3879994Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3880300Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3880519Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3880725Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3880922Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.3881217Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3881453Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3881745Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3881982Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3882273Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3882511Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3882806Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3883062Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3883407Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3883655Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3883948Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3884145Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3884344Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3884557Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3884766Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3884968Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3885201Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3885494Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3885690Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3885889Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3886088Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3886284Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3886518Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3886811Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3887051Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3887345Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3887570Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3887795Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3887998Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3888236Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3888534Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3888757Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3888970Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3889175Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3889378Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3889679Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3889915Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3890340Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3890579Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3890872Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3891114Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3891410Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3891643Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3891983Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3892181Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3892391Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3892611Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3892817Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3893022Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3893222Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3893565Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3893798Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3894095Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3894330Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3894629Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3894865Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3895158Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3895399Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3895694Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3895917Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3896119Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3896360Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3896555Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3896781Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3896985Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3897279Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3897503Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3897721Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3897923Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3898128Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3898422Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3898660Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3898955Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3899194Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3899489Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3899726Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3900022Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3900254Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3900548Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3900804Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3901111Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3901345Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3901643Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3901883Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3902187Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3902423Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3902714Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3902954Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3903282Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3903482Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3903684Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3903919Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3904214Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3904449Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3904745Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3904983Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3905306Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3905556Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3905850Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3906087Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3906384Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3906598Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3906836Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3907129Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3907368Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3907659Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3907878Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3908079Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3908282Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3908490Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3908788Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3909005Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3909207Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3909424Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3909636Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3909951Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3910176Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3910378Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3910585Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3910776Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3910952Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.3911149Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3911373Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3911585Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3911780Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3912007Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3912214Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3912414Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3912636Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3912847Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.3913045Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3913299Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.3913509Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3913733Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3913934Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3914160Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3914366Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3914565Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3914773Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3915071Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3915299Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3915509Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3915710Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3915906Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3916103Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3916318Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3916519Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3916721Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3916924Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3917219Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3917436Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3917637Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3917862Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3918063Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3918374Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3918591Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.3918793Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3918998Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3919197Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3919507Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3919702Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.3919908Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.3920105Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.3920303Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.3920523Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.3920729Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.3920929Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.3921120Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.3921306Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.3921481Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.3921612Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.3921717Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.3921857Z E1204 11:10:32.715000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.3922030Z [W1204 11:10:32.983934191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.3922032Z 2025-12-04T11:45:24.3922178Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.3922495Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.3922792Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.3922929Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.3923449Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.3923721Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.3923951Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.3924159Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.3924362Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3924660Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3924902Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3925200Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3925434Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3925730Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3925961Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3926255Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3926513Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3926825Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3927051Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3927264Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3927751Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3928214Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3928700Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3929177Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3929742Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3930309Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3930777Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3931355Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3931915Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3932383Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3932846Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3933390Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3933934Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3934390Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3934871Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3935356Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3935808Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3936236Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3936699Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3937264Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3937830Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3938419Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3938969Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3939434Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.3939874Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3940319Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3940767Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3941232Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3941797Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3942361Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3942922Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3943528Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3944087Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3944683Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3945257Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3945818Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3946380Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3946941Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3947518Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3948076Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3948634Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3949200Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3949760Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3950318Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3950878Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3951440Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3951995Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3952561Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3953120Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3953733Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3954188Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3954635Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.3955158Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3955716Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3956276Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3956850Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3957408Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3957973Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3958534Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3959092Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3959652Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3960212Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3960774Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3961300Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3961731Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3962162Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3962603Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3963083Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3963863Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3964443Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3964971Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3965399Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3965830Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3966272Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3966738Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3967301Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3967868Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3968428Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3968953Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3969399Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3969845Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3970321Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3970888Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3971441Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3971902Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3972358Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3972810Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3973395Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3973961Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3974525Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3975095Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3975674Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3976240Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3976805Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3977370Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3977938Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3978467Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.3978899Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.3979362Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3979828Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3980269Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3980705Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3981237Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3981826Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3982389Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3982967Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3983566Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3984133Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3984694Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3985280Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3985844Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3986400Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3986861Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3987302Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3987733Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.3988173Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.3988625Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3989160Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3989714Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.3990176Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.3990615Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.3991078Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.3991611Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3992195Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3992761Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3993365Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3993928Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3994510Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3995073Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3995638Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3996203Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3996767Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3997335Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3997899Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3998467Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.3999033Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.3999605Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4000169Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4000775Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4001355Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4001922Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4002453Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4002891Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4003407Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4003995Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4004558Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4005123Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4005688Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4006254Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4006816Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4007383Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4007950Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4008515Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4009041Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4009503Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4011920Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4012481Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4013053Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4013625Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4014084Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4014518Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4014969Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4015500Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4016041Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4016494Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4016932Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4017365Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4017891Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4018442Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4018897Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4019334Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4019763Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4020143Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4020524Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4021009Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4021474Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4021922Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4022376Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4022845Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4023328Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4023796Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4024257Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4024696Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4025149Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4025611Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4026048Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4026478Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4026922Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4027373Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4027807Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4028240Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4028768Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4029309Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4029788Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4030223Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4030662Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4031084Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4031526Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4031974Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4032407Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4032858Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4033416Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4033955Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4034407Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4034850Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4035283Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4035812Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4036359Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4036811Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4037249Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4037685Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4038219Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4038774Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4039209Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4039652Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4040072Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4040514Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4040971Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4041412Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4041849Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4042255Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4042643Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4042975Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4043244Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4043581Z E1204 11:10:32.717000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4043903Z [W1204 11:10:32.028254309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4044098Z 2025-12-04T11:45:24.4044241Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4044722Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4045348Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4045808Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4046455Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4047215Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4047756Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4048225Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4048675Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4049204Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4049765Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4050323Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4050893Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4051449Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4052004Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4052559Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4053118Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4053716Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4054262Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4054724Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4055165Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4055605Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4056050Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4056517Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4057118Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4057656Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4058115Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4058670Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4059217Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4059669Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4060134Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4060591Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4061024Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4061449Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4061894Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4062353Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4062787Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4063215Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4063712Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4064272Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4064827Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4065381Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4065958Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4066417Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4066871Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4067311Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4067753Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4068222Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4068784Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4069359Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4069918Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4070478Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4071034Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4071599Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4072156Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4072716Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4073313Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4073874Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4074433Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4075017Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4075587Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4076159Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4076717Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4077276Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4077835Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4078408Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4078965Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4079525Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4080088Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4080639Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4081098Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4081530Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4082052Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4082609Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4083168Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4083763Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4084328Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4084916Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4085490Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4086051Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4086609Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4087175Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4087747Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4088271Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4088698Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4089130Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4089574Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4090021Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4090492Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4091053Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4091579Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4092005Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4092430Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4092854Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4093345Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4093929Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4094490Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4095061Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4095583Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4096024Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4096468Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4096950Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4097510Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4098060Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4098520Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4098960Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4099396Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4099929Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4100494Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4101062Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4101629Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4102194Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4102765Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4103382Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4103959Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4104524Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4105053Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4105489Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4105957Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4106417Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4106855Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4107292Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4107824Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4108393Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4108958Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4109521Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4110086Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4110652Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4111216Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4111781Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4112367Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4112920Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4113419Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4113858Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4114288Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4114735Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4115199Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4115736Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4116287Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4116754Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4117193Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4117635Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4118165Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4118729Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4119296Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4119866Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4120430Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4120992Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4121578Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4122146Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4122722Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4123322Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4123888Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4124456Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4125032Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4125597Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4126160Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4126726Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4127293Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4127855Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4128421Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4128954Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4129390Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4129861Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4130428Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4131015Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4131579Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4132153Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4132715Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4133311Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4133875Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4134459Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4135024Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4135555Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4136025Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4136590Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4137155Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4137723Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4138269Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4138723Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4139166Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4139603Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4140168Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4140711Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4141176Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4141613Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4142051Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4142589Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4143141Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4143642Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4144079Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4144506Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4144887Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4145269Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4145728Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4146192Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4146636Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4147092Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4147558Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4148002Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4148457Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4148921Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4149392Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4149848Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4150332Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4150777Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4151209Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4151663Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4152130Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4152570Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4153005Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4153575Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4154124Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4154580Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4155017Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4155443Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4155868Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4156319Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4156773Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4157210Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4157650Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4158210Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4158757Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4159224Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4159666Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4160099Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4160631Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4161185Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4161638Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4162078Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4162514Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4163049Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4163622Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4164058Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4164489Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4164915Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4165366Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4165831Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4166276Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4166698Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4167118Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4167519Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4167854Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4168123Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4180694Z E1204 11:10:32.761000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4181029Z [W1204 11:10:32.030433869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4181222Z 2025-12-04T11:45:24.4181374Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4181870Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4182542Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4183007Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4183706Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4184473Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4184989Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4185458Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4185902Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4186440Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4187011Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4187571Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4188131Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4188722Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4189277Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4189846Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4190405Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4190963Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4191508Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4191986Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4192421Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4192860Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4193338Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4193806Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4194364Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4194884Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4195346Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4195905Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4196458Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4196656Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4196874Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4197097Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4197306Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4197518Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4197738Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4197941Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4198139Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4198334Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4198580Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4198875Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4199107Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4199399Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4199619Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4199825Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4200019Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4200229Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4200427Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4200660Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4200953Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4201187Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4201501Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4201731Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4202036Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4202266Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4202561Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4202792Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4203093Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4203371Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4203665Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4203898Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4204190Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4204422Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4204718Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4204950Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4205243Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4205473Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4205767Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4206022Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4206334Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4206554Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4206755Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4206953Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4207244Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4207497Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4207788Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4208024Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4208317Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4208550Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4208841Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4209073Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4209368Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4209602Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4209892Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4210089Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4210306Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4210506Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4210724Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4210926Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4211156Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4211450Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4211660Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4211853Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4212050Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4212243Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4212478Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4212771Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4213008Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4213340Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4213538Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4213747Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4213948Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4214183Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4214499Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4214726Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4214944Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4215146Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4215350Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4215644Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4215879Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4216185Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4216420Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4216715Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4216946Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4217242Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4217474Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4217772Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4217972Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4218170Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4218392Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4218593Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4218816Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4219016Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4219319Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4219552Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4219850Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4220088Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4220392Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4220627Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4220918Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4221152Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4221447Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4221666Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4221870Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4222070Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4222265Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4222477Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4222679Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4222973Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4223219Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4223460Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4223669Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4223871Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4224163Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4224398Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4224711Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4224944Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4225237Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4225470Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4225767Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4225998Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4226292Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4226528Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4226819Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4227056Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4227349Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4227610Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4227900Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4228145Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4228438Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4228672Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4228966Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4229181Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4229382Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4229615Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4229911Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4230146Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4230439Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4230674Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4230969Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4231204Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4231499Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4231732Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4232049Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4232246Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4232493Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4232785Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4233019Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4233344Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4233573Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4233778Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4233977Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4234183Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4234475Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4234693Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4234898Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4235095Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4235300Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4235593Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4235815Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4236017Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4236232Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4236438Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4236590Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4236803Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4237023Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4237232Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4237428Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4237660Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4237866Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4238063Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4238286Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4238492Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4238694Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4238914Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4239121Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4239319Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4239516Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4239730Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4239934Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4240134Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4240362Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4240661Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4240884Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4241090Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4241288Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4241483Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4241680Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4241904Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4242107Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4242304Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4242509Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4242802Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4243018Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4243222Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4243467Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4243671Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4243965Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4244180Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4244381Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4244610Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4244809Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4245120Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4245318Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4245521Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4245714Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4245910Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4246139Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4246348Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4246548Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4246741Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4246925Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4247099Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4247226Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4247333Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4247459Z E1204 11:10:32.763000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4247618Z [W1204 11:10:32.032560260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4247622Z 2025-12-04T11:45:24.4247770Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4248072Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4248369Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4248500Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4249007Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4249273Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4249501Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4249708Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4249911Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4250219Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4250455Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4250750Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4250984Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4251277Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4251509Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4251801Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4252037Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4252329Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4252555Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4252761Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4252974Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4253188Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4253422Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4253670Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4253963Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4254161Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4254393Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4254706Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4254926Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4255125Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4255349Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4255554Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4255752Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4255944Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4256164Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4256370Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4256568Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4256764Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4256994Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4257289Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4257546Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4257852Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4258070Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4258276Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4258474Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4258680Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4258892Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4259123Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4259419Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4259654Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4259949Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4260183Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4260472Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4260706Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4260998Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4261231Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4261521Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4261779Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4262077Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4262321Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4262612Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4262843Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4263137Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4263417Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4263709Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4263945Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4264237Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4264475Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4264764Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4264985Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4265188Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4265383Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4265677Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4265907Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4266229Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4266459Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4266764Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4266995Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4267285Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4267515Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4267818Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4268052Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4268342Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4268538Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4268735Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4268930Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4269138Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4269338Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4269569Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4269860Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4270055Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4270249Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4270461Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4270656Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4270903Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4271197Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4271431Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4271723Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4271929Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4272135Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4272337Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4272571Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4272862Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4273084Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4273315Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4273516Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4273717Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4274011Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4274242Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4274534Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4274793Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4275088Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4275335Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4275627Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4275865Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4276160Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4276373Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4276570Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4276792Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4276998Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4277196Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4277398Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4277689Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4277925Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4278218Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4278458Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4278750Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4278995Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4279300Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4279547Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4279840Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4280059Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4280262Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4280471Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4280670Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4280881Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4281080Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4281374Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4281595Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4281796Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4281995Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4282197Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4282490Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4282723Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4283018Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4283295Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4283604Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4283849Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4284139Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4284373Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4284668Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4284914Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4285206Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4285439Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4285734Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4285967Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4286259Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4286491Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4286785Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4287019Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4287309Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4287507Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4287718Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4287969Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4288272Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4288507Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4288798Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4289031Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4289334Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4289566Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4289858Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4290093Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4290388Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4290585Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4290817Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4291111Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4291342Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4291637Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4291851Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4292064Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4292273Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4292476Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4292781Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4292994Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4293200Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4293445Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4293661Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4293956Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4294175Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4294505Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4294703Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4294899Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4295050Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4295248Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4295472Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4295678Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4295877Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4296096Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4296302Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4296530Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4296751Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4296971Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4297169Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4297391Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4297597Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4297807Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4298002Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4298215Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4298415Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4298615Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4298819Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4299112Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4299328Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4299533Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4299736Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4299928Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4300125Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4300337Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4300551Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4300762Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4300972Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4301268Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4301480Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4301684Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4301881Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4302093Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4302387Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4302598Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4302800Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4302997Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4303197Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4303531Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4303729Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4303931Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4304123Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4304318Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4304530Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4304753Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4304963Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4305167Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4305345Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4305516Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4305643Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4305746Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4305872Z E1204 11:10:32.766000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4305938Z ('RERUN', {'yellow': True}) [1.7304s] [100%] 2025-12-04T11:45:24.4306283Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:34.476148795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4306286Z 2025-12-04T11:45:24.4306430Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4306723Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4307018Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4307149Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4307629Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4307883Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4308109Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4308315Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4308516Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4308808Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4309063Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4309365Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4309597Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4309889Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4310121Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4310422Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4310654Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4310946Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4311167Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4311376Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4311574Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4311780Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4311979Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4312213Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4312502Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4312700Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4312930Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4313244Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4313503Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4313713Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4313931Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4314134Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4314331Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4314523Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4314759Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4314960Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4315155Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4315350Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4315583Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4315876Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4316106Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4316395Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4316614Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4316821Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4317015Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4317221Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4317435Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4317716Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4318020Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4318250Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4318540Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4318771Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4319081Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4319311Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4319599Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4319831Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4320124Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4320357Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4320647Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4320878Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4321168Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4321399Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4321690Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4321940Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4322233Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4322477Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4322769Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4323004Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4323328Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4323562Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4323762Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4323959Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4324253Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4324485Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4324776Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4325008Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4325302Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4325534Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4325823Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4326056Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4326371Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4326603Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4326906Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4327104Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4327303Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4327501Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4327721Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4327920Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4328151Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4328443Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4328641Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4328836Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4329032Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4329228Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4329460Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4329757Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4329988Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4330279Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4330483Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4330704Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4330911Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4331154Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4331448Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4331669Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4331875Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4332086Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4332290Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4332586Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4332820Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4333114Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4333379Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4333673Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4333907Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4334202Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4334441Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4334733Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4334956Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4335153Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4335398Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4335599Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4335798Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4336005Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4336296Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4336547Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4336844Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4337083Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4337378Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4337611Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4337906Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4338138Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4338433Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4338655Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4338859Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4339060Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4339273Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4339486Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4339699Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4339993Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4340212Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4340418Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4340628Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4340835Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4341134Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4341371Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4341667Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4341903Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4342204Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4342447Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4342739Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4342975Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4343308Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4343546Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4343873Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4344125Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4344421Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4344655Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4344955Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4345205Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4345503Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4345736Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4346034Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4348794Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4349009Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4349245Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4349541Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4349780Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4350099Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4350332Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4350633Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4350898Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4351196Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4351436Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4351728Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4351934Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4352184Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4352481Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4352718Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4353012Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4353235Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4353554Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4353762Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4353967Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4354263Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4354484Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4354685Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4354889Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4355116Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4355431Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4355655Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4355863Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4356068Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4356264Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4356422Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4356636Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4356861Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4357072Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4357276Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4357505Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4357731Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4357930Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4358150Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4358363Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4358560Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4358785Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4358991Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4359190Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4359411Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4359624Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4359830Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4360028Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4360232Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4360529Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4360762Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4360964Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4361161Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4361355Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4361551Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4361764Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4361977Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4362175Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4362372Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4362667Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4362884Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4363084Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4363319Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4363531Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4363838Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4364052Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4364253Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4364451Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4364651Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4364945Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4365158Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4365361Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4365549Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4365746Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4365961Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4366180Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4366378Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4366567Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4366749Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4366918Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4367048Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4367150Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4367277Z E1204 11:10:34.209000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4367435Z [W1204 11:10:34.478477442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4367452Z 2025-12-04T11:45:24.4367595Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4367899Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4368198Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4368328Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4368810Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4369078Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4369302Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4369506Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4369707Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4369998Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4370254Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4370544Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4370778Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4371070Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4371301Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4371591Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4371819Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4372135Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4372356Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4372561Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4372758Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4372964Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4373164Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4373452Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4373743Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4373937Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4374171Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4374478Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4374699Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4374894Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4375111Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4375318Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4375514Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4375707Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4375925Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4376141Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4376350Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4376548Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4376782Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4377073Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4377308Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4377597Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4377829Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4378034Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4378228Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4378436Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4378634Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4378881Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4379171Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4379403Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4379693Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4379923Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4380213Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4380456Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4380760Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4380994Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4381286Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4381520Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4381812Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4382057Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4382346Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4382578Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4382872Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4383117Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4383443Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4383676Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4383973Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4384207Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4384499Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4384722Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4384952Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4385150Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4385445Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4385677Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4385968Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4386206Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4386517Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4386748Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4387040Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4387275Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4387582Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4387817Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4388106Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4388306Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4388502Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4388700Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4388907Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4389106Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4389360Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4389654Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4389853Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4390047Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4390247Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4390444Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4390696Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4390985Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4391218Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4391512Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4391706Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4391934Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4392134Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4392370Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4392664Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4392888Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4393096Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4393331Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4393549Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4393855Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4394093Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4394383Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4394617Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4394913Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4395159Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4395454Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4395687Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4395982Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4396198Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4396394Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4396617Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4396818Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4397018Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4397219Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4397513Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4397745Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4398067Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4398303Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4398594Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4398828Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4399121Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4399356Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4399664Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4399888Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4400093Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4400291Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4400501Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4400711Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4400913Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4401208Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4401428Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4401634Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4401831Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4402033Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4402355Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4402590Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4402884Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4403116Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4403450Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4403681Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4403991Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4404222Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4404520Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4404754Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4405061Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4405294Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4405584Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4405824Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4406118Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4406351Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4406647Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4406906Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4407204Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4407402Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4407600Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4407836Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4408133Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4408378Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4408669Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4408904Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4409197Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4409446Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4409739Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4409971Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4410267Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4410464Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4410698Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4410989Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4411246Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4411544Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4411759Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4411962Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4412160Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4412362Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4412665Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4412879Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4413083Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4413416Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4413620Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4413929Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4414157Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4414360Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4414562Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4414759Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4414906Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4415104Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4415327Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4415564Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4415761Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4415984Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4416190Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4416390Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4416613Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4416836Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4417030Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4417251Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4417458Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4417658Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4417865Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4418082Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4418286Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4418486Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4418692Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4418987Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4419203Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4419404Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4419632Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4419825Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4420026Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4420242Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4420442Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4420646Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4420847Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4421156Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4421368Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4421571Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4421774Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4421975Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4422293Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4422506Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4422711Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4422910Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4423115Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4423447Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4423642Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4423873Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4424062Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4424263Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4424475Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4424686Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4424885Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4425079Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4425275Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4425446Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4425577Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4425680Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4425809Z E1204 11:10:34.211000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4425966Z [W1204 11:10:34.480616593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4425969Z 2025-12-04T11:45:24.4426116Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4426423Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4426721Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4426856Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4427337Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4427595Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4427823Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4428052Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4428253Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4428547Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4428786Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4429079Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4429316Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4429619Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4429852Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4430149Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4430383Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4430686Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4430905Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4431114Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4431313Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4431524Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4431725Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4431957Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4432250Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4432468Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4432704Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4432993Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4433216Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4433439Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4433657Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4433883Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4434078Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4434277Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4434498Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4434706Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4434925Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4435122Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4435355Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4435649Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4435887Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4436177Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4436397Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4436628Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4436826Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4437038Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4437238Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4437473Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4437764Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4438009Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4438301Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4438533Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4438828Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4439058Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4439364Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4439596Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4439889Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4440120Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4440414Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4440649Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4440938Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4441192Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4441484Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4441721Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4442016Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4442247Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4442551Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4442782Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4443076Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4443331Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4443534Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4443743Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4444034Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4444270Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4444560Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4444796Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4445087Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4445321Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4445640Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4445872Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4446167Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4446399Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4446700Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4446910Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4447108Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4447306Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4447514Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4447717Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4447961Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4448256Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4448452Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4448651Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4448853Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4449050Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4449286Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4449578Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4449834Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4450126Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4450325Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4450534Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4450738Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4450975Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4451293Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4451516Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4451719Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4451924Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4452128Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4452433Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4452669Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4452965Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4453201Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4453538Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4453775Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4454070Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4454335Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4454631Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4454831Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4455030Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4455252Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4455458Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4455672Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4455874Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4456172Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4456409Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4456717Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4456950Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4457247Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4457487Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4457783Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4458020Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4458315Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4458563Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4458766Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4458969Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4459164Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4459373Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4459578Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4459872Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4460109Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4460311Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4460515Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4460722Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4461027Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4461262Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4461554Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4461794Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4462088Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4462323Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4462616Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4462871Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4463169Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4463443Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4463741Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4463980Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4464272Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4464527Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4464822Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4465063Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4465359Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4465612Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4465909Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4466108Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4466307Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4466542Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4466838Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4467069Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4467405Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4467644Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4467936Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4468171Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4468464Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4468700Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4469008Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4469205Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4469445Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4469740Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4469992Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4470285Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4470502Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4470708Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4470907Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4471111Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4471403Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4471629Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4471842Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4472047Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4472248Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4472545Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4472770Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4472971Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4473184Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4473407Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4473559Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4473758Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4473982Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4474204Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4474402Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4474631Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4474840Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4475037Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4475258Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4475467Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4475663Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4475916Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4476125Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4476324Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4476523Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4476737Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4476944Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4477142Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4477358Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4477654Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4477865Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4478072Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4478283Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4478479Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4478674Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4478891Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4479097Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4479300Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4479504Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4479799Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4480035Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4480236Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4480438Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4480638Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4480933Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4481151Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4481357Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4481571Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4481770Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4482067Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4482264Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4482479Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4482671Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4482867Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4483084Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4483330Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4483533Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4483724Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4483910Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4484080Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4484245Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4484353Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4484480Z E1204 11:10:34.214000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4484643Z [W1204 11:10:34.524570916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4484645Z 2025-12-04T11:45:24.4484788Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4485086Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4485381Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4485528Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4486011Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4486268Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4486498Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4486719Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4486920Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4487213Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4487451Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4487746Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4487979Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4488281Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4488535Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4488830Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4489062Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4489355Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4489579Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4489786Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4489997Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4490205Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4490407Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4490642Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4490935Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4491148Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4491380Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4491675Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4491897Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4492096Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4492315Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4492522Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4492732Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4492938Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4495950Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4496181Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4496384Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4496581Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4496819Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4497158Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4497400Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4497695Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4497918Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4498148Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4498346Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4498555Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4498756Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4498990Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4499287Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4499518Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4499811Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4500070Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4500368Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4500601Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4500895Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4501131Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4501422Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4501667Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4501958Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4502193Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4502486Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4502743Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4503038Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4503306Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4503602Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4503836Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4504129Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4504376Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4504679Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4504903Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4505106Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4505308Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4505602Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4505835Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4506143Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4506374Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4506668Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4506900Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4507211Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4507445Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4507741Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4507974Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4508266Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4508465Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4508662Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4508881Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4509088Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4509290Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4509523Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4509813Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4510015Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4510221Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4510418Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4510612Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4510846Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4511139Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4511383Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4511675Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4511871Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4512084Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4512287Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4512524Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4512820Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4513052Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4513301Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4513502Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4513704Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4513996Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4514232Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4514529Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4514779Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4515072Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4515306Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4515599Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4515845Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4516140Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4516340Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4516537Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4516761Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4516964Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4517168Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4517382Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4517692Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4517929Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4518222Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4518458Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4518752Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4519004Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4519301Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4519536Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4519833Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4520065Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4520268Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4520466Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4520663Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4520874Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4521078Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4521371Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4521595Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4521822Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4522020Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4522225Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4522517Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4522756Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4523052Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4523327Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4523622Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4523853Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4524151Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4524399Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4524690Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4524923Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4525218Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4525455Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4525747Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4525982Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4526302Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4526537Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4526834Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4527066Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4527365Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4527562Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4527774Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4528009Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4528300Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4528538Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4528844Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4529085Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4529379Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4529614Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4529910Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4530142Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4530437Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4530655Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4530889Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4531188Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4531422Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4531719Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4531936Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4532151Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4532350Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4532555Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4532851Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4533066Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4533355Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4533555Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4533759Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4534053Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4534278Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4534483Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4534682Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4534890Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4535059Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4535261Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4535481Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4535689Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4535888Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4536115Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4536342Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4536536Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4536759Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4536966Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4537164Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4537396Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4537606Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4537805Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4538001Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4538221Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4538425Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4538626Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4538825Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4539148Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4539364Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4539569Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4539773Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4539964Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4540164Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4540378Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4540596Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4540795Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4540997Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4541294Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4541509Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4541725Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4541923Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4542128Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4542421Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4542637Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4542839Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4543039Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4543290Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4543596Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4543797Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4543999Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4544191Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4544389Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4544607Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4544831Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4545029Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4545221Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4545404Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4545576Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4545716Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4545824Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4545952Z E1204 11:10:34.258000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4546113Z [W1204 11:10:34.526774826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4546117Z 2025-12-04T11:45:24.4546263Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4546560Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4546860Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4546993Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4547492Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4547760Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4547992Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4548203Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4548402Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4548699Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4548946Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4549240Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4549472Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4549770Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4550016Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4550306Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4550540Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4550834Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4551057Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4551265Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4551462Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4551674Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4551905Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4552147Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4552438Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4552638Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4552872Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4553165Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4553426Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4553621Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4553842Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4554050Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4554250Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4554464Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4554685Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4554893Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4555090Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4555288Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4555521Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4555814Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4556058Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4556364Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4556588Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4556794Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4556994Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4557202Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4557402Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4557649Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4557946Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4558181Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4558474Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4558721Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4559015Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4559250Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4559544Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4559779Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4560071Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4560301Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4560617Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4560851Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4561145Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4561378Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4561677Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4561922Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4562212Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4562447Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4562738Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4562971Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4563322Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4563541Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4563748Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4563947Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4564243Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4564473Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4564767Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4565027Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4565320Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4565553Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4565844Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4566079Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4566377Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4566630Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4566922Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4567119Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4567320Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4567536Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4567744Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4567942Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4568176Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4568469Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4568668Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4568865Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4569062Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4569281Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4569511Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4569805Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4570037Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4570332Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4570530Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4570748Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4570954Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4571187Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4571485Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4571708Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4571919Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4572122Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4572323Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4572620Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4572854Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4573152Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4573422Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4573745Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4573982Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4574276Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4574511Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4574805Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4575019Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4575219Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4575441Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4575647Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4575848Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4576052Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4576357Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4576593Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4576889Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4577122Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4577417Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4577649Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4577966Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4578201Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4578497Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4578721Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4578923Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4579129Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4579331Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4579546Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4579747Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4580041Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4580268Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4580485Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4580687Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4580889Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4581187Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4581420Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4581717Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4581954Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4582257Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4582505Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4582801Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4583036Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4583370Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4583606Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4583922Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4584154Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4584448Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4584681Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4584990Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4585228Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4585521Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4585760Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4586057Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4586259Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4586457Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4586708Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4587016Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4587251Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4587550Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4587783Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4588081Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4588326Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4588623Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4588860Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4589155Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4589368Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4589600Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4589897Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4590135Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4590428Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4590647Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4590850Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4591066Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4591278Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4591575Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4591788Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4591992Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4592197Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4592399Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4592713Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4592933Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4593136Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4593363Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4593557Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4593727Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4593924Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4594146Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4594356Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4594554Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4594775Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4594983Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4595178Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4595429Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4595635Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4595830Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4596052Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4596256Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4596455Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4596648Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4596878Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4597079Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4597275Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4597476Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4597780Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4597993Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4598193Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4598394Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4598586Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4598782Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4598996Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4599196Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4599406Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4599631Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4599925Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4600136Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4600339Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4600540Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4600738Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4601043Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4601255Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4601458Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4601656Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4601857Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4602163Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4602358Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4602560Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4602748Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4602945Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4603159Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4603401Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4603613Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4603819Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4604002Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4604170Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4604296Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4604398Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4604526Z E1204 11:10:34.260000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4604682Z [W1204 11:10:34.528903506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4604685Z 2025-12-04T11:45:24.4604843Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4605135Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4605431Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4605563Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4606057Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4606311Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4606537Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4606745Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4606945Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4607236Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4607473Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4607762Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4608016Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4608310Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4608541Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4608832Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4609063Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4609364Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4609583Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4609790Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4609987Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4610193Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4610406Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4610637Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4610927Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4611122Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4611354Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4611645Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4611863Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4612070Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4612298Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4612504Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4612700Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4612895Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4613113Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4613353Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4613561Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4613755Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4613985Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4614278Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4614508Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4614811Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4615030Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4615237Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4615435Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4615645Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4615842Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4616074Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4616397Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4616630Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4616923Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4617156Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4617452Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4617682Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4617987Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4618217Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4618508Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4618741Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4619041Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4619274Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4619564Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4619800Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4620091Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4620323Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4620614Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4620866Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4621159Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4621389Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4621682Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4621902Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4622107Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4622317Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4622606Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4622840Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4623130Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4623412Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4623701Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4623935Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4624228Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4624461Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4624756Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4624987Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4625308Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4625507Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4625702Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4625899Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4626106Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4626307Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4626555Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4626851Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4627049Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4627246Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4627441Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4627650Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4627883Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4628173Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4628408Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4628698Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4628895Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4629106Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4629308Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4629563Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4629856Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4630078Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4630279Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4630480Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4630684Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4630986Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4631221Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4631516Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4631752Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4632072Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4632307Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4632602Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4632835Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4633131Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4633365Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4633564Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4633814Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4634019Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4634222Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4634422Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4634716Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4634950Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4635263Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4635495Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4635793Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4636030Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4636339Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4636577Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4636870Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4637094Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4637295Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4637494Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4637688Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4637900Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4638123Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4638414Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4638641Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4638842Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4639042Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4639244Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4639550Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4639783Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4640075Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4640310Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4640605Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4640850Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4641150Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4641475Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4641770Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4642003Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4642297Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4642543Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4642844Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4643081Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4643404Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4643641Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4643932Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4644183Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4644479Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4644675Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4644876Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4645108Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4645415Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4645649Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4645948Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4646182Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4646475Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4646707Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4647026Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4647259Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4647554Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4647751Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4647985Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4648281Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4648533Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4648825Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4649041Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4649246Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4649444Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4649658Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4649951Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4650165Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4650367Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4650570Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4650773Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4651064Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4651298Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4651508Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4651710Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4651901Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4652051Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4652250Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4652474Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4652695Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4652897Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4653121Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4653370Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4653568Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4653803Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4654010Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4654207Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4654427Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4654636Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4654833Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4655032Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4655244Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4655476Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4655673Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4655878Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4656172Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4656384Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4656589Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4656786Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4656992Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4657187Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4657403Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4657606Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4657803Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4658017Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4658308Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4658522Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4658722Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4658922Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4659122Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4659413Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4659649Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4659850Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4660050Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4660249Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4660542Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4660739Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4660940Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4661143Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4661337Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4661549Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4661754Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4661953Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4662152Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4662333Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4662502Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4662628Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4662733Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4662858Z E1204 11:10:34.262000 696578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4662902Z FAILED [1.4937s] [100%] 2025-12-04T11:45:24.4662905Z 2025-12-04T11:45:24.4662963Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.4663113Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.4663161Z Traceback (most recent call last): 2025-12-04T11:45:24.4663355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4663412Z method(*args, **kwargs) 2025-12-04T11:45:24.4663567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4663626Z method(*args, **kwargs) 2025-12-04T11:45:24.4663778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.4663817Z with policy(): 2025-12-04T11:45:24.4663973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.4664014Z raise RuntimeError(msg) 2025-12-04T11:45:24.4664414Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 2051014656. 2025-12-04T11:45:24.4664417Z 2025-12-04T11:45:24.4664497Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.4664765Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.4664781Z 2025-12-04T11:45:24.4664874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.4664955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4665002Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4665062Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4665563Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.4665667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4665707Z graph_break [] 2025-12-04T11:45:24.4665775Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4665852Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4666357Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.4666410Z current_size = base.storage().size() 2025-12-04T11:45:24.4666452Z Autotune Choices Stats: 2025-12-04T11:45:24.4666936Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01348000019788742, "best_triton_pos": 1, "best_triton_time": 0.016078999266028404, "best_triton_kernel": "triton_mm_15", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.4667012Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4667063Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4667187Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4667229Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.4667468Z triton_mm_15 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4667722Z triton_mm_35 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4667954Z triton_mm_13 0.0173 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4668179Z triton_mm_34 0.0174 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4668404Z triton_mm_14 0.0178 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4668635Z triton_mm_31 0.0183 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4668873Z triton_mm_33 0.0188 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4669099Z triton_mm_32 0.0194 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4669322Z triton_mm_16 0.0196 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4669457Z SingleProcess AUTOTUNE benchmarking takes 0.1901 seconds and 1.3779 seconds precompiling for 33 choices 2025-12-04T11:45:24.4669604Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.4669652Z Traceback (most recent call last): 2025-12-04T11:45:24.4669820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4669863Z method(*args, **kwargs) 2025-12-04T11:45:24.4670015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4670056Z method(*args, **kwargs) 2025-12-04T11:45:24.4670207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.4670244Z with policy(): 2025-12-04T11:45:24.4670399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.4670440Z raise RuntimeError(msg) 2025-12-04T11:45:24.4670843Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2051014656 and is now 2938109952. 2025-12-04T11:45:24.4670847Z 2025-12-04T11:45:24.4670921Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.4671188Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.4671203Z 2025-12-04T11:45:24.4671291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.4671366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4671423Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4671482Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4671969Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.4672073Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4672111Z graph_break [] 2025-12-04T11:45:24.4672176Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4672251Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4672738Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.4672798Z current_size = base.storage().size() 2025-12-04T11:45:24.4672839Z Autotune Choices Stats: 2025-12-04T11:45:24.4673343Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01348000019788742, "best_triton_pos": 1, "best_triton_time": 0.016078999266028404, "best_triton_kernel": "triton_mm_15", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.4673414Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4673468Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4673590Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4673635Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.4673882Z triton_mm_15 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4674114Z triton_mm_35 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4674342Z triton_mm_13 0.0173 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4674571Z triton_mm_34 0.0174 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4674800Z triton_mm_14 0.0178 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4675025Z triton_mm_31 0.0183 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4675271Z triton_mm_33 0.0188 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4675511Z triton_mm_32 0.0194 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4675736Z triton_mm_16 0.0196 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4675866Z SingleProcess AUTOTUNE benchmarking takes 0.1901 seconds and 1.3779 seconds precompiling for 33 choices 2025-12-04T11:45:24.4675940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4675984Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4676041Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4676144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4676633Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.4676690Z graph_break [] 2025-12-04T11:45:24.4676753Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4676831Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4676871Z Autotune Choices Stats: 2025-12-04T11:45:24.4677251Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015879999846220016, "best_triton_pos": 0} 2025-12-04T11:45:24.4677322Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4677373Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4677505Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4677743Z triton_mm_73 0.0159 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4677971Z triton_mm_53 0.0160 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4678199Z triton_mm_72 0.0174 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4678428Z triton_mm_51 0.0176 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4678652Z triton_mm_52 0.0176 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4678875Z triton_mm_69 0.0182 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4679132Z triton_mm_71 0.0188 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4679362Z triton_mm_70 0.0193 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4679588Z triton_mm_54 0.0195 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4679811Z triton_mm_68 0.0216 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4679943Z SingleProcess AUTOTUNE benchmarking takes 0.2716 seconds and 0.8154 seconds precompiling for 39 choices 2025-12-04T11:45:24.4679998Z =================================== FAILURES =================================== 2025-12-04T11:45:24.4680158Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.4680206Z Traceback (most recent call last): 2025-12-04T11:45:24.4680366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4680407Z method(*args, **kwargs) 2025-12-04T11:45:24.4680561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.4680602Z method(*args, **kwargs) 2025-12-04T11:45:24.4680756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.4680795Z with policy(): 2025-12-04T11:45:24.4680950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.4680994Z raise RuntimeError(msg) 2025-12-04T11:45:24.4681399Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2938109952 and is now 3904897024. 2025-12-04T11:45:24.4681402Z 2025-12-04T11:45:24.4681479Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.4681742Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.4681745Z 2025-12-04T11:45:24.4681836Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.4681910Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4681955Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4682011Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4682498Z inductor [('triton_bundler_save_kernel', 304), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.4682598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4682647Z graph_break [] 2025-12-04T11:45:24.4682712Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4682795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4683312Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.4683359Z current_size = base.storage().size() 2025-12-04T11:45:24.4683400Z Autotune Choices Stats: 2025-12-04T11:45:24.4683870Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "_scaled_mm", "best_time": 0.01348000019788742, "best_triton_pos": 1, "best_triton_time": 0.016078999266028404, "best_triton_kernel": "triton_mm_15", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.4683941Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4683991Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4684128Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4684170Z _scaled_mm 0.0135 ms 100.0% 2025-12-04T11:45:24.4684404Z triton_mm_15 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4684634Z triton_mm_35 0.0161 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4684862Z triton_mm_13 0.0173 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4685103Z triton_mm_34 0.0174 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4685328Z triton_mm_14 0.0178 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4685555Z triton_mm_31 0.0183 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4685783Z triton_mm_33 0.0188 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4686010Z triton_mm_32 0.0194 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4686234Z triton_mm_16 0.0196 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4686363Z SingleProcess AUTOTUNE benchmarking takes 0.1901 seconds and 1.3779 seconds precompiling for 33 choices 2025-12-04T11:45:24.4686450Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4686492Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4686549Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4686660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4687148Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.4687184Z graph_break [] 2025-12-04T11:45:24.4687249Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4687321Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4687363Z Autotune Choices Stats: 2025-12-04T11:45:24.4687731Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015879999846220016, "best_triton_pos": 0} 2025-12-04T11:45:24.4687811Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4687863Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4687982Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4688218Z triton_mm_73 0.0159 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4688445Z triton_mm_53 0.0160 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4688672Z triton_mm_72 0.0174 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4688910Z triton_mm_51 0.0176 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4689136Z triton_mm_52 0.0176 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4689362Z triton_mm_69 0.0182 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4689590Z triton_mm_71 0.0188 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4689818Z triton_mm_70 0.0193 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4690040Z triton_mm_54 0.0195 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4690287Z triton_mm_68 0.0216 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4690415Z SingleProcess AUTOTUNE benchmarking takes 0.2716 seconds and 0.8154 seconds precompiling for 39 choices 2025-12-04T11:45:24.4690491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.4690533Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.4690592Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.4690691Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.4691174Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.4691213Z graph_break [] 2025-12-04T11:45:24.4691278Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.4691351Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.4691406Z Autotune Choices Stats: 2025-12-04T11:45:24.4691773Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_111", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015720000490546227, "best_triton_pos": 0} 2025-12-04T11:45:24.4691841Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.4691892Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.4692011Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.4692249Z triton_mm_111 0.0157 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4692493Z triton_mm_91 0.0159 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4692721Z triton_mm_90 0.0175 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4692949Z triton_mm_110 0.0175 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4693176Z triton_mm_89 0.0177 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4693462Z triton_mm_109 0.0186 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4693692Z triton_mm_107 0.0188 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4693735Z _scaled_mm 0.0193 ms 81.4% 2025-12-04T11:45:24.4693961Z triton_mm_92 0.0196 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.4694215Z triton_mm_108 0.0198 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.4694347Z SingleProcess AUTOTUNE benchmarking takes 0.2795 seconds and 0.6329 seconds precompiling for 39 choices 2025-12-04T11:45:24.4694540Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7e488c55ed2ff5cb.xml - 2025-12-04T11:45:24.4694601Z =========================== short test summary info ============================ 2025-12-04T11:45:24.4695194Z FAILED [1.4937s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2938109952 and is now 3904897024. 2025-12-04T11:45:24.4695209Z 2025-12-04T11:45:24.4695284Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.4695548Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.4695552Z 2025-12-04T11:45:24.4695639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.4695703Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.4695771Z ================== 1 failed, 187 deselected, 2 rerun in 6.77s ================== 2025-12-04T11:45:24.4695813Z Got exit code 1 2025-12-04T11:45:24.4695852Z Retrying single test... 2025-12-04T11:45:24.4695999Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ce64d83753e4eff6.xml 2025-12-04T11:45:24.4696056Z ============================= test session starts ============================== 2025-12-04T11:45:24.4696171Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.4696226Z cachedir: .pytest_cache 2025-12-04T11:45:24.4696387Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.4696435Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.4696478Z configfile: pytest.ini 2025-12-04T11:45:24.4696644Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.4696724Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.4696978Z stepcurrent: skipping 78 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.4697026Z Running 1 items in this shard 2025-12-04T11:45:24.4697028Z 2025-12-04T11:45:24.4697362Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:43.775065207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4697364Z 2025-12-04T11:45:24.4697685Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4698006Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4698141Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4698631Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4698886Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4699115Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4699334Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4699537Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4699833Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4700069Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4700365Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4700613Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4700909Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4701141Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4701437Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4701672Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4701961Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4702193Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4702505Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4702741Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4703034Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4703232Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4703495Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4703787Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4703999Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4704230Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4704523Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4704758Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4705063Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4705286Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4705493Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4705698Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4705909Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4706080Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4706262Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4706802Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/uz/cuzq7nx4gx44cgfloyneynqstyfmidhbbrumvx6n53zx4hnm7g6n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4706964Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4707183Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4707341Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4707487Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4707780Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4707914Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4708194Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4708334Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4708588Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4708746Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4709016Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4709164Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4709440Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4709635Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4709954Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4710248Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4710382Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4710864Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4711140Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4711370Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4711577Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4711778Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4712072Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4712308Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4712618Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4712855Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4713148Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4713416Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4713723Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4713955Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4714248Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4714481Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4714780Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4715016Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4715306Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4715533Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4715764Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4716058Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4716253Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4716486Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4716779Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4717026Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4717324Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4717544Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4717754Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4717965Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4718178Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4718345Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4718522Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4718627Z E1204 11:10:43.838000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4718784Z [W1204 11:10:43.217296892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4718787Z 2025-12-04T11:45:24.4719096Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4719388Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4719531Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4720018Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4720269Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4720493Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4720698Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4720900Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4721201Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4721435Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4721726Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4721959Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4722260Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4722490Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4722781Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4723013Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4723331Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4723563Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4723851Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4724123Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4724416Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4724614Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4724844Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4725135Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4725330Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4725574Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4725864Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4726095Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4726389Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4726626Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4726832Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4727035Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4727245Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4727413Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4727592Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4728117Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/h3/ch35ivh6nzwg4ldmmaauflzxygnmbhlnwlcqzovcmzixouxjadp5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4728276Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4728502Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4728660Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4728808Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4729097Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4729229Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4729489Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4729629Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4729894Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4730051Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4730318Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4730457Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4730731Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4730938Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4731257Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4731553Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4731685Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4732166Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4732422Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4732668Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4732874Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4733078Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4733395Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4733635Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4733933Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4734182Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4734473Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4734705Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4735000Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4735246Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4735540Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4735770Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4736068Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4736304Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4736594Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4736792Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4737048Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4737341Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4737537Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4737770Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4738061Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4738295Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4738599Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4738818Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4739024Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4739226Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4739439Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4739618Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4739798Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4739902Z E1204 11:10:43.954000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4740058Z [W1204 11:10:43.235124906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4740061Z 2025-12-04T11:45:24.4740216Z [W1204 11:10:43.237280586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4740218Z 2025-12-04T11:45:24.4740529Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4740825Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4740954Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4741459Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4741715Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4741939Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4742145Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4742347Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4742650Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4742887Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4743178Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4743448Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4743737Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4743988Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4744279Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4744514Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4744806Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4745038Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4745336Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4745580Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4745884Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4746085Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4746315Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4746607Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4746804Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4747052Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4747342Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4747575Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4747877Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4748095Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4748315Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4748515Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4748726Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4748896Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4749078Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4749608Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/pg/cpgolbbjtm5k3einq3jw2k3hplh4p7dots34butwhiblxt5muabr.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4749754Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4749996Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4750153Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4750304Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4750590Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4750723Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4750981Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4751122Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4751388Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4751543Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4751814Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4751948Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4752228Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4752434Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4752749Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4753044Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4753176Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4753692Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4753945Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4754185Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4754402Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4754607Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4754901Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4755134Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4755428Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4755671Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4755966Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4756200Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4756492Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4756725Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4757037Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4757272Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4757564Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4757799Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4758092Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4758288Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4758520Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4758831Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4759031Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4759263Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4759558Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4759794Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4760085Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4760316Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4760520Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4760723Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4760933Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4761103Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4761296Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4761397Z E1204 11:10:43.970000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4761711Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4762004Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4762137Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4762612Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4762879Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4763115Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4763351Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4763553Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4763844Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4764083Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4764374Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4764621Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4764914Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4765147Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4765441Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4765687Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4765979Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4766213Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4766507Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4766740Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4767030Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4767242Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4767487Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4767781Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4767981Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4768211Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4768505Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4768737Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4769042Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4769260Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4769469Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4769672Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4769892Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4770060Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4770236Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4770760Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/27/c27cbelxf3hztcd2igdqgeoxsc4hsx5wy6lmieiie7fe2xl7bmeh.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4770907Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4771125Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4771282Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4771426Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4771734Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4771866Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4772125Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4772262Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4772518Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4772675Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4772944Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4773098Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4773416Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4773613Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4773927Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4774235Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4774366Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4774846Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4775103Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4775330Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4775538Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4775740Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4776064Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4776301Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4776593Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4776826Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4777117Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4777348Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4777652Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4777885Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4778180Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4778411Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4778716Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4778946Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4779239Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4779436Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4779671Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4779963Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4780157Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4780414Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4780707Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4780939Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4781230Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4781452Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4781660Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4781871Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4782084Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4782250Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4782431Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4782532Z E1204 11:10:43.972000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4782692Z [W1204 11:10:43.245991126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4782704Z 2025-12-04T11:45:24.4783016Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4783343Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4783476Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4783954Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4784207Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4784431Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4784667Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4784871Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4785163Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4785399Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4785690Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4785923Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4786230Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4786460Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4786753Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4786983Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4787288Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4787524Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4787818Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4788051Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4788344Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4788541Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4788770Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4789089Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4789285Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4789519Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4789813Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4790045Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4790337Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4790568Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4790775Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4790976Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4791189Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4791357Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4794229Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4794772Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/dc/cdceiiqymbavtugfidsansgslilml2hf2hfzmjtbl4ieixlso6ci.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4794928Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4795150Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4795311Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4795459Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4795750Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4795899Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4796173Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4796314Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4796570Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4796726Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4796998Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4797135Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4797428Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4797624Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4797939Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4798237Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4798368Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4798864Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4799120Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4799349Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4799560Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4799760Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4800053Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4800306Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4800599Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4800832Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4801120Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4801354Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4801643Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4801889Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4802177Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4802410Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4802702Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4802945Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4803235Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4803459Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4803694Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4803984Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4804180Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4804412Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4804730Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4804961Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4805254Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4805474Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4805679Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4805879Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4806103Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4806268Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4806448Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4806551Z E1204 11:10:43.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4806709Z [W1204 11:10:43.250456064 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4806712Z 2025-12-04T11:45:24.4807021Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4807330Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4807462Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4807936Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4808192Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4808417Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4808624Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4808854Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4809147Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4809383Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4809672Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4809906Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4810196Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4810441Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4810732Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4810962Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4811257Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4811500Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4811793Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4812023Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4812316Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4812515Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4812746Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4813038Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4813304Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4813538Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4813831Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4814065Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4814357Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4814578Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4814798Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4814997Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4815208Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4815374Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4815553Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4816101Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpnt7682hp/qg/cqgobuc7iba2k2x75stljq77bhgirhnfput6br7r4abeo5sqggud.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.4816248Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.4816467Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.4816625Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.4816772Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.4817058Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.4817192Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.4817449Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.4817612Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.4817868Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.4818024Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.4818295Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.4818430Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.4818710Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.4818913Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.4819230Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4819526Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4819656Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4820145Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4820400Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4820627Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4820834Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4821036Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4821331Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4821563Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4821879Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4822110Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4822403Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4822636Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4822930Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4823166Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4823502Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4823735Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4824024Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4824259Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4824572Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4824767Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4824999Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4825295Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4825493Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4825726Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4826016Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4826261Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4826564Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4826787Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4826993Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4827195Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.4827409Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.4827577Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.4827774Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.4827877Z E1204 11:10:43.983000 702760 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.4827932Z ('RERUN', {'yellow': True}) [3.7053s] [100%] 2025-12-04T11:45:24.4828271Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:45.234663849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4828275Z 2025-12-04T11:45:24.4828425Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4828727Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4829026Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4829157Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4829634Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4829890Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4830114Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4830321Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4830542Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4830836Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4831071Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4831361Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4831596Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4831885Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4832129Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4832423Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4832654Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4832946Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4833176Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4833416Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4833612Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4833822Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4834029Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4834262Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4834561Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4834758Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4835018Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4835310Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4835530Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4835724Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4835943Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4836149Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4836360Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4836555Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4836773Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4836980Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4837174Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4837386Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4837619Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4837909Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4838144Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4838436Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4838656Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4838859Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4839066Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4839288Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4839491Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4839723Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4840013Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4840247Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4840553Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4840787Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4841079Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4841310Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4841605Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4841847Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4842139Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4842370Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4842662Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4842897Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4843186Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4843457Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4843759Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4843995Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4844286Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4844516Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4844809Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4845052Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4845343Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4845560Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4845764Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4845960Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4846267Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4846500Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4846791Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4847023Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4847313Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4847545Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4847835Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4848090Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4848383Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4848612Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4848904Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4849103Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4849310Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4849507Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4849713Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4849912Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4850145Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4850448Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4850643Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4850839Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4851038Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4851232Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4851466Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4851756Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4851987Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4852299Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4852497Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4852705Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4852905Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4853142Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4853471Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4853708Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4853908Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4854109Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4854315Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4854607Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4854856Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4855148Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4855384Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4855679Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4855916Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4856210Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4856456Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4856768Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4856968Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4857166Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4857387Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4857592Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4857791Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4858004Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4858301Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4858533Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4858829Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4859072Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4859366Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4859600Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4859893Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4860129Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4860425Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4860648Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4860874Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4861073Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4861269Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4861479Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4861681Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4861977Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4862216Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4862416Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4862617Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4862820Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4863114Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4863404Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4863697Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4863932Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4864225Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4864461Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4864756Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4864992Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4865320Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4865553Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4865849Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4866082Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4866376Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4866611Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4866917Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4867150Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4867446Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4867682Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4867988Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4868185Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4868384Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4868618Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4868913Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4869145Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4869438Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4869693Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4869989Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4870223Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4870514Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4870749Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4871043Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4871251Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4871484Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4871777Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4872012Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4872320Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4872536Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4872742Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4872941Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4873142Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4873469Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4873685Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4873909Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4874129Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4874332Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4874626Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4874849Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4875051Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4875250Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4875455Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4875604Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4875798Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4876020Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4876228Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4876436Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4876657Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4876863Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4877059Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4877278Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4877487Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4877683Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4877902Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4878129Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4878326Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4878524Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4878736Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4878939Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4879140Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4879340Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4879645Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4879856Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4880059Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4880257Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4880451Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4880655Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4880869Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4881070Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4881272Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4881476Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4881769Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4881980Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4882191Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4882400Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4882601Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4882894Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4883106Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4883339Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4883540Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4883754Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4884047Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4884240Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4884443Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4884633Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4884846Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4885059Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4885263Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4885461Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4885649Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4885831Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4886002Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4886128Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4886247Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4886386Z E1204 11:10:45.974000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4886545Z [W1204 11:10:45.243809422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4886549Z 2025-12-04T11:45:24.4886693Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4886989Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4887283Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4887416Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4887895Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4888161Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4888389Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4888596Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4888816Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4889107Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4889342Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4889637Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4889871Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4890161Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4890393Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4890711Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4890943Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4891235Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4891457Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4891663Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4891861Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4892079Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4892277Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4892507Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4892801Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4892997Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4893239Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4893572Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4893789Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4893985Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4894203Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4894407Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4894601Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4894808Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4895038Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4895246Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4895442Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4895634Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4895867Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4896161Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4896406Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4896697Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4896912Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4897118Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4897313Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4897534Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4897731Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4897964Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4898257Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4898488Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4898780Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4899009Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4899320Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4899553Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4899845Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4900075Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4900365Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4900607Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4900896Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4901126Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4901416Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4901645Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4901945Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4902179Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4902473Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4902702Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4902993Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4903224Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4903548Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4903793Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4903994Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4904190Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4904483Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4904717Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4905010Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4905261Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4905552Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4905786Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4906078Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4906334Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4906624Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4906858Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4907150Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4907348Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4907543Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4907738Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4907969Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4908168Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4908401Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4908689Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4908883Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4909081Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4909294Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4909488Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4909721Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4910015Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4910247Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4910552Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4910747Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4910954Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4911156Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4911391Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4911688Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4911907Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4912119Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4912327Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4912530Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4912823Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4913058Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4913387Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4913635Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4913930Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4914161Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4914456Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4914688Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4914993Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4915189Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4915386Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4915607Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4915808Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4916007Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4916209Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4916529Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4916761Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4917054Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4917285Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4917576Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4917809Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4918115Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4918346Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4918641Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4918862Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4919074Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4919273Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4919464Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4919676Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4919875Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4920169Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4920389Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4920592Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4920819Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4921020Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4921314Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4921544Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4921837Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4922069Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4922373Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4922604Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4922896Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4923131Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4923471Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4923704Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4923995Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4924359Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4924654Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4924887Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4925180Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4925443Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4925738Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4925971Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4926262Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4926462Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4926658Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4926906Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4927196Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4927427Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4927722Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4927964Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4928262Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4928496Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4928792Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4929024Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4929317Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4929512Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4929763Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4930054Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4930287Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4930581Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4930795Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4930998Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4931207Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4931410Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4931702Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4931916Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4932117Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4932325Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4932526Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4932818Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4933042Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4933244Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4933483Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4933674Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4933834Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4934042Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4934262Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4934469Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4934664Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4934886Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4935096Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4935305Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4935528Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4935733Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4935929Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4936151Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4936371Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4936569Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4936764Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4936978Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4937182Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4937385Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4937586Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4937880Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4938119Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4938320Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4938522Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4938712Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4938908Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4939123Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4939326Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4939538Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4939742Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4940037Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4940250Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4940452Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4940663Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4940864Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4941156Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4941373Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4941578Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4941776Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4941979Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4942292Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4942489Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.4942692Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.4942883Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.4943078Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.4943337Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.4943544Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.4943754Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.4943945Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.4944124Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.4944297Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.4944424Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.4944529Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.4944669Z E1204 11:10:45.977000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.4944827Z [W1204 11:10:45.245963153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.4944829Z 2025-12-04T11:45:24.4944977Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.4945269Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.4945567Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.4945698Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.4946176Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.4946460Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.4946686Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.4946894Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.4947094Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4947386Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4947622Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4947928Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4948162Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4948453Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4948690Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4948992Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4949227Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4949517Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4949743Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4949953Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4950150Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4950359Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4950557Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4950813Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4951108Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4951305Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4951539Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4951832Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4952054Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4952259Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4952479Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4952684Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4952884Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4953079Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4953348Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4953555Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4953751Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4953947Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4954178Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4954473Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4954703Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4955023Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4955245Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4955451Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4955650Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4955857Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4956059Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4956290Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4956599Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4956832Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4957123Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4957357Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4957658Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4957891Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4958183Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4958420Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4958713Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4958942Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4959236Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4959488Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4959782Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4960015Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4960304Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4960541Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4960832Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4961076Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4961367Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4961604Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4961898Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4962128Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4962333Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4962529Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.4962825Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4963058Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4963386Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4963619Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4963933Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4964168Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4964457Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4964689Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4964980Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4965213Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4965524Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4965720Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4965918Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4966114Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4966337Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4966535Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4966769Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4967064Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4967259Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4967462Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4967658Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4967853Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4968104Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4968396Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4968630Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4968920Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4969117Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4969323Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4969543Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4969777Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4970069Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4970294Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4970495Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4970705Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4970904Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4971196Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4971429Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4971722Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4971954Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4972248Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4972500Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4972792Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4973024Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4973346Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4973546Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4973743Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4973977Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4974179Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4974379Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4974584Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4974890Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4975124Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4975416Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4975648Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4975943Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4976176Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4976468Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4976713Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4977026Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4977248Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4977449Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4977648Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4977840Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4978049Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.4978260Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4978551Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4978772Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4978975Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4979186Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4979385Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4979675Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4979908Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4980201Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4980434Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4980724Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4980967Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4981268Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4981504Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4981796Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4982027Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4982319Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4982562Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4982853Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4983084Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4983413Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4983658Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4983951Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4984184Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4984475Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4984674Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4984869Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4985101Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4985424Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4985655Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4985948Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4986180Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4986474Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4986707Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4987011Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4987243Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4987533Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4987731Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4987973Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4988265Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4988497Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.4988793Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4989007Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4989208Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4989406Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4989604Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4989917Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4990133Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4990332Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4990532Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4990732Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4991027Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4991259Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.4991461Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4991659Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4991851Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4992001Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.4992208Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4992430Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4992634Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4992832Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4993054Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4993301Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4993498Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4993719Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4993953Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.4994148Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4994372Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.4994576Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.4994774Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.4994971Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4995184Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4995402Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4995600Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4995801Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4996095Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4996323Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4996525Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4996722Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4996915Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.4997111Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.4997329Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4997529Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4997730Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4997930Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4998251Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4998471Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4998671Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4998870Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.4999070Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.4999365Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.4999587Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.4999789Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.4999986Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5000190Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5000494Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5000689Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5000891Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5001079Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5001277Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5001491Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5001695Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5001893Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5002081Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5002298Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5002469Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5002598Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5002701Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5002829Z E1204 11:10:45.979000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5002984Z [W1204 11:10:46.290053184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5002988Z 2025-12-04T11:45:24.5003134Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5003459Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5003770Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5003903Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5004381Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5004652Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5004876Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5005084Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5005286Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5005577Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5005814Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5006105Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5006354Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5006659Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5006894Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5007188Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5007419Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5007712Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5007942Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5008149Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5008344Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5008553Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5008755Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5008996Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5009291Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5009486Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5009718Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5010010Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5010229Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5010426Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5010655Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5010870Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5011068Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5011264Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5011486Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5011694Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5011891Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5012097Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5012331Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5012625Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5012859Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5013159Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5013416Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5013622Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5013819Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5014031Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5014233Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5014465Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5014756Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5015015Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5015306Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5015538Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5015829Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5016064Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5016359Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5016606Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5016898Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5017132Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5017424Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5017676Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5017966Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5018197Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5018491Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5018728Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5019020Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5019251Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5019565Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5019798Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5020092Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5020312Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5020514Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5020710Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5021017Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5021251Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5021543Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5021776Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5022079Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5022308Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5022599Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5022831Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5023123Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5023393Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5023685Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5023909Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5024103Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5024302Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5024509Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5024708Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5024939Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5025244Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5025441Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5025637Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5025834Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5026028Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5026274Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5026569Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5026801Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5027095Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5027291Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5027500Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5027699Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5027935Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5028250Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5028475Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5028678Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5028875Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5029078Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5029372Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5029618Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5029910Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5030147Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5030446Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5030690Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5030984Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5031215Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5031511Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5031709Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5031908Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5032131Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5032343Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5032556Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5032760Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5033055Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5033329Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5033624Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5033883Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5034175Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5034487Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5034783Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5035020Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5035329Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5035553Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5035759Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5035959Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5036155Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5036366Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5036568Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5036886Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5037108Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5037313Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5037514Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5037717Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5038010Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5038256Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5038547Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5038781Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5039076Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5039307Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5039613Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5039848Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5040144Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5040376Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5040672Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5040906Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5041219Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5041453Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5041746Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5041980Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5042277Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5042512Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5042817Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5043013Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5043212Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5043481Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5043788Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5044024Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5044315Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5044552Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5044848Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5045082Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5045374Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5045625Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5045932Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5046130Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5046364Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5046655Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5046893Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5047200Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5047415Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5047618Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5047817Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5048021Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5048323Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5048537Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5048737Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5048939Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5049142Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5049438Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5049661Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5049872Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5050087Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5050281Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5050430Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5050628Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5050848Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5051058Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5051265Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5051490Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5051701Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5051900Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5052120Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5052339Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5052537Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5052756Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5052964Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5053161Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5053402Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5053615Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5053819Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5054034Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5054245Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5054541Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5054752Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5054953Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5055152Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5055346Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5055556Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5055771Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5055974Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5056174Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5056377Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5056683Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5056897Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5057097Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5057299Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5057502Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5057797Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5058013Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5058225Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5058438Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5058640Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5058935Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5059134Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5059336Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5059528Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5059733Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5059948Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5060151Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5060351Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5060542Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5060740Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5060914Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5061040Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5061147Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5061273Z E1204 11:10:46.023000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5061431Z [W1204 11:10:46.292232594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5061434Z 2025-12-04T11:45:24.5061581Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5061875Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5062170Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5062313Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5062806Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5063061Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5063321Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5063531Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5063750Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5064044Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5064280Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5064575Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5064806Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5065113Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5065347Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5065641Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5065877Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5066170Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5066391Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5066598Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5066827Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5067035Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5067237Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5067470Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5067764Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5067962Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5068205Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5068497Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5068716Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5068917Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5069138Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5069356Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5069553Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5069747Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5069971Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5070176Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5070375Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5070570Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5070804Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5071117Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5071350Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5071641Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5071858Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5072065Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5072261Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5072482Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5072681Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5072911Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5073206Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5073479Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5073773Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5074003Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5074299Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5074534Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5074827Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5075059Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5075375Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5075608Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5075903Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5076135Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5076429Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5076660Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5076970Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5077201Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5077494Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5077728Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5078030Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5078263Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5078552Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5078773Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5078977Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5079177Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5079472Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5079723Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5080015Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5080247Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5080539Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5080770Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5081062Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5081308Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5081599Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5081833Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5082125Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5082341Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5082538Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5082734Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5082947Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5083148Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5083411Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5083706Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5083905Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5084128Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5084327Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5084525Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5084756Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5085051Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5085284Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5085595Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5085807Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5086109Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5086410Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5086671Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5086988Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5087212Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5087415Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5087618Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5087930Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5088257Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5088494Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5088809Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5089045Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5089359Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5089594Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5089889Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5090123Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5090429Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5090628Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5090827Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5091052Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5091259Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5091481Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5091684Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5091975Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5092211Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5092505Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5092739Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5093032Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5093327Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5093624Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5093861Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5094156Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5094377Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5094593Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5094795Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5094986Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5095200Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5095401Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5095708Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5095928Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5096132Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5096334Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5096534Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5096829Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5097060Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5097352Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5097608Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5097904Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5098139Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5098433Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5098673Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5098987Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5099220Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5099512Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5099747Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5100052Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5100284Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5100577Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5100811Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5101107Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5101341Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5101633Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5101843Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5102050Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5102286Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5102578Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5102812Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5103109Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5103411Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5106389Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5106630Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5106933Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5107166Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5107488Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5107689Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5107924Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5108220Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5108456Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5108753Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5108968Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5109203Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5109407Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5109609Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5109904Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5110117Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5110320Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5110534Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5110738Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5111032Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5111254Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5111457Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5111666Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5111860Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5112009Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5112210Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5112430Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5112641Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5112839Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5113061Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5113333Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5113529Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5113753Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5113958Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5114156Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5114379Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5114585Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5114801Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5114997Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5115212Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5115417Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5115618Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5115831Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5116126Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5116343Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5116545Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5116747Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5116939Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5117138Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5117364Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5117589Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5117790Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5117989Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5118284Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5118498Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5118701Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5118912Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5119113Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5119411Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5119626Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5119831Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5120039Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5120240Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5120532Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5120732Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5120935Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5121126Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5121324Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5121547Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5121767Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5121966Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5122159Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5122341Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5122514Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5122643Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5122749Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5122891Z E1204 11:10:46.025000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5123050Z [W1204 11:10:46.294365204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5123053Z 2025-12-04T11:45:24.5123201Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5123543Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5123846Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5123979Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5124488Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5124748Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5124975Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5125185Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5125384Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5125678Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5125940Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5126235Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5126469Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5126764Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5127003Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5127292Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5127540Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5127832Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5128053Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5128262Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5128470Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5128680Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5128879Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5129116Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5129410Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5129609Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5129842Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5130131Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5130375Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5130572Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5130791Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5130995Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5131194Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5131391Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5131622Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5131827Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5132022Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5132218Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5132448Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5132753Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5132985Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5133318Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5133540Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5133750Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5133948Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5134154Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5134375Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5134620Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5134914Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5135147Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5135437Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5135670Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5135985Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5136222Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5136517Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5136748Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5137056Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5137286Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5137680Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5137914Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5138207Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5138439Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5138732Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5138989Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5139279Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5139513Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5139804Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5140036Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5140329Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5140558Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5140761Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5140961Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5141310Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5141555Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5141849Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5142084Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5142381Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5142614Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5142905Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5143138Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5143495Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5143727Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5144022Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5144218Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5144414Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5144611Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5144835Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5145036Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5145266Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5145558Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5145756Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5145966Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5146160Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5146356Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5146590Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5146881Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5147115Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5147405Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5147602Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5147835Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5148043Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5148279Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5148572Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5148797Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5148998Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5149210Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5149410Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5149705Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5149940Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5150251Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5150490Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5150782Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5151018Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5151310Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5151544Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5151836Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5152044Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5152261Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5152484Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5152690Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5152890Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5153095Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5153442Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5153690Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5153984Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5154217Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5154514Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5154760Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5155054Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5155289Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5155585Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5155809Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5156010Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5156211Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5156428Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5156640Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5156843Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5157134Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5157356Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5157562Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5157763Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5157976Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5158270Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5158505Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5158797Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5159042Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5159334Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5159568Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5159863Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5160099Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5160393Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5160623Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5160940Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5161173Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5161467Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5161701Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5161994Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5162243Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5162535Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5162769Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5163062Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5163307Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5163521Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5163753Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5164048Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5164281Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5164580Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5164818Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5165114Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5165373Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5165666Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5165905Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5166199Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5166403Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5166637Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5166955Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5167191Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5167484Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5167700Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5167917Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5168117Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5168321Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5168617Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5168835Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5169035Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5169237Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5169441Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5169757Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5169980Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5170183Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5170384Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5170576Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5170727Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5170934Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5171155Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5171360Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5171563Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5171788Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5172004Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5172202Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5172422Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5172630Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5172825Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5173049Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5173291Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5173490Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5173702Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5173930Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5174136Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5174333Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5174535Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5174829Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5175043Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5175260Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5175457Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5175650Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5175847Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5176061Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5176279Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5176478Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5176676Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5176971Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5177185Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5177385Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5177585Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5177795Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5178101Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5178315Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5178521Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5178720Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5178921Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5179217Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5179424Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5179627Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5179815Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5180013Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5180228Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5180443Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5180646Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5180837Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5181020Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5181189Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5181322Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5181426Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5181555Z E1204 11:10:46.027000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5181607Z ('RERUN', {'yellow': True}) [1.4964s] [100%] 2025-12-04T11:45:24.5181947Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:10:47.540703035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5181976Z 2025-12-04T11:45:24.5182129Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5182425Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5182723Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5182853Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5183375Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5183645Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5183871Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5184078Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5184278Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5184588Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5184823Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5185117Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5185356Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5185649Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5185883Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5186171Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5186431Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5186721Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5186942Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5187148Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5187344Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5187554Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5187771Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5188004Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5188296Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5188495Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5188728Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5189033Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5189254Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5189449Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5189671Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5189876Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5190078Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5190274Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5190491Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5190717Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5190915Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5191111Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5191342Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5191634Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5191866Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5192170Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5192391Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5192595Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5192795Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5193001Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5193212Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5193477Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5193771Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5194006Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5194299Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5194532Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5194826Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5195088Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5195384Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5195617Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5195911Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5196142Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5196449Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5196680Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5196973Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5197209Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5197513Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5197747Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5198036Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5198271Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5198561Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5198795Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5199087Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5199322Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5199536Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5199735Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5200028Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5200259Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5200554Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5200799Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5201089Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5201320Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5201612Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5201846Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5202150Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5202381Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5202673Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5202869Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5203066Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5203287Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5203494Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5203717Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5203948Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5204244Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5204438Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5204632Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5204826Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5205019Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5205262Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5205552Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5205782Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5206071Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5209487Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5209697Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5209899Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5210133Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5210426Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5210647Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5210846Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5211058Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5211266Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5211561Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5211792Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5212092Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5212328Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5212631Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5212863Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5213154Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5213425Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5213733Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5213931Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5214128Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5214350Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5214554Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5214752Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5214953Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5215244Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5215501Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5215792Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5216025Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5216316Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5216548Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5216843Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5217089Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5217380Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5217601Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5217802Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5218020Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5218212Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5218421Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5218619Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5218916Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5219140Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5219339Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5219536Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5219757Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5220049Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5220282Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5220574Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5220806Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5221097Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5221342Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5221632Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5221864Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5222156Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5222397Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5222689Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5222920Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5223212Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5223478Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5223774Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5224006Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5224322Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5224555Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5224845Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5225042Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5225238Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5225471Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5225776Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5226007Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5226302Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5226534Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5226840Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5227072Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5227364Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5227597Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5227889Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5228084Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5228314Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5228629Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5228865Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5229158Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5229373Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5229575Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5229775Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5229986Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5230279Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5230492Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5230697Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5230898Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5231110Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5231404Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5231623Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5231828Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5232028Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5232219Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5232369Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5232564Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5232806Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5233014Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5233214Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5233465Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5233671Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5233872Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5234090Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5234322Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5234515Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5234737Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5234943Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5235142Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5235357Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5235570Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5235772Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5235972Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5236175Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5236467Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5236681Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5236897Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5237108Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5237303Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5237500Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5237720Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5237923Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5238122Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5238331Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5238626Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5238840Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5239043Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5239242Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5239452Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5239746Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5239960Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5240166Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5240364Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5240565Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5240859Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5241065Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5241278Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5241469Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5241666Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5241878Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5242088Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5242288Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5242488Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5242670Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5242839Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5242970Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5243071Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5243200Z E1204 11:10:47.274000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5243391Z [W1204 11:10:47.543013704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5243406Z 2025-12-04T11:45:24.5243553Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5243847Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5244144Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5244276Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5244758Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5245014Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5245263Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5245471Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5245674Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5245963Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5246197Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5246490Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5246738Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5247029Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5247262Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5247556Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5247796Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5248087Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5248306Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5248514Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5248710Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5248917Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5249121Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5249351Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5249667Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5249865Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5250097Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5250386Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5250608Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5250804Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5251032Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5251238Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5251435Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5251633Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5251849Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5252067Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5252264Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5252457Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5252692Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5252982Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5253216Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5253540Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5253774Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5253990Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5254186Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5254392Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5254589Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5254822Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5255112Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5255361Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5255652Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5255887Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5256182Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5256429Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5256719Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5256949Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5257240Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5257474Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5257761Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5257994Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5258309Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5258545Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5258836Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5259068Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5259362Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5259604Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5259894Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5260124Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5260417Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5260639Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5260854Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5261051Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5261341Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5261575Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5261867Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5262099Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5262390Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5262640Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5262934Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5263166Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5263495Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5263727Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5264020Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5264238Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5264433Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5264630Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5264838Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5265050Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5265280Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5265574Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5265773Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5265967Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5266163Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5266356Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5266587Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5266908Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5267141Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5267435Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5267630Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5267840Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5268043Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5268291Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5268583Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5268805Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5269008Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5269205Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5269416Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5269708Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5269940Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5270235Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5270470Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5270761Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5270995Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5271308Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5271541Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5271833Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5272029Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5272227Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5272451Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5272662Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5272862Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5273062Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5273406Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5273653Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5273946Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5274179Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5274471Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5274706Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5274999Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5275232Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5275556Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5275776Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5275980Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5276177Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5276368Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5276578Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5276778Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5277085Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5277307Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5277509Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5277707Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5277921Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5278212Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5278445Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5278738Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5278972Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5279271Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5279503Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5279819Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5280051Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5280347Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5280579Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5280869Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5281105Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5281412Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5281645Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5281940Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5282174Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5282485Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5282716Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5283010Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5283208Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5283445Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5283681Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5283973Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5284234Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5284527Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5284761Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5285051Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5285283Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5285574Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5285821Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5286112Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5286306Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5286543Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5286851Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5287084Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5287376Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5287591Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5287793Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5287991Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5288191Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5288482Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5288717Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5288922Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5289122Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5289322Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5289614Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5289834Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5290046Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5290243Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5290433Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5290582Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5290779Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5291009Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5291217Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5291414Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5291635Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5291840Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5292037Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5292256Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5292461Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5292678Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5292898Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5293104Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5293336Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5293534Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5293749Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5294079Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5294295Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5294494Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5294787Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5295000Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5295202Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5295412Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5295605Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5295805Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5296022Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5296226Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5296424Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5296625Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5296917Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5297155Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5297357Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5297556Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5297757Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5298054Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5298269Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5298488Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5298685Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5298886Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5299179Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5299377Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5299588Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5299778Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5299972Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5300189Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5300397Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5300595Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5300783Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5300964Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5301158Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5301286Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5301393Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5301520Z E1204 11:10:47.276000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5301676Z [W1204 11:10:47.545141334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5301679Z 2025-12-04T11:45:24.5301822Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5302117Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5302412Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5302555Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5303038Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5303323Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5303564Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5303769Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5303968Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5304260Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5304497Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5304790Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5305024Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5305314Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5305572Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5305863Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5306095Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5306384Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5306605Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5306823Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5307020Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5307230Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5307431Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5307663Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5307963Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5308159Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5308388Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5308680Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5308900Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5309095Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5309313Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5309530Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5309738Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5309934Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5310153Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5310357Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5310552Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5310746Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5310988Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5311278Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5311508Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5311803Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5312022Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5312236Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5312430Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5312637Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5312838Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5313068Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5313399Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5313628Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5313949Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5314180Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5314474Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5314705Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5314996Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5315226Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5315532Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5315763Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5316053Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5316284Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5316593Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5316823Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5317113Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5317344Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5317636Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5317866Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5318154Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5318404Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5318694Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5318915Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5319115Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5319311Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5319604Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5319843Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5320133Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5320364Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5320656Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5320899Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5321188Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5321422Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5321712Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5321945Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5322233Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5322429Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5322643Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5322837Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5323045Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5323243Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5323514Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5323809Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5324016Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5324209Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5324403Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5324597Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5324829Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5325134Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5325364Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5325656Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5325855Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5326066Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5326268Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5326504Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5326797Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5327042Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5327247Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5327445Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5327648Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5327943Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5328178Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5328491Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5328723Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5329015Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5329248Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5329552Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5329786Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5330076Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5330278Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5330475Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5330700Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5330900Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5331117Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5331329Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5331623Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5331857Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5332148Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5332383Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5332688Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5332920Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5333217Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5333480Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5333787Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5334013Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5334217Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5334417Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5334610Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5334844Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5335046Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5335343Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5335600Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5335801Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5336003Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5336203Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5336496Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5336729Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5337040Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5337272Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5337566Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5337804Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5338106Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5338340Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5338632Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5338867Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5339159Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5339393Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5339686Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5339941Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5340339Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5340575Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5340869Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5341105Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5341396Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5341610Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5341807Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5342041Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5342333Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5342585Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5342882Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5343114Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5343450Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5343682Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5343980Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5344212Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5344535Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5344733Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5344969Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5345264Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5345495Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5345792Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5346020Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5346224Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5346423Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5346624Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5346932Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5347166Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5347371Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5347568Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5347771Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5348065Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5348285Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5348486Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5348694Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5348896Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5349044Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5349242Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5349461Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5349675Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5349873Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5350102Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5350309Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5350503Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5350722Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5350928Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5351134Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5351355Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5351558Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5351758Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5351956Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5352171Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5352371Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5352568Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5352779Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5353084Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5353439Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5353639Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5353838Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5354031Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5354234Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5354466Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5354668Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5354868Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5355069Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5355367Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5355591Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5355794Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5355990Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5356194Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5356490Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5356705Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5356908Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5357117Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5357330Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5357624Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5357821Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5358020Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5358212Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5358409Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5358633Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5358843Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5359040Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5359232Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5359409Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5359593Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5359721Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5359823Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5359951Z E1204 11:10:47.278000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5360108Z [W1204 11:10:47.589587851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5360110Z 2025-12-04T11:45:24.5360258Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5360553Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5360851Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5360982Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5361484Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5361741Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5361966Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5362172Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5362373Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5362665Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5362908Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5363200Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5363476Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5363766Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5364017Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5364307Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5364541Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5364832Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5365053Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5365258Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5365452Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5365689Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5365889Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5366123Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5366416Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5366611Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5366844Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5367149Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5367370Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5367564Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5367785Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5367991Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5368201Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5368397Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5368614Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5368822Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5369016Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5369214Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5369444Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5369737Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5369996Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5370289Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5370510Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5370714Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5370912Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5371118Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5371331Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5371568Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5371858Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5372092Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5372392Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5372630Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5372924Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5373158Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5373497Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5373727Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5374019Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5374285Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5374577Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5374810Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5375105Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5375337Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5375627Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5375877Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5376166Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5376399Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5376691Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5376936Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5377226Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5377447Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5377653Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5377848Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5378142Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5378374Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5378684Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5378918Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5379208Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5379441Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5379739Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5379972Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5380275Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5380506Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5380795Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5380992Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5381198Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5381392Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5381600Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5381799Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5382033Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5382328Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5382523Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5382719Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5382933Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5383130Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5383393Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5383684Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5383916Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5384207Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5384419Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5384628Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5384832Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5385067Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5385361Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5385603Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5385805Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5386005Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5386206Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5386501Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5386739Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5387033Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5387291Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5387583Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5387821Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5388112Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5388345Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5388639Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5388847Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5389045Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5389267Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5389473Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5389671Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5389883Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5390178Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5390409Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5390702Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5390935Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5391229Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5391462Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5391779Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5392016Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5392306Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5392528Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5392730Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5392928Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5393129Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5393384Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5393588Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5393885Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5394122Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5394323Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5394523Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5394722Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5395016Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5395251Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5395543Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5395780Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5396099Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5396338Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5396629Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5396863Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5397158Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5397403Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5397697Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5397929Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5398225Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5398459Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5398767Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5399001Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5399294Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5399529Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5399821Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5400020Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5400228Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5400469Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5400765Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5401001Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5401298Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5401535Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5401850Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5402084Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5402376Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5402611Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5402912Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5403111Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5403377Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5403673Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5403906Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5404198Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5404413Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5404614Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5404843Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5405046Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5405338Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5405554Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5405759Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5405962Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5406175Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5406469Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5406692Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5406895Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5407094Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5407298Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5407448Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5407644Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5407867Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5408073Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5408273Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5408495Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5408702Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5408922Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5409142Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5409350Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5409544Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5409765Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5409971Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5410172Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5410382Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5410594Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5410798Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5410996Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5411197Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5411499Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5411713Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5411915Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5412114Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5412310Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5412508Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5412722Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5412922Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5413142Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5413456Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5413751Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5413966Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5414169Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5414369Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5414586Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5414887Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5415098Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5415303Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5415502Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5415714Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5416008Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5416204Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5416406Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5416594Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5416791Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5419063Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5419278Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5419524Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5419718Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5419902Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5420073Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5420202Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5420307Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5420437Z E1204 11:10:47.323000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5420594Z [W1204 11:10:47.591922258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5420611Z 2025-12-04T11:45:24.5420758Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5421055Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5421353Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5421489Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5421983Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5422240Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5422469Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5422675Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5422878Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5423169Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5423437Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5423756Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5423996Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5424289Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5424520Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5424812Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5425054Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5425346Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5425563Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5425770Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5425967Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5426190Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5426391Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5426620Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5426912Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5427106Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5427339Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5427630Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5427859Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5428065Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5428284Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5428494Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5428689Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5428884Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5429102Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5429319Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5429514Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5429707Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5429939Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5430230Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5430475Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5430769Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5430991Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5431198Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5431394Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5431601Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5431797Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5432040Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5432350Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5432585Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5432878Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5433114Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5433444Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5433688Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5433979Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5434212Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5434503Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5434749Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5435038Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5435267Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5435562Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5435796Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5436087Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5436316Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5436640Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5436871Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5437166Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5437396Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5437687Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5437911Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5438127Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5438323Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5438614Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5438847Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5439150Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5439382Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5439674Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5439906Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5440200Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5440433Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5440725Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5440977Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5441266Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5441465Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5441660Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5441856Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5442065Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5442264Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5442507Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5442801Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5442998Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5443191Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5443430Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5443623Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5443855Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5444146Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5444378Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5444671Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5444866Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5445074Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5445302Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5445540Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5445833Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5446055Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5446258Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5446456Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5446672Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5446964Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5447201Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5447495Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5447740Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5448034Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5448265Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5448561Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5448794Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5449088Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5449288Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5449496Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5449730Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5449933Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5450132Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5450332Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5450627Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5450859Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5451164Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5451397Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5451691Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5451925Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5452236Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5452471Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5452764Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5452986Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5453191Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5453419Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5453611Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5453836Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5454052Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5454348Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5454570Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5454773Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5454970Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5455171Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5455484Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5455717Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5456009Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5456243Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5456555Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5456788Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5457080Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5457314Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5457609Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5457842Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5458133Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5458392Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5458686Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5458923Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5459219Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5459454Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5459759Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5459991Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5460283Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5460481Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5460680Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5460921Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5461216Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5461452Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5461744Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5461978Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5462268Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5462503Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5462817Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5463051Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5463367Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5463564Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5463802Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5464094Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5464343Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5464635Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5464851Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5465054Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5465268Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5465471Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5465762Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5465979Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5466184Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5466385Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5466589Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5466883Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5467132Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5467336Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5467536Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5467730Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5467878Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5468077Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5468317Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5468527Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5468722Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5468944Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5469149Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5469357Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5469579Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5469785Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5469983Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5470203Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5470411Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5470611Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5470809Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5471033Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5471245Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5471445Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5471645Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5471941Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5472153Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5472355Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5472572Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5472765Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5472963Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5473179Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5473421Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5473632Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5473833Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5474125Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5474342Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5474546Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5474747Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5474950Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5475247Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5475486Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5475688Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5475886Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5476086Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5476380Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5476575Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5476791Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5476982Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5477175Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5477395Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5477603Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5477814Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5478005Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5478184Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5478357Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5478484Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5478590Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5478716Z E1204 11:10:47.325000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5478874Z [W1204 11:10:47.594079319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5478877Z 2025-12-04T11:45:24.5479021Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5479336Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5479636Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5479769Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5480252Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5480505Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5480746Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5480953Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5481155Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5481446Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5481682Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5481986Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5482220Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5482513Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5482746Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5483039Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5483314Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5483602Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5483861Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5484067Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5484267Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5484478Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5484678Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5484912Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5485216Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5485411Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5485641Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5485933Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5486151Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5486360Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5486581Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5486787Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5486986Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5487180Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5487401Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5487604Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5487799Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5488015Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5488246Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5488542Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5488771Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5489070Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5489285Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5489502Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5489697Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5489902Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5490102Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5490332Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5490638Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5490869Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5491162Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5491395Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5491687Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5491918Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5492207Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5492460Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5492751Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5492983Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5493309Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5493540Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5493848Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5494077Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5494369Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5494602Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5494905Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5495137Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5495426Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5495658Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5495949Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5496176Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5496380Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5496590Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5496894Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5497127Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5497416Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5497644Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5497937Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5498181Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5498473Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5498705Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5498997Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5499241Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5499529Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5499726Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5499920Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5500114Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5500323Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5500522Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5500755Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5501074Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5501272Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5501469Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5501663Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5501857Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5502089Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5502380Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5502621Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5502912Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5503110Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5503343Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5503560Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5503792Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5504085Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5504306Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5504509Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5504708Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5504907Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5505199Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5505458Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5505753Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5505984Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5506276Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5506510Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5506813Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5507045Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5507337Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5507535Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5507733Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5507966Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5508168Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5508364Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5508567Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5508859Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5509091Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5509381Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5509637Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5509932Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5510169Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5510462Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5510694Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5510986Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5511218Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5511419Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5511618Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5511811Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5512024Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5512234Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5512532Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5512753Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5512957Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5513160Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5513395Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5513690Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5513949Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5514241Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5514474Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5514773Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5515008Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5515300Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5517239Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5517530Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5517763Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5518054Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5518302Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5518605Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5518840Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5519134Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5519366Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5519659Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5519889Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5520196Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5520395Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5520592Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5520824Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5521116Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5521350Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5521656Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5521943Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5522233Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5522466Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5522770Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5523002Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5523336Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5523533Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5523771Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5524064Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5524295Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5524601Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5524816Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5525019Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5525216Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5525417Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5525709Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5525938Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5526162Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5526360Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5526561Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5526853Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5527087Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5527289Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5527488Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5527681Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5527828Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5528026Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5528249Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5528457Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5528663Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5528884Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5529091Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5529285Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5529506Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5529711Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5529905Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5530137Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5530356Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5530552Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5530749Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5530962Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5531174Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5531372Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5531570Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5531865Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5532078Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5532282Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5532479Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5532670Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5532880Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5533093Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5533320Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5533516Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5533717Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5534010Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5534246Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5534461Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5534658Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5534859Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5535156Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5535384Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5535586Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5535785Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5535987Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5536280Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5536478Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5536679Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5536868Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5537075Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5537293Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5537502Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5537699Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5537889Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5538069Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5538252Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5538389Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5538493Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5538618Z E1204 11:10:47.327000 702760 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5538660Z FAILED [1.5030s] [100%] 2025-12-04T11:45:24.5538663Z 2025-12-04T11:45:24.5538719Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5538870Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5538918Z Traceback (most recent call last): 2025-12-04T11:45:24.5539081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5539124Z method(*args, **kwargs) 2025-12-04T11:45:24.5539290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5539330Z method(*args, **kwargs) 2025-12-04T11:45:24.5539482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5539519Z with policy(): 2025-12-04T11:45:24.5539673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5539714Z raise RuntimeError(msg) 2025-12-04T11:45:24.5540112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1973420032. 2025-12-04T11:45:24.5540115Z 2025-12-04T11:45:24.5540195Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5540460Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5540463Z 2025-12-04T11:45:24.5540553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5540631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5540677Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5540735Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5541301Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5541406Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5541443Z graph_break [] 2025-12-04T11:45:24.5541510Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5541585Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5542080Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5542140Z current_size = base.storage().size() 2025-12-04T11:45:24.5542195Z Autotune Choices Stats: 2025-12-04T11:45:24.5542572Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015720000490546227, "best_triton_pos": 0} 2025-12-04T11:45:24.5542644Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5542695Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5542819Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5543061Z triton_mm_35 0.0157 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5543351Z triton_mm_15 0.0160 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5543581Z triton_mm_34 0.0173 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5543808Z triton_mm_13 0.0175 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5544035Z triton_mm_14 0.0178 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5544260Z triton_mm_31 0.0182 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5544487Z triton_mm_33 0.0184 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5544722Z triton_mm_32 0.0196 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5544946Z triton_mm_16 0.0197 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5545172Z triton_mm_30 0.0215 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5545304Z SingleProcess AUTOTUNE benchmarking takes 0.1888 seconds and 1.3472 seconds precompiling for 33 choices 2025-12-04T11:45:24.5545451Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5545497Z Traceback (most recent call last): 2025-12-04T11:45:24.5545656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5545697Z method(*args, **kwargs) 2025-12-04T11:45:24.5545850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5545908Z method(*args, **kwargs) 2025-12-04T11:45:24.5546078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5546115Z with policy(): 2025-12-04T11:45:24.5546269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5546309Z raise RuntimeError(msg) 2025-12-04T11:45:24.5546709Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1973420032 and is now 3017801728. 2025-12-04T11:45:24.5546711Z 2025-12-04T11:45:24.5546787Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5547061Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5547064Z 2025-12-04T11:45:24.5547153Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5547227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5547270Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5547327Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5547879Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5547980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5548018Z graph_break [] 2025-12-04T11:45:24.5548083Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5548158Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5548647Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5548712Z current_size = base.storage().size() 2025-12-04T11:45:24.5548755Z Autotune Choices Stats: 2025-12-04T11:45:24.5549126Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015720000490546227, "best_triton_pos": 0} 2025-12-04T11:45:24.5549198Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5549248Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5549369Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5549606Z triton_mm_35 0.0157 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5549835Z triton_mm_15 0.0160 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5550085Z triton_mm_34 0.0173 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5550311Z triton_mm_13 0.0175 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5550540Z triton_mm_14 0.0178 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5550764Z triton_mm_31 0.0182 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5551002Z triton_mm_33 0.0184 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5551226Z triton_mm_32 0.0196 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5551453Z triton_mm_16 0.0197 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5551676Z triton_mm_30 0.0215 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5551809Z SingleProcess AUTOTUNE benchmarking takes 0.1888 seconds and 1.3472 seconds precompiling for 33 choices 2025-12-04T11:45:24.5551884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5551926Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5551984Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5552084Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5552550Z inductor [('triton_bundler_save_kernel', 304), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('async_compile_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5552589Z graph_break [] 2025-12-04T11:45:24.5552657Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5552732Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5552774Z Autotune Choices Stats: 2025-12-04T11:45:24.5553243Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "_scaled_mm", "best_time": 0.013240000233054161, "best_triton_pos": 1, "best_triton_time": 0.01592000015079975, "best_triton_kernel": "triton_mm_73", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.5553359Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5553409Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5553547Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5553591Z _scaled_mm 0.0132 ms 100.0% 2025-12-04T11:45:24.5553840Z triton_mm_73 0.0159 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5554071Z triton_mm_53 0.0162 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5554296Z triton_mm_72 0.0174 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5554539Z triton_mm_51 0.0176 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5554768Z triton_mm_52 0.0177 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5554993Z triton_mm_69 0.0186 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5555218Z triton_mm_71 0.0188 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5555446Z triton_mm_70 0.0194 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5555670Z triton_mm_54 0.0197 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5555799Z SingleProcess AUTOTUNE benchmarking takes 0.2701 seconds and 0.7819 seconds precompiling for 39 choices 2025-12-04T11:45:24.5555855Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5556000Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5556062Z Traceback (most recent call last): 2025-12-04T11:45:24.5556220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5556265Z method(*args, **kwargs) 2025-12-04T11:45:24.5556418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5556461Z method(*args, **kwargs) 2025-12-04T11:45:24.5556612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5556652Z with policy(): 2025-12-04T11:45:24.5556805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5556847Z raise RuntimeError(msg) 2025-12-04T11:45:24.5557240Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 3017801728 and is now 3904897024. 2025-12-04T11:45:24.5557256Z 2025-12-04T11:45:24.5557330Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5557605Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5557607Z 2025-12-04T11:45:24.5557695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5557771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5557812Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5557872Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5558433Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5558536Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5558574Z graph_break [] 2025-12-04T11:45:24.5558640Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5558713Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5559208Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5559257Z current_size = base.storage().size() 2025-12-04T11:45:24.5559298Z Autotune Choices Stats: 2025-12-04T11:45:24.5559669Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015720000490546227, "best_triton_pos": 0} 2025-12-04T11:45:24.5559738Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5559789Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5559908Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5560158Z triton_mm_35 0.0157 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5560389Z triton_mm_15 0.0160 ms 98.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5560620Z triton_mm_34 0.0173 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5560849Z triton_mm_13 0.0175 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5561077Z triton_mm_14 0.0178 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5561315Z triton_mm_31 0.0182 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5561554Z triton_mm_33 0.0184 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5561780Z triton_mm_32 0.0196 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5562004Z triton_mm_16 0.0197 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5562246Z triton_mm_30 0.0215 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5562379Z SingleProcess AUTOTUNE benchmarking takes 0.1888 seconds and 1.3472 seconds precompiling for 33 choices 2025-12-04T11:45:24.5562453Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5562497Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5562556Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5562657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5563118Z inductor [('triton_bundler_save_kernel', 304), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('async_compile_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5563159Z graph_break [] 2025-12-04T11:45:24.5563224Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5563342Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5563382Z Autotune Choices Stats: 2025-12-04T11:45:24.5563864Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "_scaled_mm", "best_time": 0.013240000233054161, "best_triton_pos": 1, "best_triton_time": 0.01592000015079975, "best_triton_kernel": "triton_mm_73", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4"} 2025-12-04T11:45:24.5563935Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5563985Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5564108Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5564151Z _scaled_mm 0.0132 ms 100.0% 2025-12-04T11:45:24.5564385Z triton_mm_73 0.0159 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5564612Z triton_mm_53 0.0162 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5564841Z triton_mm_72 0.0174 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5565081Z triton_mm_51 0.0176 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5565323Z triton_mm_52 0.0177 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5565547Z triton_mm_69 0.0186 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5565774Z triton_mm_71 0.0188 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5566013Z triton_mm_70 0.0194 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5566238Z triton_mm_54 0.0197 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5566368Z SingleProcess AUTOTUNE benchmarking takes 0.2701 seconds and 0.7819 seconds precompiling for 39 choices 2025-12-04T11:45:24.5566441Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5566487Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5566543Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5566645Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5567136Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5567176Z graph_break [] 2025-12-04T11:45:24.5567243Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:24.5567316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5567358Z Autotune Choices Stats: 2025-12-04T11:45:24.5567738Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_111", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.015960000455379486, "best_triton_pos": 0} 2025-12-04T11:45:24.5567810Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5567860Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5567980Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5568215Z triton_mm_111 0.0160 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5568445Z triton_mm_91 0.0160 ms 99.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5568683Z triton_mm_90 0.0172 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5568924Z triton_mm_110 0.0174 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5569154Z triton_mm_89 0.0176 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5569382Z triton_mm_107 0.0183 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5569621Z triton_mm_109 0.0187 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5569844Z triton_mm_92 0.0194 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5570070Z triton_mm_108 0.0195 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5570296Z triton_mm_105 0.0215 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5570426Z SingleProcess AUTOTUNE benchmarking takes 0.2735 seconds and 0.6310 seconds precompiling for 39 choices 2025-12-04T11:45:24.5570621Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ce64d83753e4eff6.xml - 2025-12-04T11:45:24.5570682Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5571294Z FAILED [1.5030s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 3017801728 and is now 3904897024. 2025-12-04T11:45:24.5571297Z 2025-12-04T11:45:24.5571372Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5571637Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5571640Z 2025-12-04T11:45:24.5571730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5571793Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5571865Z ================== 1 failed, 187 deselected, 2 rerun in 6.72s ================== 2025-12-04T11:45:24.5571904Z Got exit code 1 2025-12-04T11:45:24.5572114Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5572243Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.5572404Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4fdffd7a4c5b5d73.xml 2025-12-04T11:45:24.5572464Z ============================= test session starts ============================== 2025-12-04T11:45:24.5572591Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5572632Z cachedir: .pytest_cache 2025-12-04T11:45:24.5572794Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5572842Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5572885Z configfile: pytest.ini 2025-12-04T11:45:24.5573047Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5573128Z collecting ... collected 188 items / 79 deselected / 109 selected 2025-12-04T11:45:24.5573183Z stepcurrent: skipping 79 already run items. 2025-12-04T11:45:24.5573231Z Running 109 items in this shard 2025-12-04T11:45:24.5573233Z 2025-12-04T11:45:24.5573491Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7043s] [ 0%] 2025-12-04T11:45:24.5573708Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3846s] [ 0%] 2025-12-04T11:45:24.5573895Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3598s] [ 0%] 2025-12-04T11:45:24.5573897Z 2025-12-04T11:45:24.5573950Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5574092Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5574141Z Traceback (most recent call last): 2025-12-04T11:45:24.5574305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5574347Z method(*args, **kwargs) 2025-12-04T11:45:24.5574507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5574547Z method(*args, **kwargs) 2025-12-04T11:45:24.5574701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5574736Z with policy(): 2025-12-04T11:45:24.5574891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5574933Z raise RuntimeError(msg) 2025-12-04T11:45:24.5575333Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.5575337Z 2025-12-04T11:45:24.5575413Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5575670Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5575672Z 2025-12-04T11:45:24.5575759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5575835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5575877Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5575936Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5576004Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5576116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5576152Z graph_break [] 2025-12-04T11:45:24.5576216Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5576374Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5576420Z Traceback (most recent call last): 2025-12-04T11:45:24.5576573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5576614Z method(*args, **kwargs) 2025-12-04T11:45:24.5576764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5576804Z method(*args, **kwargs) 2025-12-04T11:45:24.5576953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5576992Z with policy(): 2025-12-04T11:45:24.5577144Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5577196Z raise RuntimeError(msg) 2025-12-04T11:45:24.5577581Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.5577584Z 2025-12-04T11:45:24.5577657Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5577912Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5577914Z 2025-12-04T11:45:24.5577999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5578076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5578120Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5578178Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5578243Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5578343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5578378Z graph_break [] 2025-12-04T11:45:24.5578440Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5578513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5578555Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5578620Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5578718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5578783Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5578821Z graph_break [] 2025-12-04T11:45:24.5578881Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5578935Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5579074Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5579120Z Traceback (most recent call last): 2025-12-04T11:45:24.5579273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5579313Z method(*args, **kwargs) 2025-12-04T11:45:24.5579463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5579504Z method(*args, **kwargs) 2025-12-04T11:45:24.5579652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5579702Z with policy(): 2025-12-04T11:45:24.5579855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5579908Z raise RuntimeError(msg) 2025-12-04T11:45:24.5580296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5580298Z 2025-12-04T11:45:24.5580370Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5580624Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5580627Z 2025-12-04T11:45:24.5580713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5580798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5580840Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5580897Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5580960Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5581058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5581093Z graph_break [] 2025-12-04T11:45:24.5581154Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5581226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5581269Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5581324Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5581421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5581486Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5581524Z graph_break [] 2025-12-04T11:45:24.5581582Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5581655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5581695Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5581750Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5581844Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5581907Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5581942Z graph_break [] 2025-12-04T11:45:24.5582014Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5582209Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4fdffd7a4c5b5d73.xml - 2025-12-04T11:45:24.5582271Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5582844Z FAILED [0.3598s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5582848Z 2025-12-04T11:45:24.5582919Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5583174Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5583188Z 2025-12-04T11:45:24.5583302Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5583366Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5583454Z ================== 1 failed, 79 deselected, 2 rerun in 2.47s =================== 2025-12-04T11:45:24.5583492Z Got exit code 1 2025-12-04T11:45:24.5583669Z Retrying single test... 2025-12-04T11:45:24.5583816Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8cbb01f917da212a.xml 2025-12-04T11:45:24.5583872Z ============================= test session starts ============================== 2025-12-04T11:45:24.5583984Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5584027Z cachedir: .pytest_cache 2025-12-04T11:45:24.5584187Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5584233Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5584275Z configfile: pytest.ini 2025-12-04T11:45:24.5584449Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5584526Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5584777Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5584823Z Running 1 items in this shard 2025-12-04T11:45:24.5584825Z 2025-12-04T11:45:24.5585035Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5921s] [100%] 2025-12-04T11:45:24.5585245Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2594s] [100%] 2025-12-04T11:45:24.5585434Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2338s] [100%] 2025-12-04T11:45:24.5585437Z 2025-12-04T11:45:24.5585488Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5585631Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5585676Z Traceback (most recent call last): 2025-12-04T11:45:24.5585832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5585871Z method(*args, **kwargs) 2025-12-04T11:45:24.5586037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5586079Z method(*args, **kwargs) 2025-12-04T11:45:24.5586231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5586270Z with policy(): 2025-12-04T11:45:24.5586425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5586466Z raise RuntimeError(msg) 2025-12-04T11:45:24.5586852Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.5586855Z 2025-12-04T11:45:24.5586929Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5587187Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5587203Z 2025-12-04T11:45:24.5587303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5587376Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5587421Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5587477Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5587545Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5587643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5587682Z graph_break [] 2025-12-04T11:45:24.5587742Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5587886Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5587932Z Traceback (most recent call last): 2025-12-04T11:45:24.5588097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5588139Z method(*args, **kwargs) 2025-12-04T11:45:24.5588293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5588333Z method(*args, **kwargs) 2025-12-04T11:45:24.5588485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5588522Z with policy(): 2025-12-04T11:45:24.5588677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5588718Z raise RuntimeError(msg) 2025-12-04T11:45:24.5589107Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.5589111Z 2025-12-04T11:45:24.5589184Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5589441Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5589443Z 2025-12-04T11:45:24.5589530Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5589605Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5589649Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5589728Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5589796Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5589896Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5589934Z graph_break [] 2025-12-04T11:45:24.5589996Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5590074Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5590116Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5590171Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5590267Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5590333Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5590369Z graph_break [] 2025-12-04T11:45:24.5590431Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5590483Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5590627Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5590708Z Traceback (most recent call last): 2025-12-04T11:45:24.5590866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5590918Z method(*args, **kwargs) 2025-12-04T11:45:24.5591071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5591109Z method(*args, **kwargs) 2025-12-04T11:45:24.5591261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5591299Z with policy(): 2025-12-04T11:45:24.5591455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5591496Z raise RuntimeError(msg) 2025-12-04T11:45:24.5591895Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5591898Z 2025-12-04T11:45:24.5591971Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5592225Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5592227Z 2025-12-04T11:45:24.5592313Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5592389Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5592431Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5592489Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5592555Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5592655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5592692Z graph_break [] 2025-12-04T11:45:24.5592754Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5592827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5592870Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5592925Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5593023Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5593087Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5593125Z graph_break [] 2025-12-04T11:45:24.5593198Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5593313Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5593356Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5593415Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5593510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5593575Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5593611Z graph_break [] 2025-12-04T11:45:24.5593671Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5593865Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8cbb01f917da212a.xml - 2025-12-04T11:45:24.5593925Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5594499Z FAILED [0.2338s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5594530Z 2025-12-04T11:45:24.5594603Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5594858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5594860Z 2025-12-04T11:45:24.5594945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5595008Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5595076Z ================== 1 failed, 187 deselected, 2 rerun in 2.10s ================== 2025-12-04T11:45:24.5595117Z Got exit code 1 2025-12-04T11:45:24.5595160Z Retrying single test... 2025-12-04T11:45:24.5595321Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f68fa846c3f5052.xml 2025-12-04T11:45:24.5595379Z ============================= test session starts ============================== 2025-12-04T11:45:24.5595492Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5595533Z cachedir: .pytest_cache 2025-12-04T11:45:24.5595693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5595737Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5595780Z configfile: pytest.ini 2025-12-04T11:45:24.5595941Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5596018Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5596274Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5596319Z Running 1 items in this shard 2025-12-04T11:45:24.5596321Z 2025-12-04T11:45:24.5596533Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5867s] [100%] 2025-12-04T11:45:24.5596742Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2605s] [100%] 2025-12-04T11:45:24.5596941Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2171s] [100%] 2025-12-04T11:45:24.5596943Z 2025-12-04T11:45:24.5596995Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5597137Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5597185Z Traceback (most recent call last): 2025-12-04T11:45:24.5597344Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5597385Z method(*args, **kwargs) 2025-12-04T11:45:24.5597540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5597579Z method(*args, **kwargs) 2025-12-04T11:45:24.5597730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5597767Z with policy(): 2025-12-04T11:45:24.5597923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5597974Z raise RuntimeError(msg) 2025-12-04T11:45:24.5598359Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.5598374Z 2025-12-04T11:45:24.5598449Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5598703Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5598705Z 2025-12-04T11:45:24.5598795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5598868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5598914Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5598970Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5599049Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5599147Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5599184Z graph_break [] 2025-12-04T11:45:24.5599245Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5599386Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5599431Z Traceback (most recent call last): 2025-12-04T11:45:24.5599588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5599627Z method(*args, **kwargs) 2025-12-04T11:45:24.5599781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5599820Z method(*args, **kwargs) 2025-12-04T11:45:24.5599973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5600010Z with policy(): 2025-12-04T11:45:24.5600164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5600205Z raise RuntimeError(msg) 2025-12-04T11:45:24.5600589Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.5600592Z 2025-12-04T11:45:24.5600680Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5600932Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5600937Z 2025-12-04T11:45:24.5601026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5601098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5601142Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5601199Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5601266Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5601365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5601402Z graph_break [] 2025-12-04T11:45:24.5601463Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5601539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5601594Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5601652Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5601750Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5601828Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5601863Z graph_break [] 2025-12-04T11:45:24.5601925Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5601977Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5602120Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5602166Z Traceback (most recent call last): 2025-12-04T11:45:24.5602326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5602365Z method(*args, **kwargs) 2025-12-04T11:45:24.5602518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5602557Z method(*args, **kwargs) 2025-12-04T11:45:24.5602724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5602762Z with policy(): 2025-12-04T11:45:24.5602916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5602958Z raise RuntimeError(msg) 2025-12-04T11:45:24.5603463Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5603465Z 2025-12-04T11:45:24.5603541Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5603795Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5603798Z 2025-12-04T11:45:24.5604025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5604097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5604143Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5604199Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5604266Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5604363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5604427Z graph_break [] 2025-12-04T11:45:24.5604489Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5604565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5604607Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5604667Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5604763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5604829Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5604865Z graph_break [] 2025-12-04T11:45:24.5604926Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5604999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5605043Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5605097Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5605196Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5605259Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5605322Z graph_break [] 2025-12-04T11:45:24.5605381Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:24.5605574Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f68fa846c3f5052.xml - 2025-12-04T11:45:24.5605647Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5606224Z FAILED [0.2171s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.5606226Z 2025-12-04T11:45:24.5606299Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5606570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5606573Z 2025-12-04T11:45:24.5606662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5606725Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5606795Z ================== 1 failed, 187 deselected, 2 rerun in 2.08s ================== 2025-12-04T11:45:24.5606832Z Got exit code 1 2025-12-04T11:45:24.5607041Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5607168Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.5607315Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf843b3d4b7259fd.xml 2025-12-04T11:45:24.5607375Z ============================= test session starts ============================== 2025-12-04T11:45:24.5607488Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5607530Z cachedir: .pytest_cache 2025-12-04T11:45:24.5607690Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5607735Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5607777Z configfile: pytest.ini 2025-12-04T11:45:24.5607937Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5608028Z collecting ... collected 188 items / 80 deselected / 108 selected 2025-12-04T11:45:24.5608083Z stepcurrent: skipping 80 already run items. 2025-12-04T11:45:24.5608129Z Running 108 items in this shard 2025-12-04T11:45:24.5608131Z 2025-12-04T11:45:24.5608350Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5969s] [ 0%] 2025-12-04T11:45:24.5608568Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2677s] [ 0%] 2025-12-04T11:45:24.5608757Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2394s] [ 0%] 2025-12-04T11:45:24.5608762Z 2025-12-04T11:45:24.5608813Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5608958Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5609005Z Traceback (most recent call last): 2025-12-04T11:45:24.5609178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5609220Z method(*args, **kwargs) 2025-12-04T11:45:24.5609385Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5609425Z method(*args, **kwargs) 2025-12-04T11:45:24.5609579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5609616Z with policy(): 2025-12-04T11:45:24.5609770Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5609811Z raise RuntimeError(msg) 2025-12-04T11:45:24.5610200Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.5610204Z 2025-12-04T11:45:24.5610287Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5610548Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5610550Z 2025-12-04T11:45:24.5610636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5610711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5610754Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5610815Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5610881Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5610982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5611022Z graph_break [] 2025-12-04T11:45:24.5611087Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5611232Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5611278Z Traceback (most recent call last): 2025-12-04T11:45:24.5611434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5611475Z method(*args, **kwargs) 2025-12-04T11:45:24.5611627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5611667Z method(*args, **kwargs) 2025-12-04T11:45:24.5611831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5611870Z with policy(): 2025-12-04T11:45:24.5612024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5612067Z raise RuntimeError(msg) 2025-12-04T11:45:24.5612457Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.5612459Z 2025-12-04T11:45:24.5612533Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5612793Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5612796Z 2025-12-04T11:45:24.5612882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5612970Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5613024Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5613082Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5613147Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5613277Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5613315Z graph_break [] 2025-12-04T11:45:24.5613379Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5613454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5613498Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5613555Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5613655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5613721Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5613762Z graph_break [] 2025-12-04T11:45:24.5613838Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5613893Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5614034Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5614081Z Traceback (most recent call last): 2025-12-04T11:45:24.5614234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5614276Z method(*args, **kwargs) 2025-12-04T11:45:24.5614429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5614469Z method(*args, **kwargs) 2025-12-04T11:45:24.5614620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5614659Z with policy(): 2025-12-04T11:45:24.5614813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5614855Z raise RuntimeError(msg) 2025-12-04T11:45:24.5615240Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5615242Z 2025-12-04T11:45:24.5615315Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5615590Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5615594Z 2025-12-04T11:45:24.5615681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5615759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5615801Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5615860Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5615926Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5616027Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5616063Z graph_break [] 2025-12-04T11:45:24.5616128Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5616201Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5616246Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5616301Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5616413Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5616478Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5616534Z graph_break [] 2025-12-04T11:45:24.5616594Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5616670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5616712Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5616768Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5616863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5616928Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5616964Z graph_break [] 2025-12-04T11:45:24.5617027Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5617218Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf843b3d4b7259fd.xml - 2025-12-04T11:45:24.5617309Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5617898Z FAILED [0.2394s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5617900Z 2025-12-04T11:45:24.5617972Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5618231Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5618234Z 2025-12-04T11:45:24.5618320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5618387Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5618453Z ================== 1 failed, 80 deselected, 2 rerun in 2.12s =================== 2025-12-04T11:45:24.5618494Z Got exit code 1 2025-12-04T11:45:24.5618534Z Retrying single test... 2025-12-04T11:45:24.5618682Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a8c740ec84c29afd.xml 2025-12-04T11:45:24.5618739Z ============================= test session starts ============================== 2025-12-04T11:45:24.5618850Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5618902Z cachedir: .pytest_cache 2025-12-04T11:45:24.5619063Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5619109Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5619153Z configfile: pytest.ini 2025-12-04T11:45:24.5619315Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5619393Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5619647Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5619692Z Running 1 items in this shard 2025-12-04T11:45:24.5619694Z 2025-12-04T11:45:24.5619912Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7156s] [100%] 2025-12-04T11:45:24.5620125Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3745s] [100%] 2025-12-04T11:45:24.5620330Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3304s] [100%] 2025-12-04T11:45:24.5620342Z 2025-12-04T11:45:24.5620393Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5620536Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5620582Z Traceback (most recent call last): 2025-12-04T11:45:24.5620741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5620782Z method(*args, **kwargs) 2025-12-04T11:45:24.5620938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5620981Z method(*args, **kwargs) 2025-12-04T11:45:24.5621146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5621185Z with policy(): 2025-12-04T11:45:24.5621340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5621381Z raise RuntimeError(msg) 2025-12-04T11:45:24.5621768Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.5621771Z 2025-12-04T11:45:24.5621848Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5622106Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5622110Z 2025-12-04T11:45:24.5622199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5622276Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5622321Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5622378Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5622447Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5622546Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5622583Z graph_break [] 2025-12-04T11:45:24.5622655Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5622797Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5622844Z Traceback (most recent call last): 2025-12-04T11:45:24.5623001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5623042Z method(*args, **kwargs) 2025-12-04T11:45:24.5623195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5623235Z method(*args, **kwargs) 2025-12-04T11:45:24.5623431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5623469Z with policy(): 2025-12-04T11:45:24.5623622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5623664Z raise RuntimeError(msg) 2025-12-04T11:45:24.5624050Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.5624083Z 2025-12-04T11:45:24.5624157Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5624415Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5624417Z 2025-12-04T11:45:24.5624507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5624580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5624629Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5624687Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5624755Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5624854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5624908Z graph_break [] 2025-12-04T11:45:24.5624970Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5625045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5625087Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5625144Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5625240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5625307Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5625343Z graph_break [] 2025-12-04T11:45:24.5625406Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5625458Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5625604Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5625650Z Traceback (most recent call last): 2025-12-04T11:45:24.5625808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5625849Z method(*args, **kwargs) 2025-12-04T11:45:24.5626001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5626041Z method(*args, **kwargs) 2025-12-04T11:45:24.5626192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5626229Z with policy(): 2025-12-04T11:45:24.5626396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5626439Z raise RuntimeError(msg) 2025-12-04T11:45:24.5626829Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5626832Z 2025-12-04T11:45:24.5626905Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5627163Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5627165Z 2025-12-04T11:45:24.5627254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5627328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5627372Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5627439Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5627507Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5627615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5627653Z graph_break [] 2025-12-04T11:45:24.5627715Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5627792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5627833Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5627890Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5627987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5628054Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5628090Z graph_break [] 2025-12-04T11:45:24.5628152Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5628226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5628271Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5628338Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5628437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5628501Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5628539Z graph_break [] 2025-12-04T11:45:24.5628599Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5628793Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a8c740ec84c29afd.xml - 2025-12-04T11:45:24.5628852Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5629433Z FAILED [0.3304s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5629437Z 2025-12-04T11:45:24.5629509Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5629766Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5629768Z 2025-12-04T11:45:24.5629857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5629929Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5629999Z ================== 1 failed, 187 deselected, 2 rerun in 2.44s ================== 2025-12-04T11:45:24.5630038Z Got exit code 1 2025-12-04T11:45:24.5630081Z Retrying single test... 2025-12-04T11:45:24.5630227Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e633c6f7b6eb2c72.xml 2025-12-04T11:45:24.5630287Z ============================= test session starts ============================== 2025-12-04T11:45:24.5630396Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5630441Z cachedir: .pytest_cache 2025-12-04T11:45:24.5630598Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5630646Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5630687Z configfile: pytest.ini 2025-12-04T11:45:24.5630850Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5630936Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5631195Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5631258Z Running 1 items in this shard 2025-12-04T11:45:24.5631260Z 2025-12-04T11:45:24.5631475Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6042s] [100%] 2025-12-04T11:45:24.5631688Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2576s] [100%] 2025-12-04T11:45:24.5631881Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2106s] [100%] 2025-12-04T11:45:24.5631884Z 2025-12-04T11:45:24.5631934Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5632087Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5632137Z Traceback (most recent call last): 2025-12-04T11:45:24.5632294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5632337Z method(*args, **kwargs) 2025-12-04T11:45:24.5632489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5632531Z method(*args, **kwargs) 2025-12-04T11:45:24.5632683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5632723Z with policy(): 2025-12-04T11:45:24.5632876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5632921Z raise RuntimeError(msg) 2025-12-04T11:45:24.5633361Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.5633364Z 2025-12-04T11:45:24.5633440Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5633696Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5633698Z 2025-12-04T11:45:24.5633799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5633877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5633922Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5633979Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5634048Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5634146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5634184Z graph_break [] 2025-12-04T11:45:24.5634245Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5634385Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5634430Z Traceback (most recent call last): 2025-12-04T11:45:24.5634585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5634626Z method(*args, **kwargs) 2025-12-04T11:45:24.5634779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5634834Z method(*args, **kwargs) 2025-12-04T11:45:24.5635001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5635039Z with policy(): 2025-12-04T11:45:24.5635190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5635233Z raise RuntimeError(msg) 2025-12-04T11:45:24.5635623Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.5635625Z 2025-12-04T11:45:24.5635701Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5635972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5635975Z 2025-12-04T11:45:24.5636063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5636136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5636181Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5636236Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5636303Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5636401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5636441Z graph_break [] 2025-12-04T11:45:24.5636501Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5636579Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5636620Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5636678Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5636775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5636843Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5639725Z graph_break [] 2025-12-04T11:45:24.5639797Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5639851Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5639997Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5640044Z Traceback (most recent call last): 2025-12-04T11:45:24.5640223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5640266Z method(*args, **kwargs) 2025-12-04T11:45:24.5640420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5640461Z method(*args, **kwargs) 2025-12-04T11:45:24.5640614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5640652Z with policy(): 2025-12-04T11:45:24.5640805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5640846Z raise RuntimeError(msg) 2025-12-04T11:45:24.5641235Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5641251Z 2025-12-04T11:45:24.5641328Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5641587Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5641602Z 2025-12-04T11:45:24.5641692Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5641765Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5641810Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5641867Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5641934Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5642032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5642071Z graph_break [] 2025-12-04T11:45:24.5642133Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5642220Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5642263Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5642321Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5642417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5642483Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5642519Z graph_break [] 2025-12-04T11:45:24.5642581Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5642656Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5642699Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5642755Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5642853Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5642918Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.5642956Z graph_break [] 2025-12-04T11:45:24.5643018Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:24.5643214Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e633c6f7b6eb2c72.xml - 2025-12-04T11:45:24.5643314Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5643917Z FAILED [0.2106s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.5643921Z 2025-12-04T11:45:24.5643996Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5644253Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5644256Z 2025-12-04T11:45:24.5644344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5644406Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5644477Z ================== 1 failed, 187 deselected, 2 rerun in 2.09s ================== 2025-12-04T11:45:24.5644514Z Got exit code 1 2025-12-04T11:45:24.5644726Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5644854Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.5645017Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-eccf5396988198c2.xml 2025-12-04T11:45:24.5645093Z ============================= test session starts ============================== 2025-12-04T11:45:24.5645207Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5645249Z cachedir: .pytest_cache 2025-12-04T11:45:24.5645409Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5645454Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5645497Z configfile: pytest.ini 2025-12-04T11:45:24.5645659Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5645739Z collecting ... collected 188 items / 81 deselected / 107 selected 2025-12-04T11:45:24.5645794Z stepcurrent: skipping 81 already run items. 2025-12-04T11:45:24.5645840Z Running 107 items in this shard 2025-12-04T11:45:24.5645843Z 2025-12-04T11:45:24.5646072Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1722s] [ 0%] 2025-12-04T11:45:24.5646284Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7611s] [ 0%] 2025-12-04T11:45:24.5646471Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6978s] [ 0%] 2025-12-04T11:45:24.5646474Z 2025-12-04T11:45:24.5646526Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5646668Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5646716Z Traceback (most recent call last): 2025-12-04T11:45:24.5646876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5646917Z method(*args, **kwargs) 2025-12-04T11:45:24.5647073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5647113Z method(*args, **kwargs) 2025-12-04T11:45:24.5647268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5647305Z with policy(): 2025-12-04T11:45:24.5647460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5647513Z raise RuntimeError(msg) 2025-12-04T11:45:24.5647900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.5647904Z 2025-12-04T11:45:24.5647978Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5648237Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5648239Z 2025-12-04T11:45:24.5648329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5648403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5648447Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5648504Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5649010Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5649119Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5649158Z graph_break [] 2025-12-04T11:45:24.5649222Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5649298Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5649788Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5649852Z current_size = base.storage().size() 2025-12-04T11:45:24.5649893Z Autotune Choices Stats: 2025-12-04T11:45:24.5650276Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.5650338Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5650387Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5650516Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5650755Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5650989Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5651213Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5651452Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5651682Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5651907Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5652136Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5652357Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5652402Z _scaled_mm 0.0264 ms 22.3% 2025-12-04T11:45:24.5652543Z SingleProcess AUTOTUNE benchmarking takes 0.0431 seconds and 0.1945 seconds precompiling for 9 choices 2025-12-04T11:45:24.5652689Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5652746Z Traceback (most recent call last): 2025-12-04T11:45:24.5652904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5652945Z method(*args, **kwargs) 2025-12-04T11:45:24.5653100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5653140Z method(*args, **kwargs) 2025-12-04T11:45:24.5653319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5653356Z with policy(): 2025-12-04T11:45:24.5653511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5653555Z raise RuntimeError(msg) 2025-12-04T11:45:24.5653960Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.5653963Z 2025-12-04T11:45:24.5654039Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5654294Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5654297Z 2025-12-04T11:45:24.5654385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5654458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5654502Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5654558Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5655042Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5655140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5655178Z graph_break [] 2025-12-04T11:45:24.5655254Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5655330Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5655818Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5655869Z current_size = base.storage().size() 2025-12-04T11:45:24.5655912Z Autotune Choices Stats: 2025-12-04T11:45:24.5656282Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.5656344Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5656393Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5656529Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5656779Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5657009Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5657232Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5657459Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5657695Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5657917Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5658153Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5658375Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5658422Z _scaled_mm 0.0264 ms 22.3% 2025-12-04T11:45:24.5658550Z SingleProcess AUTOTUNE benchmarking takes 0.0431 seconds and 0.1945 seconds precompiling for 9 choices 2025-12-04T11:45:24.5658626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5658668Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5658726Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5658825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5659314Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5659355Z graph_break [] 2025-12-04T11:45:24.5659420Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5659495Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5659536Z Autotune Choices Stats: 2025-12-04T11:45:24.5659898Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.5659956Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5660007Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5660128Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5660377Z triton_mm_15 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5660616Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5660843Z triton_mm_13 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5661068Z triton_mm_14 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5661305Z triton_mm_12 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5661534Z triton_mm_8 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5661759Z triton_mm_10 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5661987Z triton_mm_11 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5662029Z _scaled_mm 0.0246 ms 23.9% 2025-12-04T11:45:24.5662160Z SingleProcess AUTOTUNE benchmarking takes 0.0557 seconds and 0.2185 seconds precompiling for 9 choices 2025-12-04T11:45:24.5662214Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5662357Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5662403Z Traceback (most recent call last): 2025-12-04T11:45:24.5662563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5662606Z method(*args, **kwargs) 2025-12-04T11:45:24.5662775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5662818Z method(*args, **kwargs) 2025-12-04T11:45:24.5662970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5663010Z with policy(): 2025-12-04T11:45:24.5663165Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5663208Z raise RuntimeError(msg) 2025-12-04T11:45:24.5663636Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5663638Z 2025-12-04T11:45:24.5663714Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5663970Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5663986Z 2025-12-04T11:45:24.5664077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5664164Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5664209Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5664265Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5664750Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5664852Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5664890Z graph_break [] 2025-12-04T11:45:24.5664954Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5665040Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5665529Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5665577Z current_size = base.storage().size() 2025-12-04T11:45:24.5665620Z Autotune Choices Stats: 2025-12-04T11:45:24.5665985Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.5666048Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5666097Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5666221Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5666455Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5666703Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5666928Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5667157Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5667382Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5667605Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5667831Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5668077Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5668119Z _scaled_mm 0.0264 ms 22.3% 2025-12-04T11:45:24.5668249Z SingleProcess AUTOTUNE benchmarking takes 0.0431 seconds and 0.1945 seconds precompiling for 9 choices 2025-12-04T11:45:24.5668323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5668367Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5668424Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5668527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5669020Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5669062Z graph_break [] 2025-12-04T11:45:24.5669123Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5669198Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5669239Z Autotune Choices Stats: 2025-12-04T11:45:24.5669600Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.5669659Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5669710Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5669831Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5670062Z triton_mm_15 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5670289Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5670526Z triton_mm_13 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5670756Z triton_mm_14 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5670986Z triton_mm_12 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5671213Z triton_mm_8 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5671439Z triton_mm_10 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5671676Z triton_mm_11 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5671730Z _scaled_mm 0.0246 ms 23.9% 2025-12-04T11:45:24.5671858Z SingleProcess AUTOTUNE benchmarking takes 0.0557 seconds and 0.2185 seconds precompiling for 9 choices 2025-12-04T11:45:24.5671933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5671975Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5672033Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5672133Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5672625Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5672665Z graph_break [] 2025-12-04T11:45:24.5672727Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5672801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5672844Z Autotune Choices Stats: 2025-12-04T11:45:24.5673206Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_21", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:24.5673297Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5673345Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5673468Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5673700Z triton_mm_21 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5673931Z triton_mm_16 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5674170Z triton_mm_23 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5674398Z triton_mm_17 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5674629Z triton_mm_20 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5674857Z triton_mm_18 0.0065 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5675085Z triton_mm_22 0.0065 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5675324Z triton_mm_19 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5675378Z _scaled_mm 0.0237 ms 25.6% 2025-12-04T11:45:24.5675507Z SingleProcess AUTOTUNE benchmarking takes 0.0555 seconds and 0.2149 seconds precompiling for 9 choices 2025-12-04T11:45:24.5675697Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-eccf5396988198c2.xml - 2025-12-04T11:45:24.5675760Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5676361Z FAILED [0.6978s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5676366Z 2025-12-04T11:45:24.5676443Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5676699Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5676704Z 2025-12-04T11:45:24.5676792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5676857Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5676928Z ================== 1 failed, 81 deselected, 2 rerun in 3.65s =================== 2025-12-04T11:45:24.5676968Z Got exit code 1 2025-12-04T11:45:24.5677009Z Retrying single test... 2025-12-04T11:45:24.5677157Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a19f3a10e077f2b4.xml 2025-12-04T11:45:24.5677217Z ============================= test session starts ============================== 2025-12-04T11:45:24.5677330Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5677371Z cachedir: .pytest_cache 2025-12-04T11:45:24.5677532Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5677578Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5677621Z configfile: pytest.ini 2025-12-04T11:45:24.5677783Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5677870Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5678123Z stepcurrent: skipping 81 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5678170Z Running 1 items in this shard 2025-12-04T11:45:24.5678172Z 2025-12-04T11:45:24.5678384Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0053s] [100%] 2025-12-04T11:45:24.5678597Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7372s] [100%] 2025-12-04T11:45:24.5678783Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6418s] [100%] 2025-12-04T11:45:24.5678789Z 2025-12-04T11:45:24.5678838Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5678993Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5679040Z Traceback (most recent call last): 2025-12-04T11:45:24.5679213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5679254Z method(*args, **kwargs) 2025-12-04T11:45:24.5679411Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5679451Z method(*args, **kwargs) 2025-12-04T11:45:24.5679605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5679642Z with policy(): 2025-12-04T11:45:24.5679797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5679839Z raise RuntimeError(msg) 2025-12-04T11:45:24.5680234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.5680238Z 2025-12-04T11:45:24.5680313Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5680570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5680572Z 2025-12-04T11:45:24.5680658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5680734Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5680778Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5680839Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5681324Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5681423Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5681462Z graph_break [] 2025-12-04T11:45:24.5681525Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5681601Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5682098Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5682150Z current_size = base.storage().size() 2025-12-04T11:45:24.5682190Z Autotune Choices Stats: 2025-12-04T11:45:24.5682559Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5682616Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5682668Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5682790Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5683039Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5683304Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5683529Z triton_mm_4 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5683755Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5683994Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5684223Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5684446Z triton_mm_2 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5684669Z triton_mm_5 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5684712Z _scaled_mm 0.0066 ms 91.6% 2025-12-04T11:45:24.5684841Z SingleProcess AUTOTUNE benchmarking takes 0.0406 seconds and 0.1898 seconds precompiling for 9 choices 2025-12-04T11:45:24.5684985Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5685031Z Traceback (most recent call last): 2025-12-04T11:45:24.5685192Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5685232Z method(*args, **kwargs) 2025-12-04T11:45:24.5685386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5685427Z method(*args, **kwargs) 2025-12-04T11:45:24.5685593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5685632Z with policy(): 2025-12-04T11:45:24.5685788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5685830Z raise RuntimeError(msg) 2025-12-04T11:45:24.5686223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.5686225Z 2025-12-04T11:45:24.5686297Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5686558Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5686560Z 2025-12-04T11:45:24.5686649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5686736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5686796Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5686853Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5687331Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5687430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5687469Z graph_break [] 2025-12-04T11:45:24.5687531Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5687608Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5688107Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5688160Z current_size = base.storage().size() 2025-12-04T11:45:24.5688199Z Autotune Choices Stats: 2025-12-04T11:45:24.5688567Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5688628Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5688676Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5688799Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5689039Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5689264Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5689507Z triton_mm_4 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5689746Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5689968Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5690200Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5690429Z triton_mm_2 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5690661Z triton_mm_5 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5690722Z _scaled_mm 0.0066 ms 91.6% 2025-12-04T11:45:24.5690850Z SingleProcess AUTOTUNE benchmarking takes 0.0406 seconds and 0.1898 seconds precompiling for 9 choices 2025-12-04T11:45:24.5690925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5690967Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5691026Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5691126Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5691617Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5691657Z graph_break [] 2025-12-04T11:45:24.5691721Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5691794Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5691837Z Autotune Choices Stats: 2025-12-04T11:45:24.5692201Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5692261Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5692312Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5692431Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5692663Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5692887Z triton_mm_14 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5693125Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5693381Z triton_mm_10 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5693613Z triton_mm_12 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5693836Z triton_mm_15 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5694060Z triton_mm_8 0.0067 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5694294Z triton_mm_13 0.0068 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5694363Z _scaled_mm 0.0226 ms 27.0% 2025-12-04T11:45:24.5694500Z SingleProcess AUTOTUNE benchmarking takes 0.0383 seconds and 0.0890 seconds precompiling for 9 choices 2025-12-04T11:45:24.5694553Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5694697Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5694743Z Traceback (most recent call last): 2025-12-04T11:45:24.5694903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5694946Z method(*args, **kwargs) 2025-12-04T11:45:24.5695101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5695142Z method(*args, **kwargs) 2025-12-04T11:45:24.5695309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5695348Z with policy(): 2025-12-04T11:45:24.5695503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5695544Z raise RuntimeError(msg) 2025-12-04T11:45:24.5695930Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5695932Z 2025-12-04T11:45:24.5696009Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5696264Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5696268Z 2025-12-04T11:45:24.5696358Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5696433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5696478Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5696535Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5697034Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5697134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5697174Z graph_break [] 2025-12-04T11:45:24.5697236Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5697313Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5697799Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5697849Z current_size = base.storage().size() 2025-12-04T11:45:24.5697890Z Autotune Choices Stats: 2025-12-04T11:45:24.5698257Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5698339Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5698387Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5698509Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5698748Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5698976Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5699213Z triton_mm_4 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5699439Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5699663Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5699889Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5700116Z triton_mm_2 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5700336Z triton_mm_5 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5700378Z _scaled_mm 0.0066 ms 91.6% 2025-12-04T11:45:24.5700505Z SingleProcess AUTOTUNE benchmarking takes 0.0406 seconds and 0.1898 seconds precompiling for 9 choices 2025-12-04T11:45:24.5700578Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5700619Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5700688Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5700787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5701267Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5701305Z graph_break [] 2025-12-04T11:45:24.5701365Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5701438Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5701477Z Autotune Choices Stats: 2025-12-04T11:45:24.5701834Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5701902Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5701961Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5702079Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5702310Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5702534Z triton_mm_14 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5702763Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5703005Z triton_mm_10 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5703230Z triton_mm_12 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5703497Z triton_mm_15 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5703721Z triton_mm_8 0.0067 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5703947Z triton_mm_13 0.0068 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5703987Z _scaled_mm 0.0226 ms 27.0% 2025-12-04T11:45:24.5704115Z SingleProcess AUTOTUNE benchmarking takes 0.0383 seconds and 0.0890 seconds precompiling for 9 choices 2025-12-04T11:45:24.5704187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5704229Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5704285Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5704399Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5704888Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5704925Z graph_break [] 2025-12-04T11:45:24.5704986Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5705058Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5705099Z Autotune Choices Stats: 2025-12-04T11:45:24.5705455Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.5705533Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5705596Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5705715Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5705943Z triton_mm_23 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5706168Z triton_mm_17 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5706393Z triton_mm_22 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5706629Z triton_mm_19 0.0063 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5706859Z triton_mm_18 0.0064 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5707084Z triton_mm_20 0.0065 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5707309Z triton_mm_21 0.0065 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5707534Z triton_mm_16 0.0067 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5707576Z _scaled_mm 0.0256 ms 22.8% 2025-12-04T11:45:24.5707702Z SingleProcess AUTOTUNE benchmarking takes 0.0543 seconds and 0.2161 seconds precompiling for 9 choices 2025-12-04T11:45:24.5707893Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a19f3a10e077f2b4.xml - 2025-12-04T11:45:24.5707954Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5708540Z FAILED [0.6418s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5708545Z 2025-12-04T11:45:24.5708619Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5708876Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5708878Z 2025-12-04T11:45:24.5708965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5709027Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5709095Z ================== 1 failed, 187 deselected, 2 rerun in 3.40s ================== 2025-12-04T11:45:24.5709143Z Got exit code 1 2025-12-04T11:45:24.5709183Z Retrying single test... 2025-12-04T11:45:24.5709326Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-73221792f88fb935.xml 2025-12-04T11:45:24.5709396Z ============================= test session starts ============================== 2025-12-04T11:45:24.5709506Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5709547Z cachedir: .pytest_cache 2025-12-04T11:45:24.5709705Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5709750Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5709790Z configfile: pytest.ini 2025-12-04T11:45:24.5709952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5710027Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5710288Z stepcurrent: skipping 81 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5710333Z Running 1 items in this shard 2025-12-04T11:45:24.5710335Z 2025-12-04T11:45:24.5710545Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0123s] [100%] 2025-12-04T11:45:24.5710754Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7561s] [100%] 2025-12-04T11:45:24.5710941Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6329s] [100%] 2025-12-04T11:45:24.5710943Z 2025-12-04T11:45:24.5710995Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5711136Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5711184Z Traceback (most recent call last): 2025-12-04T11:45:24.5711343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5711384Z method(*args, **kwargs) 2025-12-04T11:45:24.5711535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5711575Z method(*args, **kwargs) 2025-12-04T11:45:24.5711725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5711773Z with policy(): 2025-12-04T11:45:24.5711925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5711968Z raise RuntimeError(msg) 2025-12-04T11:45:24.5712353Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.5712356Z 2025-12-04T11:45:24.5712429Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5712684Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5712686Z 2025-12-04T11:45:24.5712772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5712847Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5712899Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5712956Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5713523Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5713623Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5713658Z graph_break [] 2025-12-04T11:45:24.5713721Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5713795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5714314Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5714366Z current_size = base.storage().size() 2025-12-04T11:45:24.5714406Z Autotune Choices Stats: 2025-12-04T11:45:24.5714770Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5714829Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5714879Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5715000Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5715234Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5715463Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5715695Z triton_mm_1 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5715938Z triton_mm_4 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5716163Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5716388Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5716609Z triton_mm_3 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5716836Z triton_mm_0 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5716905Z _scaled_mm 0.0238 ms 25.5% 2025-12-04T11:45:24.5717051Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1915 seconds precompiling for 9 choices 2025-12-04T11:45:24.5717190Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5717238Z Traceback (most recent call last): 2025-12-04T11:45:24.5717396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5717437Z method(*args, **kwargs) 2025-12-04T11:45:24.5717591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5717630Z method(*args, **kwargs) 2025-12-04T11:45:24.5717784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5717822Z with policy(): 2025-12-04T11:45:24.5717991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5718034Z raise RuntimeError(msg) 2025-12-04T11:45:24.5718419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.5718422Z 2025-12-04T11:45:24.5718495Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5718755Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5718758Z 2025-12-04T11:45:24.5718845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5718921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5718964Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5719022Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5719500Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5719619Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5719658Z graph_break [] 2025-12-04T11:45:24.5719721Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5719798Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5720288Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5720338Z current_size = base.storage().size() 2025-12-04T11:45:24.5720378Z Autotune Choices Stats: 2025-12-04T11:45:24.5720741Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5720811Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5720860Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5720993Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5721227Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5721457Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5721684Z triton_mm_1 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5721921Z triton_mm_4 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5722148Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5722372Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5722593Z triton_mm_3 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5722821Z triton_mm_0 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5722864Z _scaled_mm 0.0238 ms 25.5% 2025-12-04T11:45:24.5722990Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1915 seconds precompiling for 9 choices 2025-12-04T11:45:24.5723066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5723108Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5723166Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5723310Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5723792Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5723830Z graph_break [] 2025-12-04T11:45:24.5723893Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5723966Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5724007Z Autotune Choices Stats: 2025-12-04T11:45:24.5724370Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.5724450Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5724499Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5724621Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5724865Z triton_mm_12 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5725093Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5725319Z triton_mm_10 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5725559Z triton_mm_14 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5725787Z triton_mm_13 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5726008Z triton_mm_11 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5726238Z triton_mm_8 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5726467Z triton_mm_15 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5726508Z _scaled_mm 0.0220 ms 27.2% 2025-12-04T11:45:24.5726637Z SingleProcess AUTOTUNE benchmarking takes 0.0342 seconds and 0.1283 seconds precompiling for 9 choices 2025-12-04T11:45:24.5726691Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5726830Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5726878Z Traceback (most recent call last): 2025-12-04T11:45:24.5727048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5727089Z method(*args, **kwargs) 2025-12-04T11:45:24.5727245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5727287Z method(*args, **kwargs) 2025-12-04T11:45:24.5727441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5727478Z with policy(): 2025-12-04T11:45:24.5727633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5727676Z raise RuntimeError(msg) 2025-12-04T11:45:24.5728064Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5728066Z 2025-12-04T11:45:24.5728138Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5728413Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5728426Z 2025-12-04T11:45:24.5728516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5728589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5728634Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5728690Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5729175Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5729274Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5729327Z graph_break [] 2025-12-04T11:45:24.5729389Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5729464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5729948Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5729999Z current_size = base.storage().size() 2025-12-04T11:45:24.5730040Z Autotune Choices Stats: 2025-12-04T11:45:24.5730406Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.5730468Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5730517Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5730640Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5730871Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5731114Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5731341Z triton_mm_1 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5731571Z triton_mm_4 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5731792Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5732015Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5732251Z triton_mm_3 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5732484Z triton_mm_0 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5732527Z _scaled_mm 0.0238 ms 25.5% 2025-12-04T11:45:24.5732654Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1915 seconds precompiling for 9 choices 2025-12-04T11:45:24.5732730Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5732773Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5732832Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5732932Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5733535Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5733572Z graph_break [] 2025-12-04T11:45:24.5733635Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5733707Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5733749Z Autotune Choices Stats: 2025-12-04T11:45:24.5734109Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.5734170Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5734220Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5734341Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5734576Z triton_mm_12 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5734821Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5735051Z triton_mm_10 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5735276Z triton_mm_14 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5735500Z triton_mm_13 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5735725Z triton_mm_11 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5735962Z triton_mm_8 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5736203Z triton_mm_15 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5736244Z _scaled_mm 0.0220 ms 27.2% 2025-12-04T11:45:24.5736375Z SingleProcess AUTOTUNE benchmarking takes 0.0342 seconds and 0.1283 seconds precompiling for 9 choices 2025-12-04T11:45:24.5736448Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5736492Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5736548Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5736649Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5737143Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5737184Z graph_break [] 2025-12-04T11:45:24.5737245Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:24.5737319Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5737359Z Autotune Choices Stats: 2025-12-04T11:45:24.5737726Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.5737788Z AUTOTUNE scaled_mm(1024x32, 32x16, 1024x1, 1x16, 16) 2025-12-04T11:45:24.5737836Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5737957Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5738188Z triton_mm_23 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5738428Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5738653Z triton_mm_19 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5738887Z triton_mm_20 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5739116Z triton_mm_16 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.5739340Z triton_mm_21 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5739567Z triton_mm_22 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.5739817Z triton_mm_18 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5739860Z _scaled_mm 0.0232 ms 25.9% 2025-12-04T11:45:24.5739987Z SingleProcess AUTOTUNE benchmarking takes 0.0494 seconds and 0.2080 seconds precompiling for 9 choices 2025-12-04T11:45:24.5740179Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-73221792f88fb935.xml - 2025-12-04T11:45:24.5740239Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5740832Z FAILED [0.6329s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.5740837Z 2025-12-04T11:45:24.5740912Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5741173Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5741176Z 2025-12-04T11:45:24.5741265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5741329Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5741400Z ================== 1 failed, 187 deselected, 2 rerun in 3.42s ================== 2025-12-04T11:45:24.5741438Z Got exit code 1 2025-12-04T11:45:24.5741647Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.5741774Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.5741922Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2732c3acdfa22a13.xml 2025-12-04T11:45:24.5741978Z ============================= test session starts ============================== 2025-12-04T11:45:24.5742091Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5742131Z cachedir: .pytest_cache 2025-12-04T11:45:24.5742304Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5742351Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5742392Z configfile: pytest.ini 2025-12-04T11:45:24.5742553Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5742632Z collecting ... collected 188 items / 82 deselected / 106 selected 2025-12-04T11:45:24.5742686Z stepcurrent: skipping 82 already run items. 2025-12-04T11:45:24.5742731Z Running 106 items in this shard 2025-12-04T11:45:24.5742733Z 2025-12-04T11:45:24.5743708Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpes8euk4l/7o/c7ojs7eybjpqr6aeub7jvcjr4fj6exjb2hqu3cackp6omgmqf2gf.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5743888Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5744111Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5744271Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5744418Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5744724Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5744859Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5745119Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5745258Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5745517Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5745677Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5745953Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5746090Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5746368Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5746580Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5746900Z E1204 11:12:18.392000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5747643Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpes8euk4l/dy/cdylrobswojasgezrfydxkgm4gkxnhmwfib2ey6x7xpxlxbmptu4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5747793Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5748021Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5748200Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5748346Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5748635Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5748768Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5749026Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5749175Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5749430Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5749585Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5749857Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5749993Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5750274Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5750474Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5750793Z E1204 11:12:18.403000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5751556Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpes8euk4l/5a/c5ah736b7363i43ciu7fp2elmfkzhf6xz2pesrbnxo4gwz76ewuv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5751711Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5751933Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5752096Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5752245Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5752546Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5752692Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5752961Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5753105Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5753385Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5753565Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5753841Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5753981Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5754266Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5754467Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5754792Z E1204 11:12:18.447000 722620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5754847Z ('RERUN', {'yellow': True}) [2.4592s] [ 0%] 2025-12-04T11:45:24.5755179Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda E1204 11:12:19.666000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5755499Z E1204 11:12:19.666000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5755632Z E1204 11:12:19.666000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5755782Z E1204 11:12:19.668000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5756085Z E1204 11:12:19.668000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5756215Z E1204 11:12:19.668000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5756364Z E1204 11:12:19.670000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5756665Z E1204 11:12:19.670000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5756822Z E1204 11:12:19.670000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5756874Z ('RERUN', {'yellow': True}) [1.1003s] [ 0%] 2025-12-04T11:45:24.5757200Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda E1204 11:12:20.593000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5757503Z E1204 11:12:20.593000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5757632Z E1204 11:12:20.593000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5757794Z E1204 11:12:20.596000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5758096Z E1204 11:12:20.596000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5758227Z E1204 11:12:20.596000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5758373Z E1204 11:12:20.598000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5758675Z E1204 11:12:20.598000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.5758805Z E1204 11:12:20.598000 722620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5758848Z FAILED [0.9921s] [ 0%] 2025-12-04T11:45:24.5758850Z 2025-12-04T11:45:24.5758907Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.5759054Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5759103Z Traceback (most recent call last): 2025-12-04T11:45:24.5759265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5759309Z method(*args, **kwargs) 2025-12-04T11:45:24.5759477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5759521Z method(*args, **kwargs) 2025-12-04T11:45:24.5759677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5759716Z with policy(): 2025-12-04T11:45:24.5759876Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5759919Z raise RuntimeError(msg) 2025-12-04T11:45:24.5760326Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:24.5760329Z 2025-12-04T11:45:24.5760408Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5760678Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5760693Z 2025-12-04T11:45:24.5760786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5760878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5760925Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5760988Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5761595Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5761704Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5761746Z graph_break [] 2025-12-04T11:45:24.5761819Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5761918Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5762450Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5762501Z current_size = base.storage().size() 2025-12-04T11:45:24.5762546Z Autotune Choices Stats: 2025-12-04T11:45:24.5762944Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0073990002274513245, "best_triton_pos": 0} 2025-12-04T11:45:24.5763020Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5763074Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5763205Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5763503Z triton_mm_17 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5763764Z triton_mm_7 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5764011Z triton_mm_18 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5764257Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5764498Z triton_mm_10 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5764746Z triton_mm_13 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5765010Z triton_mm_8 0.0081 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5765266Z triton_mm_14 0.0082 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5765509Z triton_mm_11 0.0082 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5765756Z triton_mm_12 0.0083 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5765899Z SingleProcess AUTOTUNE benchmarking takes 0.0752 seconds and 0.5253 seconds precompiling for 18 choices 2025-12-04T11:45:24.5766069Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5766122Z Traceback (most recent call last): 2025-12-04T11:45:24.5766292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5766338Z method(*args, **kwargs) 2025-12-04T11:45:24.5766501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5766546Z method(*args, **kwargs) 2025-12-04T11:45:24.5766708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5766749Z with policy(): 2025-12-04T11:45:24.5766914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5766962Z raise RuntimeError(msg) 2025-12-04T11:45:24.5767384Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:24.5767388Z 2025-12-04T11:45:24.5767468Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5767747Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5767750Z 2025-12-04T11:45:24.5767856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5767938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5767987Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5768050Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5768645Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5768753Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5768793Z graph_break [] 2025-12-04T11:45:24.5768865Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5768946Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5769489Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5769559Z current_size = base.storage().size() 2025-12-04T11:45:24.5769604Z Autotune Choices Stats: 2025-12-04T11:45:24.5770000Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0073990002274513245, "best_triton_pos": 0} 2025-12-04T11:45:24.5770073Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5770127Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5770271Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5770523Z triton_mm_17 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5770772Z triton_mm_7 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5771018Z triton_mm_18 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5771268Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5771517Z triton_mm_10 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5771766Z triton_mm_13 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5772026Z triton_mm_8 0.0081 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5772273Z triton_mm_14 0.0082 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5772518Z triton_mm_11 0.0082 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5772766Z triton_mm_12 0.0083 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5772908Z SingleProcess AUTOTUNE benchmarking takes 0.0752 seconds and 0.5253 seconds precompiling for 18 choices 2025-12-04T11:45:24.5772992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5773049Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5773113Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5773222Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5773809Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5773849Z graph_break [] 2025-12-04T11:45:24.5773919Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5774000Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5774046Z Autotune Choices Stats: 2025-12-04T11:45:24.5774455Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007439000066369772, "best_triton_pos": 0} 2025-12-04T11:45:24.5774527Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5774581Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5774711Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5774963Z triton_mm_34 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5775211Z triton_mm_31 0.0078 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5775458Z triton_mm_30 0.0080 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5775703Z triton_mm_37 0.0080 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5775946Z triton_mm_33 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5776205Z triton_mm_28 0.0082 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5776453Z triton_mm_38 0.0083 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5776703Z triton_mm_27 0.0084 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5776947Z triton_mm_32 0.0084 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5777193Z triton_mm_35 0.0087 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5777353Z SingleProcess AUTOTUNE benchmarking takes 0.1131 seconds and 0.4659 seconds precompiling for 21 choices 2025-12-04T11:45:24.5777424Z =================================== FAILURES =================================== 2025-12-04T11:45:24.5777581Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.5777631Z Traceback (most recent call last): 2025-12-04T11:45:24.5777803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5777847Z method(*args, **kwargs) 2025-12-04T11:45:24.5778015Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.5778059Z method(*args, **kwargs) 2025-12-04T11:45:24.5778227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.5778267Z with policy(): 2025-12-04T11:45:24.5778447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.5778492Z raise RuntimeError(msg) 2025-12-04T11:45:24.5778919Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.5778922Z 2025-12-04T11:45:24.5779002Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5779290Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5779293Z 2025-12-04T11:45:24.5779390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5779472Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5779519Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5779583Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5780206Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5780314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5780357Z graph_break [] 2025-12-04T11:45:24.5780425Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5780508Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5781040Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.5781092Z current_size = base.storage().size() 2025-12-04T11:45:24.5781135Z Autotune Choices Stats: 2025-12-04T11:45:24.5781534Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0073990002274513245, "best_triton_pos": 0} 2025-12-04T11:45:24.5781629Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5781682Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5781814Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5782067Z triton_mm_17 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5782315Z triton_mm_7 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5782568Z triton_mm_18 0.0080 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5782817Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5783063Z triton_mm_10 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5783340Z triton_mm_13 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5783585Z triton_mm_8 0.0081 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5783829Z triton_mm_14 0.0082 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5784076Z triton_mm_11 0.0082 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5784336Z triton_mm_12 0.0083 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5784480Z SingleProcess AUTOTUNE benchmarking takes 0.0752 seconds and 0.5253 seconds precompiling for 18 choices 2025-12-04T11:45:24.5784560Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5784609Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5784670Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5784780Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5785315Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5785356Z graph_break [] 2025-12-04T11:45:24.5785424Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5785518Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5785563Z Autotune Choices Stats: 2025-12-04T11:45:24.5785970Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007439000066369772, "best_triton_pos": 0} 2025-12-04T11:45:24.5786040Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5786092Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5786223Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5786475Z triton_mm_34 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5786740Z triton_mm_31 0.0078 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5786986Z triton_mm_30 0.0080 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5787231Z triton_mm_37 0.0080 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5787478Z triton_mm_33 0.0081 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5787724Z triton_mm_28 0.0082 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5787969Z triton_mm_38 0.0083 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5788216Z triton_mm_27 0.0084 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5788473Z triton_mm_32 0.0084 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5788721Z triton_mm_35 0.0087 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5788863Z SingleProcess AUTOTUNE benchmarking takes 0.1131 seconds and 0.4659 seconds precompiling for 21 choices 2025-12-04T11:45:24.5788945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.5788992Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.5789055Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.5789162Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.5789697Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.5789760Z graph_break [] 2025-12-04T11:45:24.5789827Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.5789905Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.5789952Z Autotune Choices Stats: 2025-12-04T11:45:24.5790343Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_50", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007600000128149986, "best_triton_pos": 0} 2025-12-04T11:45:24.5790413Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.5790466Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.5790608Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.5790860Z triton_mm_50 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5791106Z triton_mm_57 0.0077 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5791354Z triton_mm_48 0.0077 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5791602Z triton_mm_51 0.0078 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5791850Z triton_mm_53 0.0078 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5792098Z triton_mm_54 0.0078 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5792353Z triton_mm_58 0.0079 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5792603Z triton_mm_47 0.0080 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5792850Z triton_mm_52 0.0082 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.5793094Z triton_mm_49 0.0083 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.5793234Z SingleProcess AUTOTUNE benchmarking takes 0.1489 seconds and 0.3272 seconds precompiling for 21 choices 2025-12-04T11:45:24.5793478Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2732c3acdfa22a13.xml - 2025-12-04T11:45:24.5793556Z =========================== short test summary info ============================ 2025-12-04T11:45:24.5794211Z FAILED [0.9921s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.5794214Z 2025-12-04T11:45:24.5794298Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.5794582Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5794586Z 2025-12-04T11:45:24.5794683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.5794776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.5794852Z ================== 1 failed, 82 deselected, 2 rerun in 4.57s =================== 2025-12-04T11:45:24.5794893Z Got exit code 1 2025-12-04T11:45:24.5794939Z Retrying single test... 2025-12-04T11:45:24.5795095Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df53b5894a064061.xml 2025-12-04T11:45:24.5795159Z ============================= test session starts ============================== 2025-12-04T11:45:24.5795279Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.5795326Z cachedir: .pytest_cache 2025-12-04T11:45:24.5795498Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.5795551Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.5795596Z configfile: pytest.ini 2025-12-04T11:45:24.5795776Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.5795857Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.5796135Z stepcurrent: skipping 82 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.5796181Z Running 1 items in this shard 2025-12-04T11:45:24.5796183Z 2025-12-04T11:45:24.5796558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:29.060012282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5796562Z 2025-12-04T11:45:24.5796910Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5797238Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5797384Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5797908Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5798199Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5798458Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5798683Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5798904Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5799240Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5799500Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5799821Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5800074Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5800393Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5800645Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5800961Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5801223Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5801544Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5801801Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5802117Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5802370Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5802684Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5802912Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5803173Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5803553Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5803768Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5804031Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5804352Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5804604Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5804923Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5805162Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5805390Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5805609Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5805836Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5806033Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5806226Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5806799Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpj21u9d2w/7o/c7ojs7eybjpqr6aeub7jvcjr4fj6exjb2hqu3cackp6omgmqf2gf.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5806958Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5807195Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5807363Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5807536Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5807862Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5808004Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5808284Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5808434Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5808722Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5808891Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5809183Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5809328Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5809629Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5809840Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5810182Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5810501Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5810652Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5811173Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5811449Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5811693Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5811920Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5812149Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5812485Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5812739Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5813054Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5813386Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5813700Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5813951Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5814264Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5814520Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5814845Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5815096Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5815426Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5815676Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5815993Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5816208Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5816457Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5816774Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5816999Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5817269Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5817583Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5817835Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5818164Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5818402Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5818627Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5818843Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5819072Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5819255Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5819451Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5819562Z E1204 11:12:30.066000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.5819736Z [W1204 11:12:30.371840842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5819739Z 2025-12-04T11:45:24.5820088Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5820408Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5820552Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5821066Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5821340Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5821598Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5821829Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5822046Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5822366Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5822620Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5822949Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5823202Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5823564Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5823813Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5824132Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5824382Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5824721Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5824975Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5825291Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5825543Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5825856Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5826068Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5826334Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5826662Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5826873Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5827125Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5827458Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5827709Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5828025Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5828262Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5828485Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5828702Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5828931Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5829112Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5829303Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5829892Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpj21u9d2w/dy/cdylrobswojasgezrfydxkgm4gkxnhmwfib2ey6x7xpxlxbmptu4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5830053Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5830286Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5830453Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5830614Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5830944Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5831096Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5831374Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5831523Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5831799Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5831966Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5832272Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5832421Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5832719Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5832930Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5833305Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5833623Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5833763Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5834293Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5834569Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5834816Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5835041Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5835256Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5835588Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5835854Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5836171Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5836424Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5836750Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5837003Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5837320Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5837574Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5837888Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5838139Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5838455Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5838702Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5839026Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5839240Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5839492Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5839807Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5840020Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5840282Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5840610Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5840859Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5841172Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5841410Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5841643Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5841859Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5842086Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5842266Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5842463Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5842574Z E1204 11:12:30.105000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.5842746Z [W1204 11:12:30.377745980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5842748Z 2025-12-04T11:45:24.5843080Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5843450Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5843593Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5844105Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5844377Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5844621Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5844860Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5845091Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5845405Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5845657Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5845969Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5846235Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5846548Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5846796Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5847111Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5847364Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5847680Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5847928Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5848259Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5848511Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5848821Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5849033Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5849282Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5849597Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5849830Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5850083Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5850398Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5850645Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5850972Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5851209Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5851433Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5851650Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5851881Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5852063Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5852253Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5852833Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpj21u9d2w/5a/c5ah736b7363i43ciu7fp2elmfkzhf6xz2pesrbnxo4gwz76ewuv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.5852992Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.5853225Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.5853435Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.5853594Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.5853903Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.5854046Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.5854341Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.5854503Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.5854778Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.5854946Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.5855239Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.5855396Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.5855696Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.5855904Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.5856244Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5856565Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5856704Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5857218Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5857507Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5857751Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5857976Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5858191Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5858508Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5858759Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5859099Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5859350Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5859665Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5859919Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5860246Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5860498Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5860809Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5861059Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5861376Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5861626Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5861940Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5862160Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5862413Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5862729Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5862940Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5863191Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5863540Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5863821Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5864132Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5864368Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5864590Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5864829Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.5865061Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.5865242Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.5865437Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.5865547Z E1204 11:12:30.111000 728136 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.5865605Z ('RERUN', {'yellow': True}) [2.8806s] [100%] 2025-12-04T11:45:24.5865964Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:31.740573427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5865967Z 2025-12-04T11:45:24.5866127Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5866444Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5866776Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5866919Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5867431Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5867707Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5867949Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5868184Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5868411Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5868723Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5868974Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5869299Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5869552Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5869864Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5870119Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5870433Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5870683Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5870997Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5871231Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5871466Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5871680Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5871906Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5872123Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5872371Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5872692Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5872914Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5873175Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5873525Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5873761Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5873988Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5874224Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5874446Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5874657Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5874869Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5875108Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5875332Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5875544Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5875753Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5876025Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5876342Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5876594Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5876907Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5877147Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5877383Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5877609Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5877833Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5878047Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5878297Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5878622Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5878873Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5879188Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5879437Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5879755Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5880004Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5880321Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5880580Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5880894Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5881145Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5881456Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5881706Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5882017Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5882279Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5882620Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5882873Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5883188Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5883481Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5883796Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5884043Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5884358Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5884597Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5886972Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5887188Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5887533Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5887787Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5888103Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5888355Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5888669Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5888917Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5889248Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5889511Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5889827Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5890079Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5890408Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5890623Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5890833Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5891046Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5891269Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5891486Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5891736Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5892051Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5892273Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5892490Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5892704Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5892914Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5893166Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5893524Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5893789Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5894115Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5894325Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5894549Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5894764Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5895034Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5895350Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5895588Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5895805Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5896022Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5896240Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5896554Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5896820Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5897135Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5897389Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5897706Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5897959Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5898274Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5898547Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5898862Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5899074Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5899288Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5899540Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5899759Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5899975Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5900191Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5900507Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5900759Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5901076Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5901328Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5901655Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5901910Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5902226Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5902478Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5902795Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5903053Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5903326Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5903540Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5903749Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5903975Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5904205Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5904520Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5904759Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5904978Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5905195Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5905414Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5905729Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5905981Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5906309Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5906563Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5906880Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5907129Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5907448Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5907718Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5908053Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5908302Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5908619Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5908885Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5909201Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5909455Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5909771Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5910023Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5910347Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5910599Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5910927Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5911140Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5911355Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5911608Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5911926Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5912179Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5912507Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5912772Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5913088Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5913388Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5913720Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5913973Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5914289Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5914502Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5914756Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5915074Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5915330Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5915645Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5915890Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5916112Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5916326Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5916544Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5916860Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5917092Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5917323Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5917555Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5917772Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5918092Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5918343Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5918562Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5918777Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5918983Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5919147Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5919362Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5919600Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5919823Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5920034Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5920292Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5920518Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5920733Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5920969Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5921193Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5921405Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5921654Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5921889Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5922100Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5922311Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5922539Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5922771Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5922988Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5923205Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5923565Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5923793Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5924013Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5924227Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5924435Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5924658Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5924889Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5925108Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5925324Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5925540Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5925854Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5926084Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5926328Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5926542Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5926759Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5927074Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5927318Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5927535Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5927752Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5927968Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5928285Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5928498Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5928715Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5928921Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5929143Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5929374Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5929596Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5929810Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5930015Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5930211Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5930399Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5930550Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5930674Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5930811Z E1204 11:12:31.479000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5930984Z [W1204 11:12:31.748901907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5930987Z 2025-12-04T11:45:24.5931142Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5931462Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5931792Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5931938Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5932463Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5932739Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5932987Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5933210Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5933467Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5933795Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5934053Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5934369Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5934619Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5934937Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5935207Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5935538Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5935789Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5936102Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5936341Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5936585Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5936803Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5937026Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5937240Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5937490Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5937814Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5938026Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5938274Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5938602Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5938839Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5939054Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5939288Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5939511Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5939722Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5939944Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5940194Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5940414Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5940626Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5940835Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5941098Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5941417Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5941666Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5941982Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5942220Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5942441Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5942654Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5942890Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5943105Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5943392Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5943708Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5943956Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5944272Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5944535Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5944861Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5945113Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5945430Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5945698Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5946011Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5946260Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5946572Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5946824Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5947140Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5947388Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5947719Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5947968Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5948284Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5948534Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5948846Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5949096Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5949420Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5949668Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5949883Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5950097Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.5950429Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5950678Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5950994Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5951243Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5951559Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5951809Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5952122Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5952384Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5952697Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5952954Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5953282Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5953494Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5953706Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5953933Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5954182Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5954396Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5954644Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5954957Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5955183Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5955397Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5955609Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5955820Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5956067Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5956384Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5956633Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5956948Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5957172Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5957400Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5957621Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5957879Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5958199Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5958436Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5958677Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5958891Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5959107Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5959424Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5959686Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5960004Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5960256Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5960577Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5960830Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5961147Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5961399Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5961726Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5961940Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5962153Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5962394Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5962610Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5962831Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5963049Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5963411Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5963663Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5963979Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5964229Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5964560Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5964813Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5965129Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5965381Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5965701Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5965939Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5966159Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5966387Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5966595Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5966825Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5967041Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5967357Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5967593Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5967827Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5968058Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5968274Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5968593Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5968842Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5969174Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5969425Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5969738Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5969989Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5970309Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5970565Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5970878Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5971146Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5971463Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5971715Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5972029Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5972280Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5972615Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5972875Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5973191Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5973474Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5973804Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5974019Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5974230Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5974481Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5974793Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5975045Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5975358Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5975613Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5975943Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5976194Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5976509Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5976758Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5977073Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5977302Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5977565Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5977880Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5978130Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5978449Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5978690Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5978907Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5979120Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5979336Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5979651Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5979881Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5980098Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5980310Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5980539Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5980860Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5981097Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5981314Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5981527Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5981733Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5981903Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.5982126Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5982361Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5982584Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5982794Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5983043Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5983309Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5983518Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5983756Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5983977Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.5984190Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5984426Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.5984647Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5984874Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5985083Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5985315Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5985534Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5985748Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5985963Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5986279Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5986534Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5986750Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5986963Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5987168Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.5987395Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.5987626Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5987845Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5988062Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5988282Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5988602Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5988831Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5989048Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5989279Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5989495Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5989811Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5990041Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.5990258Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.5990473Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.5990690Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5991028Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5991241Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.5991457Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.5991663Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.5991886Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.5992116Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.5992337Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.5992548Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.5992755Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.5992951Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.5993138Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.5993321Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.5993432Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.5993568Z E1204 11:12:31.482000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.5993750Z [W1204 11:12:31.751074453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.5993752Z 2025-12-04T11:45:24.5993909Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.5994227Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.5994547Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.5994686Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.5995206Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.5995509Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.5995752Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.5995977Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.5996191Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.5996520Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5996773Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5997088Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5997340Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5997656Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5997909Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5998222Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5998483Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.5998796Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.5999035Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.5999256Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.5999466Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.5999691Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.5999919Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6000181Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6000496Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6000709Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6000957Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6001284Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6001520Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6001729Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6001964Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6002185Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6002396Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6002605Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6002843Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6003073Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6003318Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6003528Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6003782Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6004103Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6004354Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6004682Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6004933Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6005151Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6005363Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6005613Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6005830Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6006079Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6006394Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6006645Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6006960Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6007211Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6007523Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6007785Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6008235Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6008483Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6008797Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6009051Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6009381Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6009640Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6009951Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6010202Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6010525Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6010777Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6011088Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6011337Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6011653Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6011905Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6012220Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6012467Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6012686Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6012898Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6013215Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6013581Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6013897Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6014169Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6014497Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6014747Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6015058Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6015322Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6015637Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6015883Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6016196Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6016405Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6016619Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6016830Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6017053Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6017283Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6017532Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6017850Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6018058Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6018266Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6018476Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6018697Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6018955Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6019269Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6019519Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6019831Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6020055Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6020276Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6020492Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6020743Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6021058Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6021296Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6021513Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6021728Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6021955Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6022273Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6022525Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6022840Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6023091Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6023451Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6023733Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6024047Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6024300Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6024630Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6024843Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6025054Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6025295Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6025514Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6025730Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6025946Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6026259Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6026522Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6026841Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6027093Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6027407Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6027658Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6027976Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6028254Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6028568Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6028805Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6029023Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6029250Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6029460Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6029687Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6029901Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6030215Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6030455Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6030672Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6030885Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6031109Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6031427Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6031681Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6031997Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6032247Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6032559Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6032832Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6033144Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6033444Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6033759Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6034021Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6034418Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6034669Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6034983Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6035233Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6035548Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6035796Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6036124Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6036376Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6036689Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6036902Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6037115Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6037365Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6037709Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6037957Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6038274Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6038522Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6038846Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6039098Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6039411Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6039667Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6039983Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6040197Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6040447Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6040775Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6041028Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6041342Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6041573Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6041791Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6042005Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6042243Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6042571Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6042801Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6043019Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6043234Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6043494Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6043810Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6044047Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6044267Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6044486Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6044694Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6044857Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6045068Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6045319Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6045543Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6045757Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6045993Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6046217Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6046428Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6046678Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6046919Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6047131Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6047369Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6047591Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6047815Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6048027Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6048256Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6048473Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6048685Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6048903Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6049219Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6049450Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6049677Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6049893Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6050101Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6050313Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6050542Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6050758Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6050970Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6051198Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6051526Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6051759Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6051979Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6052206Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6052423Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6052739Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6052970Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6053187Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6053456Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6053673Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6053991Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6054215Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6054436Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6054642Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6054854Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6055083Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6055307Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6055521Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6055738Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6055948Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6056132Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6056272Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6056385Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6056521Z E1204 11:12:31.484000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6056579Z ('RERUN', {'yellow': True}) [1.1881s] [100%] 2025-12-04T11:45:24.6056963Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:32.747841115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6056968Z 2025-12-04T11:45:24.6057127Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6057443Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6057765Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6057907Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6058426Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6058716Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6058964Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6059189Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6059405Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6059724Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6059978Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6060307Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6060572Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6060886Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6061137Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6061461Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6061714Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6062029Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6062269Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6062494Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6062707Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6062931Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6063144Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6063448Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6063764Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6063977Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6064227Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6064544Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6064782Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6065023Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6065260Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6065480Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6065695Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6065905Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6066156Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6066378Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6066589Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6066800Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6067051Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6067368Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6067617Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6067944Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6068181Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6068403Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6068616Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6068838Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6069054Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6069304Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6069642Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6069892Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6070206Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6070455Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6070779Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6071030Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6071342Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6071594Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6071912Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6072161Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6072475Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6072734Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6073050Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6073348Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6073661Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6073909Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6074235Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6074499Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6074811Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6075061Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6075397Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6075633Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6075850Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6076059Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6076373Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6076622Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6076940Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6077190Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6077514Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6077766Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6078079Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6078327Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6078640Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6078901Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6079226Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6079438Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6079652Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6079865Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6080100Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6080313Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6080562Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6080875Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6081084Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6081296Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6081505Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6081713Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6081973Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6082289Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6082539Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6082849Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6083060Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6083331Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6083562Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6083826Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6084141Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6084379Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6084610Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6084826Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6085040Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6085356Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6085606Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6085923Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6086173Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6086486Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6086752Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6087069Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6087321Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6087634Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6087847Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6088070Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6088317Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6088535Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6088748Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6088965Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6089290Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6089543Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6089858Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6090107Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6090422Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6090672Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6090986Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6091245Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6091559Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6091799Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6092017Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6092231Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6092437Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6092676Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6092906Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6093221Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6093486Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6093702Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6093932Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6094146Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6094463Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6094714Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6095030Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6095279Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6095592Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6095856Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6096170Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6096422Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6096736Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6096987Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6097303Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6097567Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6097894Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6098141Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6098456Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6098719Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6099033Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6099281Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6099595Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6099811Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6100023Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6100274Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6100601Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6100851Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6101169Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6101419Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6101733Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6101982Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6102313Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6102574Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6102888Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6103100Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6103404Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6103721Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6103973Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6104288Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6104522Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6104741Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6104957Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6105171Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6105502Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6105735Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6105953Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6106168Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6106384Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6106701Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6106966Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6107186Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6107402Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6107610Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6107773Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6107995Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6108235Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6108456Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6108669Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6108906Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6109131Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6109344Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6109581Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6109817Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6110028Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6110269Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6110490Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6110703Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6110913Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6111163Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6111395Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6111608Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6111824Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6112147Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6112390Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6112607Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6112823Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6113031Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6113242Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6113514Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6113731Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6113944Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6114160Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6114495Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6114729Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6114947Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6115163Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6115377Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6115694Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6115952Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6116171Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6116381Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6116597Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6116928Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6117141Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6117359Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6117562Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6117774Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6118002Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6118224Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6118436Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6118638Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6118846Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6119030Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6119169Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6119280Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6119420Z E1204 11:12:32.481000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6119589Z [W1204 11:12:32.750202759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6119592Z 2025-12-04T11:45:24.6119747Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6120064Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6120406Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6120548Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6121063Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6121349Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6121593Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6121817Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6122035Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6122350Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6122605Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6122916Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6123168Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6123538Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6123791Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6124108Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6124358Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6124674Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6124925Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6125162Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6125372Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6125598Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6125813Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6126076Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6126393Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6126601Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6126856Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6127174Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6127411Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6127623Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6127876Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6128099Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6128312Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6128521Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6128753Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6128974Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6129185Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6129406Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6129668Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6129980Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6130232Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6130557Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6130793Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6131013Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6131221Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6131445Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6131661Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6131913Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6132227Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6132489Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6132803Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6133053Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6133401Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6133649Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6133962Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6134238Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6134553Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6134799Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6135111Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6135376Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6135688Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6135938Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6136250Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6136500Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6136815Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6137064Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6137394Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6137644Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6137958Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6138193Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6138410Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6138622Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6138947Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6139209Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6139523Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6139774Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6140098Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6140347Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6140660Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6140909Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6141225Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6141474Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6141788Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6142012Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6142225Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6142438Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6142663Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6142877Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6143128Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6143495Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6143723Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6143932Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6144144Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6144352Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6144625Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6144939Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6145189Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6145503Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6145717Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6145944Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6146161Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6146414Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6146742Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6146983Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6147203Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6147420Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6147640Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6147956Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6148237Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6148554Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6148805Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6149120Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6149382Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6149699Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6149949Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6150265Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6150480Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6150694Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6150935Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6151163Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6151379Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6151596Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6151915Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6152165Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6152484Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6152749Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6153076Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6153399Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6153717Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6153987Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6154302Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6154542Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6154762Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6154976Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6155188Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6155415Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6155633Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6155966Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6156204Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6156426Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6156639Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6156856Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6157173Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6157441Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6157768Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6158021Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6158339Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6158607Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6158928Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6159178Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6159496Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6159745Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6160064Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6160316Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6160641Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6160897Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6161214Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6161467Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6161785Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6162036Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6162365Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6162597Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6162811Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6163062Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6163439Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6163696Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6164011Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6164263Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6164578Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6164835Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6165150Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6165417Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6165732Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6165945Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6166197Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6166513Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6166764Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6167095Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6167342Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6167561Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6167775Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6167992Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6168322Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6168557Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6168775Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6168991Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6169209Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6169523Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6169762Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6169996Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6170212Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6170420Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6170582Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6170796Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6171036Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6171262Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6171485Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6171738Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6171960Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6172173Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6172411Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6172646Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6172858Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6173094Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6173354Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6173568Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6173782Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6174011Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6174231Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6174460Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6174674Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6174994Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6175224Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6175445Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6175661Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6175870Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6176100Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6176345Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6176563Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6176776Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6176993Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6177324Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6177556Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6177772Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6177988Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6178207Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6178525Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6178757Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6178983Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6179198Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6179414Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6179732Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6179944Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6180162Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6180368Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6180595Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6180840Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6181063Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6181278Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6181481Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6181689Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6181875Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6182010Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6182124Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6182260Z E1204 11:12:32.483000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6182429Z [W1204 11:12:32.752348855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6182433Z 2025-12-04T11:45:24.6182589Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6182908Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6183227Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6183433Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6183952Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6184225Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6184468Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6184690Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6184921Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6185248Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6185502Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6185821Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6186071Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6186400Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6186649Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6186966Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6187217Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6187531Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6187770Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6187991Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6188213Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6188441Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6188659Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6188910Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6189223Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6189436Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6189697Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6190023Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6190258Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6190471Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6190719Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6190942Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6191155Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6191364Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6191602Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6191824Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6192036Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6192244Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6192495Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6192821Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6193072Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6193429Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6193664Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6193886Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6194110Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6194350Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6194565Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6194814Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6195129Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6195391Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6195708Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6195958Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6196273Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6196528Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6196842Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6197095Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6197426Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6197678Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6197995Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6198244Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6198565Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6198814Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6199144Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6199403Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6199719Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6199970Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6200296Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6200548Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6200860Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6201101Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6201320Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6201533Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6201850Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6202111Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6202425Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6202676Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6202993Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6203244Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6203594Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6203863Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6204190Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6204440Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6204752Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6204980Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6205191Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6205402Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6205628Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6205842Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6206096Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6206506Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6206718Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6206943Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6207156Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6207370Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6207617Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6207932Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6208181Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6208507Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6208730Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6208955Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6209173Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6209425Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6209753Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6209992Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6210210Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6210423Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6210646Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6210966Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6211220Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6211549Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6211799Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6212116Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6212364Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6212679Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6212931Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6213293Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6213523Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6213737Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6213979Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6214217Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6214434Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6214651Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6214967Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6215220Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6215540Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6215794Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6216108Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6216376Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6216694Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6216945Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6217261Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6217499Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6217733Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6217958Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6218169Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6218398Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6218614Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6220765Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6221012Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6221233Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6221447Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6229975Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6230297Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6230549Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6230866Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6231149Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6231470Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6231725Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6232038Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6232291Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6232622Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6232897Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6233211Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6233497Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6233829Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6234083Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6234403Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6234653Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6234971Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6235224Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6235542Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6235769Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6235981Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6236236Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6236551Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6236803Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6237120Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6237385Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6237718Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6237969Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6238286Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6238547Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6238865Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6239079Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6239330Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6239647Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6239899Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6240217Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6240448Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6240677Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6240896Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6241113Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6241431Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6241662Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6241882Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6242109Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6242338Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6242657Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6242897Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6243130Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6243378Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6243587Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6243747Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6243961Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6244198Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6244427Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6244639Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6244876Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6245116Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6245333Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6245579Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6245800Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6246012Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6246251Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6246487Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6246723Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6246931Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6247162Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6247381Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6247613Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6247830Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6248148Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6248379Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6248595Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6248813Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6249020Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6249232Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6249460Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6249690Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6249908Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6250126Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6250445Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6250675Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6250894Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6251128Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6251357Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6251676Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6251906Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6252138Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6252351Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6252572Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6252887Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6253100Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6253362Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6253570Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6253783Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6254013Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6254252Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6254466Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6254673Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6254870Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6255058Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6255196Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6255309Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6255465Z E1204 11:12:32.485000 728136 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6255523Z FAILED [0.9381s] [100%] 2025-12-04T11:45:24.6255526Z 2025-12-04T11:45:24.6255591Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6255748Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6255802Z Traceback (most recent call last): 2025-12-04T11:45:24.6255979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6256027Z method(*args, **kwargs) 2025-12-04T11:45:24.6256193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6256241Z method(*args, **kwargs) 2025-12-04T11:45:24.6256404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6256462Z with policy(): 2025-12-04T11:45:24.6256627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6256674Z raise RuntimeError(msg) 2025-12-04T11:45:24.6257103Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:24.6257106Z 2025-12-04T11:45:24.6257194Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6257483Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6257489Z 2025-12-04T11:45:24.6257588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6257676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6257724Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6257789Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6258403Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6258516Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6258558Z graph_break [] 2025-12-04T11:45:24.6258633Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6258716Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6259248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6259301Z current_size = base.storage().size() 2025-12-04T11:45:24.6259347Z Autotune Choices Stats: 2025-12-04T11:45:24.6259755Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-12-04T11:45:24.6259852Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6259909Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6260042Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6260303Z triton_mm_11 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6260549Z triton_mm_10 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6260810Z triton_mm_8 0.0077 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6261053Z triton_mm_14 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6261296Z triton_mm_13 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6261541Z triton_mm_17 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6261784Z triton_mm_18 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6262034Z triton_mm_12 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6262280Z triton_mm_7 0.0081 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6262537Z triton_mm_15 0.0082 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6262682Z SingleProcess AUTOTUNE benchmarking takes 0.0819 seconds and 0.7827 seconds precompiling for 18 choices 2025-12-04T11:45:24.6262840Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6262891Z Traceback (most recent call last): 2025-12-04T11:45:24.6263065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6263110Z method(*args, **kwargs) 2025-12-04T11:45:24.6263366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6263410Z method(*args, **kwargs) 2025-12-04T11:45:24.6263577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6263621Z with policy(): 2025-12-04T11:45:24.6263801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6263848Z raise RuntimeError(msg) 2025-12-04T11:45:24.6264289Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:24.6264292Z 2025-12-04T11:45:24.6264376Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6264661Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6264665Z 2025-12-04T11:45:24.6264763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6264845Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6264893Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6264970Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6265563Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6265674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6265715Z graph_break [] 2025-12-04T11:45:24.6265785Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6265867Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6266391Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6266444Z current_size = base.storage().size() 2025-12-04T11:45:24.6266490Z Autotune Choices Stats: 2025-12-04T11:45:24.6266910Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-12-04T11:45:24.6266985Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6267038Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6267173Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6267428Z triton_mm_11 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6267674Z triton_mm_10 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6267919Z triton_mm_8 0.0077 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6268173Z triton_mm_14 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6268428Z triton_mm_13 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6268667Z triton_mm_17 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6268909Z triton_mm_18 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6269166Z triton_mm_12 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6269411Z triton_mm_7 0.0081 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6269656Z triton_mm_15 0.0082 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6269799Z SingleProcess AUTOTUNE benchmarking takes 0.0819 seconds and 0.7827 seconds precompiling for 18 choices 2025-12-04T11:45:24.6269881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6269929Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6269993Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6270104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6270635Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6270676Z graph_break [] 2025-12-04T11:45:24.6270746Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6270836Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6270882Z Autotune Choices Stats: 2025-12-04T11:45:24.6271278Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0075599998235702515, "best_triton_pos": 0} 2025-12-04T11:45:24.6271352Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6271406Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6271536Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6271788Z triton_mm_34 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6272032Z triton_mm_30 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6272300Z triton_mm_33 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6272539Z triton_mm_37 0.0076 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6272782Z triton_mm_28 0.0077 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6273027Z triton_mm_31 0.0078 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6273343Z triton_mm_38 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6273591Z triton_mm_32 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6273836Z triton_mm_27 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6274082Z triton_mm_35 0.0083 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6274224Z SingleProcess AUTOTUNE benchmarking takes 0.1384 seconds and 0.4688 seconds precompiling for 21 choices 2025-12-04T11:45:24.6274284Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6274441Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6274491Z Traceback (most recent call last): 2025-12-04T11:45:24.6274661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6274706Z method(*args, **kwargs) 2025-12-04T11:45:24.6274895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6274940Z method(*args, **kwargs) 2025-12-04T11:45:24.6275104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6275146Z with policy(): 2025-12-04T11:45:24.6275314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6275358Z raise RuntimeError(msg) 2025-12-04T11:45:24.6275784Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.6275786Z 2025-12-04T11:45:24.6275868Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6276151Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6276168Z 2025-12-04T11:45:24.6276263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6276358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6276404Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6276469Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6277061Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6277169Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6277212Z graph_break [] 2025-12-04T11:45:24.6277292Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6277467Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6277989Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6278042Z current_size = base.storage().size() 2025-12-04T11:45:24.6278086Z Autotune Choices Stats: 2025-12-04T11:45:24.6278493Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.007519999984651804, "best_triton_pos": 0} 2025-12-04T11:45:24.6278564Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6278619Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6278750Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6279004Z triton_mm_11 0.0075 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6279263Z triton_mm_10 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6279508Z triton_mm_8 0.0077 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6279752Z triton_mm_14 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6279993Z triton_mm_13 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6280237Z triton_mm_17 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6280497Z triton_mm_18 0.0078 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6280753Z triton_mm_12 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6280999Z triton_mm_7 0.0081 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6281244Z triton_mm_15 0.0082 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6281388Z SingleProcess AUTOTUNE benchmarking takes 0.0819 seconds and 0.7827 seconds precompiling for 18 choices 2025-12-04T11:45:24.6281486Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6281536Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6281597Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6281708Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6282233Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6282275Z graph_break [] 2025-12-04T11:45:24.6282343Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6282425Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6282473Z Autotune Choices Stats: 2025-12-04T11:45:24.6282866Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0075599998235702515, "best_triton_pos": 0} 2025-12-04T11:45:24.6282937Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6282989Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6283134Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6283435Z triton_mm_34 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6283683Z triton_mm_30 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6283926Z triton_mm_33 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6284170Z triton_mm_37 0.0076 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6284414Z triton_mm_28 0.0077 ms 97.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6284689Z triton_mm_31 0.0078 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6284935Z triton_mm_38 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6285179Z triton_mm_32 0.0078 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6285426Z triton_mm_27 0.0080 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6285682Z triton_mm_35 0.0083 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6285826Z SingleProcess AUTOTUNE benchmarking takes 0.1384 seconds and 0.4688 seconds precompiling for 21 choices 2025-12-04T11:45:24.6285908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6285955Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6286020Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6286129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6286652Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6286695Z graph_break [] 2025-12-04T11:45:24.6286764Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6286844Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6286890Z Autotune Choices Stats: 2025-12-04T11:45:24.6287292Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_50", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007600000128149986, "best_triton_pos": 0} 2025-12-04T11:45:24.6287364Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6287416Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6287549Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6287797Z triton_mm_50 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6288039Z triton_mm_57 0.0076 ms 99.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6288283Z triton_mm_48 0.0077 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6288541Z triton_mm_51 0.0078 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6288798Z triton_mm_53 0.0078 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6289043Z triton_mm_47 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6289290Z triton_mm_54 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6289547Z triton_mm_52 0.0080 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6289790Z triton_mm_58 0.0080 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6290036Z triton_mm_55 0.0081 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6290178Z SingleProcess AUTOTUNE benchmarking takes 0.1170 seconds and 0.3216 seconds precompiling for 21 choices 2025-12-04T11:45:24.6290388Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-df53b5894a064061.xml - 2025-12-04T11:45:24.6290457Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6291093Z FAILED [0.9381s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.6291096Z 2025-12-04T11:45:24.6291177Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6291470Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6291473Z 2025-12-04T11:45:24.6291570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6291641Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6291718Z ================== 1 failed, 187 deselected, 2 rerun in 5.03s ================== 2025-12-04T11:45:24.6291760Z Got exit code 1 2025-12-04T11:45:24.6291805Z Retrying single test... 2025-12-04T11:45:24.6291964Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f4dc44b6e5b0ad29.xml 2025-12-04T11:45:24.6292028Z ============================= test session starts ============================== 2025-12-04T11:45:24.6292149Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6292197Z cachedir: .pytest_cache 2025-12-04T11:45:24.6292369Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6292439Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6292484Z configfile: pytest.ini 2025-12-04T11:45:24.6292677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6292758Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6293031Z stepcurrent: skipping 82 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6293080Z Running 1 items in this shard 2025-12-04T11:45:24.6293082Z 2025-12-04T11:45:24.6293497Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:41.002425738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6293500Z 2025-12-04T11:45:24.6293858Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6294186Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6294333Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6294856Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6295133Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6295380Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6295605Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6295841Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6296160Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6296415Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6296731Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6296985Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6297314Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6297584Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6297899Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6298149Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6298475Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6298725Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6299039Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6299294Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6299606Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6299822Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6300071Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6300386Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6300608Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6300860Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6301177Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6301426Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6301743Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6301995Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6302231Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6302449Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6302678Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6302863Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6303057Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6303690Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp89rqtj9z/7o/c7ojs7eybjpqr6aeub7jvcjr4fj6exjb2hqu3cackp6omgmqf2gf.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.6303849Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.6304087Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.6304259Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.6304418Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.6304733Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.6304875Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.6305168Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.6305320Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.6305600Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.6305768Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.6306061Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.6306209Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.6306508Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.6306735Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.6307089Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6307408Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6307550Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6308087Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6308362Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6308607Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6308834Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6309052Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6309373Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6309626Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6309952Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6310208Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6310521Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6310773Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6311090Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6311357Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6311686Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6311939Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6312257Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6312520Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6312839Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6313052Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6313342Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6313660Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6313873Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6314126Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6314444Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6314722Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6315044Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6315286Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6315510Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6315728Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6315957Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6316180Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6316387Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6316499Z E1204 11:12:42.039000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.6316671Z [W1204 11:12:42.344098654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6316674Z 2025-12-04T11:45:24.6317012Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6317345Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6317490Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6318008Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6318282Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6318547Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6318788Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6319006Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6319337Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6319596Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6319912Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6320166Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6320481Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6320749Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6321094Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6321347Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6321671Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6321925Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6322270Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6322532Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6322856Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6323071Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6323366Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6323687Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6323901Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6324176Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6324500Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6324755Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6325074Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6325315Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6325540Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6325785Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6326035Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6326218Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6326413Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6327005Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp89rqtj9z/5a/c5ah736b7363i43ciu7fp2elmfkzhf6xz2pesrbnxo4gwz76ewuv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.6327168Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.6327407Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.6327578Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.6327740Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.6328057Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.6328201Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.6328485Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.6328637Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.6328933Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.6329106Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.6329409Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.6329556Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.6329860Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.6330073Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.6330435Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6330769Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6330912Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6331442Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6331745Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6331996Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6332224Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6332448Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6332774Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6333034Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6333392Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6333683Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6334018Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6334280Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6334602Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6334860Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6335187Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6335482Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6335804Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6336062Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6336386Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6336622Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6336886Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6337208Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6337425Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6337688Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6338013Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6338271Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6338609Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6338859Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6339088Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6339309Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6339542Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6339728Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6339947Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6340081Z E1204 11:12:42.077000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.6340260Z [W1204 11:12:42.348367548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6340262Z 2025-12-04T11:45:24.6340604Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6340935Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6341090Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6341643Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6341929Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6342182Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6342416Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6342641Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6342970Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6343248Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6343604Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6343867Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6344198Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6344460Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6344785Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6345083Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6345414Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6345674Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6346003Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6346281Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6346609Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6346870Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6347129Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6347458Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6347677Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6347941Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6348297Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6348560Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6348890Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6349134Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6349366Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6349591Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6349844Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6350046Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6350247Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6350840Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp89rqtj9z/dy/cdylrobswojasgezrfydxkgm4gkxnhmwfib2ey6x7xpxlxbmptu4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.6351006Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.6351283Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.6351458Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.6351624Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.6351946Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.6352098Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.6352389Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.6352543Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.6352837Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.6353031Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.6353380Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.6353534Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.6353849Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.6354072Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.6354429Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6354782Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6354945Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6355484Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6355768Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6356044Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6356283Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6356508Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6356839Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6357106Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6357437Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6357699Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6358046Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6358309Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6358646Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6358910Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6359239Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6359501Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6359848Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6360130Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6360462Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6360682Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6360962Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6361293Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6361538Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6361801Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6362130Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6362395Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6362723Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6362987Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6363222Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6363509Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.6363747Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.6363932Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.6364139Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.6364255Z E1204 11:12:42.081000 733392 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.6364334Z ('RERUN', {'yellow': True}) [2.8700s] [100%] 2025-12-04T11:45:24.6364711Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:43.618525115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6364735Z 2025-12-04T11:45:24.6364903Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6365234Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6365567Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6365721Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6366277Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6366565Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6366822Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6367057Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6367285Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6367614Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6367898Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6368229Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6368492Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6368827Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6369090Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6369423Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6369729Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6370060Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6370307Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6370538Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6370785Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6371019Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6371244Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6371506Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6371843Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6372067Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6372330Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6372656Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6372927Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6373150Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6373462Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6373694Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6373912Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6374132Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6374396Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6374652Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6374874Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6375094Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6375356Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6375708Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6375972Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6376297Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6376545Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6376783Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6377004Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6377240Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6377472Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6377756Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6378086Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6378347Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6378677Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6378939Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6379281Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6379554Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6379883Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6380148Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6380491Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6380754Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6381079Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6381338Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6381670Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6381935Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6382262Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6382544Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6382877Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6383139Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6383511Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6383772Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6384100Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6384371Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6384615Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6384835Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6385162Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6385451Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6385780Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6386038Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6386367Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6386626Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6386955Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6387214Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6387559Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6387818Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6388148Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6388370Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6388594Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6388816Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6389076Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6389315Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6389574Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6389899Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6390120Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6390356Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6390579Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6390797Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6391057Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6391391Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6391651Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6391980Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6392198Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6392450Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6392678Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6392943Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6393323Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6393572Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6393799Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6394045Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6394286Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6394615Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6394880Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6395233Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6395495Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6395833Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6396097Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6396428Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6396690Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6397020Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6397261Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6397483Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6397733Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6397959Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6398183Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6398410Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6398749Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6399030Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6399374Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6399635Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6399962Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6400247Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6400577Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6400840Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6401172Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6401431Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6401659Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6401881Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6402116Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6402352Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6402580Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6402910Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6403160Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6403443Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6403688Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6403930Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6404257Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6404519Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6404849Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6405129Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6405458Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6405725Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6406055Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6406317Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6406653Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6406915Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6407267Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6407532Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6407860Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6408122Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6408457Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6408734Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6409082Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6409345Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6409674Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6409896Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6410135Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6410398Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6410726Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6410992Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6411322Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6411584Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6411914Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6412190Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6412518Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6412775Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6413102Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6413502Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6413762Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6414122Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6414382Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6414711Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6414951Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6415202Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6415424Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6415646Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6415973Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6416212Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6416438Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6416657Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6416879Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6417223Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6417472Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6417696Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6417923Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6418136Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6418304Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6418543Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6418798Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6419028Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6419244Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6419490Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6419742Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6419961Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6420208Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6420436Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6420656Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6420900Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6421130Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6421347Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6421564Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6421824Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6422049Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6422269Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6422487Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6422808Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6423040Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6423316Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6423547Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6423755Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6423971Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6424203Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6424452Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6424670Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6424891Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6425213Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6425449Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6425676Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6425893Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6426113Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6426456Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6426691Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6426911Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6427127Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6427348Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6427668Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6427903Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6428135Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6428343Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6428558Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6428792Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6429032Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6429247Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6429455Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6429658Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6429848Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6429989Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6430105Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6430244Z E1204 11:12:43.357000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6430420Z [W1204 11:12:43.627047233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6430423Z 2025-12-04T11:45:24.6430579Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6430923Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6431248Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6431390Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6431919Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6432198Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6432471Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6432697Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6432913Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6433234Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6433557Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6433882Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6434140Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6434460Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6434716Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6435035Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6435288Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6435621Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6435863Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6436087Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6436307Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6436532Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6436747Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6437018Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6437350Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6437564Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6437817Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6438134Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6438389Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6438601Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6438840Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6439068Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6439282Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6439494Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6439734Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6439960Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6440192Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6440405Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6440659Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6440975Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6441227Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6441543Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6441802Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6442045Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6442257Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6442483Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6442700Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6442974Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6443332Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6443582Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6443899Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6444154Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6444471Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6444724Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6445066Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6445322Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6445646Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6445897Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6446214Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6446484Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6446815Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6447071Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6447391Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6447668Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6447988Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6448240Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6448555Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6448808Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6449127Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6449364Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6449584Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6449814Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6450137Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6450390Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6450707Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6450964Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6451294Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6451567Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6451881Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6452134Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6452462Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6452719Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6453038Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6453320Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6453533Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6453746Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6453979Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6454194Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6454469Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6454786Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6455000Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6455213Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6455426Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6455640Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6455892Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6456241Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6456493Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6456809Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6457021Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6457266Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6457491Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6457753Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6458076Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6458320Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6458540Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6458755Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6458974Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6459311Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6459567Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6459890Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6460144Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6460471Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6460748Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6461074Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6461328Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6461647Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6461882Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6462099Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6462339Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6462560Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6462776Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6463000Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6463356Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6463610Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6463949Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6464211Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6464533Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6464789Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6465110Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6465361Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6465708Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6465967Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6466186Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6466405Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6466625Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6466857Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6467073Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6467389Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6467628Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6467848Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6468071Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6468288Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6468623Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6468875Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6469196Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6469449Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6469765Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6470017Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6470351Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6470617Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6470937Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6471190Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6471524Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6471782Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6472098Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6472348Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6472667Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6472920Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6473237Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6473570Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6473888Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6474105Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6474317Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6474579Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6474896Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6475170Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6475504Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6475758Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6476082Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6476356Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6476677Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6476929Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6477244Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6477462Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6477714Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6478031Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6478313Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6478645Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6478881Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6479100Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6479315Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6479533Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6479872Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6480122Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6480341Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6480558Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6480775Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6481116Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6481356Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6481575Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6481790Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6482001Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6482166Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6482379Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6482618Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6482858Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6483072Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6483357Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6483586Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6483801Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6484041Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6484290Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6484515Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6484756Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6484979Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6485197Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6485411Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6485662Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6485883Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6486098Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6486317Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6486637Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6486871Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6487089Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6487307Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6487535Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6487754Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6487985Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6488202Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6488421Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6488642Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6488979Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6489220Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6489440Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6489657Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6489871Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6490210Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6490445Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6490664Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6490880Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6491102Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6491424Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6491635Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6491854Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6492074Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6492291Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6492521Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6492744Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6492966Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6493172Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6493433Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6493634Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6493772Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6493882Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6494021Z E1204 11:12:43.360000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6494195Z [W1204 11:12:43.629241049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6494198Z 2025-12-04T11:45:24.6494356Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6494703Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6495024Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6495166Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6495687Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6495967Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6496210Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6496454Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6496671Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6496988Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6497243Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6497556Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6497810Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6498146Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6498410Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6498726Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6498976Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6499305Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6499548Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6499771Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6499983Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6500210Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6500426Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6500676Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6500996Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6501223Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6501477Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6501792Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6502035Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6502249Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6502487Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6502731Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6502956Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6503168Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6503450Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6503675Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6503920Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6504131Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6504382Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6504698Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6504953Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6505267Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6505505Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6505730Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6505963Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6506197Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6506413Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6506665Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6506981Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6507233Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6507587Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6507840Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6508156Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6508404Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6508739Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6508990Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6509306Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6509558Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6509871Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6510125Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6510440Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6510708Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6511023Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6511283Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6511600Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6511850Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6512181Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6512447Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6512761Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6512998Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6513214Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6513503Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6513820Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6514072Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6514386Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6514640Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6514955Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6515208Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6515542Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6515793Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6516111Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6516362Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6516679Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6516913Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6517136Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6517350Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6517573Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6517792Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6518058Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6518375Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6518587Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6518798Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6519012Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6519224Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6519475Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6519787Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6520054Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6520377Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6520589Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6520814Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6521032Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6521288Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6521623Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6521880Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6522099Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6522314Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6522532Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6522864Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6523116Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6523486Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6523740Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6524064Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6524314Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6524635Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6524905Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6525223Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6525437Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6525651Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6525891Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6526108Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6526344Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6526577Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6526896Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6527148Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6527486Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6527738Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6528051Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6528305Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6528626Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6528880Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6529198Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6529452Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6529671Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6529887Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6530097Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6530322Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6530540Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6530856Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6531145Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6531364Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6531579Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6531798Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6532129Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6532381Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6532696Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6532949Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6533306Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6533559Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6533874Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6534146Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6534464Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6534724Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6535042Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6535295Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6535612Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6535884Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6536212Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6536463Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6536782Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6537052Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6537375Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6537588Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6537803Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6538055Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6538373Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6538624Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6538954Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6539210Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6545299Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6545585Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6545907Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6546158Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6546518Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6546750Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6546998Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6547311Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6547585Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6547905Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6548135Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6548360Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6548573Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6548793Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6549104Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6549335Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6549573Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6549789Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6550006Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6550324Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6550566Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6550782Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6551018Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6551239Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6551397Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6551608Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6551844Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6552076Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6552283Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6552515Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6552732Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6552939Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6553172Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6553443Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6553657Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6553925Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6554141Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6554349Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6554555Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6554779Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6554992Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6555202Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6555433Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6555755Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6555977Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6556196Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6556411Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6556630Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6556835Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6557057Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6557268Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6557477Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6557689Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6557995Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6558218Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6558447Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6558657Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6558874Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6559180Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6559406Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6559616Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6559850Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6560072Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6560377Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6560585Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6560794Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6561010Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6561215Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6561437Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6561653Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6561859Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6562067Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6562260Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6562446Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6562575Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6562699Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6562827Z E1204 11:12:43.362000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6562882Z ('RERUN', {'yellow': True}) [1.0840s] [100%] 2025-12-04T11:45:24.6563218Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:12:44.521926877 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6563222Z 2025-12-04T11:45:24.6563410Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6563710Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6564012Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6564181Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6564671Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6564930Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6565187Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6565395Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6565596Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6565890Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6566126Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6566419Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6566654Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6566950Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6567203Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6567497Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6567729Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6568020Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6568240Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6568463Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6568670Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6568882Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6569082Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6569315Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6569630Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6569826Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6570058Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6570351Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6570572Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6570769Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6570986Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6571199Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6571407Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6571607Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6571827Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6572031Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6572226Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6572420Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6572667Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6572977Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6573208Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6573539Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6573766Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6573998Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6574196Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6574404Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6574602Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6574836Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6575129Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6575362Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6575669Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6575899Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6576196Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6576429Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6576725Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6576958Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6577274Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6577517Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6577806Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6578040Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6578348Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6578581Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6578874Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6579105Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6579399Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6579629Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6579924Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6580180Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6580473Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6580694Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6580893Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6581090Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6581382Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6581632Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6581931Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6582170Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6582464Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6582710Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6583008Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6583238Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6583570Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6583805Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6584094Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6584289Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6584516Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6584711Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6584919Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6585120Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6585355Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6585644Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6585861Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6586078Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6586271Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6586465Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6586696Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6587005Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6587239Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6587532Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6587728Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6587941Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6588145Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6588381Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6588673Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6588911Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6589120Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6589322Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6589524Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6589819Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6590054Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6590361Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6590616Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6590908Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6591142Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6591454Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6591690Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6591982Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6592182Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6592379Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6592599Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6592802Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6593002Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6593217Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6593545Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6593780Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6594079Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6594314Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6594626Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6594873Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6595165Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6595401Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6595714Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6595936Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6596138Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6596337Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6596531Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6596743Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6596946Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6597241Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6597483Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6597687Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6597887Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6598089Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6598379Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6598614Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6598927Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6599174Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6599467Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6599700Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6600014Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6600248Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6600543Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6600777Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6601070Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6601305Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6601596Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6601845Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6602140Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6602381Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6602680Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6602911Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6603207Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6603457Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6603670Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6603907Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6604203Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6604456Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6604751Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6604993Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6605285Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6605520Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6605817Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6606047Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6606357Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6606555Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6606795Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6607088Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6607330Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6607624Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6607858Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6608071Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6608269Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6608471Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6608765Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6608994Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6609200Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6609396Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6609600Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6609898Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6610126Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6610328Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6610526Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6610731Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6610882Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6611080Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6611301Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6611508Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6611707Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6611951Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6612170Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6612368Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6612589Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6612794Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6613012Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6613238Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6613492Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6613687Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6613884Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6614099Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6614301Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6614499Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6614699Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6615012Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6615228Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6615432Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6615633Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6615824Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6616022Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6616256Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6616473Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6616669Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6616871Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6617164Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6617395Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6617597Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6617794Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6617994Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6618291Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6618505Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6618708Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6618916Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6619147Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6620984Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6621197Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6621398Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6621591Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6621788Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6622034Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6622258Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6622457Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6622651Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6622833Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6623009Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6623140Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6623245Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6623412Z E1204 11:12:44.255000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6623570Z [W1204 11:12:44.524302170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6623572Z 2025-12-04T11:45:24.6623723Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6624017Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6624319Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6624452Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6624955Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6625270Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6625501Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6625711Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6625911Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6626203Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6626471Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6626760Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6626994Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6627287Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6627523Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6627814Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6628045Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6628336Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6628556Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6628763Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6628961Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6629194Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6629393Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6629651Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6629944Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6630138Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6630370Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6630681Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6630913Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6631109Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6631327Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6631539Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6631736Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6631932Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6632149Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6632355Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6632550Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6632745Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6632977Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6633311Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6633559Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6633852Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6634096Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6634301Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6634499Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6634707Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6634925Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6635182Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6635474Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6635706Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6635995Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6636228Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6636519Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6636752Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6637043Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6637276Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6637572Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6637816Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6638110Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6638358Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6638647Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6638877Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6639169Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6639415Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6639716Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6639946Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6640242Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6640473Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6640768Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6640985Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6641186Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6641384Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6641675Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6641907Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6642210Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6642443Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6642745Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6642978Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6643306Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6643535Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6643847Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6644092Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6644382Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6644579Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6644776Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6644973Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6645178Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6645376Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6645607Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6645907Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6646103Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6646299Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6646510Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6646705Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6646951Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6647241Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6647471Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6647762Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6647975Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6648193Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6648395Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6648629Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6648922Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6649146Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6649347Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6649546Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6649748Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6650040Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6650276Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6650571Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6650823Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6651120Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6651367Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6651660Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6651894Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6652189Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6652408Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6652618Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6652838Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6653041Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6653241Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6653471Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6653766Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6653999Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6654297Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6654529Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6654825Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6655062Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6655371Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6655608Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6655917Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6656137Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6656340Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6656536Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6656742Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6656964Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6657164Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6657461Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6657685Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6657887Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6658086Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6658287Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6658579Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6658815Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6659108Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6659339Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6659644Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6659876Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6660185Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6660417Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6660716Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6660957Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6661262Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6661494Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6661786Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6662021Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6662314Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6662548Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6662841Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6663076Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6663407Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6663609Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6663805Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6664052Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6664346Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6664592Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6664886Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6665119Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6665421Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6665668Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6665957Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6666191Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6666483Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6666680Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6666929Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6667223Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6667461Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6667754Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6667970Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6668171Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6668381Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6668584Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6668895Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6669113Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6669312Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6669512Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6669725Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6670032Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6670253Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6670454Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6670652Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6670845Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6670995Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6671196Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6671418Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6671624Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6671821Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6672043Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6672247Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6672455Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6672674Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6672892Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6673092Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6673348Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6673554Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6673750Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6673958Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6674185Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6674395Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6674592Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6674793Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6675088Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6675299Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6675504Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6675702Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6675894Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6676090Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6676304Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6676507Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6676725Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6676925Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6677231Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6677444Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6677644Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6677842Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6678056Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6678362Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6678575Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6678779Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6678982Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6679183Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6679475Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6679676Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6679878Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6680070Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6680267Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6680481Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6680684Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6680899Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6681089Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6681283Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6681455Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6681582Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6681687Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6681816Z E1204 11:12:44.257000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6681973Z [W1204 11:12:44.526436467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.6681985Z 2025-12-04T11:45:24.6682132Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.6682437Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.6682732Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.6682884Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.6683411Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.6683665Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.6683896Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.6684102Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.6684302Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6684596Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6684832Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6685154Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6685387Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6685698Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6685930Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6686223Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6686468Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6686771Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6686992Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6687196Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6687395Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6687602Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6687807Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6688052Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6688345Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6688541Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6688772Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6689065Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6689283Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6689491Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6689711Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6689929Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6690127Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6690328Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6690548Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6690765Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6690973Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6691168Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6691404Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6691697Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6691929Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6692224Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6692441Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6692651Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6692848Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6693060Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6693297Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6693539Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6693846Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6694098Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6694391Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6694623Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6694919Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6695166Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6695471Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6695706Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6696000Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6696231Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6696524Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6696757Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6697047Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6697281Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6697573Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6697806Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6698109Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6698339Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6698643Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6698879Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6699173Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6699390Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6699607Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6699822Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.6700113Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6700347Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6700641Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6700876Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6701170Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6701403Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6701696Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6701931Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6702225Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6702470Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6702760Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6702968Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6703178Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6703413Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6703620Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6703821Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6704067Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6704381Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6704577Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6704773Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6704971Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6705167Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6705400Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6705691Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6705923Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6706215Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6706412Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6706622Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6706842Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6707077Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6707383Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6707611Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6707813Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6708013Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6708231Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6708539Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6708772Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6709067Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6709303Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6709596Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6709831Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6710128Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6710361Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6710660Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6710858Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6711056Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6711287Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6711493Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6711707Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6711908Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6712204Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6712438Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6712749Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6712992Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6713320Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6713559Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6713854Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6714088Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6714379Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6714601Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6714806Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6715008Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6715200Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6715408Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.6715622Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6715927Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6716152Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6716353Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6716554Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6716759Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6717070Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6717318Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6717610Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6717843Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6718136Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6718370Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6718664Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6718902Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6719199Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6719435Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6719730Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6719977Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6720269Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6720512Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6720805Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6721040Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6721346Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6721594Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6721886Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6722088Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6722286Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6722522Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6722814Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6723046Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6723378Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6723612Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6723903Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6724135Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6724446Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6724681Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6724986Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6725184Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6725418Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6725716Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6725979Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.6726272Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6726493Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6726695Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6726894Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6727095Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6727386Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6727602Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6727803Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6728007Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6728208Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6728499Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6728732Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.6728935Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6729147Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6729337Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6729491Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.6729689Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6729921Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6730138Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6730334Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6730554Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6730759Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6730955Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6731175Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6731381Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.6731578Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6731799Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.6732007Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.6732204Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.6732402Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6732614Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6732833Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6733030Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6733289Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6733584Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6733796Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6733997Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6734210Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6734419Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.6734614Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.6734831Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6735039Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6735238Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6735444Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6735739Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6735954Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6736155Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6736356Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6736555Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6736852Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6737079Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.6737286Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.6737503Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.6737705Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.6738002Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.6738197Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.6738411Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.6738612Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.6738806Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.6739022Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.6739229Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.6739439Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.6739630Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.6739818Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.6739990Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.6740117Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.6740222Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.6740348Z E1204 11:12:44.259000 733392 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.6740394Z FAILED [0.9409s] [100%] 2025-12-04T11:45:24.6740397Z 2025-12-04T11:45:24.6740454Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6740602Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6740650Z Traceback (most recent call last): 2025-12-04T11:45:24.6740816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6740860Z method(*args, **kwargs) 2025-12-04T11:45:24.6741026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6741067Z method(*args, **kwargs) 2025-12-04T11:45:24.6741219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6741259Z with policy(): 2025-12-04T11:45:24.6741425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6741468Z raise RuntimeError(msg) 2025-12-04T11:45:24.6741869Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:24.6741872Z 2025-12-04T11:45:24.6741952Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6742217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6742241Z 2025-12-04T11:45:24.6742334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6742413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6742459Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6742523Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6743080Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6743183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6743224Z graph_break [] 2025-12-04T11:45:24.6743335Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6743415Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6743907Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6743957Z current_size = base.storage().size() 2025-12-04T11:45:24.6744000Z Autotune Choices Stats: 2025-12-04T11:45:24.6744377Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007319999858736992, "best_triton_pos": 0} 2025-12-04T11:45:24.6744450Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6744498Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6744623Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6744862Z triton_mm_10 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6745110Z triton_mm_14 0.0076 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6745359Z triton_mm_11 0.0077 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6745591Z triton_mm_17 0.0079 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6745833Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6746059Z triton_mm_8 0.0080 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6746299Z triton_mm_13 0.0080 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6746536Z triton_mm_18 0.0082 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6746768Z triton_mm_7 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6746994Z triton_mm_12 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6747129Z SingleProcess AUTOTUNE benchmarking takes 0.0784 seconds and 0.8485 seconds precompiling for 18 choices 2025-12-04T11:45:24.6747278Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6747324Z Traceback (most recent call last): 2025-12-04T11:45:24.6747484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6747525Z method(*args, **kwargs) 2025-12-04T11:45:24.6747677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6747718Z method(*args, **kwargs) 2025-12-04T11:45:24.6747880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6747922Z with policy(): 2025-12-04T11:45:24.6748076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6748117Z raise RuntimeError(msg) 2025-12-04T11:45:24.6748516Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:24.6748518Z 2025-12-04T11:45:24.6748594Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6748878Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6748880Z 2025-12-04T11:45:24.6748970Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6749048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6749093Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6749160Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6749711Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6749811Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6749849Z graph_break [] 2025-12-04T11:45:24.6749913Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6750007Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6750496Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6750559Z current_size = base.storage().size() 2025-12-04T11:45:24.6750604Z Autotune Choices Stats: 2025-12-04T11:45:24.6750976Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007319999858736992, "best_triton_pos": 0} 2025-12-04T11:45:24.6751042Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6751094Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6751217Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6751451Z triton_mm_10 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6751676Z triton_mm_14 0.0076 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6751902Z triton_mm_11 0.0077 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6752130Z triton_mm_17 0.0079 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6752358Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6752587Z triton_mm_8 0.0080 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6752825Z triton_mm_13 0.0080 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6753059Z triton_mm_18 0.0082 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6753324Z triton_mm_7 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6753550Z triton_mm_12 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6753683Z SingleProcess AUTOTUNE benchmarking takes 0.0784 seconds and 0.8485 seconds precompiling for 18 choices 2025-12-04T11:45:24.6753758Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6753821Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6753878Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6753993Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6754485Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6754525Z graph_break [] 2025-12-04T11:45:24.6754592Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6754667Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6754711Z Autotune Choices Stats: 2025-12-04T11:45:24.6755070Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_37", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-12-04T11:45:24.6755135Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6755182Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6755302Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6755531Z triton_mm_37 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6755757Z triton_mm_34 0.0076 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6755985Z triton_mm_31 0.0076 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6756209Z triton_mm_38 0.0077 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6756465Z triton_mm_33 0.0078 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6756693Z triton_mm_32 0.0078 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6756933Z triton_mm_30 0.0080 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6757158Z triton_mm_28 0.0081 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6757388Z triton_mm_27 0.0086 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6757615Z triton_mm_35 0.0086 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6757768Z SingleProcess AUTOTUNE benchmarking takes 0.1389 seconds and 0.4629 seconds precompiling for 21 choices 2025-12-04T11:45:24.6757822Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6757966Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6758019Z Traceback (most recent call last): 2025-12-04T11:45:24.6758176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6758221Z method(*args, **kwargs) 2025-12-04T11:45:24.6758378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6758423Z method(*args, **kwargs) 2025-12-04T11:45:24.6758577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6758621Z with policy(): 2025-12-04T11:45:24.6758775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6758819Z raise RuntimeError(msg) 2025-12-04T11:45:24.6759209Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.6759212Z 2025-12-04T11:45:24.6759292Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6759560Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6759565Z 2025-12-04T11:45:24.6759654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6759729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6759773Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6759833Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6760393Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6760495Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6760532Z graph_break [] 2025-12-04T11:45:24.6760599Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6760686Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6761185Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6761233Z current_size = base.storage().size() 2025-12-04T11:45:24.6761288Z Autotune Choices Stats: 2025-12-04T11:45:24.6761658Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007319999858736992, "best_triton_pos": 0} 2025-12-04T11:45:24.6761759Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6761810Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6761931Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6762164Z triton_mm_10 0.0073 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6762389Z triton_mm_14 0.0076 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6762620Z triton_mm_11 0.0077 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6762845Z triton_mm_17 0.0079 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6763076Z triton_mm_15 0.0080 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6763339Z triton_mm_8 0.0080 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6763564Z triton_mm_13 0.0080 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6763789Z triton_mm_18 0.0082 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6764019Z triton_mm_7 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6764263Z triton_mm_12 0.0084 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6764396Z SingleProcess AUTOTUNE benchmarking takes 0.0784 seconds and 0.8485 seconds precompiling for 18 choices 2025-12-04T11:45:24.6764488Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6764532Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6764591Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6764691Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6765181Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6765235Z graph_break [] 2025-12-04T11:45:24.6765298Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6765375Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6765431Z Autotune Choices Stats: 2025-12-04T11:45:24.6765796Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_37", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.007360000163316727, "best_triton_pos": 0} 2025-12-04T11:45:24.6765860Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6765909Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6766029Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6766261Z triton_mm_37 0.0074 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6766488Z triton_mm_34 0.0076 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6766717Z triton_mm_31 0.0076 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6766945Z triton_mm_38 0.0077 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6767174Z triton_mm_33 0.0078 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6767406Z triton_mm_32 0.0078 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6767629Z triton_mm_30 0.0080 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6767867Z triton_mm_28 0.0081 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6768098Z triton_mm_27 0.0086 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6768337Z triton_mm_35 0.0086 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6768470Z SingleProcess AUTOTUNE benchmarking takes 0.1389 seconds and 0.4629 seconds precompiling for 21 choices 2025-12-04T11:45:24.6768544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6768590Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6768646Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6768749Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6769243Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6769304Z graph_break [] 2025-12-04T11:45:24.6769366Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:24.6769442Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6769484Z Autotune Choices Stats: 2025-12-04T11:45:24.6769845Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_53", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007600000128149986, "best_triton_pos": 0} 2025-12-04T11:45:24.6769911Z AUTOTUNE scaled_mm(1024x32, 32x2048, 1024x1, 1x2048, 2048) 2025-12-04T11:45:24.6769959Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6770083Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6770313Z triton_mm_53 0.0076 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6770537Z triton_mm_58 0.0077 ms 99.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6770760Z triton_mm_48 0.0077 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6770988Z triton_mm_54 0.0077 ms 98.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6771213Z triton_mm_50 0.0078 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6771444Z triton_mm_57 0.0078 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6771688Z triton_mm_47 0.0078 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6771924Z triton_mm_51 0.0079 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6772153Z triton_mm_52 0.0079 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6772378Z triton_mm_55 0.0082 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6772510Z SingleProcess AUTOTUNE benchmarking takes 0.1165 seconds and 0.3217 seconds precompiling for 21 choices 2025-12-04T11:45:24.6772713Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f4dc44b6e5b0ad29.xml - 2025-12-04T11:45:24.6772776Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6773417Z FAILED [0.9409s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.6773420Z 2025-12-04T11:45:24.6773492Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6773756Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6773759Z 2025-12-04T11:45:24.6773846Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6773913Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6773982Z ================== 1 failed, 187 deselected, 2 rerun in 4.92s ================== 2025-12-04T11:45:24.6774022Z Got exit code 1 2025-12-04T11:45:24.6774229Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6774357Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.6774502Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8e24318a35e24281.xml 2025-12-04T11:45:24.6774564Z ============================= test session starts ============================== 2025-12-04T11:45:24.6774686Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6774729Z cachedir: .pytest_cache 2025-12-04T11:45:24.6774890Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6774937Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6774982Z configfile: pytest.ini 2025-12-04T11:45:24.6775144Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6775225Z collecting ... collected 188 items / 83 deselected / 105 selected 2025-12-04T11:45:24.6775279Z stepcurrent: skipping 83 already run items. 2025-12-04T11:45:24.6775325Z Running 105 items in this shard 2025-12-04T11:45:24.6775344Z 2025-12-04T11:45:24.6775563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8999s] [ 0%] 2025-12-04T11:45:24.6775789Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5344s] [ 0%] 2025-12-04T11:45:24.6775975Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.4885s] [ 0%] 2025-12-04T11:45:24.6775977Z 2025-12-04T11:45:24.6776030Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6776170Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6776218Z Traceback (most recent call last): 2025-12-04T11:45:24.6776377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6776443Z method(*args, **kwargs) 2025-12-04T11:45:24.6776598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6776656Z method(*args, **kwargs) 2025-12-04T11:45:24.6776808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6776848Z with policy(): 2025-12-04T11:45:24.6777001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6777047Z raise RuntimeError(msg) 2025-12-04T11:45:24.6777432Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:24.6777438Z 2025-12-04T11:45:24.6777513Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6777781Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6777785Z 2025-12-04T11:45:24.6777872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6777950Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6777993Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6778054Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6778536Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6778639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6778677Z graph_break [] 2025-12-04T11:45:24.6778742Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6778816Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6779306Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6779369Z current_size = base.storage().size() 2025-12-04T11:45:24.6779412Z Autotune Choices Stats: 2025-12-04T11:45:24.6779795Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6779858Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6779910Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6780032Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6780277Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6780506Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6780753Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6780991Z triton_mm_0 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6781035Z _scaled_mm 0.0263 ms 23.1% 2025-12-04T11:45:24.6781168Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:24.6781308Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6781357Z Traceback (most recent call last): 2025-12-04T11:45:24.6781513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6781559Z method(*args, **kwargs) 2025-12-04T11:45:24.6781712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6781755Z method(*args, **kwargs) 2025-12-04T11:45:24.6781906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6781946Z with policy(): 2025-12-04T11:45:24.6782104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6782146Z raise RuntimeError(msg) 2025-12-04T11:45:24.6782536Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:24.6782540Z 2025-12-04T11:45:24.6782617Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6782872Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6782877Z 2025-12-04T11:45:24.6782965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6783042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6783087Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6783159Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6783681Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6783785Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6783823Z graph_break [] 2025-12-04T11:45:24.6783887Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6783961Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6784454Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6784517Z current_size = base.storage().size() 2025-12-04T11:45:24.6784575Z Autotune Choices Stats: 2025-12-04T11:45:24.6784945Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6785003Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6785053Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6785176Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6785412Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6785640Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6785872Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6786096Z triton_mm_0 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6786142Z _scaled_mm 0.0263 ms 23.1% 2025-12-04T11:45:24.6786270Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:24.6786348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6786392Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6786457Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6786557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6787039Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6787091Z graph_break [] 2025-12-04T11:45:24.6787152Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6787229Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6787269Z Autotune Choices Stats: 2025-12-04T11:45:24.6787644Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6787700Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6787751Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6787870Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6788104Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6788341Z triton_mm_6 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6788580Z triton_mm_7 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6788807Z triton_mm_4 0.0075 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6788849Z _scaled_mm 0.0260 ms 23.7% 2025-12-04T11:45:24.6788982Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1226 seconds precompiling for 5 choices 2025-12-04T11:45:24.6789038Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6789180Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6789228Z Traceback (most recent call last): 2025-12-04T11:45:24.6789505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6789547Z method(*args, **kwargs) 2025-12-04T11:45:24.6789707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6789747Z method(*args, **kwargs) 2025-12-04T11:45:24.6789903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6789942Z with policy(): 2025-12-04T11:45:24.6790097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6790140Z raise RuntimeError(msg) 2025-12-04T11:45:24.6790527Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1172307968. 2025-12-04T11:45:24.6790530Z 2025-12-04T11:45:24.6790604Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6790861Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6790863Z 2025-12-04T11:45:24.6790971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6791047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6791094Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6791151Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6791652Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6791751Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6791790Z graph_break [] 2025-12-04T11:45:24.6791850Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6791926Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6792413Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6792484Z current_size = base.storage().size() 2025-12-04T11:45:24.6792525Z Autotune Choices Stats: 2025-12-04T11:45:24.6792897Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6792953Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6793004Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6793126Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6793398Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6793626Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6793853Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6794078Z triton_mm_0 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6794122Z _scaled_mm 0.0263 ms 23.1% 2025-12-04T11:45:24.6794256Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:24.6794329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6794374Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6794430Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6794533Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6795036Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6795078Z graph_break [] 2025-12-04T11:45:24.6795141Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6795231Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6795275Z Autotune Choices Stats: 2025-12-04T11:45:24.6795636Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6795692Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6795743Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6795864Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6796112Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6796352Z triton_mm_6 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6796579Z triton_mm_7 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6796804Z triton_mm_4 0.0075 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6796849Z _scaled_mm 0.0260 ms 23.7% 2025-12-04T11:45:24.6796979Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1226 seconds precompiling for 5 choices 2025-12-04T11:45:24.6797057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6797099Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6797159Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6797258Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6797713Z inductor [('triton_bundler_save_kernel', 32), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('async_compile_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6797750Z graph_break [] 2025-12-04T11:45:24.6797813Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6797885Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6797929Z Autotune Choices Stats: 2025-12-04T11:45:24.6798397Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "_scaled_mm", "best_time": 0.006039999891072512, "best_triton_pos": 1, "best_triton_time": 0.006039999891072512, "best_triton_kernel": "triton_mm_10", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1"} 2025-12-04T11:45:24.6798452Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6798520Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6798645Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6798688Z _scaled_mm 0.0060 ms 100.0% 2025-12-04T11:45:24.6798931Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6799163Z triton_mm_11 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6799390Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6799618Z triton_mm_8 0.0076 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6799762Z SingleProcess AUTOTUNE benchmarking takes 0.0330 seconds and 0.2202 seconds precompiling for 5 choices 2025-12-04T11:45:24.6799965Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8e24318a35e24281.xml - 2025-12-04T11:45:24.6800027Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6800604Z FAILED [0.4885s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1172307968. 2025-12-04T11:45:24.6800607Z 2025-12-04T11:45:24.6800684Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6800947Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6800950Z 2025-12-04T11:45:24.6801040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6801102Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6801172Z ================== 1 failed, 83 deselected, 2 rerun in 2.94s =================== 2025-12-04T11:45:24.6801212Z Got exit code 1 2025-12-04T11:45:24.6801256Z Retrying single test... 2025-12-04T11:45:24.6801405Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e8c60ec61ebc5761.xml 2025-12-04T11:45:24.6801465Z ============================= test session starts ============================== 2025-12-04T11:45:24.6801577Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6801621Z cachedir: .pytest_cache 2025-12-04T11:45:24.6801782Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6801831Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6801872Z configfile: pytest.ini 2025-12-04T11:45:24.6802035Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6802109Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6802377Z stepcurrent: skipping 83 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6802421Z Running 1 items in this shard 2025-12-04T11:45:24.6802424Z 2025-12-04T11:45:24.6802638Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9177s] [100%] 2025-12-04T11:45:24.6802860Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5450s] [100%] 2025-12-04T11:45:24.6803046Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.5987s] [100%] 2025-12-04T11:45:24.6803048Z 2025-12-04T11:45:24.6803101Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6803241Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6803318Z Traceback (most recent call last): 2025-12-04T11:45:24.6803476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6803541Z method(*args, **kwargs) 2025-12-04T11:45:24.6803695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6803755Z method(*args, **kwargs) 2025-12-04T11:45:24.6803907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6803946Z with policy(): 2025-12-04T11:45:24.6804101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6804144Z raise RuntimeError(msg) 2025-12-04T11:45:24.6804536Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:24.6804539Z 2025-12-04T11:45:24.6804616Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6804875Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6804880Z 2025-12-04T11:45:24.6804967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6805042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6805085Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6805144Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6805622Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6805725Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6809400Z graph_break [] 2025-12-04T11:45:24.6809483Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6809565Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6810104Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6810155Z current_size = base.storage().size() 2025-12-04T11:45:24.6810199Z Autotune Choices Stats: 2025-12-04T11:45:24.6810711Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.6810774Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6810826Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6810947Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6811190Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6811439Z triton_mm_1 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6811676Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6811900Z triton_mm_0 0.0074 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6811945Z _scaled_mm 0.0264 ms 22.8% 2025-12-04T11:45:24.6812081Z SingleProcess AUTOTUNE benchmarking takes 0.0233 seconds and 0.1415 seconds precompiling for 5 choices 2025-12-04T11:45:24.6812225Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6812274Z Traceback (most recent call last): 2025-12-04T11:45:24.6812437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6812479Z method(*args, **kwargs) 2025-12-04T11:45:24.6812634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6812676Z method(*args, **kwargs) 2025-12-04T11:45:24.6812827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6812871Z with policy(): 2025-12-04T11:45:24.6813026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6813069Z raise RuntimeError(msg) 2025-12-04T11:45:24.6813494Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:24.6813497Z 2025-12-04T11:45:24.6813575Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6813831Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6813833Z 2025-12-04T11:45:24.6813922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6814018Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6814064Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6814122Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6814632Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6814736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6814777Z graph_break [] 2025-12-04T11:45:24.6814844Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6814918Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6815405Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6815485Z current_size = base.storage().size() 2025-12-04T11:45:24.6815529Z Autotune Choices Stats: 2025-12-04T11:45:24.6815897Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.6815954Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6816004Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6816127Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6816364Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6816598Z triton_mm_1 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6816824Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6817052Z triton_mm_0 0.0074 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6817097Z _scaled_mm 0.0264 ms 22.8% 2025-12-04T11:45:24.6817225Z SingleProcess AUTOTUNE benchmarking takes 0.0233 seconds and 0.1415 seconds precompiling for 5 choices 2025-12-04T11:45:24.6817301Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6817344Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6817401Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6817501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6817996Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6818037Z graph_break [] 2025-12-04T11:45:24.6818109Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6818182Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6818227Z Autotune Choices Stats: 2025-12-04T11:45:24.6818607Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6818662Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6818713Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6818834Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6819076Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6819314Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6819557Z triton_mm_7 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6819782Z triton_mm_4 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6819830Z _scaled_mm 0.0276 ms 22.0% 2025-12-04T11:45:24.6819957Z SingleProcess AUTOTUNE benchmarking takes 0.0209 seconds and 0.1054 seconds precompiling for 5 choices 2025-12-04T11:45:24.6820014Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6820156Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6820206Z Traceback (most recent call last): 2025-12-04T11:45:24.6820364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6820413Z method(*args, **kwargs) 2025-12-04T11:45:24.6820568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6820608Z method(*args, **kwargs) 2025-12-04T11:45:24.6820761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6820798Z with policy(): 2025-12-04T11:45:24.6820956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6820999Z raise RuntimeError(msg) 2025-12-04T11:45:24.6821397Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:24.6821399Z 2025-12-04T11:45:24.6821476Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6821750Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6821753Z 2025-12-04T11:45:24.6821841Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6821919Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6821962Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6822023Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6822509Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6822611Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6822651Z graph_break [] 2025-12-04T11:45:24.6822712Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6822789Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6823327Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6823403Z current_size = base.storage().size() 2025-12-04T11:45:24.6823445Z Autotune Choices Stats: 2025-12-04T11:45:24.6823815Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.6823869Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6823922Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6824046Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6824287Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6824516Z triton_mm_1 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6824740Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6824966Z triton_mm_0 0.0074 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6825008Z _scaled_mm 0.0264 ms 22.8% 2025-12-04T11:45:24.6825137Z SingleProcess AUTOTUNE benchmarking takes 0.0233 seconds and 0.1415 seconds precompiling for 5 choices 2025-12-04T11:45:24.6825211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6825255Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6825312Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6825413Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6825924Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6825965Z graph_break [] 2025-12-04T11:45:24.6826042Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6826117Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6826157Z Autotune Choices Stats: 2025-12-04T11:45:24.6826524Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6826580Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6826629Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6826770Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6827003Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6827242Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6827468Z triton_mm_7 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6827695Z triton_mm_4 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6827738Z _scaled_mm 0.0276 ms 22.0% 2025-12-04T11:45:24.6827871Z SingleProcess AUTOTUNE benchmarking takes 0.0209 seconds and 0.1054 seconds precompiling for 5 choices 2025-12-04T11:45:24.6827946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6827995Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6828052Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6828153Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6828627Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6828666Z graph_break [] 2025-12-04T11:45:24.6828729Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6828803Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6828850Z Autotune Choices Stats: 2025-12-04T11:45:24.6829209Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.6829263Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6829329Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6829452Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6829700Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6829943Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6830170Z triton_mm_10 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6830395Z triton_mm_8 0.0079 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6830456Z _scaled_mm 0.0256 ms 23.5% 2025-12-04T11:45:24.6830581Z SingleProcess AUTOTUNE benchmarking takes 0.0297 seconds and 0.2152 seconds precompiling for 5 choices 2025-12-04T11:45:24.6830792Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e8c60ec61ebc5761.xml - 2025-12-04T11:45:24.6830853Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6831431Z FAILED [0.5987s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:24.6831434Z 2025-12-04T11:45:24.6831508Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6831766Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6831769Z 2025-12-04T11:45:24.6831857Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6831920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6831992Z ================== 1 failed, 187 deselected, 2 rerun in 3.08s ================== 2025-12-04T11:45:24.6832029Z Got exit code 1 2025-12-04T11:45:24.6832072Z Retrying single test... 2025-12-04T11:45:24.6832216Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5f1139134777cd1.xml 2025-12-04T11:45:24.6832276Z ============================= test session starts ============================== 2025-12-04T11:45:24.6832395Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6832441Z cachedir: .pytest_cache 2025-12-04T11:45:24.6832601Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6832650Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6832691Z configfile: pytest.ini 2025-12-04T11:45:24.6832856Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6832932Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6833195Z stepcurrent: skipping 83 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6833241Z Running 1 items in this shard 2025-12-04T11:45:24.6833243Z 2025-12-04T11:45:24.6833489Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9251s] [100%] 2025-12-04T11:45:24.6833718Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5581s] [100%] 2025-12-04T11:45:24.6833907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6201s] [100%] 2025-12-04T11:45:24.6833909Z 2025-12-04T11:45:24.6833963Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6834106Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6834158Z Traceback (most recent call last): 2025-12-04T11:45:24.6834334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6834377Z method(*args, **kwargs) 2025-12-04T11:45:24.6834545Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6834587Z method(*args, **kwargs) 2025-12-04T11:45:24.6834743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6834782Z with policy(): 2025-12-04T11:45:24.6834934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6834977Z raise RuntimeError(msg) 2025-12-04T11:45:24.6835367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:24.6835370Z 2025-12-04T11:45:24.6835446Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6835701Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6835704Z 2025-12-04T11:45:24.6835794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6835868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6835912Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6835969Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6836454Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6836559Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6836596Z graph_break [] 2025-12-04T11:45:24.6836658Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6836731Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6837237Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6837288Z current_size = base.storage().size() 2025-12-04T11:45:24.6837330Z Autotune Choices Stats: 2025-12-04T11:45:24.6837712Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6837769Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6837818Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6837941Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6838172Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6838425Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6838674Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6838899Z triton_mm_0 0.0074 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6838942Z _scaled_mm 0.0293 ms 20.8% 2025-12-04T11:45:24.6839071Z SingleProcess AUTOTUNE benchmarking takes 0.0253 seconds and 0.1379 seconds precompiling for 5 choices 2025-12-04T11:45:24.6839212Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6839258Z Traceback (most recent call last): 2025-12-04T11:45:24.6839418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6839459Z method(*args, **kwargs) 2025-12-04T11:45:24.6839621Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6839661Z method(*args, **kwargs) 2025-12-04T11:45:24.6839816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6839853Z with policy(): 2025-12-04T11:45:24.6840009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6840050Z raise RuntimeError(msg) 2025-12-04T11:45:24.6840439Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:24.6840443Z 2025-12-04T11:45:24.6840518Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6840777Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6840779Z 2025-12-04T11:45:24.6840869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6840958Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6841004Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6841062Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6841556Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6841656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6841693Z graph_break [] 2025-12-04T11:45:24.6841754Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6841829Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6842314Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6842389Z current_size = base.storage().size() 2025-12-04T11:45:24.6842435Z Autotune Choices Stats: 2025-12-04T11:45:24.6842799Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6842856Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6842906Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6843033Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6843323Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6843557Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6843786Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6844011Z triton_mm_0 0.0074 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6844056Z _scaled_mm 0.0293 ms 20.8% 2025-12-04T11:45:24.6844184Z SingleProcess AUTOTUNE benchmarking takes 0.0253 seconds and 0.1379 seconds precompiling for 5 choices 2025-12-04T11:45:24.6844262Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6844304Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6844363Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6844461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6844964Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6845003Z graph_break [] 2025-12-04T11:45:24.6845066Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6845139Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6845182Z Autotune Choices Stats: 2025-12-04T11:45:24.6845565Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.6845622Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6845671Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6845793Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6846021Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6846283Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6846511Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6846734Z triton_mm_4 0.0075 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6846777Z _scaled_mm 0.0248 ms 24.7% 2025-12-04T11:45:24.6846906Z SingleProcess AUTOTUNE benchmarking takes 0.0240 seconds and 0.1184 seconds precompiling for 5 choices 2025-12-04T11:45:24.6846961Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6847101Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6847150Z Traceback (most recent call last): 2025-12-04T11:45:24.6847305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6847349Z method(*args, **kwargs) 2025-12-04T11:45:24.6847502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6847545Z method(*args, **kwargs) 2025-12-04T11:45:24.6847696Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6847735Z with policy(): 2025-12-04T11:45:24.6847893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6847936Z raise RuntimeError(msg) 2025-12-04T11:45:24.6848325Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:24.6848329Z 2025-12-04T11:45:24.6848403Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6848685Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6848687Z 2025-12-04T11:45:24.6848775Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6848851Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6848895Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6848965Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6849445Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6849547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6849584Z graph_break [] 2025-12-04T11:45:24.6849647Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6849739Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6850231Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6850293Z current_size = base.storage().size() 2025-12-04T11:45:24.6850334Z Autotune Choices Stats: 2025-12-04T11:45:24.6850696Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6850751Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6850803Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6850921Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6851154Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6851383Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6851618Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6851844Z triton_mm_0 0.0074 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6851889Z _scaled_mm 0.0293 ms 20.8% 2025-12-04T11:45:24.6852018Z SingleProcess AUTOTUNE benchmarking takes 0.0253 seconds and 0.1379 seconds precompiling for 5 choices 2025-12-04T11:45:24.6852091Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6852134Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6852189Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6852289Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6852788Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6852855Z graph_break [] 2025-12-04T11:45:24.6852915Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6852988Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6853028Z Autotune Choices Stats: 2025-12-04T11:45:24.6853431Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.6853483Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6853549Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6853666Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6853922Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6854151Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6854377Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6854603Z triton_mm_4 0.0075 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6854648Z _scaled_mm 0.0248 ms 24.7% 2025-12-04T11:45:24.6854775Z SingleProcess AUTOTUNE benchmarking takes 0.0240 seconds and 0.1184 seconds precompiling for 5 choices 2025-12-04T11:45:24.6854848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6854890Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6854946Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6855045Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6855524Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6855564Z graph_break [] 2025-12-04T11:45:24.6855625Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:24.6855698Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6855744Z Autotune Choices Stats: 2025-12-04T11:45:24.6856104Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.6856178Z AUTOTUNE scaled_mm(1x1024, 1024x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.6856227Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6856358Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6856606Z triton_mm_10 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6856840Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6857066Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6857291Z triton_mm_8 0.0074 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6857342Z _scaled_mm 0.0288 ms 21.1% 2025-12-04T11:45:24.6857490Z SingleProcess AUTOTUNE benchmarking takes 0.0359 seconds and 0.2093 seconds precompiling for 5 choices 2025-12-04T11:45:24.6857679Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5f1139134777cd1.xml - 2025-12-04T11:45:24.6857742Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6858325Z FAILED [0.6201s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:24.6858329Z 2025-12-04T11:45:24.6858402Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6858662Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6858664Z 2025-12-04T11:45:24.6858749Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6858815Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6858883Z ================== 1 failed, 187 deselected, 2 rerun in 3.12s ================== 2025-12-04T11:45:24.6858923Z Got exit code 1 2025-12-04T11:45:24.6859127Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6859259Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.6859404Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3fc3d95921a59af2.xml 2025-12-04T11:45:24.6859462Z ============================= test session starts ============================== 2025-12-04T11:45:24.6859576Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6859623Z cachedir: .pytest_cache 2025-12-04T11:45:24.6859787Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6859834Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6859882Z configfile: pytest.ini 2025-12-04T11:45:24.6860061Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6860142Z collecting ... collected 188 items / 84 deselected / 104 selected 2025-12-04T11:45:24.6860197Z stepcurrent: skipping 84 already run items. 2025-12-04T11:45:24.6860243Z Running 104 items in this shard 2025-12-04T11:45:24.6860246Z 2025-12-04T11:45:24.6860478Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3406s] [ 0%] 2025-12-04T11:45:24.6860698Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8816s] [ 0%] 2025-12-04T11:45:24.6860887Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.8048s] [ 0%] 2025-12-04T11:45:24.6860890Z 2025-12-04T11:45:24.6860943Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6861084Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6861146Z Traceback (most recent call last): 2025-12-04T11:45:24.6861304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6861356Z method(*args, **kwargs) 2025-12-04T11:45:24.6861510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6861552Z method(*args, **kwargs) 2025-12-04T11:45:24.6861707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6861748Z with policy(): 2025-12-04T11:45:24.6861904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6861945Z raise RuntimeError(msg) 2025-12-04T11:45:24.6862338Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.6862342Z 2025-12-04T11:45:24.6862419Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6862686Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6862688Z 2025-12-04T11:45:24.6862781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6862856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6862899Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6862957Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6863483Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6863584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6863620Z graph_break [] 2025-12-04T11:45:24.6863685Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6863760Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6864261Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6864312Z current_size = base.storage().size() 2025-12-04T11:45:24.6864367Z Autotune Choices Stats: 2025-12-04T11:45:24.6864740Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.6864807Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6864863Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6864985Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6865239Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6865483Z triton_mm_16 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6865713Z triton_mm_7 0.0067 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6865939Z triton_mm_12 0.0071 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6866169Z triton_mm_6 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6866400Z triton_mm_9 0.0075 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6866624Z triton_mm_14 0.0084 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6866852Z triton_mm_10 0.0085 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6867080Z triton_mm_5 0.0087 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6867311Z triton_mm_18 0.0100 ms 62.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6867448Z SingleProcess AUTOTUNE benchmarking takes 0.0815 seconds and 0.3839 seconds precompiling for 20 choices 2025-12-04T11:45:24.6867597Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6867646Z Traceback (most recent call last): 2025-12-04T11:45:24.6867821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6867867Z method(*args, **kwargs) 2025-12-04T11:45:24.6868025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6868086Z method(*args, **kwargs) 2025-12-04T11:45:24.6868246Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6868291Z with policy(): 2025-12-04T11:45:24.6868446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6868494Z raise RuntimeError(msg) 2025-12-04T11:45:24.6868888Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.6868905Z 2025-12-04T11:45:24.6868977Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6869243Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6869257Z 2025-12-04T11:45:24.6869343Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6869417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6869460Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6869518Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6870006Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6870106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6870144Z graph_break [] 2025-12-04T11:45:24.6870213Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6870285Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6870771Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6870819Z current_size = base.storage().size() 2025-12-04T11:45:24.6870859Z Autotune Choices Stats: 2025-12-04T11:45:24.6871228Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.6871293Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6871343Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6871462Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6871715Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6871943Z triton_mm_16 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6872183Z triton_mm_7 0.0067 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6872407Z triton_mm_12 0.0071 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6872637Z triton_mm_6 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6872863Z triton_mm_9 0.0075 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6873107Z triton_mm_14 0.0084 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6873367Z triton_mm_10 0.0085 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6873592Z triton_mm_5 0.0087 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6873821Z triton_mm_18 0.0100 ms 62.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6873953Z SingleProcess AUTOTUNE benchmarking takes 0.0815 seconds and 0.3839 seconds precompiling for 20 choices 2025-12-04T11:45:24.6874026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6874072Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6874132Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6874232Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6874724Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6874763Z graph_break [] 2025-12-04T11:45:24.6874826Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6874900Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6874940Z Autotune Choices Stats: 2025-12-04T11:45:24.6875302Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6875378Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6875432Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6875554Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6875803Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6876032Z triton_mm_35 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6876254Z triton_mm_31 0.0069 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6876480Z triton_mm_26 0.0072 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6876722Z triton_mm_25 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6876964Z triton_mm_28 0.0081 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6877186Z triton_mm_29 0.0086 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6877413Z triton_mm_33 0.0086 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6877645Z triton_mm_24 0.0088 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6877873Z triton_mm_37 0.0099 ms 62.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6878001Z SingleProcess AUTOTUNE benchmarking takes 0.1092 seconds and 0.2756 seconds precompiling for 20 choices 2025-12-04T11:45:24.6878054Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6878196Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6878243Z Traceback (most recent call last): 2025-12-04T11:45:24.6878399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6878440Z method(*args, **kwargs) 2025-12-04T11:45:24.6878596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6878636Z method(*args, **kwargs) 2025-12-04T11:45:24.6878786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6878823Z with policy(): 2025-12-04T11:45:24.6878975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6879014Z raise RuntimeError(msg) 2025-12-04T11:45:24.6879414Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6879418Z 2025-12-04T11:45:24.6879510Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6879770Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6879773Z 2025-12-04T11:45:24.6879861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6879933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6879975Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6880033Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6880521Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6880656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6880693Z graph_break [] 2025-12-04T11:45:24.6880755Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6880829Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6881310Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6881359Z current_size = base.storage().size() 2025-12-04T11:45:24.6881401Z Autotune Choices Stats: 2025-12-04T11:45:24.6881769Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.6881837Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6881890Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6882012Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6882244Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6882474Z triton_mm_16 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6882698Z triton_mm_7 0.0067 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6882930Z triton_mm_12 0.0071 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6883160Z triton_mm_6 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6883440Z triton_mm_9 0.0075 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6883668Z triton_mm_14 0.0084 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6883892Z triton_mm_10 0.0085 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6884120Z triton_mm_5 0.0087 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6884374Z triton_mm_18 0.0100 ms 62.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6884517Z SingleProcess AUTOTUNE benchmarking takes 0.0815 seconds and 0.3839 seconds precompiling for 20 choices 2025-12-04T11:45:24.6884592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6884636Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6884695Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6884796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6885281Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6885320Z graph_break [] 2025-12-04T11:45:24.6885383Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6885455Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6885500Z Autotune Choices Stats: 2025-12-04T11:45:24.6885867Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6885933Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6885983Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6886108Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6886345Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6886572Z triton_mm_35 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6886812Z triton_mm_31 0.0069 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6887039Z triton_mm_26 0.0072 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6887286Z triton_mm_25 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6887512Z triton_mm_28 0.0081 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6887746Z triton_mm_29 0.0086 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6887985Z triton_mm_33 0.0086 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6888221Z triton_mm_24 0.0088 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6888451Z triton_mm_37 0.0099 ms 62.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6888579Z SingleProcess AUTOTUNE benchmarking takes 0.1092 seconds and 0.2756 seconds precompiling for 20 choices 2025-12-04T11:45:24.6888653Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6888697Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6888756Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6888858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6889345Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6889384Z graph_break [] 2025-12-04T11:45:24.6889446Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6889521Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6889562Z Autotune Choices Stats: 2025-12-04T11:45:24.6889925Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006198999937623739, "best_triton_pos": 0} 2025-12-04T11:45:24.6889987Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6890039Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6890158Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6890403Z triton_mm_54 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6890629Z triton_mm_55 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6890871Z triton_mm_45 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6891095Z triton_mm_50 0.0071 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6891325Z triton_mm_44 0.0075 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6891553Z triton_mm_47 0.0082 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6891796Z triton_mm_52 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6892023Z triton_mm_48 0.0084 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6892250Z triton_mm_43 0.0088 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6892477Z triton_mm_49 0.0100 ms 62.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6892611Z SingleProcess AUTOTUNE benchmarking takes 0.1296 seconds and 0.2590 seconds precompiling for 20 choices 2025-12-04T11:45:24.6892803Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3fc3d95921a59af2.xml - 2025-12-04T11:45:24.6892866Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6893478Z FAILED [0.8048s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6893481Z 2025-12-04T11:45:24.6893556Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6893816Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6893820Z 2025-12-04T11:45:24.6893908Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6893971Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6894039Z ================== 1 failed, 84 deselected, 2 rerun in 4.05s =================== 2025-12-04T11:45:24.6894078Z Got exit code 1 2025-12-04T11:45:24.6894118Z Retrying single test... 2025-12-04T11:45:24.6894279Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-95abaec75b43cbcd.xml 2025-12-04T11:45:24.6894338Z ============================= test session starts ============================== 2025-12-04T11:45:24.6894453Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6894509Z cachedir: .pytest_cache 2025-12-04T11:45:24.6894678Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6894724Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6894767Z configfile: pytest.ini 2025-12-04T11:45:24.6894927Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6895005Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6895257Z stepcurrent: skipping 84 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6895322Z Running 1 items in this shard 2025-12-04T11:45:24.6895325Z 2025-12-04T11:45:24.6895539Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.5091s] [100%] 2025-12-04T11:45:24.6895766Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9523s] [100%] 2025-12-04T11:45:24.6895952Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.8624s] [100%] 2025-12-04T11:45:24.6895955Z 2025-12-04T11:45:24.6896008Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6896149Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6896198Z Traceback (most recent call last): 2025-12-04T11:45:24.6896355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6896406Z method(*args, **kwargs) 2025-12-04T11:45:24.6896565Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6896604Z method(*args, **kwargs) 2025-12-04T11:45:24.6896757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6896795Z with policy(): 2025-12-04T11:45:24.6896949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6896990Z raise RuntimeError(msg) 2025-12-04T11:45:24.6897378Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.6897382Z 2025-12-04T11:45:24.6897457Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6897715Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6897717Z 2025-12-04T11:45:24.6897802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6897875Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6897918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6897985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6898483Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6898590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6898633Z graph_break [] 2025-12-04T11:45:24.6898694Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6898769Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6899257Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6899317Z current_size = base.storage().size() 2025-12-04T11:45:24.6899368Z Autotune Choices Stats: 2025-12-04T11:45:24.6899741Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6899805Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6899858Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6899979Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6900216Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6900451Z triton_mm_16 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6900679Z triton_mm_7 0.0068 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6900906Z triton_mm_12 0.0070 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6901136Z triton_mm_6 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6901368Z triton_mm_9 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6901593Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6901832Z triton_mm_14 0.0085 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6902058Z triton_mm_5 0.0089 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6902298Z triton_mm_18 0.0098 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6902429Z SingleProcess AUTOTUNE benchmarking takes 0.0930 seconds and 0.4046 seconds precompiling for 20 choices 2025-12-04T11:45:24.6902569Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6902617Z Traceback (most recent call last): 2025-12-04T11:45:24.6902772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6902815Z method(*args, **kwargs) 2025-12-04T11:45:24.6902969Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6903023Z method(*args, **kwargs) 2025-12-04T11:45:24.6903186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6903226Z with policy(): 2025-12-04T11:45:24.6903413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6903454Z raise RuntimeError(msg) 2025-12-04T11:45:24.6903846Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.6903849Z 2025-12-04T11:45:24.6903922Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6904184Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6904187Z 2025-12-04T11:45:24.6904274Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6904348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6904391Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6904456Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6904940Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6905043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6905080Z graph_break [] 2025-12-04T11:45:24.6905144Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6905218Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6905706Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6905770Z current_size = base.storage().size() 2025-12-04T11:45:24.6905810Z Autotune Choices Stats: 2025-12-04T11:45:24.6906191Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6906259Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6906311Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6906431Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6906666Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6906895Z triton_mm_16 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6907146Z triton_mm_7 0.0068 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6907383Z triton_mm_12 0.0070 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6907616Z triton_mm_6 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6907847Z triton_mm_9 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6908071Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6908296Z triton_mm_14 0.0085 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6908518Z triton_mm_5 0.0089 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6908748Z triton_mm_18 0.0098 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6908878Z SingleProcess AUTOTUNE benchmarking takes 0.0930 seconds and 0.4046 seconds precompiling for 20 choices 2025-12-04T11:45:24.6908955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6909003Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6909058Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6909160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6909663Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6909703Z graph_break [] 2025-12-04T11:45:24.6909764Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6909838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6909893Z Autotune Choices Stats: 2025-12-04T11:45:24.6910260Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.6910323Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6910374Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6910495Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6910743Z triton_mm_35 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6910987Z triton_mm_36 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6911211Z triton_mm_31 0.0065 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6911438Z triton_mm_26 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6911673Z triton_mm_25 0.0074 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6911904Z triton_mm_28 0.0075 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6912126Z triton_mm_29 0.0079 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6912349Z triton_mm_33 0.0079 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6912573Z triton_mm_24 0.0084 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6912801Z triton_mm_37 0.0098 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6912931Z SingleProcess AUTOTUNE benchmarking takes 0.1201 seconds and 0.2759 seconds precompiling for 20 choices 2025-12-04T11:45:24.6912983Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6913134Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6913180Z Traceback (most recent call last): 2025-12-04T11:45:24.6913372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6913415Z method(*args, **kwargs) 2025-12-04T11:45:24.6913588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6913628Z method(*args, **kwargs) 2025-12-04T11:45:24.6913781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6913818Z with policy(): 2025-12-04T11:45:24.6913972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6914012Z raise RuntimeError(msg) 2025-12-04T11:45:24.6914401Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6914416Z 2025-12-04T11:45:24.6914490Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6914764Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6914766Z 2025-12-04T11:45:24.6914854Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6914928Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6914975Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6915031Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6915514Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6915614Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6915652Z graph_break [] 2025-12-04T11:45:24.6915713Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6915788Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6916280Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6916327Z current_size = base.storage().size() 2025-12-04T11:45:24.6916367Z Autotune Choices Stats: 2025-12-04T11:45:24.6916736Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6916806Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6916856Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6916978Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6917228Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6917472Z triton_mm_16 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6917700Z triton_mm_7 0.0068 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6917927Z triton_mm_12 0.0070 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6918162Z triton_mm_6 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6918401Z triton_mm_9 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6918637Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6918860Z triton_mm_14 0.0085 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6919084Z triton_mm_5 0.0089 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6919318Z triton_mm_18 0.0098 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6919447Z SingleProcess AUTOTUNE benchmarking takes 0.0930 seconds and 0.4046 seconds precompiling for 20 choices 2025-12-04T11:45:24.6919522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6919567Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6919627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6919725Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6920220Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6920259Z graph_break [] 2025-12-04T11:45:24.6920323Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6920395Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6920435Z Autotune Choices Stats: 2025-12-04T11:45:24.6920808Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.6920870Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6920921Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6921046Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6921290Z triton_mm_35 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6921518Z triton_mm_36 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6921744Z triton_mm_31 0.0065 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6921969Z triton_mm_26 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6922224Z triton_mm_25 0.0074 ms 81.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6922449Z triton_mm_28 0.0075 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6922673Z triton_mm_29 0.0079 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6922897Z triton_mm_33 0.0079 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6923124Z triton_mm_24 0.0084 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6923392Z triton_mm_37 0.0098 ms 61.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6923522Z SingleProcess AUTOTUNE benchmarking takes 0.1201 seconds and 0.2759 seconds precompiling for 20 choices 2025-12-04T11:45:24.6923598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6923640Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6923700Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6923800Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6924285Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6924324Z graph_break [] 2025-12-04T11:45:24.6924385Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6924460Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6924520Z Autotune Choices Stats: 2025-12-04T11:45:24.6924903Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005998999811708927, "best_triton_pos": 0} 2025-12-04T11:45:24.6924966Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6925016Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6925136Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6925370Z triton_mm_54 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6925595Z triton_mm_55 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6925837Z triton_mm_45 0.0067 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6926081Z triton_mm_50 0.0069 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6926123Z _scaled_mm 0.0070 ms 86.2% 2025-12-04T11:45:24.6926352Z triton_mm_44 0.0070 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6926580Z triton_mm_47 0.0078 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6926806Z triton_mm_48 0.0081 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6927028Z triton_mm_52 0.0082 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6927254Z triton_mm_43 0.0086 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6927384Z SingleProcess AUTOTUNE benchmarking takes 0.1322 seconds and 0.2529 seconds precompiling for 20 choices 2025-12-04T11:45:24.6927576Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-95abaec75b43cbcd.xml - 2025-12-04T11:45:24.6927640Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6928229Z FAILED [0.8624s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6928232Z 2025-12-04T11:45:24.6928320Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6928582Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6928585Z 2025-12-04T11:45:24.6928673Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6928745Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6928815Z ================== 1 failed, 187 deselected, 2 rerun in 4.34s ================== 2025-12-04T11:45:24.6928852Z Got exit code 1 2025-12-04T11:45:24.6928893Z Retrying single test... 2025-12-04T11:45:24.6929037Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04f37de4bce51f22.xml 2025-12-04T11:45:24.6929096Z ============================= test session starts ============================== 2025-12-04T11:45:24.6929207Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6929261Z cachedir: .pytest_cache 2025-12-04T11:45:24.6929423Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6929480Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6929523Z configfile: pytest.ini 2025-12-04T11:45:24.6929685Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6929762Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6930013Z stepcurrent: skipping 84 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6930057Z Running 1 items in this shard 2025-12-04T11:45:24.6930059Z 2025-12-04T11:45:24.6930274Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.2896s] [100%] 2025-12-04T11:45:24.6930489Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8713s] [100%] 2025-12-04T11:45:24.6930679Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.7970s] [100%] 2025-12-04T11:45:24.6930682Z 2025-12-04T11:45:24.6930733Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6930871Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6930923Z Traceback (most recent call last): 2025-12-04T11:45:24.6931080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6931122Z method(*args, **kwargs) 2025-12-04T11:45:24.6931277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6931320Z method(*args, **kwargs) 2025-12-04T11:45:24.6931474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6931512Z with policy(): 2025-12-04T11:45:24.6931665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6931707Z raise RuntimeError(msg) 2025-12-04T11:45:24.6932109Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.6932112Z 2025-12-04T11:45:24.6932187Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6932446Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6932460Z 2025-12-04T11:45:24.6932547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6932623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6932667Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6932727Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6933217Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6933368Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6933421Z graph_break [] 2025-12-04T11:45:24.6933487Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6933561Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6934047Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6934099Z current_size = base.storage().size() 2025-12-04T11:45:24.6934139Z Autotune Choices Stats: 2025-12-04T11:45:24.6934510Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6934574Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6934625Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6934744Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6934988Z triton_mm_16 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6935218Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6935444Z triton_mm_12 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6935672Z triton_mm_7 0.0069 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6935912Z triton_mm_6 0.0076 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6936140Z triton_mm_9 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6936378Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6936602Z triton_mm_14 0.0087 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6936827Z triton_mm_5 0.0087 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6937055Z triton_mm_11 0.0098 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6937200Z SingleProcess AUTOTUNE benchmarking takes 0.0727 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:24.6937357Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6937409Z Traceback (most recent call last): 2025-12-04T11:45:24.6937563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6937605Z method(*args, **kwargs) 2025-12-04T11:45:24.6937756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6937797Z method(*args, **kwargs) 2025-12-04T11:45:24.6937947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6937985Z with policy(): 2025-12-04T11:45:24.6938136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6938178Z raise RuntimeError(msg) 2025-12-04T11:45:24.6938564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.6938566Z 2025-12-04T11:45:24.6938639Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6938898Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6938901Z 2025-12-04T11:45:24.6938986Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6939061Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6939107Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6939165Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6939655Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6939766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6939802Z graph_break [] 2025-12-04T11:45:24.6939865Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6939939Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6940433Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6940481Z current_size = base.storage().size() 2025-12-04T11:45:24.6940524Z Autotune Choices Stats: 2025-12-04T11:45:24.6940893Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6940968Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6941018Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6941149Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6941383Z triton_mm_16 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6941613Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6941839Z triton_mm_12 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6942067Z triton_mm_7 0.0069 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6942297Z triton_mm_6 0.0076 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6942524Z triton_mm_9 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6942748Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6942976Z triton_mm_14 0.0087 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6943198Z triton_mm_5 0.0087 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6943458Z triton_mm_11 0.0098 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6943602Z SingleProcess AUTOTUNE benchmarking takes 0.0727 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:24.6943677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6943722Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6943778Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6943892Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6944376Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6944414Z graph_break [] 2025-12-04T11:45:24.6944476Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6944549Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6944603Z Autotune Choices Stats: 2025-12-04T11:45:24.6944966Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6945041Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6945093Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6945212Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6945444Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6945680Z triton_mm_35 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6945910Z triton_mm_26 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6946138Z triton_mm_25 0.0070 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6946361Z triton_mm_31 0.0071 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6946589Z triton_mm_28 0.0081 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6946812Z triton_mm_29 0.0083 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6947035Z triton_mm_33 0.0084 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6947269Z triton_mm_24 0.0088 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6947498Z triton_mm_30 0.0098 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6947646Z SingleProcess AUTOTUNE benchmarking takes 0.1072 seconds and 0.2828 seconds precompiling for 20 choices 2025-12-04T11:45:24.6947699Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6947839Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6947885Z Traceback (most recent call last): 2025-12-04T11:45:24.6948040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6948080Z method(*args, **kwargs) 2025-12-04T11:45:24.6948232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6948283Z method(*args, **kwargs) 2025-12-04T11:45:24.6948434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6948482Z with policy(): 2025-12-04T11:45:24.6948636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6948678Z raise RuntimeError(msg) 2025-12-04T11:45:24.6949067Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6949070Z 2025-12-04T11:45:24.6949144Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6949406Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6949410Z 2025-12-04T11:45:24.6949501Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6949575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6949619Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6949676Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6950167Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6950266Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6950307Z graph_break [] 2025-12-04T11:45:24.6950368Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6950446Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6950928Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.6950976Z current_size = base.storage().size() 2025-12-04T11:45:24.6951020Z Autotune Choices Stats: 2025-12-04T11:45:24.6951397Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6951474Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6951525Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6951646Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6951884Z triton_mm_16 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6952125Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6952368Z triton_mm_12 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6952608Z triton_mm_7 0.0069 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6952839Z triton_mm_6 0.0076 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6953066Z triton_mm_9 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6953319Z triton_mm_10 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6953545Z triton_mm_14 0.0087 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6953770Z triton_mm_5 0.0087 ms 70.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6953998Z triton_mm_11 0.0098 ms 62.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6954132Z SingleProcess AUTOTUNE benchmarking takes 0.0727 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:24.6954209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6954253Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6954313Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6954412Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6954913Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6954950Z graph_break [] 2025-12-04T11:45:24.6955015Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6955088Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6955134Z Autotune Choices Stats: 2025-12-04T11:45:24.6955508Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.6955573Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6955624Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6955752Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6955988Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6956251Z triton_mm_35 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6956478Z triton_mm_26 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6956706Z triton_mm_25 0.0070 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6956932Z triton_mm_31 0.0071 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6957162Z triton_mm_28 0.0081 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6957387Z triton_mm_29 0.0083 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6957611Z triton_mm_33 0.0084 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6957833Z triton_mm_24 0.0088 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6958070Z triton_mm_30 0.0098 ms 63.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6958199Z SingleProcess AUTOTUNE benchmarking takes 0.1072 seconds and 0.2828 seconds precompiling for 20 choices 2025-12-04T11:45:24.6958273Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6958316Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6958374Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6958484Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6958975Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.6959015Z graph_break [] 2025-12-04T11:45:24.6959078Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:24.6959152Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.6959192Z Autotune Choices Stats: 2025-12-04T11:45:24.6959562Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:24.6959638Z AUTOTUNE scaled_mm(1x1024, 1024x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.6959694Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.6959825Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.6960064Z triton_mm_54 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.6960292Z triton_mm_55 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6960519Z triton_mm_50 0.0065 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6960746Z triton_mm_45 0.0068 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6960976Z triton_mm_44 0.0074 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.6961200Z triton_mm_48 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6961429Z triton_mm_47 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6961658Z triton_mm_52 0.0082 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6961884Z triton_mm_43 0.0086 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.6962117Z triton_mm_49 0.0098 ms 62.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.6962261Z SingleProcess AUTOTUNE benchmarking takes 0.1342 seconds and 0.2605 seconds precompiling for 20 choices 2025-12-04T11:45:24.6962452Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-04f37de4bce51f22.xml - 2025-12-04T11:45:24.6962516Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6963109Z FAILED [0.7970s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.6963112Z 2025-12-04T11:45:24.6963188Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6963485Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6963500Z 2025-12-04T11:45:24.6963591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6963670Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6963740Z ================== 1 failed, 187 deselected, 2 rerun in 3.98s ================== 2025-12-04T11:45:24.6963778Z Got exit code 1 2025-12-04T11:45:24.6963991Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.6964118Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.6964265Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d69be3be89cd7bc.xml 2025-12-04T11:45:24.6964323Z ============================= test session starts ============================== 2025-12-04T11:45:24.6964434Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6964475Z cachedir: .pytest_cache 2025-12-04T11:45:24.6964635Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6964682Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6964722Z configfile: pytest.ini 2025-12-04T11:45:24.6964883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6964958Z collecting ... collected 188 items / 85 deselected / 103 selected 2025-12-04T11:45:24.6965012Z stepcurrent: skipping 85 already run items. 2025-12-04T11:45:24.6965059Z Running 103 items in this shard 2025-12-04T11:45:24.6965060Z 2025-12-04T11:45:24.6965274Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6000s] [ 0%] 2025-12-04T11:45:24.6965481Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2641s] [ 0%] 2025-12-04T11:45:24.6965666Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2191s] [ 0%] 2025-12-04T11:45:24.6965668Z 2025-12-04T11:45:24.6965721Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6965857Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6965902Z Traceback (most recent call last): 2025-12-04T11:45:24.6966073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6966115Z method(*args, **kwargs) 2025-12-04T11:45:24.6966269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6966309Z method(*args, **kwargs) 2025-12-04T11:45:24.6966480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6966520Z with policy(): 2025-12-04T11:45:24.6966676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6966719Z raise RuntimeError(msg) 2025-12-04T11:45:24.6967100Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.6967102Z 2025-12-04T11:45:24.6967176Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6967440Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6967454Z 2025-12-04T11:45:24.6967542Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6967615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6967658Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6967714Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6967781Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6967881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6967922Z graph_break [] 2025-12-04T11:45:24.6967986Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6968129Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6968175Z Traceback (most recent call last): 2025-12-04T11:45:24.6968332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6968371Z method(*args, **kwargs) 2025-12-04T11:45:24.6968524Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6968563Z method(*args, **kwargs) 2025-12-04T11:45:24.6968712Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6968749Z with policy(): 2025-12-04T11:45:24.6968904Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6968944Z raise RuntimeError(msg) 2025-12-04T11:45:24.6969331Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.6969334Z 2025-12-04T11:45:24.6969407Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6969658Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6969660Z 2025-12-04T11:45:24.6969746Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6969831Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6969876Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6969936Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6970007Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6970104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6970154Z graph_break [] 2025-12-04T11:45:24.6970214Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6970289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6970330Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6970387Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6970482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6970550Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6970588Z graph_break [] 2025-12-04T11:45:24.6970648Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6970714Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6970855Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6970915Z Traceback (most recent call last): 2025-12-04T11:45:24.6971075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6971115Z method(*args, **kwargs) 2025-12-04T11:45:24.6971269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6971309Z method(*args, **kwargs) 2025-12-04T11:45:24.6971461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6971499Z with policy(): 2025-12-04T11:45:24.6971654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6971697Z raise RuntimeError(msg) 2025-12-04T11:45:24.6972075Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.6972078Z 2025-12-04T11:45:24.6972153Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6972405Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6972407Z 2025-12-04T11:45:24.6972495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6972568Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6972612Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6972668Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6972744Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6972843Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6972881Z graph_break [] 2025-12-04T11:45:24.6972939Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6973012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6973054Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6973111Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6973208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6973330Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6973368Z graph_break [] 2025-12-04T11:45:24.6973431Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6973508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6973550Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6973618Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6973717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6973781Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6973819Z graph_break [] 2025-12-04T11:45:24.6973876Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6974068Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3d69be3be89cd7bc.xml - 2025-12-04T11:45:24.6974128Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6974699Z FAILED [0.2191s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.6974727Z 2025-12-04T11:45:24.6974802Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6975054Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6975056Z 2025-12-04T11:45:24.6975149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6975212Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6975283Z ================== 1 failed, 85 deselected, 2 rerun in 2.10s =================== 2025-12-04T11:45:24.6975319Z Got exit code 1 2025-12-04T11:45:24.6975362Z Retrying single test... 2025-12-04T11:45:24.6975513Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51d13268d02d60f4.xml 2025-12-04T11:45:24.6975573Z ============================= test session starts ============================== 2025-12-04T11:45:24.6975683Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6975727Z cachedir: .pytest_cache 2025-12-04T11:45:24.6975883Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6975931Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6975972Z configfile: pytest.ini 2025-12-04T11:45:24.6976137Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6976213Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6976463Z stepcurrent: skipping 85 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6976510Z Running 1 items in this shard 2025-12-04T11:45:24.6976512Z 2025-12-04T11:45:24.6976727Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6034s] [100%] 2025-12-04T11:45:24.6976932Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2595s] [100%] 2025-12-04T11:45:24.6977127Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2179s] [100%] 2025-12-04T11:45:24.6977130Z 2025-12-04T11:45:24.6977185Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6977324Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6977385Z Traceback (most recent call last): 2025-12-04T11:45:24.6977543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6977585Z method(*args, **kwargs) 2025-12-04T11:45:24.6977738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6977782Z method(*args, **kwargs) 2025-12-04T11:45:24.6977938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6977976Z with policy(): 2025-12-04T11:45:24.6978127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6978184Z raise RuntimeError(msg) 2025-12-04T11:45:24.6978577Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.6978580Z 2025-12-04T11:45:24.6978654Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6978905Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6978909Z 2025-12-04T11:45:24.6978996Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6979073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6979114Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6979178Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6979246Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6979346Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6979382Z graph_break [] 2025-12-04T11:45:24.6979443Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6979577Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6979624Z Traceback (most recent call last): 2025-12-04T11:45:24.6979779Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6979819Z method(*args, **kwargs) 2025-12-04T11:45:24.6979972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6980021Z method(*args, **kwargs) 2025-12-04T11:45:24.6980171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6980208Z with policy(): 2025-12-04T11:45:24.6980360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6980403Z raise RuntimeError(msg) 2025-12-04T11:45:24.6980796Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.6980799Z 2025-12-04T11:45:24.6980873Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6981125Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6981128Z 2025-12-04T11:45:24.6981230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6981303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6981349Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6981404Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6981471Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6981569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6981608Z graph_break [] 2025-12-04T11:45:24.6981668Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6981746Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6981804Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6981860Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6981969Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6982036Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6982073Z graph_break [] 2025-12-04T11:45:24.6982133Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6982187Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6982326Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6982372Z Traceback (most recent call last): 2025-12-04T11:45:24.6982529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6982574Z method(*args, **kwargs) 2025-12-04T11:45:24.6982731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6982774Z method(*args, **kwargs) 2025-12-04T11:45:24.6982924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6982962Z with policy(): 2025-12-04T11:45:24.6983115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6983157Z raise RuntimeError(msg) 2025-12-04T11:45:24.6983563Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.6983566Z 2025-12-04T11:45:24.6983641Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6983894Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6983897Z 2025-12-04T11:45:24.6983984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6984058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6984101Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6984157Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6984226Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6984339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6984381Z graph_break [] 2025-12-04T11:45:24.6984444Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6984519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6984560Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6984616Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6984727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6984792Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6984829Z graph_break [] 2025-12-04T11:45:24.6984889Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6984960Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6985003Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6985058Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6985156Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6985236Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6985274Z graph_break [] 2025-12-04T11:45:24.6985332Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6985543Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51d13268d02d60f4.xml - 2025-12-04T11:45:24.6985603Z =========================== short test summary info ============================ 2025-12-04T11:45:24.6986170Z FAILED [0.2179s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.6986173Z 2025-12-04T11:45:24.6986249Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6986500Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6986503Z 2025-12-04T11:45:24.6986591Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6986653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.6986724Z ================== 1 failed, 187 deselected, 2 rerun in 2.10s ================== 2025-12-04T11:45:24.6986761Z Got exit code 1 2025-12-04T11:45:24.6986804Z Retrying single test... 2025-12-04T11:45:24.6986951Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0dd293d582b0f21b.xml 2025-12-04T11:45:24.6987011Z ============================= test session starts ============================== 2025-12-04T11:45:24.6987121Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.6987165Z cachedir: .pytest_cache 2025-12-04T11:45:24.6987324Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.6987374Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.6987414Z configfile: pytest.ini 2025-12-04T11:45:24.6987574Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.6987648Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.6987907Z stepcurrent: skipping 85 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6987956Z Running 1 items in this shard 2025-12-04T11:45:24.6987958Z 2025-12-04T11:45:24.6988170Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7258s] [100%] 2025-12-04T11:45:24.6988392Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3716s] [100%] 2025-12-04T11:45:24.6988577Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3135s] [100%] 2025-12-04T11:45:24.6988579Z 2025-12-04T11:45:24.6988634Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.6988771Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6988819Z Traceback (most recent call last): 2025-12-04T11:45:24.6988976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6989030Z method(*args, **kwargs) 2025-12-04T11:45:24.6989183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6989239Z method(*args, **kwargs) 2025-12-04T11:45:24.6989392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6989435Z with policy(): 2025-12-04T11:45:24.6989589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6989633Z raise RuntimeError(msg) 2025-12-04T11:45:24.6990013Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.6990018Z 2025-12-04T11:45:24.6990090Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6990344Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6990346Z 2025-12-04T11:45:24.6990432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6990508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6990550Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6990611Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6990681Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6990784Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6990824Z graph_break [] 2025-12-04T11:45:24.6990884Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6991020Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6991070Z Traceback (most recent call last): 2025-12-04T11:45:24.6991223Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6991265Z method(*args, **kwargs) 2025-12-04T11:45:24.6991417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6991457Z method(*args, **kwargs) 2025-12-04T11:45:24.6991607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6991663Z with policy(): 2025-12-04T11:45:24.6991820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6991865Z raise RuntimeError(msg) 2025-12-04T11:45:24.6992251Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.6992255Z 2025-12-04T11:45:24.6992329Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6992582Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6992584Z 2025-12-04T11:45:24.6992671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6992746Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6992799Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6992857Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6992943Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6993043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6993079Z graph_break [] 2025-12-04T11:45:24.6993139Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6993214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6993292Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6993347Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6993449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6993516Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6993555Z graph_break [] 2025-12-04T11:45:24.6993613Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6993667Z =================================== FAILURES =================================== 2025-12-04T11:45:24.6993806Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.6993853Z Traceback (most recent call last): 2025-12-04T11:45:24.6994006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6994047Z method(*args, **kwargs) 2025-12-04T11:45:24.6994197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.6994241Z method(*args, **kwargs) 2025-12-04T11:45:24.6994391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.6994430Z with policy(): 2025-12-04T11:45:24.6994584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.6994631Z raise RuntimeError(msg) 2025-12-04T11:45:24.6995007Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.6995010Z 2025-12-04T11:45:24.6995085Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.6995355Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.6995360Z 2025-12-04T11:45:24.6995446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.6995522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6995565Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6995621Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6995699Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6995797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6995834Z graph_break [] 2025-12-04T11:45:24.6995897Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6995974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6996014Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6996069Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6998583Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6998680Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6998718Z graph_break [] 2025-12-04T11:45:24.6998777Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6998865Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.6998911Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.6998967Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.6999063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.6999127Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.6999162Z graph_break [] 2025-12-04T11:45:24.6999220Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:24.6999415Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0dd293d582b0f21b.xml - 2025-12-04T11:45:24.6999476Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7000050Z FAILED [0.3135s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7000053Z 2025-12-04T11:45:24.7000126Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7000377Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7000380Z 2025-12-04T11:45:24.7000466Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7000531Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7000599Z ================== 1 failed, 187 deselected, 2 rerun in 2.43s ================== 2025-12-04T11:45:24.7000636Z Got exit code 1 2025-12-04T11:45:24.7000839Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7000968Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.7001111Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8f5361418660ead.xml 2025-12-04T11:45:24.7001169Z ============================= test session starts ============================== 2025-12-04T11:45:24.7001299Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7001342Z cachedir: .pytest_cache 2025-12-04T11:45:24.7001500Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7001546Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7001587Z configfile: pytest.ini 2025-12-04T11:45:24.7001762Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7001838Z collecting ... collected 188 items / 86 deselected / 102 selected 2025-12-04T11:45:24.7001892Z stepcurrent: skipping 86 already run items. 2025-12-04T11:45:24.7001936Z Running 102 items in this shard 2025-12-04T11:45:24.7001939Z 2025-12-04T11:45:24.7002154Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7825s] [ 0%] 2025-12-04T11:45:24.7002364Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3478s] [ 0%] 2025-12-04T11:45:24.7002558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3094s] [ 0%] 2025-12-04T11:45:24.7002571Z 2025-12-04T11:45:24.7002622Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7002762Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7002808Z Traceback (most recent call last): 2025-12-04T11:45:24.7002968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7003007Z method(*args, **kwargs) 2025-12-04T11:45:24.7003161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7003200Z method(*args, **kwargs) 2025-12-04T11:45:24.7003399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7003436Z with policy(): 2025-12-04T11:45:24.7003590Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7003631Z raise RuntimeError(msg) 2025-12-04T11:45:24.7004013Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.7004016Z 2025-12-04T11:45:24.7004088Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7004348Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7004351Z 2025-12-04T11:45:24.7004439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7004514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7004557Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7004612Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7004678Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7004775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7004812Z graph_break [] 2025-12-04T11:45:24.7004872Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7005025Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7005070Z Traceback (most recent call last): 2025-12-04T11:45:24.7005226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7005265Z method(*args, **kwargs) 2025-12-04T11:45:24.7005434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7005474Z method(*args, **kwargs) 2025-12-04T11:45:24.7005626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7005661Z with policy(): 2025-12-04T11:45:24.7005813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7005853Z raise RuntimeError(msg) 2025-12-04T11:45:24.7006234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.7006251Z 2025-12-04T11:45:24.7006339Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7006600Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7006602Z 2025-12-04T11:45:24.7006688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7006763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7006806Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7006861Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7006927Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7007025Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7007061Z graph_break [] 2025-12-04T11:45:24.7007121Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7007198Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7007238Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7007297Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7007392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7007456Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7007491Z graph_break [] 2025-12-04T11:45:24.7007550Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7007606Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7007751Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7007799Z Traceback (most recent call last): 2025-12-04T11:45:24.7007956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7007998Z method(*args, **kwargs) 2025-12-04T11:45:24.7008149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7008189Z method(*args, **kwargs) 2025-12-04T11:45:24.7008340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7008377Z with policy(): 2025-12-04T11:45:24.7008531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7008583Z raise RuntimeError(msg) 2025-12-04T11:45:24.7008973Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7008977Z 2025-12-04T11:45:24.7009063Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7009315Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7009317Z 2025-12-04T11:45:24.7009403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7009476Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7009520Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7009575Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7009660Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7009758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7009807Z graph_break [] 2025-12-04T11:45:24.7009866Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7009941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7009983Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7010039Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7010134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7010198Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7010233Z graph_break [] 2025-12-04T11:45:24.7010294Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7010367Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7010412Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7010465Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7010562Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7010628Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7010667Z graph_break [] 2025-12-04T11:45:24.7010723Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7010917Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8f5361418660ead.xml - 2025-12-04T11:45:24.7010977Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7011545Z FAILED [0.3094s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7011549Z 2025-12-04T11:45:24.7011624Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7011876Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7011879Z 2025-12-04T11:45:24.7011967Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7012029Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7012109Z ================== 1 failed, 86 deselected, 2 rerun in 2.46s =================== 2025-12-04T11:45:24.7012147Z Got exit code 1 2025-12-04T11:45:24.7012191Z Retrying single test... 2025-12-04T11:45:24.7012337Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bdad7a524b397c5d.xml 2025-12-04T11:45:24.7012397Z ============================= test session starts ============================== 2025-12-04T11:45:24.7012517Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7012561Z cachedir: .pytest_cache 2025-12-04T11:45:24.7012718Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7012766Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7012806Z configfile: pytest.ini 2025-12-04T11:45:24.7012972Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7013047Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7013344Z stepcurrent: skipping 86 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7013402Z Running 1 items in this shard 2025-12-04T11:45:24.7013404Z 2025-12-04T11:45:24.7013619Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6147s] [100%] 2025-12-04T11:45:24.7013827Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2693s] [100%] 2025-12-04T11:45:24.7014014Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2260s] [100%] 2025-12-04T11:45:24.7014017Z 2025-12-04T11:45:24.7014071Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7014210Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7014256Z Traceback (most recent call last): 2025-12-04T11:45:24.7014413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7014456Z method(*args, **kwargs) 2025-12-04T11:45:24.7014609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7014652Z method(*args, **kwargs) 2025-12-04T11:45:24.7014803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7014839Z with policy(): 2025-12-04T11:45:24.7014992Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7015035Z raise RuntimeError(msg) 2025-12-04T11:45:24.7015427Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.7015430Z 2025-12-04T11:45:24.7015504Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7015759Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7015763Z 2025-12-04T11:45:24.7015849Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7015937Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7015978Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7016036Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7016102Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7016201Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7016251Z graph_break [] 2025-12-04T11:45:24.7016314Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7016452Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7016500Z Traceback (most recent call last): 2025-12-04T11:45:24.7016651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7016692Z method(*args, **kwargs) 2025-12-04T11:45:24.7016844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7016883Z method(*args, **kwargs) 2025-12-04T11:45:24.7017047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7017096Z with policy(): 2025-12-04T11:45:24.7017247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7017290Z raise RuntimeError(msg) 2025-12-04T11:45:24.7017682Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.7017684Z 2025-12-04T11:45:24.7017758Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7018010Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7018013Z 2025-12-04T11:45:24.7018101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7018176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7018219Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7018274Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7018342Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7018439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7018477Z graph_break [] 2025-12-04T11:45:24.7018535Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7018611Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7018652Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7018712Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7018808Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7018875Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7018914Z graph_break [] 2025-12-04T11:45:24.7018973Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7019025Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7019166Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7019213Z Traceback (most recent call last): 2025-12-04T11:45:24.7019366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7019419Z method(*args, **kwargs) 2025-12-04T11:45:24.7019573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7019615Z method(*args, **kwargs) 2025-12-04T11:45:24.7019764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7019816Z with policy(): 2025-12-04T11:45:24.7019974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7020019Z raise RuntimeError(msg) 2025-12-04T11:45:24.7020401Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7020404Z 2025-12-04T11:45:24.7020478Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7020741Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7020754Z 2025-12-04T11:45:24.7020844Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7020917Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7020961Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7021016Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7021083Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7021179Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7021215Z graph_break [] 2025-12-04T11:45:24.7021275Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7021350Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7021392Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7021449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7021546Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7021615Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7021652Z graph_break [] 2025-12-04T11:45:24.7021710Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7021782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7021825Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7021879Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7021977Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7022041Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7022078Z graph_break [] 2025-12-04T11:45:24.7022137Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7022329Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bdad7a524b397c5d.xml - 2025-12-04T11:45:24.7022391Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7022963Z FAILED [0.2260s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7022966Z 2025-12-04T11:45:24.7023050Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7023337Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7023341Z 2025-12-04T11:45:24.7023448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7023510Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7023580Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:24.7023618Z Got exit code 1 2025-12-04T11:45:24.7023659Z Retrying single test... 2025-12-04T11:45:24.7023804Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f264162c5ab7b8e2.xml 2025-12-04T11:45:24.7023864Z ============================= test session starts ============================== 2025-12-04T11:45:24.7023975Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7024030Z cachedir: .pytest_cache 2025-12-04T11:45:24.7024188Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7024250Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7024291Z configfile: pytest.ini 2025-12-04T11:45:24.7024454Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7024528Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7024779Z stepcurrent: skipping 86 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7024826Z Running 1 items in this shard 2025-12-04T11:45:24.7024828Z 2025-12-04T11:45:24.7025039Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7286s] [100%] 2025-12-04T11:45:24.7025248Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3132s] [100%] 2025-12-04T11:45:24.7025434Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2831s] [100%] 2025-12-04T11:45:24.7025437Z 2025-12-04T11:45:24.7025489Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7025628Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7025679Z Traceback (most recent call last): 2025-12-04T11:45:24.7025838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7025881Z method(*args, **kwargs) 2025-12-04T11:45:24.7026035Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7026074Z method(*args, **kwargs) 2025-12-04T11:45:24.7026231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7026270Z with policy(): 2025-12-04T11:45:24.7026425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7026467Z raise RuntimeError(msg) 2025-12-04T11:45:24.7026864Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.7026869Z 2025-12-04T11:45:24.7026942Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7027196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7027211Z 2025-12-04T11:45:24.7027298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7027373Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7027415Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7027471Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7027536Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7027636Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7027672Z graph_break [] 2025-12-04T11:45:24.7027733Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7027885Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7027930Z Traceback (most recent call last): 2025-12-04T11:45:24.7028095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7028137Z method(*args, **kwargs) 2025-12-04T11:45:24.7028286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7028329Z method(*args, **kwargs) 2025-12-04T11:45:24.7028486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7028525Z with policy(): 2025-12-04T11:45:24.7028679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7028724Z raise RuntimeError(msg) 2025-12-04T11:45:24.7029106Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.7029110Z 2025-12-04T11:45:24.7029183Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7029438Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7029440Z 2025-12-04T11:45:24.7029526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7029601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7029643Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7029701Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7029766Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7029866Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7029902Z graph_break [] 2025-12-04T11:45:24.7029963Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7030035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7030077Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7030132Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7030230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7030293Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7030341Z graph_break [] 2025-12-04T11:45:24.7030400Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7030457Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7030597Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7030655Z Traceback (most recent call last): 2025-12-04T11:45:24.7030809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7030850Z method(*args, **kwargs) 2025-12-04T11:45:24.7030999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7031038Z method(*args, **kwargs) 2025-12-04T11:45:24.7031188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7031226Z with policy(): 2025-12-04T11:45:24.7031378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7031434Z raise RuntimeError(msg) 2025-12-04T11:45:24.7031814Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7031827Z 2025-12-04T11:45:24.7031902Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7032152Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7032157Z 2025-12-04T11:45:24.7032243Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7032317Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7032360Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7032417Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7032483Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7032584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7032621Z graph_break [] 2025-12-04T11:45:24.7032683Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7032755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7032796Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7032851Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7032948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7033012Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7033049Z graph_break [] 2025-12-04T11:45:24.7033106Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7033180Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7033221Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7033311Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7033407Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7033471Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7033508Z graph_break [] 2025-12-04T11:45:24.7033567Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:24.7033757Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f264162c5ab7b8e2.xml - 2025-12-04T11:45:24.7033832Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7034415Z FAILED [0.2831s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.7034419Z 2025-12-04T11:45:24.7034492Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7034743Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7034747Z 2025-12-04T11:45:24.7034832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7034895Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7034975Z ================== 1 failed, 187 deselected, 2 rerun in 2.34s ================== 2025-12-04T11:45:24.7035013Z Got exit code 1 2025-12-04T11:45:24.7035232Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7035361Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.7035506Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5dca2daca0ca5552.xml 2025-12-04T11:45:24.7035564Z ============================= test session starts ============================== 2025-12-04T11:45:24.7035672Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7035715Z cachedir: .pytest_cache 2025-12-04T11:45:24.7035871Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7035919Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7035959Z configfile: pytest.ini 2025-12-04T11:45:24.7036120Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7036195Z collecting ... collected 188 items / 87 deselected / 101 selected 2025-12-04T11:45:24.7036249Z stepcurrent: skipping 87 already run items. 2025-12-04T11:45:24.7036292Z Running 101 items in this shard 2025-12-04T11:45:24.7036294Z 2025-12-04T11:45:24.7036505Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7575s] [ 0%] 2025-12-04T11:45:24.7036713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3969s] [ 0%] 2025-12-04T11:45:24.7036897Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.3519s] [ 0%] 2025-12-04T11:45:24.7036900Z 2025-12-04T11:45:24.7036952Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7037089Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7037135Z Traceback (most recent call last): 2025-12-04T11:45:24.7037291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7037332Z method(*args, **kwargs) 2025-12-04T11:45:24.7037483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7037542Z method(*args, **kwargs) 2025-12-04T11:45:24.7037693Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7037731Z with policy(): 2025-12-04T11:45:24.7037883Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7037935Z raise RuntimeError(msg) 2025-12-04T11:45:24.7038316Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:24.7038318Z 2025-12-04T11:45:24.7038392Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7038642Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7038654Z 2025-12-04T11:45:24.7038741Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7038814Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7038869Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7038925Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7039412Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7039510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7039546Z graph_break [] 2025-12-04T11:45:24.7039606Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7039679Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7040168Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7040216Z current_size = base.storage().size() 2025-12-04T11:45:24.7040258Z Autotune Choices Stats: 2025-12-04T11:45:24.7040628Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.7040682Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7040730Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7040856Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7041090Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7041132Z _scaled_mm 0.0257 ms 23.9% 2025-12-04T11:45:24.7041258Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0660 seconds precompiling for 2 choices 2025-12-04T11:45:24.7041406Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7041453Z Traceback (most recent call last): 2025-12-04T11:45:24.7041609Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7041651Z method(*args, **kwargs) 2025-12-04T11:45:24.7041816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7041859Z method(*args, **kwargs) 2025-12-04T11:45:24.7042008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7042046Z with policy(): 2025-12-04T11:45:24.7042197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7042239Z raise RuntimeError(msg) 2025-12-04T11:45:24.7042619Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:24.7042634Z 2025-12-04T11:45:24.7042709Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7042971Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7042973Z 2025-12-04T11:45:24.7043061Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7043135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7043180Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7043235Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7043771Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7043875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7043911Z graph_break [] 2025-12-04T11:45:24.7043973Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7044047Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7044532Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7044579Z current_size = base.storage().size() 2025-12-04T11:45:24.7044620Z Autotune Choices Stats: 2025-12-04T11:45:24.7044985Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.7045038Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7045088Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7045211Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7045457Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7045502Z _scaled_mm 0.0257 ms 23.9% 2025-12-04T11:45:24.7045628Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0660 seconds precompiling for 2 choices 2025-12-04T11:45:24.7045717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7045759Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7045816Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7045914Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7046397Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7046449Z graph_break [] 2025-12-04T11:45:24.7046507Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7046581Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7046634Z Autotune Choices Stats: 2025-12-04T11:45:24.7046990Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:24.7047041Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7047088Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7047209Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7047439Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7047481Z _scaled_mm 0.0261 ms 23.9% 2025-12-04T11:45:24.7047609Z SingleProcess AUTOTUNE benchmarking takes 0.0117 seconds and 0.0561 seconds precompiling for 2 choices 2025-12-04T11:45:24.7047662Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7047799Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7047844Z Traceback (most recent call last): 2025-12-04T11:45:24.7048004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7048044Z method(*args, **kwargs) 2025-12-04T11:45:24.7048201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7048242Z method(*args, **kwargs) 2025-12-04T11:45:24.7048392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7048430Z with policy(): 2025-12-04T11:45:24.7048583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7048625Z raise RuntimeError(msg) 2025-12-04T11:45:24.7049005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:24.7049007Z 2025-12-04T11:45:24.7049092Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7049346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7049350Z 2025-12-04T11:45:24.7049449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7049522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7049564Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7049619Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7050104Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7050214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7050249Z graph_break [] 2025-12-04T11:45:24.7050309Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7050401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7050888Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7050934Z current_size = base.storage().size() 2025-12-04T11:45:24.7050974Z Autotune Choices Stats: 2025-12-04T11:45:24.7051336Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.7051391Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7051439Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7051561Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7051792Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7051835Z _scaled_mm 0.0257 ms 23.9% 2025-12-04T11:45:24.7051961Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0660 seconds precompiling for 2 choices 2025-12-04T11:45:24.7052036Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7052079Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7052137Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7052236Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7052725Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7052764Z graph_break [] 2025-12-04T11:45:24.7052822Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7052911Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7052951Z Autotune Choices Stats: 2025-12-04T11:45:24.7053357Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:24.7053408Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7053455Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7053576Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7053809Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7053849Z _scaled_mm 0.0261 ms 23.9% 2025-12-04T11:45:24.7053977Z SingleProcess AUTOTUNE benchmarking takes 0.0117 seconds and 0.0561 seconds precompiling for 2 choices 2025-12-04T11:45:24.7054064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7054122Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7054178Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7054277Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7054757Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7054795Z graph_break [] 2025-12-04T11:45:24.7054855Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7054930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7054971Z Autotune Choices Stats: 2025-12-04T11:45:24.7055330Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:24.7055381Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7055429Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7055550Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7055777Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7055820Z _scaled_mm 0.0288 ms 20.8% 2025-12-04T11:45:24.7055945Z SingleProcess AUTOTUNE benchmarking takes 0.0119 seconds and 0.0561 seconds precompiling for 2 choices 2025-12-04T11:45:24.7056141Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5dca2daca0ca5552.xml - 2025-12-04T11:45:24.7056201Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7056791Z FAILED [0.3519s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:24.7056795Z 2025-12-04T11:45:24.7056868Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7057132Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7057136Z 2025-12-04T11:45:24.7057222Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7057283Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7057350Z ================== 1 failed, 87 deselected, 2 rerun in 2.52s =================== 2025-12-04T11:45:24.7057387Z Got exit code 1 2025-12-04T11:45:24.7057426Z Retrying single test... 2025-12-04T11:45:24.7057570Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-198b5fad40f3e1de.xml 2025-12-04T11:45:24.7057627Z ============================= test session starts ============================== 2025-12-04T11:45:24.7057748Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7057789Z cachedir: .pytest_cache 2025-12-04T11:45:24.7057958Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7058005Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7058045Z configfile: pytest.ini 2025-12-04T11:45:24.7058208Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7058281Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7058530Z stepcurrent: skipping 87 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7058572Z Running 1 items in this shard 2025-12-04T11:45:24.7058575Z 2025-12-04T11:45:24.7058783Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9108s] [100%] 2025-12-04T11:45:24.7058993Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4901s] [100%] 2025-12-04T11:45:24.7059174Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6266s] [100%] 2025-12-04T11:45:24.7059176Z 2025-12-04T11:45:24.7059227Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7059362Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7059410Z Traceback (most recent call last): 2025-12-04T11:45:24.7059566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7059608Z method(*args, **kwargs) 2025-12-04T11:45:24.7059759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7059802Z method(*args, **kwargs) 2025-12-04T11:45:24.7059954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7059991Z with policy(): 2025-12-04T11:45:24.7060143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7060185Z raise RuntimeError(msg) 2025-12-04T11:45:24.7060575Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:24.7060579Z 2025-12-04T11:45:24.7060653Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7060916Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7060918Z 2025-12-04T11:45:24.7061004Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7061080Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7061121Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7061178Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7061658Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7061780Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7061816Z graph_break [] 2025-12-04T11:45:24.7061875Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7061948Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7062431Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7062479Z current_size = base.storage().size() 2025-12-04T11:45:24.7062520Z Autotune Choices Stats: 2025-12-04T11:45:24.7062882Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7062934Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7062982Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7063103Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7063367Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7063409Z _scaled_mm 0.0286 ms 20.5% 2025-12-04T11:45:24.7063536Z SingleProcess AUTOTUNE benchmarking takes 0.0131 seconds and 0.0638 seconds precompiling for 2 choices 2025-12-04T11:45:24.7063674Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7063720Z Traceback (most recent call last): 2025-12-04T11:45:24.7063875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7063915Z method(*args, **kwargs) 2025-12-04T11:45:24.7064067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7064107Z method(*args, **kwargs) 2025-12-04T11:45:24.7064272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7064310Z with policy(): 2025-12-04T11:45:24.7064462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7064502Z raise RuntimeError(msg) 2025-12-04T11:45:24.7064897Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:24.7064899Z 2025-12-04T11:45:24.7064972Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7065226Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7065228Z 2025-12-04T11:45:24.7065314Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7065406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7065447Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7065517Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7066003Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7066102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7066137Z graph_break [] 2025-12-04T11:45:24.7066197Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7066270Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7066757Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7066804Z current_size = base.storage().size() 2025-12-04T11:45:24.7066845Z Autotune Choices Stats: 2025-12-04T11:45:24.7067204Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7067256Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7067305Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7067424Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7067657Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7067697Z _scaled_mm 0.0286 ms 20.5% 2025-12-04T11:45:24.7067824Z SingleProcess AUTOTUNE benchmarking takes 0.0131 seconds and 0.0638 seconds precompiling for 2 choices 2025-12-04T11:45:24.7067896Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7067939Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7068005Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7068105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7068598Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7068637Z graph_break [] 2025-12-04T11:45:24.7068697Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7068770Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7068809Z Autotune Choices Stats: 2025-12-04T11:45:24.7069164Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:24.7069226Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7069283Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7069404Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7069632Z triton_mm_1 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7069674Z _scaled_mm 0.0268 ms 22.1% 2025-12-04T11:45:24.7069799Z SingleProcess AUTOTUNE benchmarking takes 0.0116 seconds and 0.0550 seconds precompiling for 2 choices 2025-12-04T11:45:24.7069854Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7069994Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7070042Z Traceback (most recent call last): 2025-12-04T11:45:24.7070197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7070242Z method(*args, **kwargs) 2025-12-04T11:45:24.7070393Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7070433Z method(*args, **kwargs) 2025-12-04T11:45:24.7070583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7070621Z with policy(): 2025-12-04T11:45:24.7070773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7070814Z raise RuntimeError(msg) 2025-12-04T11:45:24.7071196Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:24.7071200Z 2025-12-04T11:45:24.7071275Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7071528Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7071532Z 2025-12-04T11:45:24.7071618Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7071691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7071743Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7071802Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7075552Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7075657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7075693Z graph_break [] 2025-12-04T11:45:24.7075753Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7075825Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7076307Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7076377Z current_size = base.storage().size() 2025-12-04T11:45:24.7076433Z Autotune Choices Stats: 2025-12-04T11:45:24.7076794Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7076845Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7076894Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7077016Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7077248Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7077289Z _scaled_mm 0.0286 ms 20.5% 2025-12-04T11:45:24.7077416Z SingleProcess AUTOTUNE benchmarking takes 0.0131 seconds and 0.0638 seconds precompiling for 2 choices 2025-12-04T11:45:24.7077491Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7077533Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7077588Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7077686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7078162Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7078201Z graph_break [] 2025-12-04T11:45:24.7078259Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7078336Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7078375Z Autotune Choices Stats: 2025-12-04T11:45:24.7078730Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:24.7078779Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7078840Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7078959Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7079204Z triton_mm_1 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7079246Z _scaled_mm 0.0268 ms 22.1% 2025-12-04T11:45:24.7079372Z SingleProcess AUTOTUNE benchmarking takes 0.0116 seconds and 0.0550 seconds precompiling for 2 choices 2025-12-04T11:45:24.7079444Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7079485Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7079543Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7079643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7080122Z inductor [('triton_bundler_save_kernel', 16), ('async_compile_cache_miss', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7080180Z graph_break [] 2025-12-04T11:45:24.7080242Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7080316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7080360Z Autotune Choices Stats: 2025-12-04T11:45:24.7080717Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005799999926239252, "best_triton_pos": 0} 2025-12-04T11:45:24.7080769Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7080816Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7080941Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7081174Z triton_mm_2 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7081216Z _scaled_mm 0.0292 ms 19.8% 2025-12-04T11:45:24.7081343Z SingleProcess AUTOTUNE benchmarking takes 0.0158 seconds and 0.1669 seconds precompiling for 2 choices 2025-12-04T11:45:24.7081532Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-198b5fad40f3e1de.xml - 2025-12-04T11:45:24.7081595Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7082167Z FAILED [0.6266s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:24.7082171Z 2025-12-04T11:45:24.7082245Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7082498Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7082500Z 2025-12-04T11:45:24.7082597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7082661Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7082732Z ================== 1 failed, 187 deselected, 2 rerun in 3.05s ================== 2025-12-04T11:45:24.7082770Z Got exit code 1 2025-12-04T11:45:24.7082812Z Retrying single test... 2025-12-04T11:45:24.7082971Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a2b836bc970f2274.xml 2025-12-04T11:45:24.7083030Z ============================= test session starts ============================== 2025-12-04T11:45:24.7083145Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7083186Z cachedir: .pytest_cache 2025-12-04T11:45:24.7083374Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7083423Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7083466Z configfile: pytest.ini 2025-12-04T11:45:24.7083627Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7083719Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7083981Z stepcurrent: skipping 87 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7084025Z Running 1 items in this shard 2025-12-04T11:45:24.7084027Z 2025-12-04T11:45:24.7084235Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9502s] [100%] 2025-12-04T11:45:24.7084442Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4942s] [100%] 2025-12-04T11:45:24.7084624Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5164s] [100%] 2025-12-04T11:45:24.7084627Z 2025-12-04T11:45:24.7084679Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7084817Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7084864Z Traceback (most recent call last): 2025-12-04T11:45:24.7085021Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7085062Z method(*args, **kwargs) 2025-12-04T11:45:24.7085213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7085256Z method(*args, **kwargs) 2025-12-04T11:45:24.7085409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7085449Z with policy(): 2025-12-04T11:45:24.7085601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7085646Z raise RuntimeError(msg) 2025-12-04T11:45:24.7086031Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:24.7086033Z 2025-12-04T11:45:24.7086108Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7086368Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7086383Z 2025-12-04T11:45:24.7086471Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7086547Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7086590Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7086649Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7087138Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7087239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7087274Z graph_break [] 2025-12-04T11:45:24.7087337Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7087409Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7087906Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7087966Z current_size = base.storage().size() 2025-12-04T11:45:24.7088006Z Autotune Choices Stats: 2025-12-04T11:45:24.7088371Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7088423Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7088475Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7088597Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7088831Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7088872Z _scaled_mm 0.0274 ms 21.5% 2025-12-04T11:45:24.7089001Z SingleProcess AUTOTUNE benchmarking takes 0.0136 seconds and 0.0632 seconds precompiling for 2 choices 2025-12-04T11:45:24.7089137Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7089185Z Traceback (most recent call last): 2025-12-04T11:45:24.7089339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7089384Z method(*args, **kwargs) 2025-12-04T11:45:24.7089538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7089582Z method(*args, **kwargs) 2025-12-04T11:45:24.7089735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7089774Z with policy(): 2025-12-04T11:45:24.7089931Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7089973Z raise RuntimeError(msg) 2025-12-04T11:45:24.7090370Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:24.7090373Z 2025-12-04T11:45:24.7090446Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7090709Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7090713Z 2025-12-04T11:45:24.7090801Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7090879Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7090922Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7090984Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7091468Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7091578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7091627Z graph_break [] 2025-12-04T11:45:24.7091688Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7091762Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7092251Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7092298Z current_size = base.storage().size() 2025-12-04T11:45:24.7092338Z Autotune Choices Stats: 2025-12-04T11:45:24.7092707Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7092761Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7092811Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7092930Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7093164Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7093204Z _scaled_mm 0.0274 ms 21.5% 2025-12-04T11:45:24.7093358Z SingleProcess AUTOTUNE benchmarking takes 0.0136 seconds and 0.0632 seconds precompiling for 2 choices 2025-12-04T11:45:24.7093432Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7093475Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7093532Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7093631Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7094106Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7094162Z graph_break [] 2025-12-04T11:45:24.7094223Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7094297Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7094337Z Autotune Choices Stats: 2025-12-04T11:45:24.7094707Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.7094759Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7094805Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7094925Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7095159Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7095215Z _scaled_mm 0.0286 ms 21.3% 2025-12-04T11:45:24.7095341Z SingleProcess AUTOTUNE benchmarking takes 0.0140 seconds and 0.0550 seconds precompiling for 2 choices 2025-12-04T11:45:24.7095409Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7095546Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7095592Z Traceback (most recent call last): 2025-12-04T11:45:24.7095745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7095786Z method(*args, **kwargs) 2025-12-04T11:45:24.7095938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7095978Z method(*args, **kwargs) 2025-12-04T11:45:24.7096129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7096167Z with policy(): 2025-12-04T11:45:24.7096318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7096362Z raise RuntimeError(msg) 2025-12-04T11:45:24.7096743Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1153433600. 2025-12-04T11:45:24.7096745Z 2025-12-04T11:45:24.7096817Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7097071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7097074Z 2025-12-04T11:45:24.7097160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7097236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7097279Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7097336Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7097815Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7097930Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7097966Z graph_break [] 2025-12-04T11:45:24.7098028Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7098100Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7098596Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7098643Z current_size = base.storage().size() 2025-12-04T11:45:24.7098683Z Autotune Choices Stats: 2025-12-04T11:45:24.7099046Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7099108Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7099156Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7099287Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7099522Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7099562Z _scaled_mm 0.0274 ms 21.5% 2025-12-04T11:45:24.7099690Z SingleProcess AUTOTUNE benchmarking takes 0.0136 seconds and 0.0632 seconds precompiling for 2 choices 2025-12-04T11:45:24.7099764Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7099807Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7099862Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7099963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7100444Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7100481Z graph_break [] 2025-12-04T11:45:24.7100540Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7100613Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7100654Z Autotune Choices Stats: 2025-12-04T11:45:24.7101013Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.7101066Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7101113Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7101234Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7101462Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7101503Z _scaled_mm 0.0286 ms 21.3% 2025-12-04T11:45:24.7101639Z SingleProcess AUTOTUNE benchmarking takes 0.0140 seconds and 0.0550 seconds precompiling for 2 choices 2025-12-04T11:45:24.7101714Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7101756Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7101813Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7101925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7102371Z inductor [('triton_bundler_save_kernel', 8), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('async_compile_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.7102407Z graph_break [] 2025-12-04T11:45:24.7102466Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:24.7102540Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7102581Z Autotune Choices Stats: 2025-12-04T11:45:24.7103062Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "_scaled_mm", "best_time": 0.006118999794125557, "best_triton_pos": 1, "best_triton_time": 0.0061599998734891415, "best_triton_kernel": "triton_mm_2", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1"} 2025-12-04T11:45:24.7103122Z AUTOTUNE scaled_mm(1x32, 32x16, 1x1, 1x16, 16) 2025-12-04T11:45:24.7103171Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7103331Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7103373Z _scaled_mm 0.0061 ms 100.0% 2025-12-04T11:45:24.7103602Z triton_mm_2 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7103732Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.1697 seconds precompiling for 2 choices 2025-12-04T11:45:24.7103922Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a2b836bc970f2274.xml - 2025-12-04T11:45:24.7103983Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7104555Z FAILED [0.5164s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1153433600. 2025-12-04T11:45:24.7104560Z 2025-12-04T11:45:24.7104634Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7104892Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7104894Z 2025-12-04T11:45:24.7104980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7105046Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7105114Z ================== 1 failed, 187 deselected, 2 rerun in 2.98s ================== 2025-12-04T11:45:24.7105153Z Got exit code 1 2025-12-04T11:45:24.7105354Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7105497Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.7105642Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c23cb191ce9eb0a.xml 2025-12-04T11:45:24.7105702Z ============================= test session starts ============================== 2025-12-04T11:45:24.7105829Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7105872Z cachedir: .pytest_cache 2025-12-04T11:45:24.7106031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7106077Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7106119Z configfile: pytest.ini 2025-12-04T11:45:24.7106283Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7106361Z collecting ... collected 188 items / 88 deselected / 100 selected 2025-12-04T11:45:24.7106417Z stepcurrent: skipping 88 already run items. 2025-12-04T11:45:24.7106476Z Running 100 items in this shard 2025-12-04T11:45:24.7106478Z 2025-12-04T11:45:24.7106693Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1365s] [ 1%] 2025-12-04T11:45:24.7106917Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7722s] [ 1%] 2025-12-04T11:45:24.7107100Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6440s] [ 1%] 2025-12-04T11:45:24.7107103Z 2025-12-04T11:45:24.7107155Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7107295Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7107343Z Traceback (most recent call last): 2025-12-04T11:45:24.7107503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7107547Z method(*args, **kwargs) 2025-12-04T11:45:24.7107701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7107745Z method(*args, **kwargs) 2025-12-04T11:45:24.7107895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7107933Z with policy(): 2025-12-04T11:45:24.7108085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7108128Z raise RuntimeError(msg) 2025-12-04T11:45:24.7108518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.7108521Z 2025-12-04T11:45:24.7108598Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7108854Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7108858Z 2025-12-04T11:45:24.7108946Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7109020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7109062Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7109119Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7109610Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7109722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7109759Z graph_break [] 2025-12-04T11:45:24.7109822Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7109894Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7110380Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7110437Z current_size = base.storage().size() 2025-12-04T11:45:24.7110480Z Autotune Choices Stats: 2025-12-04T11:45:24.7110867Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7110930Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7110982Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7111102Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7111339Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7111569Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7111796Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7112021Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7112245Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7112472Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7112699Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7112924Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7112964Z _scaled_mm 0.0296 ms 19.9% 2025-12-04T11:45:24.7113104Z SingleProcess AUTOTUNE benchmarking takes 0.0355 seconds and 0.1798 seconds precompiling for 9 choices 2025-12-04T11:45:24.7113245Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7113340Z Traceback (most recent call last): 2025-12-04T11:45:24.7113512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7113555Z method(*args, **kwargs) 2025-12-04T11:45:24.7113707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7113749Z method(*args, **kwargs) 2025-12-04T11:45:24.7113899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7113939Z with policy(): 2025-12-04T11:45:24.7114090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7114132Z raise RuntimeError(msg) 2025-12-04T11:45:24.7114538Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.7114557Z 2025-12-04T11:45:24.7114632Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7114890Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7114892Z 2025-12-04T11:45:24.7114978Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7115054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7115096Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7115156Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7115639Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7115740Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7115776Z graph_break [] 2025-12-04T11:45:24.7115841Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7115914Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7116400Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7116450Z current_size = base.storage().size() 2025-12-04T11:45:24.7116492Z Autotune Choices Stats: 2025-12-04T11:45:24.7116858Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7116918Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7116982Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7117106Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7117341Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7117577Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7117802Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7118027Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7118258Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7118493Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7118713Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7118943Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7118984Z _scaled_mm 0.0296 ms 19.9% 2025-12-04T11:45:24.7119112Z SingleProcess AUTOTUNE benchmarking takes 0.0355 seconds and 0.1798 seconds precompiling for 9 choices 2025-12-04T11:45:24.7119188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7119231Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7119286Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7119389Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7119868Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('async_compile_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7119908Z graph_break [] 2025-12-04T11:45:24.7119969Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7120042Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7120087Z Autotune Choices Stats: 2025-12-04T11:45:24.7120449Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.7120507Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7120555Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7120688Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7120921Z triton_mm_11 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7121166Z triton_mm_13 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7121394Z triton_mm_14 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7121620Z triton_mm_15 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7121845Z triton_mm_8 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7122096Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7122320Z triton_mm_10 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7122545Z triton_mm_12 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7122590Z _scaled_mm 0.0284 ms 20.6% 2025-12-04T11:45:24.7122717Z SingleProcess AUTOTUNE benchmarking takes 0.0478 seconds and 0.1993 seconds precompiling for 9 choices 2025-12-04T11:45:24.7122772Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7122913Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7122960Z Traceback (most recent call last): 2025-12-04T11:45:24.7123116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7123157Z method(*args, **kwargs) 2025-12-04T11:45:24.7123339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7123380Z method(*args, **kwargs) 2025-12-04T11:45:24.7123533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7123571Z with policy(): 2025-12-04T11:45:24.7123724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7123765Z raise RuntimeError(msg) 2025-12-04T11:45:24.7124152Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7124154Z 2025-12-04T11:45:24.7124228Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7124500Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7124502Z 2025-12-04T11:45:24.7124590Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7124665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7124709Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7124780Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7125260Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7125361Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7125399Z graph_break [] 2025-12-04T11:45:24.7125460Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7125562Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7126048Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7126111Z current_size = base.storage().size() 2025-12-04T11:45:24.7126152Z Autotune Choices Stats: 2025-12-04T11:45:24.7126517Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7126576Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7126626Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7126747Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7126981Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7127209Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7127442Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7127673Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7127897Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7128120Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7128351Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7128583Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7128638Z _scaled_mm 0.0296 ms 19.9% 2025-12-04T11:45:24.7128766Z SingleProcess AUTOTUNE benchmarking takes 0.0355 seconds and 0.1798 seconds precompiling for 9 choices 2025-12-04T11:45:24.7128842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7128883Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7128941Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7129041Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7129526Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('async_compile_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7129588Z graph_break [] 2025-12-04T11:45:24.7129651Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7129725Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7129768Z Autotune Choices Stats: 2025-12-04T11:45:24.7130127Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.7130186Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7130234Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7130356Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7130589Z triton_mm_11 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7130818Z triton_mm_13 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7131050Z triton_mm_14 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7131277Z triton_mm_15 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7131508Z triton_mm_8 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7131733Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7131971Z triton_mm_10 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7132198Z triton_mm_12 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7132241Z _scaled_mm 0.0284 ms 20.6% 2025-12-04T11:45:24.7132386Z SingleProcess AUTOTUNE benchmarking takes 0.0478 seconds and 0.1993 seconds precompiling for 9 choices 2025-12-04T11:45:24.7132461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7132505Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7132561Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7132663Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7133142Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7133204Z graph_break [] 2025-12-04T11:45:24.7133303Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7133380Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7133421Z Autotune Choices Stats: 2025-12-04T11:45:24.7133790Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.7133848Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7133898Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7134017Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7134248Z triton_mm_17 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7134478Z triton_mm_18 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7134707Z triton_mm_22 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7134933Z triton_mm_23 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7135163Z triton_mm_19 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7135392Z triton_mm_20 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7135630Z triton_mm_21 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7135860Z triton_mm_16 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7135903Z _scaled_mm 0.0276 ms 22.2% 2025-12-04T11:45:24.7136045Z SingleProcess AUTOTUNE benchmarking takes 0.0587 seconds and 0.2018 seconds precompiling for 9 choices 2025-12-04T11:45:24.7136238Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5c23cb191ce9eb0a.xml - 2025-12-04T11:45:24.7136299Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7136879Z FAILED [0.6440s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7136893Z 2025-12-04T11:45:24.7136967Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7137244Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7137246Z 2025-12-04T11:45:24.7137333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7137396Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7137465Z ================== 1 failed, 88 deselected, 2 rerun in 3.57s =================== 2025-12-04T11:45:24.7137503Z Got exit code 1 2025-12-04T11:45:24.7137547Z Retrying single test... 2025-12-04T11:45:24.7137689Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-96489007c290be99.xml 2025-12-04T11:45:24.7137755Z ============================= test session starts ============================== 2025-12-04T11:45:24.7137868Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7137913Z cachedir: .pytest_cache 2025-12-04T11:45:24.7138074Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7138121Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7138161Z configfile: pytest.ini 2025-12-04T11:45:24.7138324Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7138399Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7138651Z stepcurrent: skipping 88 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7138694Z Running 1 items in this shard 2025-12-04T11:45:24.7138696Z 2025-12-04T11:45:24.7138911Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9961s] [100%] 2025-12-04T11:45:24.7139121Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7273s] [100%] 2025-12-04T11:45:24.7139308Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6412s] [100%] 2025-12-04T11:45:24.7139310Z 2025-12-04T11:45:24.7139363Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7139516Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7139565Z Traceback (most recent call last): 2025-12-04T11:45:24.7139723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7139768Z method(*args, **kwargs) 2025-12-04T11:45:24.7139937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7139981Z method(*args, **kwargs) 2025-12-04T11:45:24.7140132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7140173Z with policy(): 2025-12-04T11:45:24.7140326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7140369Z raise RuntimeError(msg) 2025-12-04T11:45:24.7140754Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.7140775Z 2025-12-04T11:45:24.7140853Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7141112Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7141114Z 2025-12-04T11:45:24.7141207Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7141282Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7141326Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7141383Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7141867Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7141968Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7142004Z graph_break [] 2025-12-04T11:45:24.7142067Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7142141Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7142632Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7142681Z current_size = base.storage().size() 2025-12-04T11:45:24.7142725Z Autotune Choices Stats: 2025-12-04T11:45:24.7143094Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.7143153Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7143201Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7143371Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7143608Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7143856Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7144083Z triton_mm_6 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7144309Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7144539Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7144778Z triton_mm_5 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7145013Z triton_mm_7 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7145237Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7145279Z _scaled_mm 0.0240 ms 25.3% 2025-12-04T11:45:24.7145409Z SingleProcess AUTOTUNE benchmarking takes 0.0392 seconds and 0.1889 seconds precompiling for 9 choices 2025-12-04T11:45:24.7145550Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7145601Z Traceback (most recent call last): 2025-12-04T11:45:24.7145758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7145801Z method(*args, **kwargs) 2025-12-04T11:45:24.7145953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7145994Z method(*args, **kwargs) 2025-12-04T11:45:24.7146148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7146189Z with policy(): 2025-12-04T11:45:24.7146346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7146393Z raise RuntimeError(msg) 2025-12-04T11:45:24.7146783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.7146786Z 2025-12-04T11:45:24.7146862Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7147118Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7147122Z 2025-12-04T11:45:24.7147218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7147294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7147338Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7147400Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7147894Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7147997Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7148035Z graph_break [] 2025-12-04T11:45:24.7148098Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7148172Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7148659Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7148728Z current_size = base.storage().size() 2025-12-04T11:45:24.7148771Z Autotune Choices Stats: 2025-12-04T11:45:24.7149142Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.7149202Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7149251Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7149373Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7149607Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7149837Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7150063Z triton_mm_6 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7150290Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7150523Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7150745Z triton_mm_5 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7150971Z triton_mm_7 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7151210Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7151252Z _scaled_mm 0.0240 ms 25.3% 2025-12-04T11:45:24.7151393Z SingleProcess AUTOTUNE benchmarking takes 0.0392 seconds and 0.1889 seconds precompiling for 9 choices 2025-12-04T11:45:24.7151466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7151511Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7151568Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7151669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7152147Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7152199Z graph_break [] 2025-12-04T11:45:24.7152259Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7152348Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7152389Z Autotune Choices Stats: 2025-12-04T11:45:24.7152751Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.7152809Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7152859Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7152983Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7153225Z triton_mm_12 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7153500Z triton_mm_8 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7153725Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7153951Z triton_mm_10 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7154177Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7154403Z triton_mm_13 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7154629Z triton_mm_15 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7154875Z triton_mm_14 0.0066 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7154918Z _scaled_mm 0.0257 ms 23.5% 2025-12-04T11:45:24.7155045Z SingleProcess AUTOTUNE benchmarking takes 0.0388 seconds and 0.0797 seconds precompiling for 9 choices 2025-12-04T11:45:24.7155113Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7155257Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7155306Z Traceback (most recent call last): 2025-12-04T11:45:24.7155463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7155504Z method(*args, **kwargs) 2025-12-04T11:45:24.7155658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7155702Z method(*args, **kwargs) 2025-12-04T11:45:24.7155855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7155910Z with policy(): 2025-12-04T11:45:24.7156063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7156118Z raise RuntimeError(msg) 2025-12-04T11:45:24.7156502Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7156507Z 2025-12-04T11:45:24.7156581Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7156838Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7156841Z 2025-12-04T11:45:24.7156929Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7157007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7157051Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7157108Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7157590Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7157692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7157728Z graph_break [] 2025-12-04T11:45:24.7157791Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7157865Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7158353Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7158404Z current_size = base.storage().size() 2025-12-04T11:45:24.7158445Z Autotune Choices Stats: 2025-12-04T11:45:24.7158820Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.7158879Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7158932Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7159061Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7159297Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7159528Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7159762Z triton_mm_6 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7160001Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7160240Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7160464Z triton_mm_5 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7160685Z triton_mm_7 0.0062 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7160911Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7160954Z _scaled_mm 0.0240 ms 25.3% 2025-12-04T11:45:24.7161081Z SingleProcess AUTOTUNE benchmarking takes 0.0392 seconds and 0.1889 seconds precompiling for 9 choices 2025-12-04T11:45:24.7161155Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7161200Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7161256Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7161357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7161840Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7161882Z graph_break [] 2025-12-04T11:45:24.7161945Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7162018Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7162064Z Autotune Choices Stats: 2025-12-04T11:45:24.7162435Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.7162497Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7162545Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7162667Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7162906Z triton_mm_12 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7163135Z triton_mm_8 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7163394Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7163632Z triton_mm_10 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7163879Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7164100Z triton_mm_13 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7164326Z triton_mm_15 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7164550Z triton_mm_14 0.0066 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7164595Z _scaled_mm 0.0257 ms 23.5% 2025-12-04T11:45:24.7164723Z SingleProcess AUTOTUNE benchmarking takes 0.0388 seconds and 0.0797 seconds precompiling for 9 choices 2025-12-04T11:45:24.7164798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7164841Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7164906Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7165006Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7165488Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7165528Z graph_break [] 2025-12-04T11:45:24.7165586Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7165663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7165704Z Autotune Choices Stats: 2025-12-04T11:45:24.7166076Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_21", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:24.7166133Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7166185Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7166303Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7166546Z triton_mm_21 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7166772Z triton_mm_16 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7166999Z triton_mm_17 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7167235Z triton_mm_22 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7167478Z triton_mm_23 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7167708Z triton_mm_18 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7167937Z triton_mm_19 0.0065 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7168162Z triton_mm_20 0.0066 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7168207Z _scaled_mm 0.0260 ms 22.7% 2025-12-04T11:45:24.7168337Z SingleProcess AUTOTUNE benchmarking takes 0.0539 seconds and 0.2006 seconds precompiling for 9 choices 2025-12-04T11:45:24.7168525Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-96489007c290be99.xml - 2025-12-04T11:45:24.7168587Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7169159Z FAILED [0.6412s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7169163Z 2025-12-04T11:45:24.7169237Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7169495Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7169497Z 2025-12-04T11:45:24.7169583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7169649Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7169716Z ================== 1 failed, 187 deselected, 2 rerun in 3.38s ================== 2025-12-04T11:45:24.7169765Z Got exit code 1 2025-12-04T11:45:24.7169806Z Retrying single test... 2025-12-04T11:45:24.7169953Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78e59f9ad25bddb1.xml 2025-12-04T11:45:24.7170011Z ============================= test session starts ============================== 2025-12-04T11:45:24.7170136Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7170178Z cachedir: .pytest_cache 2025-12-04T11:45:24.7170340Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7170386Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7170428Z configfile: pytest.ini 2025-12-04T11:45:24.7170589Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7170666Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7170917Z stepcurrent: skipping 88 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7170971Z Running 1 items in this shard 2025-12-04T11:45:24.7171003Z 2025-12-04T11:45:24.7171219Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1709s] [100%] 2025-12-04T11:45:24.7171431Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7599s] [100%] 2025-12-04T11:45:24.7171617Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6427s] [100%] 2025-12-04T11:45:24.7171619Z 2025-12-04T11:45:24.7171670Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7171812Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7171858Z Traceback (most recent call last): 2025-12-04T11:45:24.7172019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7172061Z method(*args, **kwargs) 2025-12-04T11:45:24.7172215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7172254Z method(*args, **kwargs) 2025-12-04T11:45:24.7172407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7172444Z with policy(): 2025-12-04T11:45:24.7172600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7172640Z raise RuntimeError(msg) 2025-12-04T11:45:24.7173023Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.7173028Z 2025-12-04T11:45:24.7173102Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7173392Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7173394Z 2025-12-04T11:45:24.7173481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7173553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7173617Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7173674Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7174170Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7174271Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7174309Z graph_break [] 2025-12-04T11:45:24.7174369Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7174444Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7174929Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7175008Z current_size = base.storage().size() 2025-12-04T11:45:24.7175051Z Autotune Choices Stats: 2025-12-04T11:45:24.7175419Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7175479Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7175527Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7175648Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7177469Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7177704Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7177932Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7178166Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7178388Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7178611Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7178831Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7179067Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7179109Z _scaled_mm 0.0292 ms 20.2% 2025-12-04T11:45:24.7179237Z SingleProcess AUTOTUNE benchmarking takes 0.0378 seconds and 0.1751 seconds precompiling for 9 choices 2025-12-04T11:45:24.7179379Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7179439Z Traceback (most recent call last): 2025-12-04T11:45:24.7179599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7179640Z method(*args, **kwargs) 2025-12-04T11:45:24.7179793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7179833Z method(*args, **kwargs) 2025-12-04T11:45:24.7179987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7180024Z with policy(): 2025-12-04T11:45:24.7180177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7180229Z raise RuntimeError(msg) 2025-12-04T11:45:24.7180628Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.7180631Z 2025-12-04T11:45:24.7180706Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7180960Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7180962Z 2025-12-04T11:45:24.7181050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7181125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7181169Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7181225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7181709Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7181806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7181843Z graph_break [] 2025-12-04T11:45:24.7181903Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7181978Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7182469Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7182517Z current_size = base.storage().size() 2025-12-04T11:45:24.7182559Z Autotune Choices Stats: 2025-12-04T11:45:24.7182931Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7182991Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7183041Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7183161Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7183445Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7183672Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7183894Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7184119Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7184377Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7184599Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7184822Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7185044Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7185090Z _scaled_mm 0.0292 ms 20.2% 2025-12-04T11:45:24.7185217Z SingleProcess AUTOTUNE benchmarking takes 0.0378 seconds and 0.1751 seconds precompiling for 9 choices 2025-12-04T11:45:24.7185293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7185334Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7185390Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7185496Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7185969Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7186007Z graph_break [] 2025-12-04T11:45:24.7186068Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7186143Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7186182Z Autotune Choices Stats: 2025-12-04T11:45:24.7186548Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00583899999037385, "best_triton_pos": 0} 2025-12-04T11:45:24.7186617Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7186666Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7186785Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7187027Z triton_mm_15 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7187256Z triton_mm_8 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7187482Z triton_mm_10 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7187710Z triton_mm_12 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7187947Z triton_mm_14 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7188181Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7188406Z triton_mm_13 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7188638Z triton_mm_11 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7188680Z _scaled_mm 0.0262 ms 22.3% 2025-12-04T11:45:24.7188807Z SingleProcess AUTOTUNE benchmarking takes 0.0414 seconds and 0.1769 seconds precompiling for 9 choices 2025-12-04T11:45:24.7188861Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7189001Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7189049Z Traceback (most recent call last): 2025-12-04T11:45:24.7189204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7189245Z method(*args, **kwargs) 2025-12-04T11:45:24.7189397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7189437Z method(*args, **kwargs) 2025-12-04T11:45:24.7189588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7189627Z with policy(): 2025-12-04T11:45:24.7189780Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7189822Z raise RuntimeError(msg) 2025-12-04T11:45:24.7190206Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7190208Z 2025-12-04T11:45:24.7190282Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7190550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7190553Z 2025-12-04T11:45:24.7190641Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7190727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7190772Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7190829Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7191309Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7191408Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7191456Z graph_break [] 2025-12-04T11:45:24.7191518Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7191591Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7192087Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7192133Z current_size = base.storage().size() 2025-12-04T11:45:24.7192174Z Autotune Choices Stats: 2025-12-04T11:45:24.7192540Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.7192600Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7192649Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7192771Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7193004Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7193233Z triton_mm_4 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7193499Z triton_mm_7 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7193724Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7193946Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7194179Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7194399Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7194637Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7194678Z _scaled_mm 0.0292 ms 20.2% 2025-12-04T11:45:24.7194806Z SingleProcess AUTOTUNE benchmarking takes 0.0378 seconds and 0.1751 seconds precompiling for 9 choices 2025-12-04T11:45:24.7194879Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7194922Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7194977Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7195078Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7195567Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7195618Z graph_break [] 2025-12-04T11:45:24.7195677Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7195751Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7195791Z Autotune Choices Stats: 2025-12-04T11:45:24.7196150Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00583899999037385, "best_triton_pos": 0} 2025-12-04T11:45:24.7196208Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7196257Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7196379Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7196609Z triton_mm_15 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7196838Z triton_mm_8 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7197066Z triton_mm_10 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7197297Z triton_mm_12 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7197521Z triton_mm_14 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7197745Z triton_mm_9 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7197978Z triton_mm_13 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7198219Z triton_mm_11 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7198262Z _scaled_mm 0.0262 ms 22.3% 2025-12-04T11:45:24.7198389Z SingleProcess AUTOTUNE benchmarking takes 0.0414 seconds and 0.1769 seconds precompiling for 9 choices 2025-12-04T11:45:24.7198462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7198504Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7198561Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7198661Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7199146Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7199204Z graph_break [] 2025-12-04T11:45:24.7199265Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:24.7199339Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7199379Z Autotune Choices Stats: 2025-12-04T11:45:24.7199742Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:24.7199801Z AUTOTUNE scaled_mm(1x32, 32x2048, 1x1, 1x2048, 2048) 2025-12-04T11:45:24.7199849Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7199968Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7200200Z triton_mm_17 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7200427Z triton_mm_19 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7200652Z triton_mm_18 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7200878Z triton_mm_16 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7201107Z triton_mm_20 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7201329Z triton_mm_23 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7201560Z triton_mm_22 0.0065 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7201784Z triton_mm_21 0.0066 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7201837Z _scaled_mm 0.0076 ms 78.0% 2025-12-04T11:45:24.7201964Z SingleProcess AUTOTUNE benchmarking takes 0.0575 seconds and 0.2046 seconds precompiling for 9 choices 2025-12-04T11:45:24.7202156Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78e59f9ad25bddb1.xml - 2025-12-04T11:45:24.7202217Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7202795Z FAILED [0.6427s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.7202819Z 2025-12-04T11:45:24.7202894Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7203151Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7203153Z 2025-12-04T11:45:24.7203239Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7203337Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7203405Z ================== 1 failed, 187 deselected, 2 rerun in 3.59s ================== 2025-12-04T11:45:24.7203443Z Got exit code 1 2025-12-04T11:45:24.7203648Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7203776Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.7203920Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3351e77a28a3a39f.xml 2025-12-04T11:45:24.7203978Z ============================= test session starts ============================== 2025-12-04T11:45:24.7204089Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7204132Z cachedir: .pytest_cache 2025-12-04T11:45:24.7204288Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7204336Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7204376Z configfile: pytest.ini 2025-12-04T11:45:24.7204541Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7204614Z collecting ... collected 188 items / 89 deselected / 99 selected 2025-12-04T11:45:24.7204670Z stepcurrent: skipping 89 already run items. 2025-12-04T11:45:24.7204713Z Running 99 items in this shard 2025-12-04T11:45:24.7204715Z 2025-12-04T11:45:24.7204934Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4598s] [ 1%] 2025-12-04T11:45:24.7205145Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9946s] [ 1%] 2025-12-04T11:45:24.7205348Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.8565s] [ 1%] 2025-12-04T11:45:24.7205351Z 2025-12-04T11:45:24.7205404Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7205545Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7205606Z Traceback (most recent call last): 2025-12-04T11:45:24.7205765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7205807Z method(*args, **kwargs) 2025-12-04T11:45:24.7205960Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7206000Z method(*args, **kwargs) 2025-12-04T11:45:24.7206151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7206189Z with policy(): 2025-12-04T11:45:24.7206342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7206396Z raise RuntimeError(msg) 2025-12-04T11:45:24.7206780Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.7206796Z 2025-12-04T11:45:24.7206871Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7207126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7207130Z 2025-12-04T11:45:24.7207216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7207289Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7207335Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7207392Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7207884Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7207982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7208018Z graph_break [] 2025-12-04T11:45:24.7208081Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7208154Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7208640Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7208688Z current_size = base.storage().size() 2025-12-04T11:45:24.7208729Z Autotune Choices Stats: 2025-12-04T11:45:24.7209101Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7209176Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7209228Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7209351Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7209602Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7209828Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7210053Z triton_mm_16 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7210280Z triton_mm_8 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7210536Z triton_mm_18 0.0078 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7210760Z triton_mm_12 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7210983Z triton_mm_11 0.0080 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7211206Z triton_mm_15 0.0082 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7211430Z triton_mm_13 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7211660Z triton_mm_9 0.0088 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7211790Z SingleProcess AUTOTUNE benchmarking takes 0.0814 seconds and 0.3640 seconds precompiling for 20 choices 2025-12-04T11:45:24.7211934Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7211981Z Traceback (most recent call last): 2025-12-04T11:45:24.7212138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7212180Z method(*args, **kwargs) 2025-12-04T11:45:24.7212333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7212374Z method(*args, **kwargs) 2025-12-04T11:45:24.7212522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7212560Z with policy(): 2025-12-04T11:45:24.7212711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7212751Z raise RuntimeError(msg) 2025-12-04T11:45:24.7213159Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.7213163Z 2025-12-04T11:45:24.7213238Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7213545Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7213547Z 2025-12-04T11:45:24.7213636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7213708Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7213751Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7213807Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7214303Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7214427Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7214464Z graph_break [] 2025-12-04T11:45:24.7214527Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7214599Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7215083Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7215130Z current_size = base.storage().size() 2025-12-04T11:45:24.7215171Z Autotune Choices Stats: 2025-12-04T11:45:24.7215538Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7215599Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7215648Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7215767Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7216002Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7216237Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7216464Z triton_mm_16 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7216691Z triton_mm_8 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7216932Z triton_mm_18 0.0078 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7217166Z triton_mm_12 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7217390Z triton_mm_11 0.0080 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7217611Z triton_mm_15 0.0082 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7217833Z triton_mm_13 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7218071Z triton_mm_9 0.0088 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7218209Z SingleProcess AUTOTUNE benchmarking takes 0.0814 seconds and 0.3640 seconds precompiling for 20 choices 2025-12-04T11:45:24.7218284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7218326Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7218384Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7218482Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7218963Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7219002Z graph_break [] 2025-12-04T11:45:24.7219064Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7219136Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7219176Z Autotune Choices Stats: 2025-12-04T11:45:24.7219539Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7219597Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7219650Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7219768Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7220001Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7220042Z _scaled_mm 0.0063 ms 98.7% 2025-12-04T11:45:24.7220267Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7220500Z triton_mm_27 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7220736Z triton_mm_35 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7220964Z triton_mm_37 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7221187Z triton_mm_31 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7221410Z triton_mm_34 0.0081 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7221642Z triton_mm_30 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7221874Z triton_mm_32 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7222002Z SingleProcess AUTOTUNE benchmarking takes 0.1181 seconds and 0.2955 seconds precompiling for 20 choices 2025-12-04T11:45:24.7222055Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7222197Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7222244Z Traceback (most recent call last): 2025-12-04T11:45:24.7222401Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7222442Z method(*args, **kwargs) 2025-12-04T11:45:24.7222594Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7222635Z method(*args, **kwargs) 2025-12-04T11:45:24.7222786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7222822Z with policy(): 2025-12-04T11:45:24.7222975Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7223018Z raise RuntimeError(msg) 2025-12-04T11:45:24.7223440Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7223443Z 2025-12-04T11:45:24.7223516Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7223777Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7223779Z 2025-12-04T11:45:24.7223864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7223936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7223978Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7224047Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7224550Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7224651Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7224689Z graph_break [] 2025-12-04T11:45:24.7224753Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7224825Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7225306Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7225368Z current_size = base.storage().size() 2025-12-04T11:45:24.7225407Z Autotune Choices Stats: 2025-12-04T11:45:24.7225794Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7225852Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7225902Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7226020Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7226254Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7226480Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7226706Z triton_mm_16 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7226933Z triton_mm_8 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7227161Z triton_mm_18 0.0078 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7227385Z triton_mm_12 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7227606Z triton_mm_11 0.0080 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7227828Z triton_mm_15 0.0082 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7228060Z triton_mm_13 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7228296Z triton_mm_9 0.0088 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7228427Z SingleProcess AUTOTUNE benchmarking takes 0.0814 seconds and 0.3640 seconds precompiling for 20 choices 2025-12-04T11:45:24.7228501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7228545Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7228602Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7228703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7229188Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7229248Z graph_break [] 2025-12-04T11:45:24.7229309Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7229383Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7229424Z Autotune Choices Stats: 2025-12-04T11:45:24.7229784Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7229844Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7229895Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7230015Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7230246Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7230288Z _scaled_mm 0.0063 ms 98.7% 2025-12-04T11:45:24.7230510Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7230738Z triton_mm_27 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7230964Z triton_mm_35 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7231193Z triton_mm_37 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7231417Z triton_mm_31 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7231647Z triton_mm_34 0.0081 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7231882Z triton_mm_30 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7232106Z triton_mm_32 0.0083 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7232235Z SingleProcess AUTOTUNE benchmarking takes 0.1181 seconds and 0.2955 seconds precompiling for 20 choices 2025-12-04T11:45:24.7232308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7232350Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7232408Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7232510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7233011Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7233060Z graph_break [] 2025-12-04T11:45:24.7233123Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7233196Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7233237Z Autotune Choices Stats: 2025-12-04T11:45:24.7233642Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.7233704Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7233756Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7233875Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7234105Z triton_mm_55 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7234332Z triton_mm_46 0.0066 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7234552Z triton_mm_52 0.0071 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7234781Z triton_mm_54 0.0073 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7235008Z triton_mm_56 0.0082 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7235244Z triton_mm_50 0.0083 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7235468Z triton_mm_49 0.0086 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7235702Z triton_mm_53 0.0087 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7235925Z triton_mm_51 0.0087 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7236153Z triton_mm_47 0.0093 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7236285Z SingleProcess AUTOTUNE benchmarking takes 0.1408 seconds and 0.2730 seconds precompiling for 20 choices 2025-12-04T11:45:24.7236488Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3351e77a28a3a39f.xml - 2025-12-04T11:45:24.7236568Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7237155Z FAILED [0.8565s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7237158Z 2025-12-04T11:45:24.7237232Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7237491Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7237495Z 2025-12-04T11:45:24.7237583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7237646Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7237714Z ================== 1 failed, 89 deselected, 2 rerun in 4.33s =================== 2025-12-04T11:45:24.7237750Z Got exit code 1 2025-12-04T11:45:24.7237790Z Retrying single test... 2025-12-04T11:45:24.7237933Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b5b8c1f33852a43.xml 2025-12-04T11:45:24.7237989Z ============================= test session starts ============================== 2025-12-04T11:45:24.7238100Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7238143Z cachedir: .pytest_cache 2025-12-04T11:45:24.7238302Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7238350Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7238391Z configfile: pytest.ini 2025-12-04T11:45:24.7238552Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7238627Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7238882Z stepcurrent: skipping 89 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7238925Z Running 1 items in this shard 2025-12-04T11:45:24.7238938Z 2025-12-04T11:45:24.7239153Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4933s] [100%] 2025-12-04T11:45:24.7239362Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.0480s] [100%] 2025-12-04T11:45:24.7239561Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.8460s] [100%] 2025-12-04T11:45:24.7239563Z 2025-12-04T11:45:24.7239614Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7239753Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7239798Z Traceback (most recent call last): 2025-12-04T11:45:24.7239958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7239999Z method(*args, **kwargs) 2025-12-04T11:45:24.7240176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7240227Z method(*args, **kwargs) 2025-12-04T11:45:24.7240377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7240415Z with policy(): 2025-12-04T11:45:24.7240566Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7240609Z raise RuntimeError(msg) 2025-12-04T11:45:24.7240994Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.7240997Z 2025-12-04T11:45:24.7241071Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7241332Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7241335Z 2025-12-04T11:45:24.7241422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7241495Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7241538Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7241594Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7242081Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7242181Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7242217Z graph_break [] 2025-12-04T11:45:24.7242281Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7242353Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7242836Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7242897Z current_size = base.storage().size() 2025-12-04T11:45:24.7242939Z Autotune Choices Stats: 2025-12-04T11:45:24.7243362Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7243424Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7243474Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7243595Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7243830Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7244057Z triton_mm_8 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7244308Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7244531Z triton_mm_16 0.0069 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7244573Z _scaled_mm 0.0073 ms 85.2% 2025-12-04T11:45:24.7244803Z triton_mm_18 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7245026Z triton_mm_12 0.0078 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7245251Z triton_mm_11 0.0081 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7245477Z triton_mm_15 0.0081 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7245701Z triton_mm_13 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7245829Z SingleProcess AUTOTUNE benchmarking takes 0.0909 seconds and 0.3896 seconds precompiling for 20 choices 2025-12-04T11:45:24.7245970Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7246017Z Traceback (most recent call last): 2025-12-04T11:45:24.7246173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7246213Z method(*args, **kwargs) 2025-12-04T11:45:24.7246364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7246403Z method(*args, **kwargs) 2025-12-04T11:45:24.7246554Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7246604Z with policy(): 2025-12-04T11:45:24.7246757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7246799Z raise RuntimeError(msg) 2025-12-04T11:45:24.7247197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.7247200Z 2025-12-04T11:45:24.7247273Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7247530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7247532Z 2025-12-04T11:45:24.7247620Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7247693Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7247746Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7247802Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7248301Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7248399Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7248435Z graph_break [] 2025-12-04T11:45:24.7248497Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7248571Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7249053Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7249101Z current_size = base.storage().size() 2025-12-04T11:45:24.7249140Z Autotune Choices Stats: 2025-12-04T11:45:24.7249505Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7249567Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7249617Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7249738Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7249976Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7250204Z triton_mm_8 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7250437Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7250662Z triton_mm_16 0.0069 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7250705Z _scaled_mm 0.0073 ms 85.2% 2025-12-04T11:45:24.7250944Z triton_mm_18 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7251167Z triton_mm_12 0.0078 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7251390Z triton_mm_11 0.0081 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7251623Z triton_mm_15 0.0081 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7251857Z triton_mm_13 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7251985Z SingleProcess AUTOTUNE benchmarking takes 0.0909 seconds and 0.3896 seconds precompiling for 20 choices 2025-12-04T11:45:24.7252057Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7252099Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7252155Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7252256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7252743Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7252782Z graph_break [] 2025-12-04T11:45:24.7252845Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7252917Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7252957Z Autotune Choices Stats: 2025-12-04T11:45:24.7253346Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.7253406Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7253455Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7253577Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7253812Z triton_mm_36 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7254044Z triton_mm_27 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7254282Z triton_mm_35 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7254528Z triton_mm_33 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7254757Z triton_mm_37 0.0078 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7254979Z triton_mm_31 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7255203Z triton_mm_34 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7255438Z triton_mm_30 0.0082 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7255674Z triton_mm_32 0.0082 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7255906Z triton_mm_28 0.0088 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7256033Z SingleProcess AUTOTUNE benchmarking takes 0.1218 seconds and 0.2947 seconds precompiling for 20 choices 2025-12-04T11:45:24.7256087Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7256226Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7256275Z Traceback (most recent call last): 2025-12-04T11:45:24.7256429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7256470Z method(*args, **kwargs) 2025-12-04T11:45:24.7256623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7256667Z method(*args, **kwargs) 2025-12-04T11:45:24.7256818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7256857Z with policy(): 2025-12-04T11:45:24.7257009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7257051Z raise RuntimeError(msg) 2025-12-04T11:45:24.7257438Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7257441Z 2025-12-04T11:45:24.7257516Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7257773Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7257776Z 2025-12-04T11:45:24.7257872Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7257948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7257992Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7258049Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7258548Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7258647Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7258684Z graph_break [] 2025-12-04T11:45:24.7258745Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7258818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7259313Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7259369Z current_size = base.storage().size() 2025-12-04T11:45:24.7259409Z Autotune Choices Stats: 2025-12-04T11:45:24.7259772Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.7259834Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7259885Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7260006Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7260242Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7260469Z triton_mm_8 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7260693Z triton_mm_14 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7260917Z triton_mm_16 0.0069 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7260962Z _scaled_mm 0.0073 ms 85.2% 2025-12-04T11:45:24.7261188Z triton_mm_18 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7261413Z triton_mm_12 0.0078 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7261646Z triton_mm_11 0.0081 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7261869Z triton_mm_15 0.0081 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7262107Z triton_mm_13 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7262235Z SingleProcess AUTOTUNE benchmarking takes 0.0909 seconds and 0.3896 seconds precompiling for 20 choices 2025-12-04T11:45:24.7262310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7262352Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7262410Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7262511Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7263008Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7263056Z graph_break [] 2025-12-04T11:45:24.7263117Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7263191Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7263230Z Autotune Choices Stats: 2025-12-04T11:45:24.7263619Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.7263679Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7263730Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7263851Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7264084Z triton_mm_36 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7264314Z triton_mm_27 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7264542Z triton_mm_35 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7264764Z triton_mm_33 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7264995Z triton_mm_37 0.0078 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7265220Z triton_mm_31 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7265458Z triton_mm_34 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7265695Z triton_mm_30 0.0082 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7265919Z triton_mm_32 0.0082 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7266148Z triton_mm_28 0.0088 ms 69.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7266277Z SingleProcess AUTOTUNE benchmarking takes 0.1218 seconds and 0.2947 seconds precompiling for 20 choices 2025-12-04T11:45:24.7266363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7266406Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7266473Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7266575Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7267058Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7267095Z graph_break [] 2025-12-04T11:45:24.7267156Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7267230Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7267272Z Autotune Choices Stats: 2025-12-04T11:45:24.7267635Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.7267694Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7267744Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7267861Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7268096Z triton_mm_55 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7268326Z triton_mm_46 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7268551Z triton_mm_52 0.0069 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7268774Z triton_mm_54 0.0072 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7269021Z triton_mm_56 0.0082 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7269246Z triton_mm_50 0.0083 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7269477Z triton_mm_53 0.0085 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7269701Z triton_mm_49 0.0086 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7269924Z triton_mm_51 0.0086 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7270161Z triton_mm_47 0.0092 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7270302Z SingleProcess AUTOTUNE benchmarking takes 0.1278 seconds and 0.2803 seconds precompiling for 20 choices 2025-12-04T11:45:24.7270493Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8b5b8c1f33852a43.xml - 2025-12-04T11:45:24.7270555Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7271140Z FAILED [0.8460s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7271146Z 2025-12-04T11:45:24.7271220Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7271479Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7271480Z 2025-12-04T11:45:24.7271567Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7271629Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7271695Z ================== 1 failed, 187 deselected, 2 rerun in 4.41s ================== 2025-12-04T11:45:24.7271735Z Got exit code 1 2025-12-04T11:45:24.7271774Z Retrying single test... 2025-12-04T11:45:24.7271917Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0832e2583a031acb.xml 2025-12-04T11:45:24.7271974Z ============================= test session starts ============================== 2025-12-04T11:45:24.7272087Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7272127Z cachedir: .pytest_cache 2025-12-04T11:45:24.7272285Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7272329Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7272370Z configfile: pytest.ini 2025-12-04T11:45:24.7272531Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7272619Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7272869Z stepcurrent: skipping 89 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7272914Z Running 1 items in this shard 2025-12-04T11:45:24.7272917Z 2025-12-04T11:45:24.7273139Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3887s] [100%] 2025-12-04T11:45:24.7273385Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9089s] [100%] 2025-12-04T11:45:24.7273572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.8336s] [100%] 2025-12-04T11:45:24.7273574Z 2025-12-04T11:45:24.7273625Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7273767Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7273829Z Traceback (most recent call last): 2025-12-04T11:45:24.7273987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7274042Z method(*args, **kwargs) 2025-12-04T11:45:24.7274195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7274236Z method(*args, **kwargs) 2025-12-04T11:45:24.7274388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7274424Z with policy(): 2025-12-04T11:45:24.7274576Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7274616Z raise RuntimeError(msg) 2025-12-04T11:45:24.7275010Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:24.7275015Z 2025-12-04T11:45:24.7275089Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7275350Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7275352Z 2025-12-04T11:45:24.7275439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7275511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7275554Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7275611Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7276100Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7276200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7276238Z graph_break [] 2025-12-04T11:45:24.7276299Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7276371Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7276867Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7276917Z current_size = base.storage().size() 2025-12-04T11:45:24.7276969Z Autotune Choices Stats: 2025-12-04T11:45:24.7277345Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006479000207036734, "best_triton_pos": 0} 2025-12-04T11:45:24.7277407Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7277456Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7277580Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7277815Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7278064Z triton_mm_8 0.0067 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7278291Z triton_mm_16 0.0069 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7278516Z triton_mm_14 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7278745Z triton_mm_18 0.0076 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7278971Z triton_mm_12 0.0084 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7279196Z triton_mm_15 0.0085 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7279419Z triton_mm_11 0.0085 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7279642Z triton_mm_13 0.0086 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7279868Z triton_mm_9 0.0086 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7279996Z SingleProcess AUTOTUNE benchmarking takes 0.0757 seconds and 0.3951 seconds precompiling for 20 choices 2025-12-04T11:45:24.7280136Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7280187Z Traceback (most recent call last): 2025-12-04T11:45:24.7280354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7280395Z method(*args, **kwargs) 2025-12-04T11:45:24.7280548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7280591Z method(*args, **kwargs) 2025-12-04T11:45:24.7280753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7280789Z with policy(): 2025-12-04T11:45:24.7280943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7280983Z raise RuntimeError(msg) 2025-12-04T11:45:24.7281376Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:24.7281379Z 2025-12-04T11:45:24.7281463Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7281722Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7281740Z 2025-12-04T11:45:24.7281826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7281901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7281943Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7282000Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7282486Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7282586Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7282627Z graph_break [] 2025-12-04T11:45:24.7282691Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7282765Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7283246Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7283330Z current_size = base.storage().size() 2025-12-04T11:45:24.7283369Z Autotune Choices Stats: 2025-12-04T11:45:24.7283740Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006479000207036734, "best_triton_pos": 0} 2025-12-04T11:45:24.7283802Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7283853Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7283971Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7284222Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7284450Z triton_mm_8 0.0067 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7284689Z triton_mm_16 0.0069 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7284915Z triton_mm_14 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7285142Z triton_mm_18 0.0076 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7285363Z triton_mm_12 0.0084 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7285616Z triton_mm_15 0.0085 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7285839Z triton_mm_11 0.0085 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7286062Z triton_mm_13 0.0086 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7286290Z triton_mm_9 0.0086 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7286421Z SingleProcess AUTOTUNE benchmarking takes 0.0757 seconds and 0.3951 seconds precompiling for 20 choices 2025-12-04T11:45:24.7286494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7286537Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7286593Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7286692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7287173Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7287211Z graph_break [] 2025-12-04T11:45:24.7287272Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7287347Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7287386Z Autotune Choices Stats: 2025-12-04T11:45:24.7287750Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.7287808Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7287869Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7287990Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7288223Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7288461Z triton_mm_27 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7288684Z triton_mm_33 0.0070 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7288908Z triton_mm_35 0.0070 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7289147Z triton_mm_37 0.0077 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7289380Z triton_mm_32 0.0082 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7289604Z triton_mm_31 0.0082 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7289826Z triton_mm_34 0.0084 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7290050Z triton_mm_30 0.0085 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7290277Z triton_mm_28 0.0092 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7290405Z SingleProcess AUTOTUNE benchmarking takes 0.1086 seconds and 0.2956 seconds precompiling for 20 choices 2025-12-04T11:45:24.7290459Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7290600Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7290647Z Traceback (most recent call last): 2025-12-04T11:45:24.7290805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7290847Z method(*args, **kwargs) 2025-12-04T11:45:24.7291001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7291042Z method(*args, **kwargs) 2025-12-04T11:45:24.7291193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7291230Z with policy(): 2025-12-04T11:45:24.7291382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7291426Z raise RuntimeError(msg) 2025-12-04T11:45:24.7291833Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7291836Z 2025-12-04T11:45:24.7291912Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7292178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7292181Z 2025-12-04T11:45:24.7292269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7292341Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7292387Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7292443Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7292931Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7293053Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7293089Z graph_break [] 2025-12-04T11:45:24.7293153Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7293225Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7293739Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7293786Z current_size = base.storage().size() 2025-12-04T11:45:24.7293826Z Autotune Choices Stats: 2025-12-04T11:45:24.7294195Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006479000207036734, "best_triton_pos": 0} 2025-12-04T11:45:24.7294257Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7294307Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7294426Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7294664Z triton_mm_17 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7294892Z triton_mm_8 0.0067 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7295117Z triton_mm_16 0.0069 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7295339Z triton_mm_14 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7295580Z triton_mm_18 0.0076 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7295817Z triton_mm_12 0.0084 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7296040Z triton_mm_15 0.0085 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7296261Z triton_mm_11 0.0085 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7296481Z triton_mm_13 0.0086 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7296730Z triton_mm_9 0.0086 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7296871Z SingleProcess AUTOTUNE benchmarking takes 0.0757 seconds and 0.3951 seconds precompiling for 20 choices 2025-12-04T11:45:24.7296945Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7296986Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7297042Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7297140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7297625Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7297664Z graph_break [] 2025-12-04T11:45:24.7297725Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7297799Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7297838Z Autotune Choices Stats: 2025-12-04T11:45:24.7298203Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.7298262Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7298314Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7298431Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7298665Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7298894Z triton_mm_27 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7299129Z triton_mm_33 0.0070 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7299352Z triton_mm_35 0.0070 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7299591Z triton_mm_37 0.0077 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7299815Z triton_mm_32 0.0082 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7300042Z triton_mm_31 0.0082 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7300274Z triton_mm_34 0.0084 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7300508Z triton_mm_30 0.0085 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7300735Z triton_mm_28 0.0092 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7300863Z SingleProcess AUTOTUNE benchmarking takes 0.1086 seconds and 0.2956 seconds precompiling for 20 choices 2025-12-04T11:45:24.7300938Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7300981Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7301037Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7301137Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7301619Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7301656Z graph_break [] 2025-12-04T11:45:24.7301716Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:24.7301789Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7301828Z Autotune Choices Stats: 2025-12-04T11:45:24.7302196Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0063599999994039536, "best_triton_pos": 0} 2025-12-04T11:45:24.7302258Z AUTOTUNE scaled_mm(257x1024, 1024x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.7302308Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7302427Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7302659Z triton_mm_55 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7302894Z triton_mm_52 0.0065 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7303130Z triton_mm_46 0.0071 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7303386Z triton_mm_54 0.0072 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7303612Z triton_mm_56 0.0082 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.7303833Z triton_mm_50 0.0084 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7304070Z triton_mm_53 0.0085 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7304307Z triton_mm_49 0.0086 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7304530Z triton_mm_51 0.0087 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.7304756Z triton_mm_47 0.0092 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7304887Z SingleProcess AUTOTUNE benchmarking takes 0.1192 seconds and 0.2734 seconds precompiling for 20 choices 2025-12-04T11:45:24.7305078Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0832e2583a031acb.xml - 2025-12-04T11:45:24.7305138Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7305720Z FAILED [0.8336s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:24.7305725Z 2025-12-04T11:45:24.7305799Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7306060Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7306063Z 2025-12-04T11:45:24.7306149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7306212Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7306279Z ================== 1 failed, 187 deselected, 2 rerun in 4.15s ================== 2025-12-04T11:45:24.7306316Z Got exit code 1 2025-12-04T11:45:24.7306534Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.7306660Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.7306804Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c96fb1b5c74ee19e.xml 2025-12-04T11:45:24.7306875Z ============================= test session starts ============================== 2025-12-04T11:45:24.7306985Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7307029Z cachedir: .pytest_cache 2025-12-04T11:45:24.7307187Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7307233Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7307273Z configfile: pytest.ini 2025-12-04T11:45:24.7307439Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7307514Z collecting ... collected 188 items / 90 deselected / 98 selected 2025-12-04T11:45:24.7307579Z stepcurrent: skipping 90 already run items. 2025-12-04T11:45:24.7307622Z Running 98 items in this shard 2025-12-04T11:45:24.7307635Z 2025-12-04T11:45:24.7308564Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/ft/cftwcaw26yfqzncfx5ay6f5yphk4oxcp3y77myr6vnwvxrtxokcj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7308718Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7308937Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7309096Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7309241Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7309528Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7309662Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7309918Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7310058Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7310311Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7310468Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7310751Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7310888Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7311178Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7311372Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7311688Z E1204 11:16:17.605000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7312418Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/oo/coo5sfmg7qr4pihsbxzvqsogrsg5aawknpdukhcjbvazesc5apdb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7312583Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7312796Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7312951Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7313097Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7313437Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7313568Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7313821Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7313958Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7314209Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7314366Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7314634Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7314766Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7315054Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7315247Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7315576Z E1204 11:16:17.632000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7316301Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/4d/c4dp4toeegb6ene2zm275qlmwwitm25znvmalu5xokcx7a6nwjhb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7316460Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7316685Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7316838Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7316980Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7317263Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7317393Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7317651Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7317791Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7318044Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7318197Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7318463Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7318597Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7318870Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7319060Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7319383Z E1204 11:16:17.633000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7320121Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/iu/ciumaqddts74dhpdg5buodcpruo2dpqxjnctbnp2hfb4oovwo2uu.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7320266Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7320478Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7320641Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7320797Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7321078Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7321208Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7321463Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7321597Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7321850Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7322004Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7322273Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7322408Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7322685Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7322879Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7323189Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7323958Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/h3/ch34ji4wmoxtg6rrwvgfhbpc4aea2stvtsact5ya6u7y27qtyje4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7324116Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7324328Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7324480Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7324624Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7324907Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7325064Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7325318Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7325452Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7325704Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7325857Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7326126Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7326258Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7326530Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7326722Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7327043Z E1204 11:16:17.637000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7327771Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpsl3k5t1m/7b/c7blspwwishxxvhrwit3xw6m4nwpsxa4hhvy4cvmz76pstq2luos.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7327933Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7328142Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7328306Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7328448Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7328730Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7328859Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7329111Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7329255Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7329519Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7329672Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7329939Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7330075Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7330351Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7330543Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7330852Z E1204 11:16:17.641000 793539 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7330903Z ('RERUN', {'yellow': True}) [3.4066s] [ 1%] 2025-12-04T11:45:24.7331224Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:16:19.770000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7331525Z E1204 11:16:19.770000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7331654Z E1204 11:16:19.770000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7331796Z E1204 11:16:19.773000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7332099Z E1204 11:16:19.773000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7332226Z E1204 11:16:19.773000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7332368Z E1204 11:16:19.775000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7332671Z E1204 11:16:19.775000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7332796Z E1204 11:16:19.775000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7332937Z E1204 11:16:19.839000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7333227Z E1204 11:16:19.839000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7333439Z E1204 11:16:19.839000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7333594Z E1204 11:16:19.841000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7333887Z E1204 11:16:19.841000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7334017Z E1204 11:16:19.841000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7334159Z E1204 11:16:19.843000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7334448Z E1204 11:16:19.843000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7334578Z E1204 11:16:19.843000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7334627Z ('RERUN', {'yellow': True}) [1.7788s] [ 1%] 2025-12-04T11:45:24.7334944Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:16:21.361000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7335238Z E1204 11:16:21.361000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7335362Z E1204 11:16:21.361000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7335504Z E1204 11:16:21.363000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7335797Z E1204 11:16:21.363000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7335921Z E1204 11:16:21.363000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7336061Z E1204 11:16:21.365000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7336373Z E1204 11:16:21.365000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7336501Z E1204 11:16:21.365000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7336653Z E1204 11:16:21.408000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7336943Z E1204 11:16:21.408000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7337068Z E1204 11:16:21.408000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7337212Z E1204 11:16:21.410000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7337502Z E1204 11:16:21.410000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7337651Z E1204 11:16:21.410000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7337791Z E1204 11:16:21.412000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7338081Z E1204 11:16:21.412000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.7338205Z E1204 11:16:21.412000 793539 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7338247Z FAILED [1.6091s] [ 1%] 2025-12-04T11:45:24.7338249Z 2025-12-04T11:45:24.7338304Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.7338447Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7338494Z Traceback (most recent call last): 2025-12-04T11:45:24.7338651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7338695Z method(*args, **kwargs) 2025-12-04T11:45:24.7338846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7338887Z method(*args, **kwargs) 2025-12-04T11:45:24.7339036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7339077Z with policy(): 2025-12-04T11:45:24.7339227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7339268Z raise RuntimeError(msg) 2025-12-04T11:45:24.7339661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1973420032. 2025-12-04T11:45:24.7339665Z 2025-12-04T11:45:24.7339741Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7340001Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7340004Z 2025-12-04T11:45:24.7340109Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7340184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7340230Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7340287Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7340857Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7340960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7340997Z graph_break [] 2025-12-04T11:45:24.7341068Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7341142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7341643Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7341707Z current_size = base.storage().size() 2025-12-04T11:45:24.7341751Z Autotune Choices Stats: 2025-12-04T11:45:24.7342121Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.7342190Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7342241Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7342362Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7342600Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7342828Z triton_mm_21 0.0104 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7343057Z triton_mm_34 0.0107 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7343316Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7343362Z _scaled_mm 0.0110 ms 88.3% 2025-12-04T11:45:24.7343583Z triton_mm_30 0.0113 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7343806Z triton_mm_22 0.0114 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7344044Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7344270Z triton_mm_23 0.0122 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7344510Z triton_mm_25 0.0126 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7344642Z SingleProcess AUTOTUNE benchmarking takes 0.1677 seconds and 1.0801 seconds precompiling for 33 choices 2025-12-04T11:45:24.7344789Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7344835Z Traceback (most recent call last): 2025-12-04T11:45:24.7344995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7345049Z method(*args, **kwargs) 2025-12-04T11:45:24.7345201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7345255Z method(*args, **kwargs) 2025-12-04T11:45:24.7345407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7345445Z with policy(): 2025-12-04T11:45:24.7345598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7345637Z raise RuntimeError(msg) 2025-12-04T11:45:24.7346030Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1973420032 and is now 2940207104. 2025-12-04T11:45:24.7346034Z 2025-12-04T11:45:24.7346107Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7346366Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7346369Z 2025-12-04T11:45:24.7346456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7346530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7346573Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7346630Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7347203Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7347304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7347342Z graph_break [] 2025-12-04T11:45:24.7347404Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7347479Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7347976Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7348025Z current_size = base.storage().size() 2025-12-04T11:45:24.7348065Z Autotune Choices Stats: 2025-12-04T11:45:24.7348440Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.7348509Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7348559Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7348680Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7348912Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7349148Z triton_mm_21 0.0104 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7349392Z triton_mm_34 0.0107 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7349619Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7349661Z _scaled_mm 0.0110 ms 88.3% 2025-12-04T11:45:24.7349885Z triton_mm_30 0.0113 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7350109Z triton_mm_22 0.0114 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7350333Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7350559Z triton_mm_23 0.0122 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7350782Z triton_mm_25 0.0126 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7350915Z SingleProcess AUTOTUNE benchmarking takes 0.1677 seconds and 1.0801 seconds precompiling for 33 choices 2025-12-04T11:45:24.7350991Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7351034Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7351091Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7351191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7351690Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7351728Z graph_break [] 2025-12-04T11:45:24.7351792Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7351866Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7351917Z Autotune Choices Stats: 2025-12-04T11:45:24.7352277Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.010239999741315842, "best_triton_pos": 0} 2025-12-04T11:45:24.7352345Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7352394Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7352516Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7352756Z triton_mm_67 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7352997Z triton_mm_72 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7353226Z triton_mm_71 0.0107 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7353486Z triton_mm_59 0.0110 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7353710Z triton_mm_54 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7353935Z triton_mm_60 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7354160Z triton_mm_68 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7354386Z triton_mm_61 0.0120 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7354610Z triton_mm_63 0.0126 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7354838Z triton_mm_53 0.0132 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7354966Z SingleProcess AUTOTUNE benchmarking takes 0.2612 seconds and 0.8759 seconds precompiling for 39 choices 2025-12-04T11:45:24.7355020Z =================================== FAILURES =================================== 2025-12-04T11:45:24.7355165Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.7355225Z Traceback (most recent call last): 2025-12-04T11:45:24.7355383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7355427Z method(*args, **kwargs) 2025-12-04T11:45:24.7355578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.7355631Z method(*args, **kwargs) 2025-12-04T11:45:24.7355783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.7355822Z with policy(): 2025-12-04T11:45:24.7355974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.7356017Z raise RuntimeError(msg) 2025-12-04T11:45:24.7356412Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.7356436Z 2025-12-04T11:45:24.7356509Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7356783Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7356786Z 2025-12-04T11:45:24.7356873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7356948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7356991Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7357049Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7357604Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7357708Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7357745Z graph_break [] 2025-12-04T11:45:24.7357810Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7357882Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7358367Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.7358416Z current_size = base.storage().size() 2025-12-04T11:45:24.7358455Z Autotune Choices Stats: 2025-12-04T11:45:24.7358826Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.7358892Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7358943Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7359061Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7359305Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7359543Z triton_mm_21 0.0104 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7359778Z triton_mm_34 0.0107 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7360007Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7360049Z _scaled_mm 0.0110 ms 88.3% 2025-12-04T11:45:24.7360274Z triton_mm_30 0.0113 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7360508Z triton_mm_22 0.0114 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7360743Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7360968Z triton_mm_23 0.0122 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7361190Z triton_mm_25 0.0126 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7361321Z SingleProcess AUTOTUNE benchmarking takes 0.1677 seconds and 1.0801 seconds precompiling for 33 choices 2025-12-04T11:45:24.7361396Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7361440Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7361499Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7361601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7362083Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7362122Z graph_break [] 2025-12-04T11:45:24.7362186Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7362261Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7362300Z Autotune Choices Stats: 2025-12-04T11:45:24.7362664Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.010239999741315842, "best_triton_pos": 0} 2025-12-04T11:45:24.7362730Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7362791Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7362911Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7363141Z triton_mm_67 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7363417Z triton_mm_72 0.0104 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7363647Z triton_mm_71 0.0107 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7363877Z triton_mm_59 0.0110 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7364118Z triton_mm_54 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7364355Z triton_mm_60 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7364577Z triton_mm_68 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7364803Z triton_mm_61 0.0120 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7365027Z triton_mm_63 0.0126 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7365254Z triton_mm_53 0.0132 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7365383Z SingleProcess AUTOTUNE benchmarking takes 0.2612 seconds and 0.8759 seconds precompiling for 39 choices 2025-12-04T11:45:24.7365455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.7365498Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.7365555Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.7365656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.7366145Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.7366184Z graph_break [] 2025-12-04T11:45:24.7366246Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.7366321Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.7366362Z Autotune Choices Stats: 2025-12-04T11:45:24.7366737Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009638999588787556, "best_triton_pos": 0} 2025-12-04T11:45:24.7366806Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.7366866Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.7366987Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.7367223Z triton_mm_105 0.0096 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7367458Z triton_mm_109 0.0103 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7367687Z triton_mm_110 0.0107 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7367934Z triton_mm_97 0.0110 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7368157Z triton_mm_98 0.0111 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7368200Z _scaled_mm 0.0111 ms 86.7% 2025-12-04T11:45:24.7368424Z triton_mm_106 0.0113 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7368648Z triton_mm_92 0.0116 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7368878Z triton_mm_99 0.0120 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.7369103Z triton_mm_101 0.0126 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.7369233Z SingleProcess AUTOTUNE benchmarking takes 0.2617 seconds and 0.7215 seconds precompiling for 39 choices 2025-12-04T11:45:24.7369425Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c96fb1b5c74ee19e.xml - 2025-12-04T11:45:24.7369488Z =========================== short test summary info ============================ 2025-12-04T11:45:24.7370088Z FAILED [1.6091s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.7370091Z 2025-12-04T11:45:24.7370165Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.7370436Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7370439Z 2025-12-04T11:45:24.7370527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.7370591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.7370683Z ================== 1 failed, 90 deselected, 2 rerun in 6.81s =================== 2025-12-04T11:45:24.7370721Z Got exit code 1 2025-12-04T11:45:24.7370762Z Retrying single test... 2025-12-04T11:45:24.7370908Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-58dfbbc01b2bd0df.xml 2025-12-04T11:45:24.7370964Z ============================= test session starts ============================== 2025-12-04T11:45:24.7371078Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.7371121Z cachedir: .pytest_cache 2025-12-04T11:45:24.7371283Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.7371341Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.7371384Z configfile: pytest.ini 2025-12-04T11:45:24.7371545Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.7371630Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.7371884Z stepcurrent: skipping 90 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.7371928Z Running 1 items in this shard 2025-12-04T11:45:24.7371930Z 2025-12-04T11:45:24.7372265Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:31.536126998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7372268Z 2025-12-04T11:45:24.7372584Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7372885Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7373016Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7373540Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7373801Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7374027Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7374239Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7374456Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7374757Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7375013Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7375306Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7375540Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7375846Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7376091Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7376380Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7376610Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7376904Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7377137Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7377428Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7377659Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7377949Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7378145Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7378375Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7378663Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7378867Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7379098Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7379408Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7379640Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7379930Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7380162Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7380379Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7380579Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7380790Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7380957Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7381137Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7381669Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/h3/ch34ji4wmoxtg6rrwvgfhbpc4aea2stvtsact5ya6u7y27qtyje4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7381817Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7382034Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7382189Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7382336Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7382625Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7382758Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7383024Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7383162Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7383445Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7383615Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7383887Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7384021Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7384297Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7384502Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7384832Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7385123Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7385254Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7385734Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7385987Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7386222Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7386431Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7386633Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7386926Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7387158Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7387467Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7387699Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7388000Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7388230Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7388524Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7388768Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7389071Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7389303Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7389592Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7389826Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7390117Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7390312Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7390543Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7390832Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7391032Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7391264Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7391553Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7391792Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7392083Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7392315Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7392520Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7392722Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7392931Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7393110Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7393345Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7393448Z E1204 11:16:31.630000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7393607Z [W1204 11:16:31.961190623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7393609Z 2025-12-04T11:45:24.7393917Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7394212Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7394343Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7394817Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7395071Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7395301Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7395508Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7395710Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7396014Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7396248Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7396552Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7396785Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7397078Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7397309Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7397614Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7397859Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7398152Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7398382Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7398674Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7398904Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7399192Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7399387Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7399619Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7399909Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7400105Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7400351Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7400639Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7400884Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7401173Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7401394Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7401598Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7401809Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7402036Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7402200Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7402377Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7402911Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/4d/c4dp4toeegb6ene2zm275qlmwwitm25znvmalu5xokcx7a6nwjhb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7403062Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7403312Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7403468Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7403613Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7403898Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7404031Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7404285Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7406446Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7406725Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7406886Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7407173Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7407310Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7407586Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7407781Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7408109Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7408415Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7408546Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7409024Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7409280Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7409510Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7409715Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7409915Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7410208Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7410442Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7410732Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7410972Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7411263Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7411505Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7411799Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7412030Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7412320Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7412572Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7412860Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7413090Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7413480Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7413678Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7413908Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7414206Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7414403Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7414635Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7414929Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7415157Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7415460Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7415680Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7415899Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7416099Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7416308Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7416477Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7416653Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7416770Z E1204 11:16:31.697000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7416943Z [W1204 11:16:31.970609181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7416946Z 2025-12-04T11:45:24.7417255Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7417549Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7417678Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7418155Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7418405Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7418633Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7418839Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7419039Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7419329Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7419571Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7419861Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7420102Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7420393Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7420623Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7420917Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7421168Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7421469Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7421699Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7421988Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7422219Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7422510Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7422703Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7422934Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7423223Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7423454Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7423683Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7423987Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7424217Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7424521Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7424740Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7424944Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7425144Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7425365Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7425549Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7425728Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7426254Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/7b/c7blspwwishxxvhrwit3xw6m4nwpsxa4hhvy4cvmz76pstq2luos.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7426401Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7426615Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7426770Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7426914Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7427199Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7427330Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7427587Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7427726Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7427979Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7428134Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7428411Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7428546Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7428831Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7429023Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7429338Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7429638Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7429784Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7430263Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7430517Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7430742Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7430949Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7431148Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7431440Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7431673Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7431965Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7432196Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7432496Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7432729Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7433030Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7433303Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7433592Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7433824Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7434129Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7434373Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7434660Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7434856Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7435087Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7435379Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7435573Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7435804Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7436094Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7436325Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7436616Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7436833Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7437053Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7437257Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7437486Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7437651Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7437828Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7437931Z E1204 11:16:31.704000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7438086Z [W1204 11:16:31.974271686 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7438099Z 2025-12-04T11:45:24.7438406Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7438706Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7438836Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7439308Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7439563Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7439791Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7439997Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7440196Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7440489Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7440721Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7441013Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7441252Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7441552Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7441782Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7442076Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7442306Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7442607Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7442848Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7443136Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7443401Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7443692Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7443888Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7444119Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7444412Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7444606Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7444837Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7445128Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7445358Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7445661Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7445893Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7446098Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7446297Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7446507Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7446673Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7446861Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7447397Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/ft/cftwcaw26yfqzncfx5ay6f5yphk4oxcp3y77myr6vnwvxrtxokcj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7447544Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7447758Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7447915Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7448061Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7448345Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7448475Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7448731Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7448869Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7449124Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7449278Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7449544Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7449686Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7449961Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7450168Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7450481Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7450772Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7450901Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7451392Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7451653Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7451879Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7452084Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7452284Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7452573Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7452805Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7453097Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7453365Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7453660Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7453890Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7454200Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7454442Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7454734Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7454962Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7455254Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7455498Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7455803Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7455999Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7456229Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7456518Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7456713Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7456942Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7457233Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7457463Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7457754Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7457971Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7458177Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7458386Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7458596Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7458773Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7458950Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7459052Z E1204 11:16:31.707000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7459205Z [W1204 11:16:31.982739228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7459207Z 2025-12-04T11:45:24.7459515Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7459819Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7459959Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7460439Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7460689Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7460915Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7461120Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7461319Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7461607Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7461841Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7462132Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7462361Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7462664Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7462896Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7463196Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7463468Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7463757Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7464004Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7464308Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7464538Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7464827Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7465022Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7465258Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7465549Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7465745Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7465974Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7466268Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7466498Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7466788Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7467018Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7467225Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7467441Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7467651Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7467815Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7467991Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7468526Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/iu/ciumaqddts74dhpdg5buodcpruo2dpqxjnctbnp2hfb4oovwo2uu.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7468688Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7468900Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7469055Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7469199Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7469485Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7469617Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7469874Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7470010Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7470264Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7470420Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7470689Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7470822Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7471105Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7471296Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7471618Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7471913Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7472044Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7472519Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7472791Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7473015Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7473223Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7473460Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7473751Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7473984Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7474274Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7474508Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7474798Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7475029Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7475319Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7475563Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7475853Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7476097Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7476387Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7476616Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7476923Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7477133Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7477362Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7477651Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7477844Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7478075Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7478363Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7478593Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7478884Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7479108Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7479317Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7479517Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7479726Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7479901Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7480079Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7480192Z E1204 11:16:31.716000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7480346Z [W1204 11:16:31.984087678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7480348Z 2025-12-04T11:45:24.7480655Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7480947Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7481086Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7481575Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7481827Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7482050Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7482255Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7482453Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7482743Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7482976Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7483305Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7483536Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7483830Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7484073Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7484364Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7484608Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7484898Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7485128Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7485416Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7485679Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7485968Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7486167Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7486397Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7486690Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7486889Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7487119Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7487410Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7487642Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7487932Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7488150Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7488368Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7488569Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7488790Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7488956Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7489131Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7489658Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpupw7iyob/oo/coo5sfmg7qr4pihsbxzvqsogrsg5aawknpdukhcjbvazesc5apdb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.7489814Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.7490041Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.7490197Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.7490340Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.7490631Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.7490762Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.7491022Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.7491158Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.7491412Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.7491566Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.7491836Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.7491972Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.7492246Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.7492436Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.7492761Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7493066Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7493195Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7493696Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7493947Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7494197Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7494403Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7494601Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7494893Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7495125Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7495422Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7495653Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7495942Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7496173Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7496464Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7496694Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7496994Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7497224Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7497526Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7497757Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7498046Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7498241Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7498482Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7498787Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7498981Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7499211Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7499501Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7499733Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7500025Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7500244Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7500449Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7500650Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.7500858Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.7501022Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.7501214Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.7501316Z E1204 11:16:31.717000 799458 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.7501371Z ('RERUN', {'yellow': True}) [3.5005s] [100%] 2025-12-04T11:45:24.7501724Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:33.807030785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7501728Z 2025-12-04T11:45:24.7501873Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7502166Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7502460Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7502600Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7503086Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7503368Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7503592Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7503798Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7503996Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7504285Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7504523Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7504818Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7505047Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7505337Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7505579Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7505870Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7506114Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7506404Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7506622Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7506838Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7507050Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7507255Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7507453Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7507682Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7507972Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7508168Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7508397Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7508687Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7508904Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7509100Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7509320Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7509524Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7509729Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7509924Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7510154Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7510359Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7510554Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7510746Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7510977Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7511288Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7511520Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7511811Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7512029Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7512234Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7512427Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7512633Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7512830Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7513060Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7513390Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7513620Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7513914Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7514155Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7514457Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7514687Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7514977Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7515209Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7515511Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7515753Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7516041Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7516273Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7516565Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7516795Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7517084Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7517314Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7517602Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7517833Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7518122Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7518370Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7518666Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7518898Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7519098Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7519294Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7519584Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7519824Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7520125Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7520354Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7520643Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7520877Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7521172Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7521401Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7521692Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7521924Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7522214Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7522408Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7522602Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7522808Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7523016Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7523229Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7523495Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7523784Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7523979Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7524186Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7524395Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7524587Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7524818Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7525109Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7525342Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7525639Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7525832Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7526040Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7526240Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7526475Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7526769Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7527004Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7527207Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7527421Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7527622Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7527918Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7528152Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7528457Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7528699Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7528991Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7529223Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7529516Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7529749Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7530040Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7530240Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7530436Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7530659Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7530860Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7531058Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7531268Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7531561Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7531804Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7532094Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7532327Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7532624Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7532882Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7533172Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7533436Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7533728Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7533949Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7534151Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7534348Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7534541Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7534750Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7534956Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7535249Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7535471Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7535688Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7535886Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7536106Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7536397Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7536630Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7536922Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7537183Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7537483Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7537720Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7538067Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7538331Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7538668Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7538931Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7539264Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7539497Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7539796Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7540029Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7540332Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7540565Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7540875Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7541113Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7541494Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7541709Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7541917Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7542150Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7542454Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7542684Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7542978Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7543210Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7543531Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7543764Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7544057Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7544288Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7544585Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7544797Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7545030Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7545336Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7545568Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7545863Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7546089Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7546308Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7546507Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7546711Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7547013Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7547237Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7547439Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7547641Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7547839Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7548137Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7548363Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7548564Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7548768Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7548971Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7549120Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7549337Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7549560Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7549766Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7549960Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7550179Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7550407Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7550610Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7550835Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7551045Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7551238Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7551466Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7551678Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7551873Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7552068Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7552280Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7552483Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7552681Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7552879Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7553188Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7553443Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7553659Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7553857Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7554048Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7554243Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7554471Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7554685Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7554882Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7555081Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7555373Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7555587Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7555788Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7555986Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7556187Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7556480Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7556694Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7556894Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7557092Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7557305Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7557598Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7557803Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7558003Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7558192Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7558388Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7558612Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7558827Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7559024Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7559211Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7559392Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7559563Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7559695Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7559799Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7559926Z E1204 11:16:33.546000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7560082Z [W1204 11:16:33.815868432 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7560085Z 2025-12-04T11:45:24.7560231Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7560530Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7560829Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7560958Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7561445Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7561700Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7561939Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7562144Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7562344Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7562636Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7562886Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7563191Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7563472Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7563764Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7563997Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7564288Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7564518Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7564811Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7565032Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7565241Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7565438Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7565663Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7565862Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7566115Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7566408Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7566602Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7566833Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7567136Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7567373Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7567570Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7567789Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7567994Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7568188Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7568383Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7568603Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7568806Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7569001Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7569195Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7569428Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7569721Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7569970Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7570265Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7570497Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7570702Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7570896Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7571102Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7571310Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7571553Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7571846Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7572081Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7572376Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7572608Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7572898Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7573129Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7573471Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7573703Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7573992Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7574222Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7574532Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7574776Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7575070Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7575302Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7575594Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7575837Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7576141Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7576371Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7576662Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7576895Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7577189Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7577408Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7577608Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7577802Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7578093Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7578323Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7578613Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7578853Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7579163Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7579396Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7579690Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7579921Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7580222Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7580464Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7580752Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7580950Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7581144Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7581340Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7581547Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7581746Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7581977Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7582267Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7582463Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7582656Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7582849Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7583060Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7583320Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7583626Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7583858Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7584150Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7584358Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7584577Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7584777Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7585010Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7585303Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7585524Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7585726Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7585923Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7586126Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7586420Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7586654Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7586947Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7587178Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7587482Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7587726Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7588020Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7588251Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7588550Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7588756Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7588964Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7589184Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7589385Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7589583Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7589782Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7590076Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7590308Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7590599Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7590834Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7591126Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7591357Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7591657Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7591889Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7592191Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7592411Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7592613Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7592810Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7593014Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7593237Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7593474Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7593767Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7593986Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7594189Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7594386Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7594584Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7594876Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7595108Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7595406Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7595639Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7595945Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7596177Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7596483Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7596716Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7597008Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7597239Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7597561Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7597794Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7598087Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7598318Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7598611Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7598843Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7599137Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7599368Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7599661Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7599857Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7600055Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7600305Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7600595Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7600839Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7601130Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7601362Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7601653Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7601905Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7602196Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7602427Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7602722Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7602918Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7603153Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7603479Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7603711Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7604007Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7604222Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7604424Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7604636Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7604839Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7605150Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7605364Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7605564Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7605761Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7605976Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7606283Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7606502Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7606705Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7606903Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7607097Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7607246Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7607444Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7607663Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7607871Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7608067Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7608288Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7608493Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7608686Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7608916Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7609121Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7609330Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7609552Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7609760Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7609957Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7610160Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7610384Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7610583Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7610781Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7610982Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7611279Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7611492Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7611697Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7611895Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7612085Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7612282Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7612494Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7612694Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7612914Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7613116Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7613438Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7613652Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7613856Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7614056Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7614277Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7614584Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7614797Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7615001Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7615197Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7615400Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7615694Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7615888Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7616088Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7616278Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7616475Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7616688Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7616893Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7617100Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7617289Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7617480Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7617654Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7617778Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7617884Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7618009Z E1204 11:16:33.549000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7618166Z [W1204 11:16:33.818052479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7618179Z 2025-12-04T11:45:24.7618324Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7618631Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7618928Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7619057Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7619538Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7619791Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7620018Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7620223Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7620421Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7620719Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7620953Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7621256Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7621486Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7621790Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7622022Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7622312Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7622543Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7622844Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7623075Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7623329Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7623525Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7623731Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7623931Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7624162Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7624450Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7624644Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7624875Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7625166Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7625388Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7625596Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7625817Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7626036Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7626230Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7626424Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7626644Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7626865Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7627073Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7627266Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7627498Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7627793Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7628025Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7628316Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7628534Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7628737Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7628933Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7629141Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7629341Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7629570Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7629874Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7630107Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7630418Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7630650Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7630939Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7631181Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7631481Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7631710Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7632002Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7632233Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7632526Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7632754Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7633044Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7633308Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7633600Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7633830Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7634119Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7634367Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7634673Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7634908Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7635199Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7635416Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7635636Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7635847Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7636137Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7636368Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7636658Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7636892Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7637187Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7637420Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7637711Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7637943Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7638233Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7638463Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7638762Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7638959Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7639167Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7639366Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7639574Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7639772Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7640018Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7640320Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7640514Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7640708Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7640902Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7641096Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7641326Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7641618Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7641849Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7642140Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7642336Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7642542Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7642753Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7642986Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7643320Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7643541Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7643740Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7643940Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7644153Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7644461Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7644692Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7644986Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7645220Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7645515Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7645748Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7646040Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7646277Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7646572Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7646770Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7646966Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7647202Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7647405Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7647615Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7647819Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7648111Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7648346Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7648655Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7648898Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7649189Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7649422Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7649714Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7649946Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7650238Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7650461Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7650662Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7650864Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7651058Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7651268Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7651478Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7651771Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7652003Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7652203Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7652400Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7652598Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7652907Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7653150Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7653478Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7653712Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7654004Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7654240Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7654532Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7654765Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7655059Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7655293Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7655590Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7655836Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7656128Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7656379Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7656672Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7656906Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7657198Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7657457Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7657749Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7657946Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7658144Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7658378Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7658673Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7658904Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7659197Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7659429Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7659723Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7659954Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7660258Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7660496Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7660798Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7660995Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7661227Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7661520Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7661764Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7662072Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7662286Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7662488Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7662688Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7662892Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7663186Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7663425Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7663626Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7663827Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7664028Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7664321Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7664556Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7664760Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7664970Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7665166Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7665313Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7665508Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7665728Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7665948Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7666155Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7666374Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7666581Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7666775Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7666998Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7667205Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7667400Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7667622Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7667831Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7668028Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7668222Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7668435Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7668648Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7668847Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7669058Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7669350Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7669567Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7669769Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7669981Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7670184Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7670377Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7670590Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7670791Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7670990Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7671191Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7671490Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7671702Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7671904Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7672106Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7672305Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7672598Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7672819Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7673019Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7673227Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7673467Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7673760Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7673955Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7674174Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7674379Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7674574Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7674785Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7674990Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7675186Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7675377Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7675558Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7675727Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7675855Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7675956Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7676083Z E1204 11:16:33.551000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7676241Z [W1204 11:16:33.860487491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7676243Z 2025-12-04T11:45:24.7676389Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7676683Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7676994Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7677125Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7677615Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7677870Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7678095Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7678318Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7678529Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7678821Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7679057Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7679348Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7679582Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7679873Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7680107Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7680396Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7680631Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7680924Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7681154Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7681360Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7681567Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7681775Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7681973Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7682204Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7682497Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7682710Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7682941Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7683237Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7683489Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7683684Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7683904Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7684109Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7684305Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7684499Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7684717Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7684924Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7685118Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7685327Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7685562Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7685866Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7686098Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7686388Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7686607Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7686822Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7687034Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7687239Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7687434Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7687667Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7687962Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7688194Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7688485Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7688718Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7689010Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7689241Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7689531Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7689770Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7690062Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7690304Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7690593Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7690824Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7691114Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7691369Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7691658Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7691891Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7692180Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7692414Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7692709Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7692938Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7693231Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7693479Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7693682Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7693876Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7694182Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7694414Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7694723Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7694956Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7695250Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7695480Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7695797Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7696027Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7696320Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7696551Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7696841Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7697037Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7697234Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7697433Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7697641Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7697842Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7698070Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7698361Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7698567Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7698760Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7698968Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7699161Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7699391Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7699685Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7699927Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7700234Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7700428Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7700635Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7700836Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7701070Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7701361Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7701584Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7701785Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7701987Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7702187Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7702480Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7702723Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7703016Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7703300Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7703593Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7703825Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7704117Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7704383Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7704677Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7704874Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7705072Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7705293Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7705496Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7705697Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7705896Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7706190Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7706424Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7706720Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7706954Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7707258Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7707503Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7709235Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7710085Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7710380Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7710605Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7710806Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7711018Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7711215Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7711425Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7711623Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7711917Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7712137Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7712340Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7712539Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7712738Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7713031Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7713306Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7713626Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7713873Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7714165Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7714444Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7714757Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7714990Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7720129Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7720380Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7720685Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7720924Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7721220Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7721455Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7721750Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7721982Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7722277Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7722512Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7722830Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7723031Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7723241Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7723529Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7723823Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7724083Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7724377Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7724609Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7724904Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7725137Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7725433Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7725666Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7725961Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7726162Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7726397Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7726691Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7726926Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7727235Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7727450Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7727666Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7727870Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7728084Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7728391Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7728604Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7728808Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7729007Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7729210Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7729504Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7729725Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7729931Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7730131Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7730326Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7730474Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7730671Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7730894Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7731102Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7731302Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7731531Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7731739Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7731945Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7732180Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7732405Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7732601Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7732820Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7733025Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7733225Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7733463Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7733676Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7733877Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7734076Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7734278Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7734575Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7734787Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7734987Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7735189Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7735380Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7735593Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7735806Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7736021Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7736231Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7736430Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7736740Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7736955Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7737157Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7737354Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7737556Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7737850Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7738063Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7738265Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7738462Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7738664Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7738959Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7739159Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7739362Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7739551Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7739761Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7739974Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7740190Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7740397Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7740588Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7740783Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7740955Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7741083Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7741188Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7741320Z E1204 11:16:33.593000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7741477Z [W1204 11:16:33.862695328 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7741481Z 2025-12-04T11:45:24.7741630Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7741927Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7742228Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7742360Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7742848Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7743107Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7743353Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7743563Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7743763Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7744075Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7744325Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7744619Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7744869Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7745177Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7745410Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7745703Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7745939Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7746234Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7746456Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7746662Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7746860Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7747070Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7747272Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7747639Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7747933Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7748129Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7748381Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7748677Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7748907Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7749118Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7749339Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7749562Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7749755Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7749952Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7750171Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7750375Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7750573Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7750770Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7751003Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7751296Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7751530Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7751827Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7752048Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7752254Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7752452Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7752669Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7752869Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7753116Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7753465Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7753719Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7754010Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7754243Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7754533Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7754767Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7755059Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7755290Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7755582Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7755813Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7756106Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7756340Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7756631Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7756864Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7757171Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7757403Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7757710Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7757957Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7758260Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7758490Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7758782Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7759001Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7759204Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7759401Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7759693Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7759925Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7760220Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7760454Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7760744Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7760978Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7761268Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7761516Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7761808Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7762049Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7762356Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7762567Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7762763Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7762959Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7763168Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7763415Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7763649Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7763941Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7764136Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7764334Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7764530Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7764730Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7764964Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7765255Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7765487Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7765796Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7765994Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7766222Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7766427Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7766679Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7766988Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7767213Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7767416Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7767618Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7767820Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7768119Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7768354Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7768647Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7768883Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7769177Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7769415Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7769713Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7769948Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7770255Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7770453Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7770664Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7770896Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7771116Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7771317Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7771522Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7771825Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7772061Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7772361Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7772596Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7772895Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7773130Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7773459Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7773695Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7773988Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7774218Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7774423Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7774641Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7774834Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7775066Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7775287Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7775597Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7775822Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7776025Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7776227Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7776432Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7776735Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7776972Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7777268Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7777506Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7777803Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7778042Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7778339Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7778574Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7778891Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7779128Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7779439Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7779687Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7779987Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7780241Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7780536Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7780773Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7781070Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7781312Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7781610Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7781815Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7782020Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7782258Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7782557Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7782792Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7783093Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7783350Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7783673Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7783931Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7784226Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7784482Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7784797Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7785001Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7785241Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7785539Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7785780Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7786076Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7786301Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7786507Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7786714Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7786923Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7787218Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7787439Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7787643Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7787862Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7788065Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7788380Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7788673Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7788879Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7789115Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7789310Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7789466Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7789666Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7789890Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7790102Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7790303Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7790532Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7790740Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7790950Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7791176Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7791390Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7791591Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7791816Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7792027Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7792244Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7792447Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7792675Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7792898Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7793099Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7793356Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7793652Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7793873Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7794085Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7794287Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7794488Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7794689Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7794908Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7795112Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7795317Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7795526Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7795827Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7796046Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7796248Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7796470Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7796676Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7796995Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7797224Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7797455Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7797690Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7797898Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7798225Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7798446Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7798660Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7798872Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7799083Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7799307Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7799522Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7799727Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7799921Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7800113Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7800294Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7800431Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7800540Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7800676Z E1204 11:16:33.596000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7800850Z [W1204 11:16:33.864836636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7800859Z 2025-12-04T11:45:24.7801008Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7801323Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7801636Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7801788Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7802283Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7802550Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7802788Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7802999Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7803220Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7803559Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7803808Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7804110Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7804350Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7804651Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7804888Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7805192Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7805448Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7805762Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7805991Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7806216Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7806447Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7806657Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7806867Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7807102Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7807405Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7807607Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7807851Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7808152Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7808377Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7808580Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7808800Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7809013Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7809211Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7809412Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7809654Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7809865Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7810086Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7810284Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7810539Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7810850Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7811089Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7811389Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7811614Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7811828Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7812026Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7812244Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7812450Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7812690Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7812991Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7813225Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7813559Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7813794Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7814109Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7814343Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7814658Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7814920Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7815230Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7815470Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7815763Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7816005Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7816299Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7816539Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7816839Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7817077Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7817379Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7817616Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7817915Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7818155Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7818448Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7818705Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7818909Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7819121Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7819429Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7819682Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7819979Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7820214Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7820510Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7820743Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7821040Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7821274Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7821570Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7821805Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7822102Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7822304Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7822502Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7822703Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7822912Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7823130Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7823414Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7823721Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7823936Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7824148Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7824350Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7824547Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7824785Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7825081Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7825317Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7825613Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7825811Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7826025Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7826229Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7826471Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7826773Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7826995Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7827203Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7827420Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7827625Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7827931Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7828181Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7828490Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7828723Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7829026Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7829262Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7829559Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7829793Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7830089Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7830291Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7830492Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7830719Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7830920Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7831124Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7831327Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7831629Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7831877Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7832171Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7832424Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7832729Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7832978Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7833324Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7833561Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7833862Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7834087Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7834294Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7834493Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7834691Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7834902Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7835108Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7835404Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7835627Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7835831Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7836031Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7836254Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7836548Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7836799Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7837108Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7837359Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7837654Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7837887Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7838184Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7838421Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7838721Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7838958Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7839255Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7839493Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7839788Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7840025Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7840321Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7840555Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7840872Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7841120Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7841419Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7841638Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7841853Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7842094Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7842391Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7842630Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7842927Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7843166Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7843506Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7843748Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7844050Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7844284Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7844586Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7844786Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7845025Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7845337Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7845592Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7845896Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7846291Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7846520Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7846721Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7846931Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7847229Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7847449Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7847660Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7847862Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7848071Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7848373Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7848604Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7848808Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7849014Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7849214Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7849365Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7849570Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7849813Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7850026Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7850236Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7850476Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7850698Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7850900Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7851128Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7851335Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7851538Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7851763Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7851975Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7852175Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7852376Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7852594Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7852802Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7853010Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7853214Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7853555Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7853770Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7853993Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7854197Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7854403Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7854617Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7854831Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7855054Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7855256Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7855464Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7855761Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7855982Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7856190Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7856391Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7856599Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7856895Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7857116Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7857320Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7857525Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7857733Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7858032Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7858247Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7858450Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7858657Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7858869Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7859090Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7859314Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7859517Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7859713Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7859898Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7860075Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7860205Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7860315Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7860444Z E1204 11:16:33.598000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7860503Z ('RERUN', {'yellow': True}) [1.6588s] [100%] 2025-12-04T11:45:24.7860846Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:35.273808524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7860850Z 2025-12-04T11:45:24.7861001Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7861301Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7861600Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7861738Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7862227Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7862499Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7862728Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7862947Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7863162Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7863507Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7863748Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7864043Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7864281Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7864582Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7864818Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7865115Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7865348Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7865643Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7865867Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7866079Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7866282Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7866495Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7866701Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7866957Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7867270Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7867467Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7867718Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7868035Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7868255Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7868457Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7868678Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7868890Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7869089Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7869292Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7869514Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7869725Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7869928Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7870124Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7870360Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7870654Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7870892Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7871196Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7871422Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7871643Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7871851Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7872063Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7872274Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7872510Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7872803Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7873040Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7873363Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7873596Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7873894Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7874129Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7874426Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7874660Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7874956Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7875193Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7875501Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7875738Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7876045Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7876298Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7876597Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7876845Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7877142Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7877376Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7877674Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7877908Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7878205Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7878433Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7878638Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7878844Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7879140Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7879379Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7879673Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7879910Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7880219Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7880466Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7880764Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7881016Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7881329Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7881566Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7881859Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7882062Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7882260Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7882460Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7882670Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7882873Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7883108Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7883428Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7883628Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7883827Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7884026Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7884222Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7884472Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7884764Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7885013Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7885321Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7885531Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7885742Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7885946Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7886186Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7886479Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7886705Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7886908Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7887108Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7887313Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7887607Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7887843Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7888141Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7888379Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7888688Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7888921Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7889229Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7889473Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7889768Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7889981Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7890177Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7890402Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7890606Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7890810Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7891014Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7891311Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7891543Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7891840Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7892081Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7892377Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7892616Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7892914Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7893240Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7893567Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7893808Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7894030Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7894245Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7894444Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7894656Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7894862Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7895159Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7895388Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7895598Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7895800Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7896007Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7896305Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7896546Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7896841Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7897080Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7897376Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7897646Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7897946Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7898203Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7898508Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7898757Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7899051Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7899288Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7899584Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7899822Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7900115Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7900354Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7900652Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7900889Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7901186Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7901385Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7901587Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7901820Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7902127Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7902362Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7902669Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7902918Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7903224Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7903549Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7903844Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7904078Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7904375Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7904572Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7904811Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7905107Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7905344Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7905643Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7905860Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7906064Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7906264Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7906484Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7906779Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7907008Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7907226Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7907426Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7907643Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7907937Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7908161Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7908364Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7908567Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7908762Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7908910Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7909112Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7909335Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7909548Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7909747Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7909975Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7910182Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7910385Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7910609Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7910829Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7911030Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7911264Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7911486Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7911697Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7911899Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7912120Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7912329Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7912533Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7912736Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7913033Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7913246Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7913485Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7913687Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7913886Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7914087Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7914302Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7914511Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7914713Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7914951Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7915247Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7915477Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7915702Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7915902Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7916129Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7916424Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7916644Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7916852Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7917056Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7917258Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7917557Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7917760Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7917964Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7918157Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7918352Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7918568Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7918774Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7918978Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7919182Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7919361Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7919535Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7919673Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7919790Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7919916Z E1204 11:16:35.007000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7920090Z [W1204 11:16:35.276121009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7920092Z 2025-12-04T11:45:24.7920239Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7920535Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7920837Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7920968Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7921459Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7921713Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7921943Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7922149Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7922352Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7922647Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7922883Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7923180Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7923464Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7923873Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7924120Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7924429Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7924695Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7924986Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7925211Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7925418Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7925619Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7925828Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7926031Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7926270Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7926566Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7926768Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7927000Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7927297Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7927517Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7927717Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7927953Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7928159Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7928374Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7928572Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7928807Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7929036Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7929236Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7929433Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7929667Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7929962Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7930199Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7930496Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7930717Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7930926Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7931126Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7931336Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7931540Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7931772Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7932067Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7932312Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7932607Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7932854Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7933164Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7933433Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7933724Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7933959Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7934253Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7934489Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7934783Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7935015Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7935312Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7935548Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7935842Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7936075Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7936368Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7936606Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7936914Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7937162Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7937456Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7937695Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7937918Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7938116Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7938414Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7938647Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7938947Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7939178Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7939480Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7939717Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7940011Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7940253Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7940549Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7940789Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7941081Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7941298Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7941501Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7941717Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7941944Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7942157Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7942396Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7942696Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7942898Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7943098Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7943334Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7943539Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7943773Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7944070Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7944305Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7944605Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7944807Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7945019Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7945228Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7945481Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7945783Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7946020Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7946248Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7946453Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7946672Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7946972Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7947208Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7947512Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7947748Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7948048Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7948287Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7948583Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7948823Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7949116Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7949321Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7949522Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7949750Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7949971Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7950171Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7950389Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7950697Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7950948Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7951243Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7951482Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7951780Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7952017Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7952320Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7952554Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7952854Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7953078Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7953313Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7953521Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7953717Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7953933Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7954135Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7954455Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7954681Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7954909Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7955127Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7955346Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7955645Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7955883Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7956184Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7956419Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7956719Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7956961Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7957258Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7957500Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7957797Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7958037Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7958331Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7958571Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7958882Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7959117Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7959429Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7959675Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7959989Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7960229Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7960524Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7960730Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7960930Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7961170Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7961466Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7961703Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7962001Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7962237Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7962531Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7962766Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7963064Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7963364Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7963662Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7963882Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7964132Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7964450Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7964683Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7964980Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7965198Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7965401Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7965604Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7965806Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7966102Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7966318Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7966526Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7966729Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7966934Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7967230Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7967452Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7967669Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7967867Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7968072Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7968221Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.7968430Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7968668Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7968876Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7969077Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7969301Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7969513Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7969712Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7969937Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7970146Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.7970350Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7970577Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.7970785Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7970988Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7971186Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7971405Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7971609Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7971826Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7972034Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7972343Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7972571Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7972787Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7972995Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7973189Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.7973411Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7973630Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7973836Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7974039Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7974241Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7974542Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7974758Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7974967Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7975167Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7975372Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7975674Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7975890Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.7976116Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7976317Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.7976536Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7976846Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7977059Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.7977268Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.7977460Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.7977664Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.7977880Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.7978092Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.7978292Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.7978486Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.7978668Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.7978844Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.7978976Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.7979086Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.7979220Z E1204 11:16:35.009000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.7979379Z [W1204 11:16:35.278278546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.7979381Z 2025-12-04T11:45:24.7979532Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.7979826Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.7980129Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.7980279Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.7980782Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.7981053Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.7981296Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.7981509Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.7981710Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7982010Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7982253Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7982550Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7982793Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7983089Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7983369Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7983668Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7983907Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7984208Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7984428Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7984660Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7984860Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7985074Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7985289Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7985542Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7985863Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7986062Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7986301Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7986597Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7986823Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7987021Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7987247Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7987456Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7987655Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7987858Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7988079Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7988290Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7988488Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7988687Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.7988921Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7989229Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7989478Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7989782Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7990017Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7990225Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.7990425Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.7990635Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.7990840Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.7991077Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7991373Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7991611Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7991905Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7992142Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7992438Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7992675Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7992973Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7993205Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7993556Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7993790Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7994099Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7994351Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7994667Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7994904Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7995199Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7995439Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7995734Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7995970Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7996266Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7996499Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7996798Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7997018Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.7997226Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.7997423Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.7997719Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7997973Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7998266Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7998514Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7998834Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7999092Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7999388Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.7999625Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.7999925Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8000160Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8000458Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8000657Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8000858Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8001058Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8001268Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8001473Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8001707Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8002010Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8002211Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8002423Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8002620Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8002831Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8003080Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8003421Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8003658Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8003951Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8004154Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8004364Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8004572Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8004812Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8005108Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8005338Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8005542Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8005746Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8005947Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8006246Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8006484Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8006797Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8007035Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8007342Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8007593Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8007901Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8008139Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8008439Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8008638Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8008841Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8009068Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8009280Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8009480Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8009688Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8009988Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8010223Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8010521Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8010756Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8011067Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8011307Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8011614Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8011870Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8012165Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8012402Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8012608Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8012813Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8013011Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8013225Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8013471Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8013770Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8013997Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8014201Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8014406Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8014607Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8014912Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8015152Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8015467Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8015707Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8016014Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8016268Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8016565Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8016816Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8017116Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8017351Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8017650Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8017886Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8018184Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8018422Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8018721Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8018963Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8019258Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8019498Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8019793Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8020009Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8020212Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8020458Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8020766Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8021012Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8021314Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8021554Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8021848Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8022089Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8022384Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8022622Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8022917Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8023118Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8023394Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8023691Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8023932Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8024229Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8024465Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8024669Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8024888Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8025096Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8025405Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8025638Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8025842Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8026048Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8026251Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8026551Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8026779Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8026983Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8027190Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8027385Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8027541Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8027741Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8027969Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8028181Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8028386Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8028612Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8028833Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8029035Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8029277Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8029500Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8029708Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8029935Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8030145Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8030344Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8030546Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8030762Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8030969Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8031169Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8031375Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8031672Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8031893Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8032099Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8032299Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8032496Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8032695Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8032928Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8037339Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8037587Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8037793Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8038108Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8038343Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8038545Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8038751Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8038953Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8039251Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8039468Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8039675Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8039879Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8040083Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8040382Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8040579Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8040785Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8040976Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8041176Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8041409Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8041617Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8041832Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8042024Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8042221Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8042404Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8042536Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8042644Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8042773Z E1204 11:16:35.011000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8042932Z [W1204 11:16:35.320422733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8042937Z 2025-12-04T11:45:24.8043085Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8043431Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8043730Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8043864Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8044357Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8044617Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8044846Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8045053Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8045255Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8045548Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8045802Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8046108Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8046344Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8046656Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8046906Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8047201Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8047433Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8047727Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8047949Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8048156Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8048354Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8048563Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8048764Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8048999Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8049294Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8049490Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8049725Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8050042Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8050263Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8050471Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8050701Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8050911Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8051122Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8051323Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8051547Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8051753Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8051952Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8052148Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8052386Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8052680Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8052915Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8053213Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8053465Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8053680Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8053877Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8054091Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8054307Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8054545Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8054852Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8055101Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8055412Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8055644Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8055942Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8056177Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8056472Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8056709Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8057002Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8057236Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8057529Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8057765Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8058057Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8058293Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8058590Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8058841Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8059135Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8059377Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8059682Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8059927Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8060219Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8060441Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8060645Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8060847Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8061141Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8061378Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8061673Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8061905Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8062201Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8062432Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8062726Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8062960Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8063309Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8063543Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8063847Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8064061Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8064282Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8064483Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8064693Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8064898Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8065134Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8065429Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8065629Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8065825Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8066023Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8066218Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8066456Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8066752Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8066984Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8067278Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8067491Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8067703Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8067919Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8068158Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8068466Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8068702Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8068907Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8069108Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8069312Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8069606Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8069845Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8070144Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8070379Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8070675Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8070910Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8071207Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8071441Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8071739Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8071950Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8072149Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8072385Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8072600Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8072802Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8073016Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8073350Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8073589Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8073885Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8074122Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8074415Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8074655Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8074951Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8075191Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8075487Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8075710Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8075917Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8076116Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8076330Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8076542Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8076760Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8077071Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8077309Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8077516Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8077716Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8077922Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8078219Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8078460Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8078758Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8078991Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8079289Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8079525Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8079824Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8080058Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8080354Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8080592Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8080898Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8081146Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8081455Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8081704Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8082005Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8082246Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8082543Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8082778Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8083076Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8083305Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8083507Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8083744Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8084041Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8084279Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8084576Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8084813Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8085122Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8085360Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8085672Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8085917Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8086214Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8086428Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8086664Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8086960Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8087198Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8087496Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8087710Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8087916Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8088115Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8088320Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8088615Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8088833Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8089037Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8089240Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8089456Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8089749Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8089987Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8090202Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8090404Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8090614Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8090764Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8090964Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8091187Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8091397Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8091599Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8091824Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8092032Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8092231Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8092454Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8092663Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8092861Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8093083Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8093321Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8093521Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8093737Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8093957Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8094174Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8094390Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8094590Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8094901Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8095114Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8095320Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8095519Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8095715Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8095914Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8096129Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8096334Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8096533Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8096738Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8097033Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8097250Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8097455Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8097654Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8097869Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8098164Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8098404Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8098619Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8098821Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8099038Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8099331Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8099531Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8099733Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8099926Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8100123Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8100341Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8100547Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8100751Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8100947Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8101131Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8101305Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8101432Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8101542Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8101670Z E1204 11:16:35.053000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8101830Z [W1204 11:16:35.322611270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8101834Z 2025-12-04T11:45:24.8101995Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8102294Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8102602Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8102746Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8103237Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8103532Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8103760Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8103969Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8104171Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8104468Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8104704Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8105003Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8105239Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8105537Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8105775Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8106068Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8106304Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8106611Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8106848Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8107055Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8107268Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8107499Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8107699Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8107938Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8108232Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8108431Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8108666Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8108962Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8109186Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8109383Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8109606Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8109812Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8110011Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8110209Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8110433Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8110648Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8110849Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8111045Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8111288Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8111594Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8111839Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8112134Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8112355Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8112567Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8112768Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8112977Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8113179Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8113436Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8113732Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8113966Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8114264Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8114500Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8114795Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8115056Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8115348Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8115597Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8115902Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8116151Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8116446Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8116679Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8116974Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8117209Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8117507Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8117743Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8118033Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8118269Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8118562Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8118797Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8119089Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8119313Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8119534Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8119734Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8120043Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8120289Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8120597Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8120830Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8121127Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8121362Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8121654Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8121892Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8122186Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8122420Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8122715Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8122912Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8123109Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8123336Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8123549Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8123748Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8123999Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8124290Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8124501Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8124711Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8124921Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8125118Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8125349Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8125643Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8125874Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8126169Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8126367Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8126574Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8126780Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8127014Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8127310Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8127531Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8127735Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8127936Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8128154Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8128451Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8128694Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8129000Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8129253Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8129549Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8129786Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8130080Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8130316Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8130611Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8130814Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8131013Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8131240Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8131450Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8131651Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8131857Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8132151Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8132390Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8132698Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8132937Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8133245Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8133530Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8133842Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8134079Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8134377Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8134602Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8134807Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8135011Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8135204Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8135418Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8135619Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8135916Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8136137Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8136344Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8136545Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8136744Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8137056Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8137290Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8137600Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8137846Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8138152Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8138387Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8138683Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8138918Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8139214Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8139449Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8139745Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8139979Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8140275Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8140508Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8140803Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8141038Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8141346Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8141582Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8141887Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8142101Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8142297Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8142548Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8142841Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8143078Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8143392Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8143627Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8143922Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8144155Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8144450Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8144687Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8144980Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8145180Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8145413Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8145709Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8145963Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8146272Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8146488Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8146708Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8146924Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8147126Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8147422Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8147636Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8147841Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8148045Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8148247Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8148546Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8148769Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8148977Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8149176Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8149374Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8149523Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8149724Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8149950Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8150170Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8150374Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8150608Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8150830Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8151041Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8151267Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8151476Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8151672Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8151895Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8152104Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8152307Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8152505Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8152724Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8152928Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8153133Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8153381Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8153677Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8153894Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8154096Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8154312Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8154506Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8154720Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8154954Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8155158Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8155375Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8155575Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8155874Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8156088Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8156294Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8156492Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8156694Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8156991Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8157206Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8157416Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8157616Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8157818Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8158112Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8158311Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8158525Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8158714Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8158922Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8159136Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8159354Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8159564Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8159757Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8159939Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8160110Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8160239Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8160346Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8160475Z E1204 11:16:35.056000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8160630Z [W1204 11:16:35.324703128 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8160633Z 2025-12-04T11:45:24.8160778Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8161073Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8161373Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8161504Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8161993Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8162251Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8162478Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8162695Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8162895Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8163207Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8163497Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8163805Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8164040Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8164335Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8164571Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8164863Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8165097Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8165391Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8165612Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8165824Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8166021Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8166230Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8166429Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8166666Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8166973Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8167169Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8167416Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8167708Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8167950Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8168157Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8168378Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8168586Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8168782Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8168979Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8169200Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8169407Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8169603Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8169800Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8170033Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8170326Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8170560Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8170852Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8171073Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8171292Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8171491Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8171713Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8171923Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8172156Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8172461Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8172695Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8172987Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8173222Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8173551Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8173787Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8174084Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8174316Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8174611Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8174842Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8175136Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8175370Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8175678Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8175912Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8176220Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8176466Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8176772Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8177005Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8177300Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8177533Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8177828Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8178207Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8178416Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8178618Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8178915Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8179155Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8179448Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8179686Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8179979Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8180246Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8180542Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8180787Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8181098Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8181344Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8181640Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8181839Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8182040Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8182241Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8182452Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8182656Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8182889Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8183186Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8183423Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8183625Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8183825Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8184023Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8184260Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8184552Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8184804Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8185111Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8185311Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8185536Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8185755Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8185993Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8186287Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8186513Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8186716Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8186921Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8187123Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8187422Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8187661Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8187958Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8188198Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8188496Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8188735Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8189044Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8189279Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8189589Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8189799Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8190000Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8190238Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8190446Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8190652Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8190855Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8191155Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8191389Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8191689Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8191924Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8192223Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8192462Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8192756Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8193001Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8193327Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8193568Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8193773Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8193991Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8194211Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8194448Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8194655Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8194953Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8195179Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8195387Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8195595Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8195800Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8196096Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8196336Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8196630Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8196870Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8197165Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8197406Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8197708Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8197957Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8198255Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8198499Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8198808Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8199053Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8199353Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8199592Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8199887Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8200129Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8200426Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8200665Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8200965Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8201166Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8201369Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8201605Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8201903Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8202137Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8202449Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8202693Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8202998Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8203278Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8203589Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8203828Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8204123Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8204328Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8204567Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8204860Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8205099Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8205396Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8205617Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8205825Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8206026Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8206234Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8206530Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8206763Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8206967Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8207186Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8207390Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8207701Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8207940Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8208145Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8208351Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8208546Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8208697Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8208896Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8209121Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8209336Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8209535Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8209761Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8209970Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8210170Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8210391Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8210602Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8210798Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8211036Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8211248Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8211457Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8211676Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8211892Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8212114Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8212316Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8212523Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8212822Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8213035Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8213240Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8213475Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8213673Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8213870Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8214087Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8214291Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8214491Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8214694Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8214988Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8215218Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8215419Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8215634Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8215834Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8216144Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8216374Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8216577Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8216777Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8216978Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8217274Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8217470Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8217673Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8217865Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8218062Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8218278Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8218485Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8218684Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8218876Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8219060Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8219232Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8219371Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8219479Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8219607Z E1204 11:16:35.058000 799458 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8219653Z FAILED [1.4698s] [100%] 2025-12-04T11:45:24.8219666Z 2025-12-04T11:45:24.8219732Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.8219897Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.8219948Z Traceback (most recent call last): 2025-12-04T11:45:24.8220131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8220179Z method(*args, **kwargs) 2025-12-04T11:45:24.8220336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8220377Z method(*args, **kwargs) 2025-12-04T11:45:24.8220530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.8220572Z with policy(): 2025-12-04T11:45:24.8220730Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.8220776Z raise RuntimeError(msg) 2025-12-04T11:45:24.8221180Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1973420032. 2025-12-04T11:45:24.8221184Z 2025-12-04T11:45:24.8221270Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.8221539Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.8221542Z 2025-12-04T11:45:24.8221636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.8221722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8221768Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8221833Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8222395Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8222502Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8222542Z graph_break [] 2025-12-04T11:45:24.8222611Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8222690Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8223183Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.8223239Z current_size = base.storage().size() 2025-12-04T11:45:24.8223312Z Autotune Choices Stats: 2025-12-04T11:45:24.8223707Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.8223808Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8223862Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8224004Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8224243Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8224495Z triton_mm_34 0.0106 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8224724Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8224953Z triton_mm_21 0.0110 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8225180Z triton_mm_22 0.0113 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8225407Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8225632Z triton_mm_30 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8225858Z triton_mm_23 0.0118 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8226085Z triton_mm_25 0.0130 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8226313Z triton_mm_15 0.0131 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8226448Z SingleProcess AUTOTUNE benchmarking takes 0.1624 seconds and 1.2426 seconds precompiling for 33 choices 2025-12-04T11:45:24.8226594Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.8226646Z Traceback (most recent call last): 2025-12-04T11:45:24.8226806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8226850Z method(*args, **kwargs) 2025-12-04T11:45:24.8227005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8227050Z method(*args, **kwargs) 2025-12-04T11:45:24.8227217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.8227260Z with policy(): 2025-12-04T11:45:24.8227416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.8227457Z raise RuntimeError(msg) 2025-12-04T11:45:24.8227872Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1973420032 and is now 2940207104. 2025-12-04T11:45:24.8227885Z 2025-12-04T11:45:24.8227966Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.8228245Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.8228247Z 2025-12-04T11:45:24.8228336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.8228418Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8228461Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8228526Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8229085Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8229194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8229235Z graph_break [] 2025-12-04T11:45:24.8229308Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8229387Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8229877Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.8229932Z current_size = base.storage().size() 2025-12-04T11:45:24.8229974Z Autotune Choices Stats: 2025-12-04T11:45:24.8230350Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.8230423Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8230480Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8230604Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8230841Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8231071Z triton_mm_34 0.0106 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8231316Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8231562Z triton_mm_21 0.0110 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8231789Z triton_mm_22 0.0113 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8232031Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8232266Z triton_mm_30 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8232499Z triton_mm_23 0.0118 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8232730Z triton_mm_25 0.0130 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8232958Z triton_mm_15 0.0131 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8233094Z SingleProcess AUTOTUNE benchmarking takes 0.1624 seconds and 1.2426 seconds precompiling for 33 choices 2025-12-04T11:45:24.8233169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8233216Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8233303Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8233409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8233901Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8233945Z graph_break [] 2025-12-04T11:45:24.8234011Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8234089Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8234132Z Autotune Choices Stats: 2025-12-04T11:45:24.8234499Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.010080000385642052, "best_triton_pos": 0} 2025-12-04T11:45:24.8234570Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8234623Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8234748Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8235004Z triton_mm_67 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8235232Z triton_mm_59 0.0103 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8235278Z _scaled_mm 0.0104 ms 97.3% 2025-12-04T11:45:24.8235522Z triton_mm_71 0.0107 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8235763Z triton_mm_72 0.0108 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8236005Z triton_mm_68 0.0112 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8236232Z triton_mm_60 0.0115 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8236459Z triton_mm_54 0.0116 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8236692Z triton_mm_61 0.0119 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8236919Z triton_mm_63 0.0130 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8237054Z SingleProcess AUTOTUNE benchmarking takes 0.2423 seconds and 0.8534 seconds precompiling for 39 choices 2025-12-04T11:45:24.8237109Z =================================== FAILURES =================================== 2025-12-04T11:45:24.8237259Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.8237308Z Traceback (most recent call last): 2025-12-04T11:45:24.8237471Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8237516Z method(*args, **kwargs) 2025-12-04T11:45:24.8237676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.8237718Z method(*args, **kwargs) 2025-12-04T11:45:24.8237877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.8237918Z with policy(): 2025-12-04T11:45:24.8238076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.8238124Z raise RuntimeError(msg) 2025-12-04T11:45:24.8238518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.8238521Z 2025-12-04T11:45:24.8238603Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.8238877Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.8238879Z 2025-12-04T11:45:24.8238972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.8239047Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8239096Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8239168Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8239726Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8239861Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8239900Z graph_break [] 2025-12-04T11:45:24.8239969Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8240044Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8240536Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.8240585Z current_size = base.storage().size() 2025-12-04T11:45:24.8240634Z Autotune Choices Stats: 2025-12-04T11:45:24.8241000Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009680000133812428, "best_triton_pos": 0} 2025-12-04T11:45:24.8241072Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8241124Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8241250Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8241484Z triton_mm_29 0.0097 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8241718Z triton_mm_34 0.0106 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8241956Z triton_mm_33 0.0109 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8242181Z triton_mm_21 0.0110 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8242410Z triton_mm_22 0.0113 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8242637Z triton_mm_16 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8242881Z triton_mm_30 0.0116 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8243121Z triton_mm_23 0.0118 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8243404Z triton_mm_25 0.0130 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8243649Z triton_mm_15 0.0131 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8243780Z SingleProcess AUTOTUNE benchmarking takes 0.1624 seconds and 1.2426 seconds precompiling for 33 choices 2025-12-04T11:45:24.8243859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8243903Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8243964Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8244067Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8244561Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8244602Z graph_break [] 2025-12-04T11:45:24.8244671Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8244747Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8244795Z Autotune Choices Stats: 2025-12-04T11:45:24.8245157Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.010080000385642052, "best_triton_pos": 0} 2025-12-04T11:45:24.8245226Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8245280Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8245404Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8245638Z triton_mm_67 0.0101 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8245865Z triton_mm_59 0.0103 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8245913Z _scaled_mm 0.0104 ms 97.3% 2025-12-04T11:45:24.8246140Z triton_mm_71 0.0107 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8246373Z triton_mm_72 0.0108 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8246615Z triton_mm_68 0.0112 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8246860Z triton_mm_60 0.0115 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8247087Z triton_mm_54 0.0116 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8247327Z triton_mm_61 0.0119 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8247566Z triton_mm_63 0.0130 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8247697Z SingleProcess AUTOTUNE benchmarking takes 0.2423 seconds and 0.8534 seconds precompiling for 39 choices 2025-12-04T11:45:24.8247776Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.8247822Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.8247885Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.8247986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.8248479Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.8248524Z graph_break [] 2025-12-04T11:45:24.8248589Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.8248669Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.8248713Z Autotune Choices Stats: 2025-12-04T11:45:24.8249081Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009960000403225422, "best_triton_pos": 0} 2025-12-04T11:45:24.8249151Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.8249206Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.8249328Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.8249567Z triton_mm_105 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8249613Z _scaled_mm 0.0105 ms 95.0% 2025-12-04T11:45:24.8249848Z triton_mm_109 0.0106 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8250080Z triton_mm_110 0.0108 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8250320Z triton_mm_97 0.0109 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8250554Z triton_mm_106 0.0113 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8250792Z triton_mm_98 0.0115 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8251031Z triton_mm_92 0.0116 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8251269Z triton_mm_99 0.0119 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.8251499Z triton_mm_101 0.0130 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.8251630Z SingleProcess AUTOTUNE benchmarking takes 0.2547 seconds and 0.7042 seconds precompiling for 39 choices 2025-12-04T11:45:24.8251827Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-58dfbbc01b2bd0df.xml - 2025-12-04T11:45:24.8251892Z =========================== short test summary info ============================ 2025-12-04T11:45:24.8252491Z FAILED [1.4698s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.8252494Z 2025-12-04T11:45:24.8252570Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.8252838Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.8252841Z 2025-12-04T11:45:24.8252933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.8252999Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.8253077Z ================== 1 failed, 187 deselected, 2 rerun in 6.65s ================== 2025-12-04T11:45:24.8253120Z Got exit code 1 2025-12-04T11:45:24.8253166Z Retrying single test... 2025-12-04T11:45:24.8253345Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4e1b69be281ad4ef.xml 2025-12-04T11:45:24.8253409Z ============================= test session starts ============================== 2025-12-04T11:45:24.8253523Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.8253569Z cachedir: .pytest_cache 2025-12-04T11:45:24.8253735Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.8253784Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.8253830Z configfile: pytest.ini 2025-12-04T11:45:24.8253997Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.8254098Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.8254352Z stepcurrent: skipping 90 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.8254405Z Running 1 items in this shard 2025-12-04T11:45:24.8254407Z 2025-12-04T11:45:24.8254757Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:44.505380846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8254774Z 2025-12-04T11:45:24.8255096Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8255419Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8255562Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8256056Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8256317Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8256551Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8256761Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8256968Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8257273Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8257513Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8257815Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8258053Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8258349Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8258597Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8258895Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8259146Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8259448Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8259694Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8259987Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8260228Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8260525Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8260729Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8260971Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8261265Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8261466Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8261698Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8261997Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8262230Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8262529Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8262758Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8262976Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8263185Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8263450Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8263624Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8263817Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8264360Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/ft/cftwcaw26yfqzncfx5ay6f5yphk4oxcp3y77myr6vnwvxrtxokcj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8264513Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8264734Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8264896Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8265045Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8265342Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8265480Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8265744Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8265889Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8266146Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8266310Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8266582Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8266722Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8267001Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8267202Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8267535Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8267847Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8267985Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8268479Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8268750Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8268979Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8269192Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8269397Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8269696Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8269940Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8270236Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8270473Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8270768Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8271006Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8271303Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8271538Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8271845Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8272080Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8272394Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8272651Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8272956Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8273158Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8273417Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8273713Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8273912Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8274150Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8274448Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8274686Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8274984Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8275209Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8275422Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8275626Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8275843Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8276017Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8276213Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8276322Z E1204 11:16:44.600000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8276481Z [W1204 11:16:44.919308437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8276483Z 2025-12-04T11:45:24.8276814Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8277129Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8277282Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8277765Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8278019Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8278252Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8278457Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8278663Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8278958Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8279199Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8279501Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8279736Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8280035Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8280269Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8280579Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8280812Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8281121Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8281369Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8281677Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8281914Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8282210Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8282410Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8282646Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8282938Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8283135Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8283447Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8283742Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8283975Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8284276Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8284502Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8284710Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8284915Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8285141Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8285314Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8285508Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8286046Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/h3/ch34ji4wmoxtg6rrwvgfhbpc4aea2stvtsact5ya6u7y27qtyje4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8286214Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8286433Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8286597Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8286745Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8287041Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8287176Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8287439Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8287580Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8287843Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8288005Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8288278Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8288418Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8288700Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8288898Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8289216Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8289528Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8289662Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8290157Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8290439Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8290666Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8290877Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8291078Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8291381Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8291622Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8291914Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8292150Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8292444Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8292682Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8292978Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8293212Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8293544Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8293795Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8294195Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8294446Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8294759Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8294977Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8295212Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8295510Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8295708Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8295948Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8296246Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8296483Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8296779Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8297002Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8297217Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8297420Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8297635Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8297804Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8297989Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8298098Z E1204 11:16:44.652000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8298270Z [W1204 11:16:44.924065355 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8298273Z 2025-12-04T11:45:24.8298591Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8298901Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8299049Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8299618Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8299880Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8300109Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8300317Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8300525Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8300821Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8301063Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8301358Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8301600Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8301898Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8302131Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8302426Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8302658Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8302968Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8303217Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8303543Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8303795Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8304102Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8304304Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8304537Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8304832Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8305035Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8305269Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8305565Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8305801Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8306099Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8306319Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8306534Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8306742Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8306956Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8307147Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8307326Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8307863Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/7b/c7blspwwishxxvhrwit3xw6m4nwpsxa4hhvy4cvmz76pstq2luos.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8308022Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8308257Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8308417Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8308564Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8308858Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8308993Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8309255Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8309396Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8309657Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8309815Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8310091Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8310230Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8310514Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8310714Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8311028Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8311325Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8311470Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8311971Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8312229Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8312465Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8312690Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8312892Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8313191Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8313464Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8313761Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8313997Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8314290Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8314529Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8314823Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8315062Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8315359Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8315594Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8315889Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8316138Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8316447Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8316647Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8316898Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8317209Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8317405Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8317646Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8317940Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8318178Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8318475Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8318697Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8318909Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8319112Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8319332Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8319499Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8319683Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8319786Z E1204 11:16:44.657000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8319950Z [W1204 11:16:44.927460634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8319954Z 2025-12-04T11:45:24.8320282Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8320576Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8320723Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8321200Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8321489Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8321719Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8321926Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8322131Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8322427Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8322667Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8322962Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8323201Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8323536Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8323771Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8324066Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8324300Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8324623Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8324876Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8325186Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8325427Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8325733Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8325950Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8326183Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8326481Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8326679Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8326915Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8327216Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8327451Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8327745Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8327968Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8328179Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8328381Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8328593Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8328762Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8328940Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8329478Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/4d/c4dp4toeegb6ene2zm275qlmwwitm25znvmalu5xokcx7a6nwjhb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8329638Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8329872Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8330029Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8330192Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8330486Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8330619Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8330881Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8331024Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8331286Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8331443Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8331721Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8331856Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8332138Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8332338Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8332653Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8332953Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8333086Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8333626Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8333886Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8334127Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8334355Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8334570Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8334868Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8338313Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8338620Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8338856Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8339150Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8339383Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8339674Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8339910Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8340201Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8340435Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8340730Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8340963Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8341279Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8341477Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8341720Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8342032Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8342242Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8342475Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8342769Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8343003Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8343334Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8343555Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8343762Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8343962Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8344176Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8344343Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8344526Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8344630Z E1204 11:16:44.660000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8344791Z [W1204 11:16:44.932757055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8344794Z 2025-12-04T11:45:24.8345107Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8345421Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8345554Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8346045Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8346316Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8346558Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8346780Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8346983Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8347276Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8347510Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8347802Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8348034Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8348326Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8348559Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8348852Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8349082Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8349378Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8349610Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8349915Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8350145Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8350455Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8350661Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8350907Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8351199Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8351394Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8351630Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8351925Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8352160Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8352453Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8352673Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8352880Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8353082Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8353331Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8353498Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8353677Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8354243Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/oo/coo5sfmg7qr4pihsbxzvqsogrsg5aawknpdukhcjbvazesc5apdb.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8354393Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8354609Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8354781Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8354940Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8355229Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8355377Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8355637Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8355776Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8356031Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8356186Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8356459Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8356594Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8356875Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8357068Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8357388Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8357682Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8357812Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8358293Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8358560Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8358789Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8359013Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8359223Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8359529Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8359763Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8360057Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8360289Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8360583Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8360819Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8361112Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8361346Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8361636Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8361870Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8362163Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8362394Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8362687Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8362897Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8363130Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8363483Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8363693Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8363940Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8364231Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8364465Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8364755Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8364977Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8365185Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8365385Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8365597Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8365763Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8365941Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8366044Z E1204 11:16:44.666000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8366202Z [W1204 11:16:44.933956847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8366204Z 2025-12-04T11:45:24.8366514Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8366808Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8366939Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8367430Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8367695Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8367932Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8368154Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8368354Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8368647Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8368881Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8369173Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8369406Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8369696Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8369929Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8370220Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8370456Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8370748Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8370978Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8371272Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8371515Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8371809Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8372016Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8372259Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8372554Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8372761Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8372994Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8373319Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8373553Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8373847Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8374066Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8374274Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8374475Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8374686Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8374852Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8375031Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8375555Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpifp6vs3_/iu/ciumaqddts74dhpdg5buodcpruo2dpqxjnctbnp2hfb4oovwo2uu.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.8375702Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.8375942Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.8376100Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.8376261Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.8376548Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.8376696Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.8376966Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.8377105Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.8377382Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.8377538Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.8377809Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.8377944Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.8378222Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.8378416Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.8378732Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8379026Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8379156Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8379636Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8379890Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8380131Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8380335Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8380549Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8380843Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8381088Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8381393Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8381625Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8381920Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8382155Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8382448Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8382679Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8382969Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8383201Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8383533Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8383765Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8384057Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8384255Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8384489Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8384794Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8385005Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8385235Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8385540Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8385785Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8386074Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8386294Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8386500Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8386701Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.8386914Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.8387081Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.8387261Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.8387363Z E1204 11:16:44.667000 805382 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.8387416Z ('RERUN', {'yellow': True}) [3.7164s] [100%] 2025-12-04T11:45:24.8387751Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:46.987619126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8387754Z 2025-12-04T11:45:24.8387900Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8388194Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8388490Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8388619Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8389122Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8389392Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8389627Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8389849Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8390048Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8390343Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8390576Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8390867Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8391101Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8391393Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8391627Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8391934Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8392166Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8392457Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8392675Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8392882Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8393090Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8393336Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8393535Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8393781Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8394086Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8394294Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8394525Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8394817Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8395039Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8395233Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8395453Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8395659Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8395855Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8396052Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8396274Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8396478Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8396673Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8396868Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8397099Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8397407Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8397639Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8397941Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8398170Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8398388Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8398586Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8398792Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8398991Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8399224Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8399517Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8399748Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8400039Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8400271Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8400564Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8400796Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8401089Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8401319Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8401610Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8401853Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8402159Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8402390Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8402694Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8402935Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8403230Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8403493Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8403786Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8404019Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8404309Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8404540Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8404833Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8405052Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8405253Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8405452Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8405745Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8405976Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8406285Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8406517Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8406825Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8407075Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8407381Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8407611Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8407905Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8408136Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8408430Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8408624Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8408820Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8409016Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8409223Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8409424Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8409653Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8409946Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8410143Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8410339Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8410544Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8410737Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8410979Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8411280Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8411523Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8411812Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8412009Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8412216Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8412418Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8412656Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8412950Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8413172Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8413500Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8413703Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8413902Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8414197Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8414432Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8414725Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8414977Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8415284Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8415517Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8415825Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8416073Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8416366Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8416563Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8416762Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8416984Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8417189Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8417389Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8417594Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8417890Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8418127Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8418423Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8418656Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8418953Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8419202Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8419496Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8419746Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8420049Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8420283Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8420486Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8420687Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8420882Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8421094Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8421299Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8421592Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8421817Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8422021Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8422222Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8422427Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8422722Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8422959Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8423279Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8423531Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8423823Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8424078Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8424390Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8424640Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8424937Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8425171Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8425467Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8425702Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8425999Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8426234Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8426528Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8426767Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8427068Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8427303Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8427599Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8427798Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8428010Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8428243Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8428550Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8428794Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8429104Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8429340Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8429635Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8429872Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8430166Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8430403Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8430697Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8430897Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8431131Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8431427Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8431665Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8431958Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8432175Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8432388Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8432590Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8432802Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8433097Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8433354Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8433573Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8433772Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8433976Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8434272Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8434498Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8434702Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8434902Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8435094Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8435245Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8435443Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8435668Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8435875Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8436075Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8436302Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8436509Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8436724Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8436945Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8437169Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8437379Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8437615Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8437822Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8438019Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8438215Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8438431Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8438638Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8438838Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8439040Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8439334Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8439550Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8439753Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8439951Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8440144Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8440340Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8440554Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8440777Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8440977Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8441188Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8441484Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8441709Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8441922Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8442120Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8442320Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8442617Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8442832Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8443036Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8443235Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8443466Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8443762Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8443959Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8444163Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8444353Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8444548Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8444764Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8444985Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8445184Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8445388Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8445571Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8445755Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8445896Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8446000Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8446128Z E1204 11:16:46.726000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8446285Z [W1204 11:16:46.996027570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8446288Z 2025-12-04T11:45:24.8446433Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8446730Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8447028Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8447159Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8447643Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8447898Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8448126Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8448331Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8448532Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8448824Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8449071Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8449365Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8449608Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8449913Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8450157Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8450450Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8450681Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8450974Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8451196Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8451403Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8451600Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8451808Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8452009Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8452242Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8452539Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8452735Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8452967Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8453292Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8453527Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8453723Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8453954Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8454173Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8454368Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8454590Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8454810Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8455014Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8455211Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8455405Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8455640Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8455932Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8456166Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8456460Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8456680Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8456889Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8457085Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8457292Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8457491Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8457745Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8458039Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8458282Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8458585Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8458829Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8459124Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8459359Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8459654Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8459889Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8460180Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8460415Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8460707Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8460941Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8461234Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8461466Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8461762Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8461993Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8462299Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8462530Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8462834Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8463077Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8463416Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8463637Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8463839Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8464041Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8464338Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8464572Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8464868Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8465100Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8465393Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8465625Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8465919Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8466148Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8466445Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8466702Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8466992Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8467205Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8467419Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8467632Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8467839Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8468041Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8468278Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8468572Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8468773Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8468969Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8469167Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8469364Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8469597Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8469890Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8470122Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8470418Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8470613Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8470838Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8471042Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8471289Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8471585Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8471829Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8472046Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8472245Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8472449Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8472742Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8472976Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8473316Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8473552Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8473849Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8474084Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8474381Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8474615Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8474910Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8475111Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8475323Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8475546Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8475764Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8475980Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8476181Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8476490Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8476726Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8477020Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8477257Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8477552Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8477787Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8478080Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8478318Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8478616Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8478836Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8479041Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8479243Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8479438Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8479660Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8479864Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8480170Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8480401Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8480621Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8480821Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8481022Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8481325Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8481566Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8481862Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8482095Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8482391Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8482625Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8482921Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8483158Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8483484Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8483719Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8484027Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8484263Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8484571Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8484818Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8485128Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8485361Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8485659Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8485892Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8486187Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8486387Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8486588Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8486824Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8487120Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8487356Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8487652Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8487889Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8488184Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8488429Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8488726Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8488975Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8489281Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8489488Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8489724Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8490020Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8490255Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8490553Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8490772Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8490975Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8491174Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8491377Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8491675Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8491888Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8492093Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8492290Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8492495Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8492800Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8493025Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8493239Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8493480Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8493674Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8493836Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8494034Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8494256Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8494465Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8494663Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8494886Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8495094Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8495294Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8495518Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8495725Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8495924Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8496143Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8496353Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8496552Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8496751Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8496983Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8497186Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8497399Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8497612Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8497911Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8498135Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8498340Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8498542Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8498735Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8498935Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8499149Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8499355Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8499555Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8499760Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8500061Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8500274Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8500478Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8500677Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8500881Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8501189Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8501406Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8501626Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8501837Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8502037Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8502342Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8502538Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8502741Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8502933Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8503129Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8503383Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8503590Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8503789Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8503982Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8504165Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8504338Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8504464Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8504567Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8504693Z E1204 11:16:46.729000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8504852Z [W1204 11:16:46.998207577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8504854Z 2025-12-04T11:45:24.8505002Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8505319Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8505616Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8505762Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8506258Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8506528Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8506755Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8506962Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8507161Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8507456Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8507691Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8507985Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8508218Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8508516Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8508749Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8509041Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8509275Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8509579Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8509802Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8510022Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8510222Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8510443Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8510655Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8510891Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8511185Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8511383Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8511615Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8511911Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8512135Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8512331Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8512554Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8512760Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8512960Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8513157Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8513407Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8513617Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8513830Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8514026Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8514271Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8514564Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8514807Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8515114Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8515333Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8515540Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8515737Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8515945Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8516145Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8516377Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8516670Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8516904Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8517196Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8517428Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8517722Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8517956Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8518261Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8518493Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8518795Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8519035Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8519344Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8519574Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8519867Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8520099Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8520396Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8520628Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8520920Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8521153Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8521444Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8521678Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8521972Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8522190Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8522392Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8522601Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8522894Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8523138Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8523473Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8523721Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8524011Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8524244Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8524535Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8524767Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8525061Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8525295Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8525588Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8525785Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8525982Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8526176Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8526384Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8526583Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8526816Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8527125Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8527322Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8527532Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8527743Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8527950Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8528182Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8528480Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8528713Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8529004Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8529204Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8529411Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8529620Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8529855Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8530152Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8530376Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8530578Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8530777Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8530977Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8531286Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8531521Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8531827Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8532076Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8532381Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8532620Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8532914Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8533151Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8533482Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8533682Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8533884Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8534106Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8534312Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8534514Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8534717Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8535014Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8535251Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8535548Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8535799Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8536107Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8536339Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8536657Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8536906Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8537202Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8537424Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8537626Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8537827Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8538019Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8538230Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8538430Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8538728Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8538952Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8539157Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8539359Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8539559Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8539853Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8540098Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8540403Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8540637Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8540941Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8541186Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8541481Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8541716Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8542009Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8542244Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8542537Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8542770Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8543065Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8543333Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8543629Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8543864Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8544159Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8544408Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8544702Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8544918Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8545128Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8545362Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8545671Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8545903Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8546199Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8546434Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8546729Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8546961Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8547255Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8547489Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8547783Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8547980Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8548213Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8548509Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8548756Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8549051Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8549277Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8549490Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8549689Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8549905Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8550202Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8550415Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8550618Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8550819Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8551021Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8551317Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8551540Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8551744Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8551943Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8552136Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8552285Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8552481Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8552704Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8552912Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8553121Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8553383Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8553615Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8553824Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8554059Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8554266Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8554460Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8554682Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8554888Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8555087Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8555283Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8555501Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8555704Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8555903Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8556105Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8556399Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8556613Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8556814Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8557015Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8557223Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8557423Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8557649Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8557866Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8558068Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8558285Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8558583Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8558798Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8559003Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8559205Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8559407Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8559705Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8559917Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8560125Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8560324Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8560527Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8560825Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8561020Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8561223Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8561425Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8561623Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8561847Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8562065Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8562277Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8562483Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8562667Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8562839Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8562969Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8563073Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8563201Z E1204 11:16:46.731000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8563402Z [W1204 11:16:46.040629140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8563404Z 2025-12-04T11:45:24.8563551Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8563847Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8564145Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8564279Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8564766Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8565024Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8565252Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8565462Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8565682Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8565989Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8566231Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8566537Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8566791Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8567085Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8567322Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8567615Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8567849Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8568142Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8568363Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8568572Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8568769Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8568981Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8569182Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8569416Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8569712Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8569928Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8570166Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8570469Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8570704Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8570904Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8571136Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8571343Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8571542Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8571742Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8571961Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8572171Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8572375Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8572573Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8572810Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8573105Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8573366Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8573659Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8573881Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8574089Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8574302Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8574514Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8574725Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8574976Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8575269Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8575519Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8575817Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8576055Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8576354Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8576592Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8576887Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8577118Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8577413Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8577648Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8577939Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8578172Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8578463Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8578708Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8579002Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8579247Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8579550Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8579793Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8580086Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8580317Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8580611Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8580831Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8581036Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8581236Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8581527Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8581761Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8582152Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8582395Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8582688Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8582924Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8583233Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8583496Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8583805Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8584053Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8584366Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8584563Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8584758Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8584956Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8585163Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8585366Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8585597Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8585891Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8586090Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8586288Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8586489Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8586685Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8586921Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8587213Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8587450Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8587757Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8587967Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8588178Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8588394Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8588643Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8588938Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8589164Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8589366Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8589572Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8589777Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8590075Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8590313Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8590609Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8590849Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8591144Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8591380Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8591677Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8591923Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8592220Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8592429Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8592644Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8592877Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8593086Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8593322Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8593524Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8593821Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8594056Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8594351Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8594584Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8594881Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8595117Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8595412Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8595653Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8595947Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8596172Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8596391Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8596592Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8596801Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8597025Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8597241Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8597536Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8597762Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8597966Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8598169Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8598372Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8598666Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8598903Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8599197Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8599435Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8599731Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8599966Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8600266Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8600501Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8600807Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8601051Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8601347Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8601602Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8601906Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8602140Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8602434Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8602673Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8602968Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8603205Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8603538Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8603737Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8603938Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8604171Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8604467Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8604700Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8604999Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8605258Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8605565Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8605802Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8606110Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8606363Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8606660Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8606857Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8607095Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8607392Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8607629Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8607924Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8608141Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8608348Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8608548Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8608752Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8609046Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8609263Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8609478Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8609679Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8609897Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8610191Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8610428Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8610645Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8610846Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8611040Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8611191Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8611387Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8611612Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8611830Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8612031Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8612256Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8612462Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8612663Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8612884Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8613097Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8613331Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8613552Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8613778Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8613975Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8614190Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8614420Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8614642Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8614840Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8615043Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8615343Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8615556Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8615761Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8615960Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8616154Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8616351Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8616569Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8616776Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8616973Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8617176Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8617471Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8617689Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8617903Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8618105Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8618323Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8618632Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8618860Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8619064Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8619267Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8619471Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8619769Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8619967Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8620171Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8620364Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8620561Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8620776Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8620983Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8621184Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8621374Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8621566Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8621740Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8621867Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8621984Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8622113Z E1204 11:16:46.774000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8622271Z [W1204 11:16:46.042839847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8622273Z 2025-12-04T11:45:24.8622427Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8622734Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8623042Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8623175Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8623698Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8623952Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8624181Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8624388Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8624590Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8624883Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8625123Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8625421Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8625656Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8625954Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8626188Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8626495Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8626728Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8627035Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8627270Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8627492Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8627693Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8627903Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8628107Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8628344Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8628639Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8628837Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8629070Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8629367Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8629589Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8629788Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8630008Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8630214Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8630415Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8630624Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8630846Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8631063Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8631263Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8631479Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8631727Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8632198Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8632431Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8632725Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8632947Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8633157Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8633405Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8633616Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8633818Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8634052Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8634347Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8634578Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8634873Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8635104Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8635420Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8635670Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8635974Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8636208Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8636514Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8636749Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8637041Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8637276Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8637571Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8637802Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8638098Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8638331Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8638626Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8638860Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8639151Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8639386Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8639689Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8639912Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8640126Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8640339Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8640634Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8640878Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8641173Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8641405Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8641699Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8641932Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8642225Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8642460Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8642754Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8642990Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8643314Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8643515Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8643711Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8643910Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8644135Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8644336Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8644584Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8644890Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8645106Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8645303Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8645505Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8645702Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8645934Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8646229Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8646461Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8646756Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8646953Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8647163Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8647369Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8647604Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8647902Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8648125Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8648343Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8648543Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8648759Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8649053Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8649301Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8649616Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8649853Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8652958Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8653196Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8653531Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8653767Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8654060Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8654262Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8654460Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8654683Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8654888Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8655090Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8655291Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8655612Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8655848Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8656157Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8656406Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8656714Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8656952Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8657249Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8657483Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8657780Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8658002Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8658208Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8658408Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8658601Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8658816Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8659017Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8659312Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8659536Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8659744Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8659953Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8660155Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8660459Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8660706Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8661013Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8661246Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8661540Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8661774Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8662072Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8662303Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8662595Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8662829Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8663124Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8663389Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8663684Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8663917Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8664210Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8664460Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8664769Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8665001Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8665309Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8665522Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8665718Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8665953Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8666245Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8666480Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8666775Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8667009Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8667304Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8667536Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8667833Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8668066Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8668358Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8668556Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8668803Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8669099Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8669343Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8669659Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8669886Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8670088Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8670286Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8670488Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8670780Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8670995Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8671199Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8671399Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8671603Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8671899Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8672122Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8672326Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8672523Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8672717Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8672865Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8673074Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8673317Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8673539Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8673749Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8673987Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8674195Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8674390Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8674612Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8674819Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8675015Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8675235Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8675442Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8675638Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8675833Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8676049Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8676251Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8676450Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8676651Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8676945Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8677174Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8677374Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8677583Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8677787Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8677983Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8678207Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8678410Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8678608Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8678808Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8679102Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8679316Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8679517Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8679714Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8679916Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8680210Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8680426Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8680627Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8680825Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8681028Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8681333Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8681529Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8681741Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8681942Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8682138Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8682365Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8682573Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8682772Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8682962Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8683143Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8683352Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8683478Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8683581Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8683708Z E1204 11:16:46.776000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8683866Z [W1204 11:16:46.044981645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8683869Z 2025-12-04T11:45:24.8684019Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8684315Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8684614Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8684745Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8685235Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8685504Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8685731Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8685960Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8686172Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8686480Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8686715Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8687010Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8687247Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8687539Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8687772Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8688063Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8688295Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8688586Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8688808Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8689014Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8689211Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8689423Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8689623Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8689867Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8690169Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8690366Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8690608Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8690910Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8691129Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8691324Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8691543Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8691749Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8691945Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8692138Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8692357Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8692562Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8692757Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8692951Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8693182Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8693505Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8693737Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8694047Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8694266Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8694483Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8694693Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8694899Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8695115Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8695344Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8695636Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8695869Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8696165Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8696399Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8696689Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8696921Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8697213Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8697444Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8697737Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8697967Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8698269Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8698502Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8698810Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8699051Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8699340Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8699583Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8699875Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8700108Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8700400Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8700634Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8700928Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8701251Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8701455Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8701654Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8702071Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8702304Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8702596Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8702829Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8703140Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8703413Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8703706Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8703951Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8704258Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8704488Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8704780Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8704975Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8705171Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8705365Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8705572Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8705771Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8706002Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8706297Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8706491Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8706686Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8706880Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8707073Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8707322Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8707615Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8707856Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8708161Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8708369Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8708576Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8708780Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8709013Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8709308Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8709530Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8709731Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8709931Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8710131Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8710430Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8710661Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8710954Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8711186Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8711490Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8711724Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8712026Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8712271Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8712564Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8712777Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8712974Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8713194Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8713433Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8713632Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8713834Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8714128Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8714361Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8714654Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8714889Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8715183Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8715415Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8715710Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8715957Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8716251Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8716485Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8716703Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8716921Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8717114Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8717326Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8717527Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8717821Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8718043Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8718243Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8718442Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8718642Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8718936Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8719170Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8719464Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8719699Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8719992Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8720237Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8720530Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8720773Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8721077Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8721322Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8721616Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8721850Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8722146Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8722379Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8722672Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8722905Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8723197Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8723463Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8723755Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8723953Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8724152Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8724386Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8724694Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8724928Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8725233Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8725478Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8725785Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8726018Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8726310Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8726545Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8726841Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8727039Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8727273Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8727566Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8727799Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8728093Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8728307Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8728508Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8728709Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8728922Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8729215Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8729438Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8729653Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8729852Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8730065Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8730358Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8730579Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8730782Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8730985Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8731181Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8731331Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8731527Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8731749Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8731956Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8732156Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8732377Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8732584Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8732780Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8733000Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8733220Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8733447Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8733684Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8733906Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8734118Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8734312Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8734529Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8734734Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8734932Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8735134Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8735432Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8735652Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8735853Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8736054Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8736250Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8736445Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8736661Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8736862Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8737060Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8737279Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8737574Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8737800Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8738011Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8738210Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8738420Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8738716Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8738928Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8739131Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8739329Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8739530Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8739825Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8740021Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8740223Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8740412Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8740606Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8740821Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8741027Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8741225Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8741425Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8741606Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8741775Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8741912Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8742030Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8742157Z E1204 11:16:46.778000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8742221Z ('RERUN', {'yellow': True}) [1.6614s] [100%] 2025-12-04T11:45:24.8742559Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:16:48.469752516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8742562Z 2025-12-04T11:45:24.8742707Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8743003Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8743329Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8743460Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8743943Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8744197Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8744426Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8744637Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8744836Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8745132Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8745367Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8745677Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8745908Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8746213Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8746460Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8746765Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8746998Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8747289Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8747512Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8747718Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8747916Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8748124Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8748323Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8748559Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8748850Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8749049Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8749280Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8749572Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8749793Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8749999Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8750218Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8750445Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8750650Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8750843Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8751073Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8751278Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8751474Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8751668Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8751899Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8752194Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8752424Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8752718Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8752937Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8753144Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8753374Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8753583Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8753782Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8754013Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8754319Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8754550Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8754854Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8755098Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8755405Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8755637Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8755931Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8756164Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8756457Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8756688Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8756979Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8757209Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8757501Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8757731Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8758024Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8758257Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8758548Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8758791Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8759093Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8759325Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8759628Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8759857Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8760057Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8760252Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8760548Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8760781Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8761072Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8761305Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8761596Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8761828Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8762119Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8762351Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8762642Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8762878Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8763183Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8763415Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8763633Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8763842Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8764063Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8764262Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8764497Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8764789Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8764986Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8765184Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8765379Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8765575Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8765804Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8766097Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8766330Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8766623Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8766820Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8767026Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8767249Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8767485Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8767793Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8768024Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8768226Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8768438Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8768639Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8768934Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8769168Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8769465Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8769700Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8769996Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8770229Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8770523Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8770759Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8771052Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8771251Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8771447Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8771681Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8771885Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8772098Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8772311Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8772603Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8772853Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8773147Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8773416Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8773711Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8773945Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8774240Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8774476Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8774771Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8774992Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8775194Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8775394Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8775586Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8775797Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8776011Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8776305Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8776538Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8776752Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8776968Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8777167Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8777462Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8777695Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8777989Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8778223Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8778515Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8778749Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8779043Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8779279Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8779572Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8779806Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8780101Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8780344Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8780637Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8780884Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8781193Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8781446Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8781738Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8781971Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8782262Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8782460Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8782656Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8782893Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8783187Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8783457Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8783755Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8783987Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8784283Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8784515Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8784827Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8785060Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8785364Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8785573Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8785819Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8786116Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8786349Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8786642Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8786857Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8787060Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8787259Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8787459Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8789469Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8789689Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8790575Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8790779Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8790979Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8791298Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8791520Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8791726Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8791924Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8792127Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8792277Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8792485Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8792708Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8792916Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8793114Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8793380Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8793587Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8793782Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8794004Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8794214Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8794473Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8794696Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8794920Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8795120Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8795319Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8795535Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8795736Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8795935Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8796136Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8796444Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8796676Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8796877Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8797078Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8797270Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8797470Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8797686Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8797888Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8798089Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8798289Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8798605Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8798820Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8799037Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8799237Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8799437Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8799740Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8799955Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8800160Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8800358Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8800577Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8800876Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8801083Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8801285Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8801474Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8801674Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8801891Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8802100Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8802296Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8802487Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8802670Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8802853Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8802982Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8803085Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8803223Z E1204 11:16:48.203000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8803416Z [W1204 11:16:48.472066221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8803419Z 2025-12-04T11:45:24.8803566Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8803861Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8804164Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8804300Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8804782Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8805066Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8805294Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8805503Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8805702Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8805999Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8806236Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8806533Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8806769Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8807076Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8807313Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8807622Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8807852Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8808145Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8808369Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8808575Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8808777Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8808986Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8809199Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8809445Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8809738Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8809936Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8810168Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8810461Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8810682Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8810881Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8811104Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8811312Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8811521Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8811715Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8811947Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8812152Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8812351Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8812547Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8812780Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8813075Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8813342Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8813648Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8813882Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8814088Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8814284Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8814495Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8814696Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8814929Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8815224Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8815456Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8815768Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8816001Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8816312Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8816543Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8816837Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8817069Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8817361Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8817595Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8817902Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8818144Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8818437Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8818667Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8818963Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8819196Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8819487Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8819721Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8820012Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8820259Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8820562Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8820785Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8820985Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8821182Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8821476Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8821707Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8821999Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8822242Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8822547Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8822781Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8823072Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8823335Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8823629Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8823861Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8824151Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8824348Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8824558Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8824753Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8824977Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8825176Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8825410Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8825700Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8825897Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8826094Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8826288Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8826499Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8826744Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8827040Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8827272Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8827566Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8827765Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8827973Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8828177Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8828411Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8828716Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8828938Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8829154Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8829355Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8829557Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8829853Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8830086Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8830380Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8830613Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8830918Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8831162Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8831455Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8831690Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8831986Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8832184Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8832380Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8832602Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8832803Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8833023Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8833225Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8833567Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8833801Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8834097Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8834333Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8834627Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8834858Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8835152Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8835398Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8835705Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8835925Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8836127Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8836328Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8836521Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8836734Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8836933Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8837226Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8837461Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8837664Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8837874Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8838074Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8838367Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8838602Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8838897Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8839129Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8839422Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8839673Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8839978Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8840212Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8840505Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8840741Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8841036Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8841270Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8841562Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8841806Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8842099Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8842343Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8842635Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8842870Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8843163Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8843391Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8843587Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8843820Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8844126Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8844374Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8844667Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8844899Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8845193Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8845426Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8845723Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8845955Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8846265Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8846463Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8846709Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8847004Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8847238Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8847533Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8847748Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8847951Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8848154Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8848367Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8848679Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8848892Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8849093Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8849292Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8849492Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8849786Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8850007Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8850209Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8850410Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8850614Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8850761Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8850971Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8851199Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8851408Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8851609Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8851828Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8852038Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8852232Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8852466Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8852684Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8852882Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8853103Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8853327Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8853525Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8853720Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8853934Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8854135Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8854333Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8854556Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8854853Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8855079Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8855281Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8855482Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8855675Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8855872Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8856085Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8856286Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8856484Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8856698Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8857008Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8857222Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8857426Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8857625Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8857830Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8858127Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8858339Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8858543Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8858753Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8858955Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8859263Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8859461Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8859664Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8859855Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8860055Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8860268Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8860475Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8860671Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8860875Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8861065Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8861240Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8861368Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8861471Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8861600Z E1204 11:16:48.205000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8861755Z [W1204 11:16:48.474220199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8861760Z 2025-12-04T11:45:24.8861906Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8862199Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8862497Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8862628Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8863137Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8863443Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8863672Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8863881Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8864080Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8864376Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8864615Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8864905Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8865164Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8865467Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8865701Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8865994Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8866229Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8866520Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8866739Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8866946Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8867142Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8867366Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8867564Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8867808Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8868102Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8868301Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8868537Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8868829Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8869050Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8869254Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8869474Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8869691Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8869887Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8870084Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8870303Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8870512Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8870707Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8870903Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8871134Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8871431Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8871676Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8871978Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8872198Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8872401Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8872601Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8872810Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8873010Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8873241Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8873579Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8873829Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8874121Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8874353Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8874643Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8874877Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8875171Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8875404Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8875697Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8875943Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8876238Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8876483Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8876775Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8877011Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8877304Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8877539Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8877828Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8878077Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8878380Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8878611Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8878904Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8879122Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8879326Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8879520Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8879816Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8880047Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8880355Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8880589Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8880900Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8881131Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8881424Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8881654Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8881947Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8882181Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8882486Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8882692Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8882888Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8883083Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8883333Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8883538Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8883769Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8884062Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8884259Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8884460Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8884669Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8884865Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8885109Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8885400Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8885635Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8885927Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8886126Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8886341Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8886546Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8886797Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8887105Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8887327Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8887527Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8887728Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8887929Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8888223Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8888458Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8888750Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8889000Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8889295Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8889543Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8889835Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8890071Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8890367Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8890564Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8890761Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8890993Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8891200Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8891411Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8891613Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8891909Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8892144Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8892440Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8892673Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8892967Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8893204Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8893545Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8893797Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8894090Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8894313Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8894516Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8894717Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8894912Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8895123Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8895338Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8895632Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8895869Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8896071Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8896271Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8896471Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8896768Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8897003Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8897296Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8897534Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8897842Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8898088Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8898383Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8898617Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8898913Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8899146Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8899443Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8899678Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8899982Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8900227Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8900523Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8900760Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8901053Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8901290Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8901585Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8901783Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8901983Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8902229Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8902538Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8902772Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8903069Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8903333Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8903627Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8903860Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8904152Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8904402Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8904710Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8904910Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8905145Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8905442Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8905676Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8905968Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8906184Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8906387Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8906600Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8906803Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8907111Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8907324Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8907528Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8907728Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8907929Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8908224Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8908456Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8908657Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8908867Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8909058Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8909208Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8909405Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8909630Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8909839Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8910038Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8910258Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8910465Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8910671Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8910891Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8911107Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.8911302Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8911526Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8911735Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8911934Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8912130Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8912341Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8912553Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8912751Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8912969Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8913299Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8913512Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8913714Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8913913Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8914106Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8914302Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8914515Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8914716Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8914929Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8915129Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8915436Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8915650Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8915852Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8916050Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8916251Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8916550Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8916779Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8916981Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8917192Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8917392Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8917686Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8917882Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.8918084Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.8918273Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.8918470Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.8918688Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.8918894Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.8919104Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.8919292Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.8919485Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.8919656Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.8919782Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.8919885Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.8920013Z E1204 11:16:48.207000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.8920171Z [W1204 11:16:48.516654261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.8920173Z 2025-12-04T11:45:24.8920318Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.8920612Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.8920923Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.8921068Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.8921548Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.8921803Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.8922032Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.8922239Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.8922438Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8922732Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8922969Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8923306Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8923556Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8923849Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8924080Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8924373Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8924606Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8924900Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8925120Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8925341Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8925551Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8925760Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8925962Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8926192Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8926486Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8926681Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8926915Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8927207Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8927439Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8927637Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8927868Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8928074Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8928269Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8928466Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8928686Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8928891Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8929088Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8929281Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8929533Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8929835Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8930069Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8930361Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8930580Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8930786Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.8930980Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8931189Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8931386Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8931632Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8931925Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8932171Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8932471Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8932702Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8932996Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8933228Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8933547Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8933795Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8934085Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8934332Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8934627Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8934860Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8935152Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8935384Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8935675Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8935906Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8936213Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8936445Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8936750Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8936983Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8937276Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8937497Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8937696Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8937894Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.8938186Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8938432Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8938734Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8938965Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8939257Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8939490Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8939788Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8940019Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8940310Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8940557Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8940848Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8941055Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8941250Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8941447Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8941655Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8941858Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8942089Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8942380Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8942587Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8942793Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8942989Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8943184Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8943439Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8944035Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8944628Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8945199Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8945726Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8946171Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8946642Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8947123Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8947708Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8948268Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8948805Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8949358Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8949803Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8950331Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8950917Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8951485Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8952059Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8952630Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8953195Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8953826Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8954394Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8954959Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8958644Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8959119Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8959578Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8960054Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8960493Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8960928Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8961460Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8962029Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8962594Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8963158Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8963771Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8964347Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8964911Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8965471Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8966036Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8966586Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8967046Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8967486Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8967917Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8968371Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.8968818Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8969361Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8969910Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8970368Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8970806Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8971241Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8971768Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8972331Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8972909Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8973524Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8974089Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8974648Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8975209Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8975769Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8976329Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8976892Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8977459Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8978035Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8978610Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8979170Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8979734Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8980296Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8980855Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8981415Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8981974Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8982519Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.8982964Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8983466Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8984027Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8984590Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8985150Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8985712Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8986272Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8986835Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8987423Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8987999Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8988562Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8989085Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8989550Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8990112Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8990672Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.8991234Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8991788Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8992258Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8992696Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8993130Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8993694Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8994240Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.8994691Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8995127Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8995560Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.8996107Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.8996659Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.8997132Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.8997569Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.8997993Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.8998369Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.8998750Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.8999202Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.8999667Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9000108Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9000581Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9001057Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9001496Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9001948Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9002410Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9002847Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9003315Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9003778Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9004221Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9004650Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9005110Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9005561Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9006011Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9006447Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9006980Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9007522Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9007971Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9008406Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9008830Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9009266Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9009711Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9010180Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9010615Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9011048Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9011579Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9012122Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9012572Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9013009Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9013487Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9014028Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9014578Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9015051Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9015487Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9015920Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9016457Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9016986Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9017424Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9017851Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9018287Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9018727Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9019205Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9019645Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9020067Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9020473Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9020862Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9021207Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9021486Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9021760Z E1204 11:16:48.250000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9022091Z [W1204 11:16:48.518831759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9022286Z 2025-12-04T11:45:24.9022440Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9022945Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9023605Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9024082Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9024735Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9025510Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9026037Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9026512Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9026954Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9027507Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9028095Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9028669Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9029240Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9029803Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9030369Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9030941Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9031511Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9032079Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9032648Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9033121Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9033620Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9034071Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9034522Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9034992Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9035558Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9036084Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9036542Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9037117Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9037677Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9038128Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9038575Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9039035Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9039471Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9039896Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9040346Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9040809Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9041252Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9041689Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9042154Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9042726Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9043323Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9043885Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9044431Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9044895Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9045445Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9045901Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9046347Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9046827Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9047388Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9047945Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9048505Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9049065Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9049623Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9050184Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9050758Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9051321Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9051890Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9052446Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9053009Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9053598Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9054155Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9054712Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9055293Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9055872Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9056430Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9056986Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9057542Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9058101Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9058663Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9059209Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9059666Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9060102Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9060639Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9061210Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9061774Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9062335Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9062895Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9063493Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9064050Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9064625Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9065183Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9065754Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9066312Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9066835Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9067263Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9067691Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9068129Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9068572Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9069037Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9069614Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9070137Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9070579Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9071002Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9071425Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9071885Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9072445Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9073001Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9073605Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9074140Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9074591Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9075039Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9075508Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9076071Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9076624Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9077082Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9077518Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9077952Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9078497Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9079058Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9079635Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9080197Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9080758Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9081320Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9081881Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9082446Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9083020Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9083584Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9084014Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9084468Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9084926Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9085363Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9085802Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9086335Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9086896Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9087456Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9088032Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9088596Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9089179Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9089743Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9090305Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9090866Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9091419Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9091881Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9092332Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9092771Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9093208Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9093680Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9094209Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9094765Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9095223Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9095660Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9096095Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9096626Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9097207Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9097772Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9098354Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9098918Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9099482Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9100048Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9100617Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9101179Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9101757Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9102333Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9102963Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9103555Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9104120Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9104687Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9105250Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9105810Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9106376Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9106957Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9107497Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9107926Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9108390Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9108954Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9109520Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9110083Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9110646Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9111221Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9111797Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9112363Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9112925Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9113524Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9114050Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9114516Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9115083Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9115645Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9116221Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9116766Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9117230Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9117667Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9118104Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9118633Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9119175Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9119626Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9120063Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9120513Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9121057Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9121606Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9122063Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9122498Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9122925Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9123336Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9123716Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9124171Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9124637Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9125094Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9125548Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9126024Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9126461Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9126913Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9127380Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9127821Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9128274Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9128735Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9129189Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9129618Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9130075Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9130526Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9130960Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9131401Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9131939Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9132479Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9132932Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9133404Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9133853Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9134278Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9134720Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9135181Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9135616Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9136049Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9136581Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9137124Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9137573Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9138022Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9138456Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9138999Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9139539Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9139990Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9140424Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9140858Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9141387Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9141912Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9142345Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9142784Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9143208Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9143685Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9144155Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9144593Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9145014Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9145420Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9145805Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9146138Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9146410Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9146674Z E1204 11:16:48.252000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9147008Z [W1204 11:16:48.520985026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9147199Z 2025-12-04T11:45:24.9147359Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9147834Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9148459Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9148915Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9149563Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9150324Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9150835Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9151300Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9151753Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9152283Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9152856Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9153458Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9154021Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9154581Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9155145Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9155704Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9156277Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9156853Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9157399Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9157862Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9158300Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9158746Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9159190Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9159656Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9160216Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9160738Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9161215Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9161790Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9162337Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9162787Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9163238Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9163736Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9164175Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9164603Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9165053Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9165528Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9165977Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9166403Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9166863Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9167426Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9167991Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9168556Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9169102Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9169563Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9170013Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9170454Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9170910Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9171375Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9171933Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9172495Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9173055Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9173651Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9174207Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9174786Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9175358Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9175917Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9176477Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9177038Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9177598Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9178162Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9178721Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9179282Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9179855Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9180426Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9180984Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9181542Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9182104Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9182664Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9183223Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9183815Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9184272Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9184718Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9185247Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9185806Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9186365Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9186924Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9187484Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9188045Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9188614Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9189171Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9189739Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9190298Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9190855Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9191378Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9191804Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9192232Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9192670Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9193132Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9193647Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9194209Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9194733Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9195159Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9195587Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9196014Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9196476Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9197040Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9197601Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9198178Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9198700Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9199150Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9199596Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9200068Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9200633Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9201184Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9201390Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9201593Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9201807Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9202113Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9202346Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9202639Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9202871Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9203166Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9203437Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9203730Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9203970Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9204275Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9204486Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9204682Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9204904Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9205110Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9205309Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9205512Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9205803Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9206052Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9206365Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9206601Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9206894Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9207128Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9207423Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9207656Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9207950Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9208170Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9208399Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9208602Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9208810Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9209020Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9209219Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9209513Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9209733Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9209936Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9210135Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9210346Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9210655Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9210890Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9211186Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9211419Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9211718Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9211951Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9212244Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9212476Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9212780Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9213016Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9213366Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9213601Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9213896Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9214130Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9214424Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9214656Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9214964Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9215212Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9215507Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9215706Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9215903Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9216138Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9216433Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9216666Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9216959Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9217206Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9217500Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9217743Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9218037Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9218271Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9218563Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9218760Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9218992Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9219300Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9219543Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9219836Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9220050Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9220255Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9220455Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9220655Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9220949Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9221161Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9221363Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9221573Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9221773Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9222075Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9222296Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9222503Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9222702Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9222894Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9223040Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9223237Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9223509Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9223731Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9223927Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9224149Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9224358Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9224556Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9224779Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9224987Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9225181Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9225402Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9225620Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9225819Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9226025Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9226239Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9226439Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9226638Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9226841Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9227135Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9227352Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9227564Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9227765Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9227968Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9228166Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9228380Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9228580Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9228779Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9228980Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9229276Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9229489Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9229693Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9229908Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9230111Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9230415Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9230628Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9230833Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9231031Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9231232Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9231525Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9231734Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9231937Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9232137Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9232333Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9232544Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9232750Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9232947Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9233137Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9233351Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9233521Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9233647Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9233752Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9233891Z E1204 11:16:48.254000 805382 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9233934Z FAILED [1.4996s] [100%] 2025-12-04T11:45:24.9233937Z 2025-12-04T11:45:24.9233996Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9234143Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9234204Z Traceback (most recent call last): 2025-12-04T11:45:24.9234371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9234417Z method(*args, **kwargs) 2025-12-04T11:45:24.9234570Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9234612Z method(*args, **kwargs) 2025-12-04T11:45:24.9234763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9234802Z with policy(): 2025-12-04T11:45:24.9234955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9234996Z raise RuntimeError(msg) 2025-12-04T11:45:24.9235396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1973420032. 2025-12-04T11:45:24.9235400Z 2025-12-04T11:45:24.9235478Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9235759Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9235776Z 2025-12-04T11:45:24.9235866Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9235948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9235992Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9236053Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9236610Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9236712Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9236753Z graph_break [] 2025-12-04T11:45:24.9236823Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9236899Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9237392Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9237441Z current_size = base.storage().size() 2025-12-04T11:45:24.9237481Z Autotune Choices Stats: 2025-12-04T11:45:24.9237870Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009759999811649323, "best_triton_pos": 0} 2025-12-04T11:45:24.9237940Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9237996Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9238120Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9238369Z triton_mm_29 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9238412Z _scaled_mm 0.0104 ms 93.5% 2025-12-04T11:45:24.9238648Z triton_mm_33 0.0108 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9238878Z triton_mm_34 0.0108 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9239107Z triton_mm_21 0.0110 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9239333Z triton_mm_30 0.0114 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9239566Z triton_mm_22 0.0115 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9239803Z triton_mm_16 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9240031Z triton_mm_23 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9240257Z triton_mm_25 0.0128 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9240390Z SingleProcess AUTOTUNE benchmarking takes 0.1638 seconds and 1.4771 seconds precompiling for 33 choices 2025-12-04T11:45:24.9240539Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9240587Z Traceback (most recent call last): 2025-12-04T11:45:24.9240748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9240788Z method(*args, **kwargs) 2025-12-04T11:45:24.9240940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9240983Z method(*args, **kwargs) 2025-12-04T11:45:24.9241134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9241172Z with policy(): 2025-12-04T11:45:24.9241324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9241369Z raise RuntimeError(msg) 2025-12-04T11:45:24.9241775Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1973420032 and is now 2940207104. 2025-12-04T11:45:24.9241779Z 2025-12-04T11:45:24.9241855Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9242128Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9242131Z 2025-12-04T11:45:24.9242221Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9242296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9242344Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9242401Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9242957Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9243060Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9243097Z graph_break [] 2025-12-04T11:45:24.9243165Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9243238Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9243770Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9243841Z current_size = base.storage().size() 2025-12-04T11:45:24.9243882Z Autotune Choices Stats: 2025-12-04T11:45:24.9244250Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009759999811649323, "best_triton_pos": 0} 2025-12-04T11:45:24.9244319Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9244370Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9244493Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9244724Z triton_mm_29 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9244767Z _scaled_mm 0.0104 ms 93.5% 2025-12-04T11:45:24.9244997Z triton_mm_33 0.0108 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9245225Z triton_mm_34 0.0108 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9245466Z triton_mm_21 0.0110 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9245691Z triton_mm_30 0.0114 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9245929Z triton_mm_22 0.0115 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9246151Z triton_mm_16 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9246382Z triton_mm_23 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9246609Z triton_mm_25 0.0128 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9246739Z SingleProcess AUTOTUNE benchmarking takes 0.1638 seconds and 1.4771 seconds precompiling for 33 choices 2025-12-04T11:45:24.9246815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9246857Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9246914Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9247013Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9247518Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9247566Z graph_break [] 2025-12-04T11:45:24.9247630Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9247705Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9247746Z Autotune Choices Stats: 2025-12-04T11:45:24.9248108Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.01023900043219328, "best_triton_pos": 0} 2025-12-04T11:45:24.9248176Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9248231Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9248352Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9248588Z triton_mm_67 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9248630Z _scaled_mm 0.0104 ms 98.8% 2025-12-04T11:45:24.9248861Z triton_mm_72 0.0107 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9249084Z triton_mm_59 0.0107 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9249322Z triton_mm_71 0.0109 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9249555Z triton_mm_68 0.0112 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9249783Z triton_mm_60 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9250008Z triton_mm_54 0.0118 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9250239Z triton_mm_61 0.0119 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9250465Z triton_mm_63 0.0129 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9250593Z SingleProcess AUTOTUNE benchmarking takes 0.2423 seconds and 0.8553 seconds precompiling for 39 choices 2025-12-04T11:45:24.9250648Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9250807Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9250855Z Traceback (most recent call last): 2025-12-04T11:45:24.9251013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9251067Z method(*args, **kwargs) 2025-12-04T11:45:24.9251219Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9251259Z method(*args, **kwargs) 2025-12-04T11:45:24.9251410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9251448Z with policy(): 2025-12-04T11:45:24.9251600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9251640Z raise RuntimeError(msg) 2025-12-04T11:45:24.9252036Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.9252039Z 2025-12-04T11:45:24.9252113Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9252376Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9252378Z 2025-12-04T11:45:24.9252466Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9252542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9252585Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9252643Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9253203Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9253337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9253388Z graph_break [] 2025-12-04T11:45:24.9253454Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9253526Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9254011Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9254061Z current_size = base.storage().size() 2025-12-04T11:45:24.9254100Z Autotune Choices Stats: 2025-12-04T11:45:24.9254469Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009759999811649323, "best_triton_pos": 0} 2025-12-04T11:45:24.9254534Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9254585Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9254722Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9254955Z triton_mm_29 0.0098 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9255009Z _scaled_mm 0.0104 ms 93.5% 2025-12-04T11:45:24.9255237Z triton_mm_33 0.0108 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9255463Z triton_mm_34 0.0108 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9255691Z triton_mm_21 0.0110 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9255917Z triton_mm_30 0.0114 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9256142Z triton_mm_22 0.0115 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9256366Z triton_mm_16 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9256593Z triton_mm_23 0.0119 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9256834Z triton_mm_25 0.0128 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9256965Z SingleProcess AUTOTUNE benchmarking takes 0.1638 seconds and 1.4771 seconds precompiling for 33 choices 2025-12-04T11:45:24.9257039Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9257093Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9257151Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9257251Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9257740Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9257781Z graph_break [] 2025-12-04T11:45:24.9257844Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9257919Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9257960Z Autotune Choices Stats: 2025-12-04T11:45:24.9258322Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.01023900043219328, "best_triton_pos": 0} 2025-12-04T11:45:24.9258405Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9258455Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9258576Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9258822Z triton_mm_67 0.0102 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9258864Z _scaled_mm 0.0104 ms 98.8% 2025-12-04T11:45:24.9259090Z triton_mm_72 0.0107 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9259312Z triton_mm_59 0.0107 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9259541Z triton_mm_71 0.0109 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9259768Z triton_mm_68 0.0112 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9259993Z triton_mm_60 0.0114 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9260216Z triton_mm_54 0.0118 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9260451Z triton_mm_61 0.0119 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9260677Z triton_mm_63 0.0129 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9260817Z SingleProcess AUTOTUNE benchmarking takes 0.2423 seconds and 0.8553 seconds precompiling for 39 choices 2025-12-04T11:45:24.9260890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9260936Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9260992Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9261094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9261577Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9261614Z graph_break [] 2025-12-04T11:45:24.9261677Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:24.9261752Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9261792Z Autotune Choices Stats: 2025-12-04T11:45:24.9262152Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.009999999776482582, "best_triton_pos": 0} 2025-12-04T11:45:24.9262229Z AUTOTUNE scaled_mm(257x1024, 1024x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9262289Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9262412Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9262647Z triton_mm_105 0.0100 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9262688Z _scaled_mm 0.0105 ms 95.1% 2025-12-04T11:45:24.9262918Z triton_mm_109 0.0106 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9263147Z triton_mm_110 0.0108 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9263402Z triton_mm_97 0.0110 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9263626Z triton_mm_98 0.0116 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9263850Z triton_mm_106 0.0116 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9264086Z triton_mm_92 0.0118 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9264315Z triton_mm_99 0.0118 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9264555Z triton_mm_101 0.0128 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9264688Z SingleProcess AUTOTUNE benchmarking takes 0.2613 seconds and 0.7182 seconds precompiling for 39 choices 2025-12-04T11:45:24.9264881Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4e1b69be281ad4ef.xml - 2025-12-04T11:45:24.9264944Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9265541Z FAILED [1.4996s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2940207104 and is now 3906994176. 2025-12-04T11:45:24.9265544Z 2025-12-04T11:45:24.9265617Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9265878Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9265893Z 2025-12-04T11:45:24.9265980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9266044Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9266127Z ================== 1 failed, 187 deselected, 2 rerun in 6.90s ================== 2025-12-04T11:45:24.9266166Z Got exit code 1 2025-12-04T11:45:24.9266372Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9266502Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.9266647Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20912fa775f19e94.xml 2025-12-04T11:45:24.9266706Z ============================= test session starts ============================== 2025-12-04T11:45:24.9266820Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9266862Z cachedir: .pytest_cache 2025-12-04T11:45:24.9267021Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9267071Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9267111Z configfile: pytest.ini 2025-12-04T11:45:24.9267279Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9267355Z collecting ... collected 188 items / 91 deselected / 97 selected 2025-12-04T11:45:24.9267410Z stepcurrent: skipping 91 already run items. 2025-12-04T11:45:24.9267454Z Running 97 items in this shard 2025-12-04T11:45:24.9267457Z 2025-12-04T11:45:24.9267670Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7196s] [ 1%] 2025-12-04T11:45:24.9267892Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3919s] [ 1%] 2025-12-04T11:45:24.9268078Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3211s] [ 1%] 2025-12-04T11:45:24.9268080Z 2025-12-04T11:45:24.9268134Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9268286Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9271414Z Traceback (most recent call last): 2025-12-04T11:45:24.9271593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9271636Z method(*args, **kwargs) 2025-12-04T11:45:24.9271791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9271832Z method(*args, **kwargs) 2025-12-04T11:45:24.9271983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9272023Z with policy(): 2025-12-04T11:45:24.9272178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9272219Z raise RuntimeError(msg) 2025-12-04T11:45:24.9272619Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.9272623Z 2025-12-04T11:45:24.9272720Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9272982Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9272998Z 2025-12-04T11:45:24.9273087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9273163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9273206Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9273304Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9273373Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9273476Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9273512Z graph_break [] 2025-12-04T11:45:24.9273576Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9273717Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9273765Z Traceback (most recent call last): 2025-12-04T11:45:24.9273922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9273965Z method(*args, **kwargs) 2025-12-04T11:45:24.9274115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9274155Z method(*args, **kwargs) 2025-12-04T11:45:24.9274306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9274344Z with policy(): 2025-12-04T11:45:24.9274497Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9274539Z raise RuntimeError(msg) 2025-12-04T11:45:24.9274953Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.9274957Z 2025-12-04T11:45:24.9275032Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9275302Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9275306Z 2025-12-04T11:45:24.9275393Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9275467Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9275509Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9275567Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9275634Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9275736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9275772Z graph_break [] 2025-12-04T11:45:24.9275833Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9275906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9275947Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9276002Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9276102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9276166Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9276202Z graph_break [] 2025-12-04T11:45:24.9276260Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9276329Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9276469Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9276530Z Traceback (most recent call last): 2025-12-04T11:45:24.9276684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9276724Z method(*args, **kwargs) 2025-12-04T11:45:24.9276874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9276915Z method(*args, **kwargs) 2025-12-04T11:45:24.9277064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9277100Z with policy(): 2025-12-04T11:45:24.9277252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9277294Z raise RuntimeError(msg) 2025-12-04T11:45:24.9277676Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9277679Z 2025-12-04T11:45:24.9277753Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9278006Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9278008Z 2025-12-04T11:45:24.9278095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9278168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9278211Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9278268Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9278334Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9278444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9278480Z graph_break [] 2025-12-04T11:45:24.9278538Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9278613Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9278665Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9278721Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9278817Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9278881Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9278918Z graph_break [] 2025-12-04T11:45:24.9278976Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9279048Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9279089Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9279147Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9279242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9279307Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9279343Z graph_break [] 2025-12-04T11:45:24.9279401Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9279594Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-20912fa775f19e94.xml - 2025-12-04T11:45:24.9279655Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9280228Z FAILED [0.3211s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9280252Z 2025-12-04T11:45:24.9280327Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9280582Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9280584Z 2025-12-04T11:45:24.9280672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9280735Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9280804Z ================== 1 failed, 91 deselected, 2 rerun in 2.45s =================== 2025-12-04T11:45:24.9280841Z Got exit code 1 2025-12-04T11:45:24.9280882Z Retrying single test... 2025-12-04T11:45:24.9281027Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3197c0f09daf575c.xml 2025-12-04T11:45:24.9281085Z ============================= test session starts ============================== 2025-12-04T11:45:24.9281197Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9281238Z cachedir: .pytest_cache 2025-12-04T11:45:24.9281397Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9281442Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9281483Z configfile: pytest.ini 2025-12-04T11:45:24.9281646Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9281722Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9281984Z stepcurrent: skipping 91 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9282029Z Running 1 items in this shard 2025-12-04T11:45:24.9282031Z 2025-12-04T11:45:24.9282242Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6950s] [100%] 2025-12-04T11:45:24.9282462Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3526s] [100%] 2025-12-04T11:45:24.9282646Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3197s] [100%] 2025-12-04T11:45:24.9282649Z 2025-12-04T11:45:24.9282702Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9282843Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9282892Z Traceback (most recent call last): 2025-12-04T11:45:24.9283048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9283089Z method(*args, **kwargs) 2025-12-04T11:45:24.9283240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9283306Z method(*args, **kwargs) 2025-12-04T11:45:24.9283456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9283510Z with policy(): 2025-12-04T11:45:24.9283662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9283704Z raise RuntimeError(msg) 2025-12-04T11:45:24.9284090Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.9284106Z 2025-12-04T11:45:24.9284179Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9284433Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9284435Z 2025-12-04T11:45:24.9284520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9284597Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9284638Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9284695Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9284761Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9284860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9284895Z graph_break [] 2025-12-04T11:45:24.9284956Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9285096Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9285142Z Traceback (most recent call last): 2025-12-04T11:45:24.9285294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9285335Z method(*args, **kwargs) 2025-12-04T11:45:24.9285483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9285523Z method(*args, **kwargs) 2025-12-04T11:45:24.9285700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9285738Z with policy(): 2025-12-04T11:45:24.9285889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9285931Z raise RuntimeError(msg) 2025-12-04T11:45:24.9286327Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.9286333Z 2025-12-04T11:45:24.9286405Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9286660Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9286663Z 2025-12-04T11:45:24.9286749Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9286824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9286866Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9286923Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9286989Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9287087Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9287122Z graph_break [] 2025-12-04T11:45:24.9287181Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9287266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9287307Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9287362Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9287475Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9287539Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9287575Z graph_break [] 2025-12-04T11:45:24.9287632Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9287687Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9287826Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9287872Z Traceback (most recent call last): 2025-12-04T11:45:24.9288027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9288072Z method(*args, **kwargs) 2025-12-04T11:45:24.9288222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9288264Z method(*args, **kwargs) 2025-12-04T11:45:24.9288415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9288451Z with policy(): 2025-12-04T11:45:24.9288603Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9288645Z raise RuntimeError(msg) 2025-12-04T11:45:24.9289028Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9289033Z 2025-12-04T11:45:24.9289106Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9289373Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9289377Z 2025-12-04T11:45:24.9289463Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9289536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9289589Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9289647Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9289712Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9289810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9289846Z graph_break [] 2025-12-04T11:45:24.9289905Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9289976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9290020Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9290074Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9290170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9290233Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9290269Z graph_break [] 2025-12-04T11:45:24.9290327Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9290400Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9290440Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9290494Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9290589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9290664Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9290700Z graph_break [] 2025-12-04T11:45:24.9290760Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9290961Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3197c0f09daf575c.xml - 2025-12-04T11:45:24.9291022Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9291593Z FAILED [0.3197s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9291598Z 2025-12-04T11:45:24.9291669Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9291923Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9291926Z 2025-12-04T11:45:24.9292010Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9292073Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9292142Z ================== 1 failed, 187 deselected, 2 rerun in 2.39s ================== 2025-12-04T11:45:24.9292179Z Got exit code 1 2025-12-04T11:45:24.9292219Z Retrying single test... 2025-12-04T11:45:24.9292364Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d1eb13f077c7655c.xml 2025-12-04T11:45:24.9292421Z ============================= test session starts ============================== 2025-12-04T11:45:24.9292533Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9292573Z cachedir: .pytest_cache 2025-12-04T11:45:24.9292747Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9292793Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9292833Z configfile: pytest.ini 2025-12-04T11:45:24.9293005Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9293081Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9293362Z stepcurrent: skipping 91 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9293407Z Running 1 items in this shard 2025-12-04T11:45:24.9293409Z 2025-12-04T11:45:24.9293619Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6177s] [100%] 2025-12-04T11:45:24.9293828Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2729s] [100%] 2025-12-04T11:45:24.9294117Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2317s] [100%] 2025-12-04T11:45:24.9294121Z 2025-12-04T11:45:24.9294173Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9294313Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9294358Z Traceback (most recent call last): 2025-12-04T11:45:24.9294533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9294573Z method(*args, **kwargs) 2025-12-04T11:45:24.9294729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9294782Z method(*args, **kwargs) 2025-12-04T11:45:24.9294932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9294968Z with policy(): 2025-12-04T11:45:24.9295120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9295160Z raise RuntimeError(msg) 2025-12-04T11:45:24.9295546Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:24.9295549Z 2025-12-04T11:45:24.9295621Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9295881Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9295883Z 2025-12-04T11:45:24.9295968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9296044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9296086Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9296141Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9296206Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9296305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9296340Z graph_break [] 2025-12-04T11:45:24.9296400Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9296554Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9296601Z Traceback (most recent call last): 2025-12-04T11:45:24.9296755Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9296794Z method(*args, **kwargs) 2025-12-04T11:45:24.9296958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9296997Z method(*args, **kwargs) 2025-12-04T11:45:24.9297148Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9297185Z with policy(): 2025-12-04T11:45:24.9297337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9297376Z raise RuntimeError(msg) 2025-12-04T11:45:24.9297758Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:24.9297761Z 2025-12-04T11:45:24.9297834Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9298090Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9298092Z 2025-12-04T11:45:24.9298178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9298264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9298306Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9298364Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9298445Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9298544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9298579Z graph_break [] 2025-12-04T11:45:24.9298638Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9298710Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9298752Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9298805Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9298901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9298966Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9299004Z graph_break [] 2025-12-04T11:45:24.9299061Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9299116Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9299256Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9299301Z Traceback (most recent call last): 2025-12-04T11:45:24.9299455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9299497Z method(*args, **kwargs) 2025-12-04T11:45:24.9299648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9299688Z method(*args, **kwargs) 2025-12-04T11:45:24.9299840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9299877Z with policy(): 2025-12-04T11:45:24.9300030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9300081Z raise RuntimeError(msg) 2025-12-04T11:45:24.9300464Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9300476Z 2025-12-04T11:45:24.9300550Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9300804Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9300808Z 2025-12-04T11:45:24.9300893Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9300966Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9301009Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9301065Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9301130Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9301228Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9301263Z graph_break [] 2025-12-04T11:45:24.9301323Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9301395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9301437Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9301491Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9301598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9301661Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9301697Z graph_break [] 2025-12-04T11:45:24.9301767Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9301842Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9301882Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9301937Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9302032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9302098Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9302133Z graph_break [] 2025-12-04T11:45:24.9302191Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:24.9302381Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d1eb13f077c7655c.xml - 2025-12-04T11:45:24.9302442Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9303017Z FAILED [0.2317s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:24.9303021Z 2025-12-04T11:45:24.9303093Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9303379Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9303382Z 2025-12-04T11:45:24.9303468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9303530Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9303614Z ================== 1 failed, 187 deselected, 2 rerun in 2.14s ================== 2025-12-04T11:45:24.9303652Z Got exit code 1 2025-12-04T11:45:24.9303854Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9303997Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.9304141Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6870e3d2000e4d84.xml 2025-12-04T11:45:24.9304200Z ============================= test session starts ============================== 2025-12-04T11:45:24.9304310Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9304353Z cachedir: .pytest_cache 2025-12-04T11:45:24.9304513Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9304561Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9304601Z configfile: pytest.ini 2025-12-04T11:45:24.9304762Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9304836Z collecting ... collected 188 items / 92 deselected / 96 selected 2025-12-04T11:45:24.9304890Z stepcurrent: skipping 92 already run items. 2025-12-04T11:45:24.9304933Z Running 96 items in this shard 2025-12-04T11:45:24.9304936Z 2025-12-04T11:45:24.9305150Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5965s] [ 1%] 2025-12-04T11:45:24.9305374Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2684s] [ 1%] 2025-12-04T11:45:24.9305563Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2242s] [ 1%] 2025-12-04T11:45:24.9305578Z 2025-12-04T11:45:24.9305629Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9305770Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9305817Z Traceback (most recent call last): 2025-12-04T11:45:24.9305973Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9306013Z method(*args, **kwargs) 2025-12-04T11:45:24.9306163Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9306205Z method(*args, **kwargs) 2025-12-04T11:45:24.9306358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9306396Z with policy(): 2025-12-04T11:45:24.9306549Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9306591Z raise RuntimeError(msg) 2025-12-04T11:45:24.9306977Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.9306981Z 2025-12-04T11:45:24.9307053Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9307313Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9307317Z 2025-12-04T11:45:24.9307414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9307490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9307531Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9307587Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9307663Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9307763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9307799Z graph_break [] 2025-12-04T11:45:24.9307861Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9308002Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9308048Z Traceback (most recent call last): 2025-12-04T11:45:24.9308202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9308244Z method(*args, **kwargs) 2025-12-04T11:45:24.9308394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9308434Z method(*args, **kwargs) 2025-12-04T11:45:24.9308584Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9308620Z with policy(): 2025-12-04T11:45:24.9308772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9308813Z raise RuntimeError(msg) 2025-12-04T11:45:24.9309212Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.9309224Z 2025-12-04T11:45:24.9309299Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9309557Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9309559Z 2025-12-04T11:45:24.9309645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9309719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9309760Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9309816Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9309882Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9309980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9310017Z graph_break [] 2025-12-04T11:45:24.9310079Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9310152Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9310193Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9310248Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9310346Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9310409Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9310446Z graph_break [] 2025-12-04T11:45:24.9310504Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9310558Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9310699Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9310745Z Traceback (most recent call last): 2025-12-04T11:45:24.9310910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9310951Z method(*args, **kwargs) 2025-12-04T11:45:24.9311100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9311139Z method(*args, **kwargs) 2025-12-04T11:45:24.9311299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9311336Z with policy(): 2025-12-04T11:45:24.9311487Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9311528Z raise RuntimeError(msg) 2025-12-04T11:45:24.9311911Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9311915Z 2025-12-04T11:45:24.9311989Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9312246Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9312249Z 2025-12-04T11:45:24.9312334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9312407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9312465Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9312521Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9312586Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9312697Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9312732Z graph_break [] 2025-12-04T11:45:24.9312792Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9312864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9312905Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9312961Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9313057Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9313121Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9313157Z graph_break [] 2025-12-04T11:45:24.9313216Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9313315Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9313355Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9313412Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9313507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9313570Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9313605Z graph_break [] 2025-12-04T11:45:24.9313665Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9313854Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6870e3d2000e4d84.xml - 2025-12-04T11:45:24.9313914Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9314510Z FAILED [0.2242s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9314514Z 2025-12-04T11:45:24.9314588Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9314858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9314864Z 2025-12-04T11:45:24.9314950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9315013Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9315079Z ================== 1 failed, 92 deselected, 2 rerun in 2.11s =================== 2025-12-04T11:45:24.9315116Z Got exit code 1 2025-12-04T11:45:24.9315156Z Retrying single test... 2025-12-04T11:45:24.9315301Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f72e4b9d9d172fbf.xml 2025-12-04T11:45:24.9315358Z ============================= test session starts ============================== 2025-12-04T11:45:24.9315468Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9315508Z cachedir: .pytest_cache 2025-12-04T11:45:24.9315666Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9315711Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9315751Z configfile: pytest.ini 2025-12-04T11:45:24.9315911Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9315999Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9316253Z stepcurrent: skipping 92 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9316310Z Running 1 items in this shard 2025-12-04T11:45:24.9316312Z 2025-12-04T11:45:24.9316526Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6131s] [100%] 2025-12-04T11:45:24.9316737Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2685s] [100%] 2025-12-04T11:45:24.9316926Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2262s] [100%] 2025-12-04T11:45:24.9316929Z 2025-12-04T11:45:24.9316981Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9317122Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9317169Z Traceback (most recent call last): 2025-12-04T11:45:24.9317326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9317366Z method(*args, **kwargs) 2025-12-04T11:45:24.9317517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9317556Z method(*args, **kwargs) 2025-12-04T11:45:24.9317706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9317742Z with policy(): 2025-12-04T11:45:24.9317896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9317936Z raise RuntimeError(msg) 2025-12-04T11:45:24.9318332Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.9318335Z 2025-12-04T11:45:24.9318408Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9318675Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9318677Z 2025-12-04T11:45:24.9318763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9318838Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9318878Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9318935Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9319002Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9319101Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9319136Z graph_break [] 2025-12-04T11:45:24.9319197Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9319336Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9319382Z Traceback (most recent call last): 2025-12-04T11:45:24.9319535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9319575Z method(*args, **kwargs) 2025-12-04T11:45:24.9319736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9319775Z method(*args, **kwargs) 2025-12-04T11:45:24.9319927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9319975Z with policy(): 2025-12-04T11:45:24.9320127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9320167Z raise RuntimeError(msg) 2025-12-04T11:45:24.9320552Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.9320554Z 2025-12-04T11:45:24.9320627Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9320884Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9320887Z 2025-12-04T11:45:24.9320971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9321045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9321086Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9321142Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9321207Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9321305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9321340Z graph_break [] 2025-12-04T11:45:24.9321402Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9321476Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9321518Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9321572Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9321680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9321744Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9321781Z graph_break [] 2025-12-04T11:45:24.9321839Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9321902Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9322043Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9322090Z Traceback (most recent call last): 2025-12-04T11:45:24.9322243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9322284Z method(*args, **kwargs) 2025-12-04T11:45:24.9322435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9322478Z method(*args, **kwargs) 2025-12-04T11:45:24.9322626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9322662Z with policy(): 2025-12-04T11:45:24.9322813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9322854Z raise RuntimeError(msg) 2025-12-04T11:45:24.9323239Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9323286Z 2025-12-04T11:45:24.9323359Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9323617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9323635Z 2025-12-04T11:45:24.9323720Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9323793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9323834Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9323891Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9323955Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9324053Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9324089Z graph_break [] 2025-12-04T11:45:24.9324149Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9324222Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9324265Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9324321Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9324418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9324480Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9324516Z graph_break [] 2025-12-04T11:45:24.9324574Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9324647Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9324687Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9324742Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9324836Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9324900Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9324935Z graph_break [] 2025-12-04T11:45:24.9325006Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9325198Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f72e4b9d9d172fbf.xml - 2025-12-04T11:45:24.9325259Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9325858Z FAILED [0.2262s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9325863Z 2025-12-04T11:45:24.9325934Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9326190Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9326192Z 2025-12-04T11:45:24.9326277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9326339Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9326406Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:24.9326444Z Got exit code 1 2025-12-04T11:45:24.9326483Z Retrying single test... 2025-12-04T11:45:24.9326627Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2c87236ee69cc1e9.xml 2025-12-04T11:45:24.9326697Z ============================= test session starts ============================== 2025-12-04T11:45:24.9326806Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9326847Z cachedir: .pytest_cache 2025-12-04T11:45:24.9327016Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9327061Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9327101Z configfile: pytest.ini 2025-12-04T11:45:24.9327263Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9327337Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9327593Z stepcurrent: skipping 92 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9327637Z Running 1 items in this shard 2025-12-04T11:45:24.9327640Z 2025-12-04T11:45:24.9327852Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6879s] [100%] 2025-12-04T11:45:24.9328064Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3638s] [100%] 2025-12-04T11:45:24.9328251Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3775s] [100%] 2025-12-04T11:45:24.9328255Z 2025-12-04T11:45:24.9328305Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9328446Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9328490Z Traceback (most recent call last): 2025-12-04T11:45:24.9328648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9328687Z method(*args, **kwargs) 2025-12-04T11:45:24.9328852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9328891Z method(*args, **kwargs) 2025-12-04T11:45:24.9329042Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9329078Z with policy(): 2025-12-04T11:45:24.9329241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9329282Z raise RuntimeError(msg) 2025-12-04T11:45:24.9329669Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1111490560. 2025-12-04T11:45:24.9329672Z 2025-12-04T11:45:24.9329745Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9330003Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9330005Z 2025-12-04T11:45:24.9330092Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9330165Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9330207Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9330263Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9330329Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9330436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9330473Z graph_break [] 2025-12-04T11:45:24.9330533Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9330686Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9330730Z Traceback (most recent call last): 2025-12-04T11:45:24.9330882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9330921Z method(*args, **kwargs) 2025-12-04T11:45:24.9331072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9331111Z method(*args, **kwargs) 2025-12-04T11:45:24.9331260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9331297Z with policy(): 2025-12-04T11:45:24.9331450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9331489Z raise RuntimeError(msg) 2025-12-04T11:45:24.9331877Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1111490560 and is now 1136656384. 2025-12-04T11:45:24.9331880Z 2025-12-04T11:45:24.9331953Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9332209Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9332211Z 2025-12-04T11:45:24.9332298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9332370Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9332412Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9332481Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9332546Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9332643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9332679Z graph_break [] 2025-12-04T11:45:24.9332739Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9332824Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9332865Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9332920Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9333015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9333080Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9333115Z graph_break [] 2025-12-04T11:45:24.9333174Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9333228Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9333401Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9333446Z Traceback (most recent call last): 2025-12-04T11:45:24.9333602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9333642Z method(*args, **kwargs) 2025-12-04T11:45:24.9333794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9333833Z method(*args, **kwargs) 2025-12-04T11:45:24.9333999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9334035Z with policy(): 2025-12-04T11:45:24.9334189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9334243Z raise RuntimeError(msg) 2025-12-04T11:45:24.9334627Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9334629Z 2025-12-04T11:45:24.9334701Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9334956Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9334959Z 2025-12-04T11:45:24.9335045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9335120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9335162Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9335218Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9335281Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9335379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9335415Z graph_break [] 2025-12-04T11:45:24.9335476Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9335550Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9335591Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9335646Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9335742Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9335807Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9335844Z graph_break [] 2025-12-04T11:45:24.9335916Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9335990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9336032Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9336085Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9336197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9336261Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:24.9336298Z graph_break [] 2025-12-04T11:45:24.9336355Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:24.9336548Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2c87236ee69cc1e9.xml - 2025-12-04T11:45:24.9336608Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9337185Z FAILED [0.3775s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1161822208. 2025-12-04T11:45:24.9337189Z 2025-12-04T11:45:24.9337261Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9337517Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9337531Z 2025-12-04T11:45:24.9337616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9337680Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9337757Z ================== 1 failed, 187 deselected, 2 rerun in 2.45s ================== 2025-12-04T11:45:24.9337795Z Got exit code 1 2025-12-04T11:45:24.9338001Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9338131Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.9338278Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-957439c6e5eea945.xml 2025-12-04T11:45:24.9338334Z ============================= test session starts ============================== 2025-12-04T11:45:24.9338445Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9338485Z cachedir: .pytest_cache 2025-12-04T11:45:24.9338647Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9338694Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9338734Z configfile: pytest.ini 2025-12-04T11:45:24.9338893Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9338969Z collecting ... collected 188 items / 93 deselected / 95 selected 2025-12-04T11:45:24.9339022Z stepcurrent: skipping 93 already run items. 2025-12-04T11:45:24.9339065Z Running 95 items in this shard 2025-12-04T11:45:24.9339067Z 2025-12-04T11:45:24.9339280Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9924s] [ 1%] 2025-12-04T11:45:24.9339490Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7462s] [ 1%] 2025-12-04T11:45:24.9339692Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6314s] [ 1%] 2025-12-04T11:45:24.9339694Z 2025-12-04T11:45:24.9339746Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9339895Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9339942Z Traceback (most recent call last): 2025-12-04T11:45:24.9340098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9340139Z method(*args, **kwargs) 2025-12-04T11:45:24.9340292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9340332Z method(*args, **kwargs) 2025-12-04T11:45:24.9340484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9340521Z with policy(): 2025-12-04T11:45:24.9340673Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9340713Z raise RuntimeError(msg) 2025-12-04T11:45:24.9341099Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.9341102Z 2025-12-04T11:45:24.9341174Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9341441Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9341455Z 2025-12-04T11:45:24.9341540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9341615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9341656Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9341712Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9342199Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9342298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9342335Z graph_break [] 2025-12-04T11:45:24.9342398Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9342473Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9342963Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9343012Z current_size = base.storage().size() 2025-12-04T11:45:24.9343051Z Autotune Choices Stats: 2025-12-04T11:45:24.9343485Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.9343544Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9343593Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9343714Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9343971Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9344198Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9344429Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9344659Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9344882Z triton_mm_5 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9345104Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9345340Z triton_mm_7 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9345579Z triton_mm_2 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9345622Z _scaled_mm 0.0246 ms 24.6% 2025-12-04T11:45:24.9345751Z SingleProcess AUTOTUNE benchmarking takes 0.0399 seconds and 0.1955 seconds precompiling for 9 choices 2025-12-04T11:45:24.9345893Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9345938Z Traceback (most recent call last): 2025-12-04T11:45:24.9346096Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9346136Z method(*args, **kwargs) 2025-12-04T11:45:24.9346288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9346328Z method(*args, **kwargs) 2025-12-04T11:45:24.9346481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9346516Z with policy(): 2025-12-04T11:45:24.9346684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9346724Z raise RuntimeError(msg) 2025-12-04T11:45:24.9347112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.9347115Z 2025-12-04T11:45:24.9347190Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9347459Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9347462Z 2025-12-04T11:45:24.9347548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9347633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9347675Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9347732Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9348218Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9348318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9348356Z graph_break [] 2025-12-04T11:45:24.9348416Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9348489Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9348979Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9349040Z current_size = base.storage().size() 2025-12-04T11:45:24.9349080Z Autotune Choices Stats: 2025-12-04T11:45:24.9349448Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.9349513Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9349564Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9349684Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9349918Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9350143Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9350370Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9350597Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9350821Z triton_mm_5 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9351057Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9351279Z triton_mm_7 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9351513Z triton_mm_2 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9351556Z _scaled_mm 0.0246 ms 24.6% 2025-12-04T11:45:24.9351684Z SingleProcess AUTOTUNE benchmarking takes 0.0399 seconds and 0.1955 seconds precompiling for 9 choices 2025-12-04T11:45:24.9351759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9351800Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9351857Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9351958Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9352442Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9352478Z graph_break [] 2025-12-04T11:45:24.9352540Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9352613Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9352667Z Autotune Choices Stats: 2025-12-04T11:45:24.9353030Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.9353103Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9353151Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9353301Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9353534Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9353761Z triton_mm_8 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9353992Z triton_mm_12 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9354218Z triton_mm_13 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9354442Z triton_mm_11 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9354667Z triton_mm_15 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9354910Z triton_mm_14 0.0066 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9355147Z triton_mm_10 0.0066 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9355189Z _scaled_mm 0.0242 ms 25.6% 2025-12-04T11:45:24.9355317Z SingleProcess AUTOTUNE benchmarking takes 0.0381 seconds and 0.1204 seconds precompiling for 9 choices 2025-12-04T11:45:24.9355370Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9355513Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9355558Z Traceback (most recent call last): 2025-12-04T11:45:24.9355717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9355757Z method(*args, **kwargs) 2025-12-04T11:45:24.9355911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9355949Z method(*args, **kwargs) 2025-12-04T11:45:24.9356101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9356137Z with policy(): 2025-12-04T11:45:24.9356289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9356799Z raise RuntimeError(msg) 2025-12-04T11:45:24.9357186Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9357200Z 2025-12-04T11:45:24.9357273Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9357537Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9357539Z 2025-12-04T11:45:24.9357626Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9357698Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9357741Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9357797Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9358280Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9358378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9358416Z graph_break [] 2025-12-04T11:45:24.9358475Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9358548Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9359041Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9359091Z current_size = base.storage().size() 2025-12-04T11:45:24.9359130Z Autotune Choices Stats: 2025-12-04T11:45:24.9359507Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:24.9359562Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9359610Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9359731Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9359964Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9360190Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9360418Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9360644Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9360878Z triton_mm_5 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9361109Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9361331Z triton_mm_7 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9361556Z triton_mm_2 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9361599Z _scaled_mm 0.0246 ms 24.6% 2025-12-04T11:45:24.9361726Z SingleProcess AUTOTUNE benchmarking takes 0.0399 seconds and 0.1955 seconds precompiling for 9 choices 2025-12-04T11:45:24.9361801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9361843Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9361900Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9361998Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9362482Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9362520Z graph_break [] 2025-12-04T11:45:24.9362580Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9362665Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9362707Z Autotune Choices Stats: 2025-12-04T11:45:24.9363078Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:24.9363132Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9363181Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9363324Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9363558Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9363786Z triton_mm_8 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9364015Z triton_mm_12 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9364239Z triton_mm_13 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9364481Z triton_mm_11 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9364716Z triton_mm_15 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9364937Z triton_mm_14 0.0066 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9365162Z triton_mm_10 0.0066 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9365204Z _scaled_mm 0.0242 ms 25.6% 2025-12-04T11:45:24.9365333Z SingleProcess AUTOTUNE benchmarking takes 0.0381 seconds and 0.1204 seconds precompiling for 9 choices 2025-12-04T11:45:24.9365407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9365451Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9365507Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9365606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9366087Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9366126Z graph_break [] 2025-12-04T11:45:24.9366186Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9366258Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9366299Z Autotune Choices Stats: 2025-12-04T11:45:24.9366673Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:24.9366740Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9366788Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9366908Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9367137Z triton_mm_19 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9367364Z triton_mm_21 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9367591Z triton_mm_23 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9367819Z triton_mm_16 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9368044Z triton_mm_20 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9368284Z triton_mm_22 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9368519Z triton_mm_17 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9368746Z triton_mm_18 0.0071 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9368787Z _scaled_mm 0.0239 ms 25.6% 2025-12-04T11:45:24.9368915Z SingleProcess AUTOTUNE benchmarking takes 0.0538 seconds and 0.2162 seconds precompiling for 9 choices 2025-12-04T11:45:24.9369105Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-957439c6e5eea945.xml - 2025-12-04T11:45:24.9369167Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9369755Z FAILED [0.6314s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9369757Z 2025-12-04T11:45:24.9369832Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9370092Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9370094Z 2025-12-04T11:45:24.9370192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9370255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9370323Z ================== 1 failed, 93 deselected, 2 rerun in 3.39s =================== 2025-12-04T11:45:24.9370360Z Got exit code 1 2025-12-04T11:45:24.9370401Z Retrying single test... 2025-12-04T11:45:24.9370553Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26f2b41cae2a3068.xml 2025-12-04T11:45:24.9370611Z ============================= test session starts ============================== 2025-12-04T11:45:24.9370720Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9370762Z cachedir: .pytest_cache 2025-12-04T11:45:24.9370920Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9370968Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9371008Z configfile: pytest.ini 2025-12-04T11:45:24.9371170Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9371244Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9371497Z stepcurrent: skipping 93 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9371540Z Running 1 items in this shard 2025-12-04T11:45:24.9371544Z 2025-12-04T11:45:24.9371755Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9973s] [100%] 2025-12-04T11:45:24.9371975Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7573s] [100%] 2025-12-04T11:45:24.9372169Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6441s] [100%] 2025-12-04T11:45:24.9372171Z 2025-12-04T11:45:24.9372223Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9372362Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9372408Z Traceback (most recent call last): 2025-12-04T11:45:24.9372564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9372606Z method(*args, **kwargs) 2025-12-04T11:45:24.9372757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9372797Z method(*args, **kwargs) 2025-12-04T11:45:24.9372950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9372987Z with policy(): 2025-12-04T11:45:24.9373138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9373179Z raise RuntimeError(msg) 2025-12-04T11:45:24.9373593Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.9373598Z 2025-12-04T11:45:24.9373671Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9373941Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9373945Z 2025-12-04T11:45:24.9374030Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9374104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9374146Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9374217Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9374696Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9374798Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9374837Z graph_break [] 2025-12-04T11:45:24.9374899Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9374971Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9375458Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9375504Z current_size = base.storage().size() 2025-12-04T11:45:24.9375545Z Autotune Choices Stats: 2025-12-04T11:45:24.9375925Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.9375992Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9376041Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9376161Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9376396Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9376622Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9376850Z triton_mm_0 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9377076Z triton_mm_4 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9377304Z triton_mm_1 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9377525Z triton_mm_7 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9377757Z triton_mm_6 0.0062 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9377985Z triton_mm_2 0.0065 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9378038Z _scaled_mm 0.0226 ms 26.0% 2025-12-04T11:45:24.9378169Z SingleProcess AUTOTUNE benchmarking takes 0.0391 seconds and 0.1961 seconds precompiling for 9 choices 2025-12-04T11:45:24.9378309Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9378356Z Traceback (most recent call last): 2025-12-04T11:45:24.9378511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9378551Z method(*args, **kwargs) 2025-12-04T11:45:24.9378705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9378744Z method(*args, **kwargs) 2025-12-04T11:45:24.9378894Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9378933Z with policy(): 2025-12-04T11:45:24.9379086Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9379129Z raise RuntimeError(msg) 2025-12-04T11:45:24.9379513Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.9379525Z 2025-12-04T11:45:24.9379609Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9379864Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9379867Z 2025-12-04T11:45:24.9379952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9380026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9380069Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9380126Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9380612Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9380714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9380751Z graph_break [] 2025-12-04T11:45:24.9380811Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9380885Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9381373Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9381421Z current_size = base.storage().size() 2025-12-04T11:45:24.9381460Z Autotune Choices Stats: 2025-12-04T11:45:24.9381836Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.9381891Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9381954Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9382074Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9382305Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9382534Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9382761Z triton_mm_0 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9382987Z triton_mm_4 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9383213Z triton_mm_1 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9383477Z triton_mm_7 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9383711Z triton_mm_6 0.0062 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9383937Z triton_mm_2 0.0065 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9383977Z _scaled_mm 0.0226 ms 26.0% 2025-12-04T11:45:24.9384104Z SingleProcess AUTOTUNE benchmarking takes 0.0391 seconds and 0.1961 seconds precompiling for 9 choices 2025-12-04T11:45:24.9384178Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9384220Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9384278Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9384378Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9384856Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9384893Z graph_break [] 2025-12-04T11:45:24.9384953Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9385027Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9385074Z Autotune Choices Stats: 2025-12-04T11:45:24.9385450Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:24.9385508Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9385555Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9385686Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9385921Z triton_mm_12 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9386150Z triton_mm_9 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9386379Z triton_mm_11 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9386604Z triton_mm_8 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9386828Z triton_mm_15 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9387065Z triton_mm_10 0.0066 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9387299Z triton_mm_13 0.0066 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9387522Z triton_mm_14 0.0066 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9387565Z _scaled_mm 0.0066 ms 89.2% 2025-12-04T11:45:24.9387692Z SingleProcess AUTOTUNE benchmarking takes 0.0375 seconds and 0.1102 seconds precompiling for 9 choices 2025-12-04T11:45:24.9387747Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9387888Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9387935Z Traceback (most recent call last): 2025-12-04T11:45:24.9388093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9388133Z method(*args, **kwargs) 2025-12-04T11:45:24.9388288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9388328Z method(*args, **kwargs) 2025-12-04T11:45:24.9388480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9388516Z with policy(): 2025-12-04T11:45:24.9388670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9388710Z raise RuntimeError(msg) 2025-12-04T11:45:24.9389108Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9389111Z 2025-12-04T11:45:24.9389184Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9389448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9389450Z 2025-12-04T11:45:24.9389536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9389609Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9389652Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9389708Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9390193Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9390294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9390333Z graph_break [] 2025-12-04T11:45:24.9390391Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9390468Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9390957Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9391024Z current_size = base.storage().size() 2025-12-04T11:45:24.9391064Z Autotune Choices Stats: 2025-12-04T11:45:24.9391428Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:24.9391481Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9391530Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9391652Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9391888Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9392115Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9392340Z triton_mm_0 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9392566Z triton_mm_4 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9392803Z triton_mm_1 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9393027Z triton_mm_7 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9393298Z triton_mm_6 0.0062 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9393526Z triton_mm_2 0.0065 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9393569Z _scaled_mm 0.0226 ms 26.0% 2025-12-04T11:45:24.9393696Z SingleProcess AUTOTUNE benchmarking takes 0.0391 seconds and 0.1961 seconds precompiling for 9 choices 2025-12-04T11:45:24.9393772Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9393814Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9393872Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9393970Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9394533Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9394586Z graph_break [] 2025-12-04T11:45:24.9394647Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9394734Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9394775Z Autotune Choices Stats: 2025-12-04T11:45:24.9395141Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_12", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:24.9395196Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9395243Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9395363Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9395597Z triton_mm_12 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9395826Z triton_mm_9 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9396052Z triton_mm_11 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9396274Z triton_mm_8 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9396524Z triton_mm_15 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9396753Z triton_mm_10 0.0066 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9396987Z triton_mm_13 0.0066 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9397210Z triton_mm_14 0.0066 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9397251Z _scaled_mm 0.0066 ms 89.2% 2025-12-04T11:45:24.9397379Z SingleProcess AUTOTUNE benchmarking takes 0.0375 seconds and 0.1102 seconds precompiling for 9 choices 2025-12-04T11:45:24.9397454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9397498Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9397555Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9397654Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9398134Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9398184Z graph_break [] 2025-12-04T11:45:24.9398243Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9398316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9398367Z Autotune Choices Stats: 2025-12-04T11:45:24.9398729Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_21", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:24.9398783Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9398832Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9398949Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9399180Z triton_mm_21 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9399411Z triton_mm_16 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9399639Z triton_mm_20 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9399862Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9400088Z triton_mm_17 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9400321Z triton_mm_19 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9400554Z triton_mm_23 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9400780Z triton_mm_18 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9400822Z _scaled_mm 0.0233 ms 26.1% 2025-12-04T11:45:24.9400950Z SingleProcess AUTOTUNE benchmarking takes 0.0535 seconds and 0.2176 seconds precompiling for 9 choices 2025-12-04T11:45:24.9401143Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26f2b41cae2a3068.xml - 2025-12-04T11:45:24.9401204Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9401780Z FAILED [0.6441s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9401782Z 2025-12-04T11:45:24.9401855Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9402126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9402140Z 2025-12-04T11:45:24.9402229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9402291Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9402358Z ================== 1 failed, 187 deselected, 2 rerun in 3.42s ================== 2025-12-04T11:45:24.9402395Z Got exit code 1 2025-12-04T11:45:24.9402436Z Retrying single test... 2025-12-04T11:45:24.9402582Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f4f16a2d60ff42cb.xml 2025-12-04T11:45:24.9402639Z ============================= test session starts ============================== 2025-12-04T11:45:24.9402750Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9402791Z cachedir: .pytest_cache 2025-12-04T11:45:24.9402951Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9403000Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9403040Z configfile: pytest.ini 2025-12-04T11:45:24.9403201Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9403308Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9403561Z stepcurrent: skipping 93 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9403602Z Running 1 items in this shard 2025-12-04T11:45:24.9403604Z 2025-12-04T11:45:24.9403815Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1589s] [100%] 2025-12-04T11:45:24.9404042Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7965s] [100%] 2025-12-04T11:45:24.9404227Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.7443s] [100%] 2025-12-04T11:45:24.9404229Z 2025-12-04T11:45:24.9404280Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9404434Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9404481Z Traceback (most recent call last): 2025-12-04T11:45:24.9404638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9404680Z method(*args, **kwargs) 2025-12-04T11:45:24.9404831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9404874Z method(*args, **kwargs) 2025-12-04T11:45:24.9405026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9405063Z with policy(): 2025-12-04T11:45:24.9405217Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9405259Z raise RuntimeError(msg) 2025-12-04T11:45:24.9405645Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:24.9405660Z 2025-12-04T11:45:24.9405737Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9405994Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9406008Z 2025-12-04T11:45:24.9406095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9406169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9406214Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9406272Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9406755Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9406856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9406893Z graph_break [] 2025-12-04T11:45:24.9406953Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9407026Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9407517Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9407564Z current_size = base.storage().size() 2025-12-04T11:45:24.9407608Z Autotune Choices Stats: 2025-12-04T11:45:24.9407986Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.9408044Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9408092Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9408223Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9408459Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9408688Z triton_mm_4 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9408916Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9409140Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9409365Z triton_mm_7 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9409586Z triton_mm_6 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9409821Z triton_mm_1 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9410064Z triton_mm_2 0.0066 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9410105Z _scaled_mm 0.0229 ms 25.5% 2025-12-04T11:45:24.9410231Z SingleProcess AUTOTUNE benchmarking takes 0.0396 seconds and 0.2002 seconds precompiling for 9 choices 2025-12-04T11:45:24.9410371Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9410418Z Traceback (most recent call last): 2025-12-04T11:45:24.9410574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9410617Z method(*args, **kwargs) 2025-12-04T11:45:24.9410769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9410809Z method(*args, **kwargs) 2025-12-04T11:45:24.9410961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9411000Z with policy(): 2025-12-04T11:45:24.9411153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9411194Z raise RuntimeError(msg) 2025-12-04T11:45:24.9411582Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:24.9411586Z 2025-12-04T11:45:24.9411669Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9411924Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9411928Z 2025-12-04T11:45:24.9412022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9412097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9412138Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9412195Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9412681Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9412781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9412816Z graph_break [] 2025-12-04T11:45:24.9412879Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9412953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9413539Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9413600Z current_size = base.storage().size() 2025-12-04T11:45:24.9413640Z Autotune Choices Stats: 2025-12-04T11:45:24.9414025Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.9414081Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9414129Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9414250Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9414483Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9414715Z triton_mm_4 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9414940Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9415161Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9415382Z triton_mm_7 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9415619Z triton_mm_6 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9415845Z triton_mm_1 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9416082Z triton_mm_2 0.0066 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9416122Z _scaled_mm 0.0229 ms 25.5% 2025-12-04T11:45:24.9416251Z SingleProcess AUTOTUNE benchmarking takes 0.0396 seconds and 0.2002 seconds precompiling for 9 choices 2025-12-04T11:45:24.9416324Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9416366Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9416424Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9416524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9417008Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9417045Z graph_break [] 2025-12-04T11:45:24.9417104Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9417190Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9417229Z Autotune Choices Stats: 2025-12-04T11:45:24.9417592Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:24.9417656Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9417705Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9417827Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9418059Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9418287Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9418515Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9418740Z triton_mm_13 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9418963Z triton_mm_15 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9419198Z triton_mm_14 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9419425Z triton_mm_9 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9419665Z triton_mm_10 0.0066 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9419707Z _scaled_mm 0.0214 ms 27.8% 2025-12-04T11:45:24.9419833Z SingleProcess AUTOTUNE benchmarking takes 0.0534 seconds and 0.2198 seconds precompiling for 9 choices 2025-12-04T11:45:24.9419889Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9420028Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9420077Z Traceback (most recent call last): 2025-12-04T11:45:24.9420234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9420277Z method(*args, **kwargs) 2025-12-04T11:45:24.9420431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9420472Z method(*args, **kwargs) 2025-12-04T11:45:24.9420623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9420662Z with policy(): 2025-12-04T11:45:24.9420816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9420870Z raise RuntimeError(msg) 2025-12-04T11:45:24.9421255Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9421269Z 2025-12-04T11:45:24.9421342Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9421598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9421600Z 2025-12-04T11:45:24.9421686Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9421763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9421804Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9421861Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9422349Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9422451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9422488Z graph_break [] 2025-12-04T11:45:24.9422548Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9422621Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9423118Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9423167Z current_size = base.storage().size() 2025-12-04T11:45:24.9423206Z Autotune Choices Stats: 2025-12-04T11:45:24.9423680Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.9423735Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9423785Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9423906Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9424141Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9424372Z triton_mm_4 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9424597Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9424821Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9425063Z triton_mm_7 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9425298Z triton_mm_6 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9425522Z triton_mm_1 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9425746Z triton_mm_2 0.0066 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9425787Z _scaled_mm 0.0229 ms 25.5% 2025-12-04T11:45:24.9425923Z SingleProcess AUTOTUNE benchmarking takes 0.0396 seconds and 0.2002 seconds precompiling for 9 choices 2025-12-04T11:45:24.9425997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9426040Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9426097Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9426200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9426683Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9426721Z graph_break [] 2025-12-04T11:45:24.9426781Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9426869Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9426910Z Autotune Choices Stats: 2025-12-04T11:45:24.9427279Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:24.9427334Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9427381Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9427504Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9427738Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9427966Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9428197Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9428423Z triton_mm_13 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9428661Z triton_mm_15 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9428894Z triton_mm_14 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9429121Z triton_mm_9 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9429345Z triton_mm_10 0.0066 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9429388Z _scaled_mm 0.0214 ms 27.8% 2025-12-04T11:45:24.9429516Z SingleProcess AUTOTUNE benchmarking takes 0.0534 seconds and 0.2198 seconds precompiling for 9 choices 2025-12-04T11:45:24.9429591Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9429633Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9429689Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9429787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9430271Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9430311Z graph_break [] 2025-12-04T11:45:24.9430370Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:24.9430455Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9430497Z Autotune Choices Stats: 2025-12-04T11:45:24.9430869Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:24.9430923Z AUTOTUNE scaled_mm(257x32, 32x16, 257x1, 1x16, 16) 2025-12-04T11:45:24.9430973Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9431091Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9431324Z triton_mm_22 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:24.9431552Z triton_mm_23 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9431779Z triton_mm_21 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9432009Z triton_mm_16 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9432243Z triton_mm_19 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9432480Z triton_mm_20 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9432707Z triton_mm_17 0.0063 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:24.9432748Z _scaled_mm 0.0064 ms 91.3% 2025-12-04T11:45:24.9432971Z triton_mm_18 0.0065 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9433101Z SingleProcess AUTOTUNE benchmarking takes 0.0538 seconds and 0.2171 seconds precompiling for 9 choices 2025-12-04T11:45:24.9433324Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f4f16a2d60ff42cb.xml - 2025-12-04T11:45:24.9433386Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9433969Z FAILED [0.7443s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:24.9433972Z 2025-12-04T11:45:24.9434046Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9434320Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9434323Z 2025-12-04T11:45:24.9434409Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9434472Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9434539Z ================== 1 failed, 187 deselected, 2 rerun in 3.72s ================== 2025-12-04T11:45:24.9434589Z Got exit code 1 2025-12-04T11:45:24.9434794Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:24.9434923Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:24.9435070Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a115de231ae56a42.xml 2025-12-04T11:45:24.9435127Z ============================= test session starts ============================== 2025-12-04T11:45:24.9435243Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9435283Z cachedir: .pytest_cache 2025-12-04T11:45:24.9435443Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9435488Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9435529Z configfile: pytest.ini 2025-12-04T11:45:24.9435691Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9435769Z collecting ... collected 188 items / 94 deselected / 94 selected 2025-12-04T11:45:24.9435834Z stepcurrent: skipping 94 already run items. 2025-12-04T11:45:24.9435881Z Running 94 items in this shard 2025-12-04T11:45:24.9435883Z 2025-12-04T11:45:24.9436792Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmp5bhv41_5/6e/c6eanht4rb54oimg6mp2gh4utfw73vfy2waao2vvr7no7oh5u3uo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9436959Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9437184Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9437345Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9437492Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9437782Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9437918Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9438179Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9438329Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9438586Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9438761Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9439034Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9439169Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9439446Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9439641Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9439962Z E1204 11:18:20.657000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9440692Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmp5bhv41_5/c7/cc7w47w5kffhezsr6uxbhlttcdh2a457snd3tbuhehtlnmgurngz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9440861Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9441081Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9441235Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9441379Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9441666Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9441798Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9442054Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9442190Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9442444Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9442608Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9442879Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9443012Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9443335Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9443530Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9443847Z E1204 11:18:20.664000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9444571Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmp5bhv41_5/bj/cbjxbag7j66nd7yspeletejj7ht5iv6ck7i6xezu2ugg5krurumw.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9444728Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9444941Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9445107Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9445252Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9445538Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9447569Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9447834Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9447972Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9448228Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9448384Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9448653Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9448786Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9449084Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9449288Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9449603Z E1204 11:18:20.667000 825256 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9449658Z ('RERUN', {'yellow': True}) [2.5625s] [ 1%] 2025-12-04T11:45:24.9449984Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda E1204 11:18:21.974000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9450281Z E1204 11:18:21.974000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9450409Z E1204 11:18:21.974000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9450553Z E1204 11:18:21.976000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9450846Z E1204 11:18:21.976000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9450985Z E1204 11:18:21.976000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9451138Z E1204 11:18:21.978000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9451430Z E1204 11:18:21.978000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9451555Z E1204 11:18:21.978000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9451603Z ('RERUN', {'yellow': True}) [1.1207s] [ 1%] 2025-12-04T11:45:24.9451920Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda E1204 11:18:22.911000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9452217Z E1204 11:18:22.911000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9452344Z E1204 11:18:22.911000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9452486Z E1204 11:18:22.913000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9452779Z E1204 11:18:22.913000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9452907Z E1204 11:18:22.913000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9453058Z E1204 11:18:22.915000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9453377Z E1204 11:18:22.915000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:24.9453517Z E1204 11:18:22.915000 825256 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9453558Z FAILED [0.9282s] [ 1%] 2025-12-04T11:45:24.9453561Z 2025-12-04T11:45:24.9453616Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9453758Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9453806Z Traceback (most recent call last): 2025-12-04T11:45:24.9453966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9454008Z method(*args, **kwargs) 2025-12-04T11:45:24.9454162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9454202Z method(*args, **kwargs) 2025-12-04T11:45:24.9454355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9454392Z with policy(): 2025-12-04T11:45:24.9454547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9454587Z raise RuntimeError(msg) 2025-12-04T11:45:24.9454981Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:24.9455009Z 2025-12-04T11:45:24.9455085Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9455346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9455348Z 2025-12-04T11:45:24.9455438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9455515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9455559Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9455616Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9456178Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9456280Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9456318Z graph_break [] 2025-12-04T11:45:24.9456383Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9456459Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9456952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9457031Z current_size = base.storage().size() 2025-12-04T11:45:24.9457073Z Autotune Choices Stats: 2025-12-04T11:45:24.9457460Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0062790000811219215, "best_triton_pos": 0} 2025-12-04T11:45:24.9457528Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9457577Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9457699Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9457939Z triton_mm_10 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9458168Z triton_mm_17 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9458397Z triton_mm_11 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9458623Z triton_mm_13 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9458868Z triton_mm_9 0.0066 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9459102Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9459326Z triton_mm_18 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9459548Z triton_mm_8 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9459776Z triton_mm_12 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9460003Z triton_mm_15 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9460133Z SingleProcess AUTOTUNE benchmarking takes 0.0783 seconds and 0.5904 seconds precompiling for 18 choices 2025-12-04T11:45:24.9460277Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9460323Z Traceback (most recent call last): 2025-12-04T11:45:24.9460481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9460522Z method(*args, **kwargs) 2025-12-04T11:45:24.9460676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9460728Z method(*args, **kwargs) 2025-12-04T11:45:24.9460880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9460917Z with policy(): 2025-12-04T11:45:24.9461071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9461111Z raise RuntimeError(msg) 2025-12-04T11:45:24.9461516Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:24.9461520Z 2025-12-04T11:45:24.9461595Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9461858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9461861Z 2025-12-04T11:45:24.9461950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9462025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9462068Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9462126Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9462680Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9462804Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9462842Z graph_break [] 2025-12-04T11:45:24.9462906Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9462982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9463502Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9463552Z current_size = base.storage().size() 2025-12-04T11:45:24.9463591Z Autotune Choices Stats: 2025-12-04T11:45:24.9463959Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0062790000811219215, "best_triton_pos": 0} 2025-12-04T11:45:24.9464025Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9464073Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9464197Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9464429Z triton_mm_10 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9464658Z triton_mm_17 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9464897Z triton_mm_11 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9465136Z triton_mm_13 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9465359Z triton_mm_9 0.0066 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9465584Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9465809Z triton_mm_18 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9466031Z triton_mm_8 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9466257Z triton_mm_12 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9466496Z triton_mm_15 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9466638Z SingleProcess AUTOTUNE benchmarking takes 0.0783 seconds and 0.5904 seconds precompiling for 18 choices 2025-12-04T11:45:24.9466712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9466755Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9466811Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9466912Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9467404Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9467441Z graph_break [] 2025-12-04T11:45:24.9467507Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9467580Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9467620Z Autotune Choices Stats: 2025-12-04T11:45:24.9467983Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:24.9468045Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9468093Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9468216Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9468459Z triton_mm_33 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9468688Z triton_mm_31 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9468922Z triton_mm_38 0.0065 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9469150Z triton_mm_29 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9469375Z triton_mm_30 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9469597Z triton_mm_34 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9469820Z triton_mm_37 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9470044Z triton_mm_27 0.0069 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9470280Z triton_mm_32 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9470521Z triton_mm_35 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9470650Z SingleProcess AUTOTUNE benchmarking takes 0.1161 seconds and 0.5017 seconds precompiling for 21 choices 2025-12-04T11:45:24.9470705Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9470845Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9470893Z Traceback (most recent call last): 2025-12-04T11:45:24.9471050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9471092Z method(*args, **kwargs) 2025-12-04T11:45:24.9471245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9471285Z method(*args, **kwargs) 2025-12-04T11:45:24.9471435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9471473Z with policy(): 2025-12-04T11:45:24.9471626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9471667Z raise RuntimeError(msg) 2025-12-04T11:45:24.9472058Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.9472071Z 2025-12-04T11:45:24.9472147Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9472407Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9472409Z 2025-12-04T11:45:24.9472506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9472580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9472622Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9472679Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9473243Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9473377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9473413Z graph_break [] 2025-12-04T11:45:24.9473476Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9473549Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9474039Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9474113Z current_size = base.storage().size() 2025-12-04T11:45:24.9474154Z Autotune Choices Stats: 2025-12-04T11:45:24.9474520Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0062790000811219215, "best_triton_pos": 0} 2025-12-04T11:45:24.9474584Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9474632Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9474751Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9474984Z triton_mm_10 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9475211Z triton_mm_17 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9475441Z triton_mm_11 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9475664Z triton_mm_13 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9475889Z triton_mm_9 0.0066 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9476128Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9476363Z triton_mm_18 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9476587Z triton_mm_8 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9476813Z triton_mm_12 0.0069 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9477039Z triton_mm_15 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9477167Z SingleProcess AUTOTUNE benchmarking takes 0.0783 seconds and 0.5904 seconds precompiling for 18 choices 2025-12-04T11:45:24.9477243Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9477285Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9477342Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9477442Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9477941Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9477988Z graph_break [] 2025-12-04T11:45:24.9478051Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9478124Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9478164Z Autotune Choices Stats: 2025-12-04T11:45:24.9478525Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:24.9478587Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9478636Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9478757Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9478989Z triton_mm_33 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9479218Z triton_mm_31 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9479444Z triton_mm_38 0.0065 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9479677Z triton_mm_29 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9479901Z triton_mm_30 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9480134Z triton_mm_34 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9480356Z triton_mm_37 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9480583Z triton_mm_27 0.0069 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9480808Z triton_mm_32 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9481033Z triton_mm_35 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9481162Z SingleProcess AUTOTUNE benchmarking takes 0.1161 seconds and 0.5017 seconds precompiling for 21 choices 2025-12-04T11:45:24.9481245Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9481290Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9481346Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9481455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9481937Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9481974Z graph_break [] 2025-12-04T11:45:24.9482035Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9482110Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9482150Z Autotune Choices Stats: 2025-12-04T11:45:24.9482513Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:24.9482578Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9482626Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9482748Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9482978Z triton_mm_54 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9483211Z triton_mm_51 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9483484Z triton_mm_57 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9483721Z triton_mm_48 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9483943Z triton_mm_50 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9484168Z triton_mm_49 0.0067 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9484393Z triton_mm_53 0.0067 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9484619Z triton_mm_47 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9484841Z triton_mm_58 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9485089Z triton_mm_52 0.0073 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9485234Z SingleProcess AUTOTUNE benchmarking takes 0.1342 seconds and 0.3627 seconds precompiling for 21 choices 2025-12-04T11:45:24.9485426Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a115de231ae56a42.xml - 2025-12-04T11:45:24.9485487Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9486079Z FAILED [0.9282s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.9486082Z 2025-12-04T11:45:24.9486157Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9486416Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9486418Z 2025-12-04T11:45:24.9486505Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9486568Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9486635Z ================== 1 failed, 94 deselected, 2 rerun in 4.63s =================== 2025-12-04T11:45:24.9486673Z Got exit code 1 2025-12-04T11:45:24.9486711Z Retrying single test... 2025-12-04T11:45:24.9486859Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-41de2e9bfc78920a.xml 2025-12-04T11:45:24.9486918Z ============================= test session starts ============================== 2025-12-04T11:45:24.9487041Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9487083Z cachedir: .pytest_cache 2025-12-04T11:45:24.9487243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9487289Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9487330Z configfile: pytest.ini 2025-12-04T11:45:24.9487504Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9487580Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9487833Z stepcurrent: skipping 94 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9487878Z Running 1 items in this shard 2025-12-04T11:45:24.9487880Z 2025-12-04T11:45:24.9488213Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:31.887312376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9488215Z 2025-12-04T11:45:24.9488372Z [W1204 11:18:31.185323245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9488374Z 2025-12-04T11:45:24.9488527Z [W1204 11:18:31.200651717 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9488529Z 2025-12-04T11:45:24.9488845Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9489155Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9489300Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9489788Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9490048Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9490276Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9490485Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9490687Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9490983Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9491229Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9491524Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9491770Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9492067Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9492301Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9492594Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9492826Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9493117Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9493382Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9493688Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9493921Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9494213Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9494413Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9494647Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9494940Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9495135Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9495367Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9495672Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9495905Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9496209Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9496430Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9496640Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9496842Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9497054Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9497222Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9497403Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9497953Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp3x2x5d9x/c7/cc7w47w5kffhezsr6uxbhlttcdh2a457snd3tbuhehtlnmgurngz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9498111Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9498329Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9498484Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9498631Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9498921Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9499057Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9499316Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9499456Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9499715Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9499879Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9500149Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9500302Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9500580Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9500774Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9501091Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9501391Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9501520Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9502004Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9502280Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9502506Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9502712Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9502913Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9503207Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9503477Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9503772Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9504005Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9504313Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9504546Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9504849Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9505081Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9505375Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9505608Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9505903Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9506134Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9506440Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9506648Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9506880Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9507171Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9507368Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9507601Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9507893Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9508129Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9508420Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9508653Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9508861Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9509074Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9509284Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9509452Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9509633Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9509735Z E1204 11:18:31.911000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.9510045Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9510340Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9510482Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9510963Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9511226Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9511451Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9511656Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9511857Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9512150Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9512384Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9512680Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9512924Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9513217Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9513498Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9513789Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9514024Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9514316Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9514549Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9514840Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9515087Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9515391Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9515587Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9515819Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9516110Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9516308Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9516538Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9516829Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9517059Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9517371Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9517593Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9517810Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9518013Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9518224Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9518391Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9518568Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9519094Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp3x2x5d9x/6e/c6eanht4rb54oimg6mp2gh4utfw73vfy2waao2vvr7no7oh5u3uo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9519252Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9519470Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9519647Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9519796Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9520083Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9520214Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9520474Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9520612Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9520868Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9521022Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9521290Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9521426Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9521714Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9521922Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9522236Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9522529Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9522661Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9523139Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9523434Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9523676Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9523895Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9524098Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9524391Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9524625Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9524919Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9525152Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9525444Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9525674Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9525978Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9526211Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9526518Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9526750Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9527042Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9527273Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9527565Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9527761Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9528001Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9528303Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9528498Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9528732Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9529022Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9529256Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9529547Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9529766Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9529973Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9530174Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9530400Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9530566Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9530754Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9530857Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.9531166Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9531463Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9531592Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9532069Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9532332Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9532576Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9532782Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9532981Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9533301Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9533536Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9533829Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9534062Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9534352Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9534598Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9534890Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9535134Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9535424Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9535660Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9535954Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9536185Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9536475Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9536686Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9536931Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9537222Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9537419Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9537650Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9537945Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9538178Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9538469Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9538690Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9538906Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9539107Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9539328Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9539494Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9539671Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9540195Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp3x2x5d9x/bj/cbjxbag7j66nd7yspeletejj7ht5iv6ck7i6xezu2ugg5krurumw.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9540344Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9540559Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9540715Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9540873Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9541160Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9541302Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9541562Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9541700Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9541956Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9542112Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9542384Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9542519Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9542797Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9542990Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9543347Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9543652Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9543781Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9544259Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9544512Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9544740Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9544946Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9545159Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9545451Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9545698Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9545991Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9546222Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9546514Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9546745Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9547039Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9547271Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9547573Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9547806Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9548106Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9548338Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9548630Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9548828Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9549060Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9549359Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9549571Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9549802Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9550105Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9550337Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9550629Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9550851Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9551058Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9551260Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9551469Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9551637Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9551824Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9551929Z E1204 11:18:31.936000 830772 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.9551979Z ('RERUN', {'yellow': True}) [2.7885s] [100%] 2025-12-04T11:45:24.9552320Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:33.442703391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9552322Z 2025-12-04T11:45:24.9552469Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9552764Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9553059Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9553187Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9553691Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9553960Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9554197Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9554403Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9554602Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9554896Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9555131Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9555424Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9555657Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9555948Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9556196Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9556490Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9556733Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9557023Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9557244Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9557451Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9557647Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9557855Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9558063Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9558295Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9558597Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9558795Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9559026Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9559319Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9559539Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9559734Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9559953Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9560166Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9560374Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9560573Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9560804Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9561009Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9561204Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9561399Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9561630Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9561923Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9562154Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9562457Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9562687Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9562893Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9563093Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9563339Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9563541Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9563772Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9564065Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9564299Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9564605Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9564837Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9565146Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9565382Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9565676Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9565910Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9566203Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9566433Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9566723Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9566969Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9567272Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9567504Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9567797Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9568031Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9568322Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9568555Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9568846Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9569088Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9569380Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9569608Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9569809Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9570007Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9570304Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9570537Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9570828Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9571060Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9571361Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9571602Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9571893Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9572124Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9572417Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9572650Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9572942Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9573137Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9573362Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9573570Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9573778Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9573987Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9574219Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9574513Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9574709Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9574904Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9575098Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9575291Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9575536Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9575841Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9576073Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9576363Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9576559Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9576769Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9576973Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9577206Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9577500Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9577731Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9577935Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9578148Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9578349Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9578643Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9578877Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9579174Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9579407Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9579699Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9579945Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9580248Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9580484Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9580776Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9580976Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9581173Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9581395Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9581597Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9581796Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9582012Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9582305Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9582548Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9582841Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9583076Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9583403Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9583639Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9583938Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9584185Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9584491Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9584712Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9584915Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9585117Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9585310Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9585522Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9585722Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9586019Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9586242Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9586456Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9586655Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9586868Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9587161Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9587398Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9587692Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9587927Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9588221Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9588468Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9588769Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9589003Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9589297Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9589530Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9589824Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9590057Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9590349Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9590584Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9590888Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9591136Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9591428Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9591661Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9591955Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9592155Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9592352Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9592586Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9592893Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9593141Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9593465Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9593697Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9593992Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9594227Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9594523Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9594757Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9595050Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9595263Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9595497Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9595804Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9596038Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9596332Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9596547Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9596749Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9596950Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9597164Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9597458Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9597691Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9597893Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9598092Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9598294Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9598587Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9598808Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9599010Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9599208Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9599410Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9599559Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9599767Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9599992Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9600198Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9600396Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9600617Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9600825Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9601020Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9601240Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9601458Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9601662Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9601884Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9602090Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9602290Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9602485Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9602699Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9602901Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9603098Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9603330Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9603638Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9603852Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9604066Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9604266Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9604461Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9604660Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9604874Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9605075Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9605273Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9605488Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9605795Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9606008Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9606211Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9606412Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9606615Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9606911Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9607123Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9607325Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9607522Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9607732Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9608028Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9608233Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9608435Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9608625Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9608824Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9609038Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9609243Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9609441Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9609639Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9609821Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9610002Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9610129Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9610232Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9610358Z E1204 11:18:33.181000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9610516Z [W1204 11:18:33.450991348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9610518Z 2025-12-04T11:45:24.9610664Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9610961Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9611259Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9611389Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9611877Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9612132Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9612365Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9612572Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9612774Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9613067Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9613338Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9613630Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9614729Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9615035Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9615268Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9615560Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9615791Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9616085Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9616305Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9616514Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9616710Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9616930Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9617130Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9617373Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9617665Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9617861Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9618095Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9618387Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9618609Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9618804Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9619031Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9619248Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9619443Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9619638Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9619857Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9620062Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9620260Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9620455Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9620688Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9620982Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9621226Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9621519Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9621749Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9621954Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9622149Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9622358Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9622557Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9622790Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9623082Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9623362Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9623668Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9623900Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9624192Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9624425Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9624719Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9624950Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9625244Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9625479Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9625782Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9626025Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9626316Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9626548Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9626840Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9627073Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9627363Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9627593Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9627901Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9628143Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9628436Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9628656Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9628858Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9629054Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9629346Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9629577Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9629870Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9630115Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9630425Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9630657Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9630949Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9631180Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9631472Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9631703Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9631994Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9632202Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9632407Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9632605Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9632810Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9633010Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9633241Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9633572Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9633769Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9633963Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9634160Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9634370Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9634603Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9634907Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9635140Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9635434Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9635630Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9635838Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9636038Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9636283Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9636578Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9636811Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9637014Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9637214Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9637417Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9637711Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9637946Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9638237Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9638472Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9638775Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9639018Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9639314Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9639547Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9639842Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9640039Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9640236Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9640457Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9640668Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9640876Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9641075Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9641369Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9641604Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9641903Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9642138Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9642432Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9642665Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9642968Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9643202Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9643533Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9643756Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9643961Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9644160Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9644352Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9644564Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9644764Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9645070Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9645302Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9645505Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9645704Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9645904Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9646201Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9646436Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9646729Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9646963Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9647275Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9647508Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9647811Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9648043Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9648338Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9648572Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9648867Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9649099Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9649402Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9649644Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9649936Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9650169Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9650462Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9650695Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9650991Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9651188Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9651477Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9651721Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9652016Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9652259Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9652552Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9652787Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9653080Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9653342Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9653635Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9653887Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9654195Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9654391Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9654624Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9654917Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9655153Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9655446Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9655662Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9655865Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9656079Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9656282Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9656586Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9656801Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9657003Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9657204Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9657406Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9657705Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9657930Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9658141Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9658350Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9658542Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9658692Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9658888Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9659109Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9659317Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9659513Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9659740Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9659947Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9660144Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9660376Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9660582Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9660790Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9661010Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9661216Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9661415Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9661611Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9661823Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9662028Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9662237Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9662456Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9662751Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9662964Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9663166Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9663429Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9663621Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9663817Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9664030Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9664234Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9664452Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9664654Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9664959Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9665174Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9665376Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9665574Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9665776Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9666071Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9666286Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9666507Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9666723Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9666922Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9667217Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9667412Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9667616Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9667806Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9668000Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9668215Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9668420Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9668630Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9668822Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9669016Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9669187Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9669314Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9669420Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9669546Z E1204 11:18:33.184000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9669704Z [W1204 11:18:33.453199455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9669706Z 2025-12-04T11:45:24.9669850Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9670146Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9670442Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9670585Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9671079Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9671334Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9671562Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9671769Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9671969Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9672263Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9672498Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9672801Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9673034Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9673391Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9673623Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9673917Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9674150Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9674442Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9674666Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9674883Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9675080Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9675299Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9675499Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9675732Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9676026Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9676222Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9676454Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9676744Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9676963Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9677170Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9677390Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9677605Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9677802Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9678000Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9678220Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9678425Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9678622Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9678815Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9679067Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9679360Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9679601Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9679894Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9680112Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9680322Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9680517Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9680727Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9680925Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9681156Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9681458Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9681690Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9681993Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9682224Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9682522Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9682754Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9683045Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9683308Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9683611Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9683856Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9684148Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9684380Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9684672Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9684905Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9685199Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9685429Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9685720Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9685966Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9686267Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9686499Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9686790Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9687012Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9687215Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9687413Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9687705Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9687947Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9688248Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9688481Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9688772Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9689003Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9689297Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9689530Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9689822Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9690055Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9690356Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9690553Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9690756Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9690952Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9691163Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9691363Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9691597Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9691890Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9692098Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9692293Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9692498Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9692693Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9692923Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9693217Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9693478Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9693772Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9693968Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9694177Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9694398Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9694633Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9694947Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9695170Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9695374Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9695573Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9695775Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9696072Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9696306Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9696614Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9696860Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9697155Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9697388Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9697683Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9697919Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9698213Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9698412Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9698612Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9698844Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9699047Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9699255Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9699458Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9699753Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9699991Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9700285Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9700518Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9700820Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9701066Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9701363Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9701596Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9701889Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9702111Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9702317Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9702515Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9702708Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9702920Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9703133Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9703450Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9703685Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9703888Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9704087Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9704290Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9704585Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9704819Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9705113Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9705360Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9705673Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9705906Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9706201Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9706436Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9706729Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9706962Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9707254Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9707501Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9707798Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9708042Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9708336Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9708573Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9708867Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9709100Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9709395Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9709607Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9709815Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9710049Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9710346Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9710580Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9710875Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9711111Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9711405Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9711637Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9711947Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9712180Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9712483Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9712682Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9712917Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9713216Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9713483Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9713778Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9714007Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9714225Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9714425Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9714627Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9714923Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9715140Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9715343Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9715541Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9715742Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9716035Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9716270Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9716473Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9716681Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9716873Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9717022Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9717220Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9717440Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9717648Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9717843Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9718064Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9718281Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9718484Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9718705Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9718912Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9719109Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9719330Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9719537Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9719735Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9719929Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9720143Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9720355Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9720554Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9720763Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9721059Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9721274Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9721477Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9721676Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9721868Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9722063Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9722287Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9722500Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9722699Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9722899Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9723193Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9723441Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9723645Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9723845Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9724045Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9724339Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9724566Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9724771Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9724981Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9725181Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9725475Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9725672Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9725875Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9726068Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9726264Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9726493Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9726718Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9726915Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9727108Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9727287Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9727459Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9727585Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9727689Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9727815Z E1204 11:18:33.186000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9727866Z ('RERUN', {'yellow': True}) [1.1385s] [100%] 2025-12-04T11:45:24.9728202Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:34.399685674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9728205Z 2025-12-04T11:45:24.9728350Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9728654Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9728951Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9729098Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9729580Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9729835Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9730063Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9730268Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9730470Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9730772Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9731017Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9731310Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9731543Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9731835Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9732067Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9732361Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9732594Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9732899Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9733121Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9733379Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9733576Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9733783Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9733984Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9734215Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9734510Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9734706Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9734951Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9735247Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9735479Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9735676Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9735894Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9736100Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9736296Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9736490Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9736712Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9736916Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9737123Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9737320Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9737562Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9737856Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9738089Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9738382Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9738600Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9738805Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9739000Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9739218Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9739430Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9739667Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9739961Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9740193Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9740485Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9740716Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9741008Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9741239Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9744951Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9745197Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9745504Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9745736Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9746028Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9746259Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9746551Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9746781Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9747091Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9747333Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9747627Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9747857Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9748149Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9748381Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9748672Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9748892Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9749094Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9749303Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9749594Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9749837Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9750130Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9750363Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9750653Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9750883Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9751174Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9751415Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9751715Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9751947Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9752238Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9752437Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9752633Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9752829Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9753038Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9753236Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9753507Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9753811Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9754007Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9754214Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9754409Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9754605Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9754839Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9755129Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9755359Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9755660Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9755869Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9756075Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9756276Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9756508Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9756806Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9757027Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9757230Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9757427Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9757627Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9757929Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9758163Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9758464Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9758697Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9758994Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9759230Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9759522Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9759753Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9760057Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9760265Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9760461Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9760681Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9760881Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9761081Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9761282Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9761576Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9761808Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9762099Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9762341Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9762648Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9762881Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9763173Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9763437Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9763733Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9763954Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9764155Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9764368Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9764574Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9764785Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9764985Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9765277Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9765498Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9765702Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9765902Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9766102Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9766394Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9766640Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9766945Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9767177Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9767471Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9767704Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9767997Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9768230Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9768522Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9768765Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9769066Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9769299Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9769589Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9769823Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9770115Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9770347Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9770642Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9770885Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9771177Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9771386Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9771582Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9771815Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9772106Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9772341Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9772632Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9772866Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9773170Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9773445Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9773741Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9773972Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9774265Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9774461Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9774694Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9774987Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9775234Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9775529Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9775754Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9775956Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9776154Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9776356Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9776649Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9776861Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9777062Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9777280Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9777501Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9777793Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9778013Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9778215Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9778415Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9778609Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9778756Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9778955Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9779173Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9779379Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9779593Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9779817Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9780034Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9780229Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9780450Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9780657Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9780853Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9781073Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9781278Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9781487Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9781691Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9781905Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9782106Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9782306Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9782508Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9782801Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9783013Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9783214Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9783451Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9783652Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9783849Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9784073Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9784277Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9784475Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9784676Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9784970Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9785181Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9785383Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9785592Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9785805Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9786098Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9786311Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9786513Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9786712Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9786911Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9787203Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9787397Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9787598Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9787798Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9787994Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9788218Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9788424Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9788623Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9788814Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9788994Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9789165Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9789292Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9789397Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9789534Z E1204 11:18:34.133000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9789692Z [W1204 11:18:34.402108428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9789705Z 2025-12-04T11:45:24.9789852Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9790148Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9790448Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9790580Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9791068Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9791322Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9791549Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9791759Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9791975Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9792281Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9792514Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9792808Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9793043Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9793357Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9793589Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9793879Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9794129Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9794436Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9794655Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9794861Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9795058Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9795268Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9795467Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9795699Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9795988Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9796195Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9796426Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9796728Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9796947Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9797141Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9797360Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9797566Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9797761Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9797955Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9798184Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9798389Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9798593Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9798788Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9799018Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9799311Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9799544Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9799838Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9800057Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9800263Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9800467Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9800674Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9800882Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9801112Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9801403Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9801636Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9801926Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9802158Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9802458Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9802689Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9802990Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9803222Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9803541Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9803773Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9804064Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9804294Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9804587Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9804839Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9805130Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9805372Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9805661Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9805893Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9806183Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9806414Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9806706Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9806939Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9807152Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9807351Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9807642Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9807872Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9808168Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9808400Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9808689Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9808919Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9809221Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9809454Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9809751Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9809983Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9810276Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9810471Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9810667Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9810861Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9811070Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9811293Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9811534Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9811825Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9812020Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9812217Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9812411Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9812606Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9812838Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9813129Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9813393Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9813702Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9813909Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9814115Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9814316Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9814550Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9814845Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9815066Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9815266Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9815477Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9815689Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9815987Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9816222Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9816515Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9816749Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9817041Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9817274Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9817564Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9817808Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9818100Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9818308Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9818505Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9818726Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9818928Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9819125Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9819327Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9819618Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9819863Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9820165Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9820398Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9820692Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9820923Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9821216Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9821447Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9821739Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9821960Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9822172Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9822371Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9822573Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9822785Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9822986Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9823307Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9823529Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9823729Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9823928Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9824143Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9824447Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9824679Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9824972Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9825207Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9825501Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9825733Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9826023Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9826257Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9826562Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9826813Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9827106Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9827339Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9827634Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9827866Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9828159Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9828391Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9828692Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9828933Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9829227Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9829424Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9829622Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9829857Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9830149Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9830380Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9830672Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9830915Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9831219Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9831451Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9831743Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9831977Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9832272Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9832468Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9832698Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9833000Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9833242Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9833562Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9833777Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9833978Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9834178Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9834379Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9834671Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9834883Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9835097Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9835297Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9835509Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9835801Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9836020Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9836222Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9836420Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9836614Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9836763Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9836976Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9837198Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9837416Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9837613Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9837831Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9838039Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9838235Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9838455Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9838661Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9838857Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9839078Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9839291Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9839489Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9839693Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9839906Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9840108Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9840305Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9840506Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9840798Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9841013Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9841228Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9841437Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9841628Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9841823Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9842036Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9842236Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9842433Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9842632Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9842928Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9843141Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9843387Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9843585Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9843796Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9844089Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9844301Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9844504Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9844700Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9844901Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9845192Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9845400Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9845614Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9845802Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9845999Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9846210Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9846416Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9846612Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9846799Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9846979Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9847149Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9847275Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9847389Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9847517Z E1204 11:18:34.135000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9847673Z [W1204 11:18:34.404273966 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9847676Z 2025-12-04T11:45:24.9847830Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:24.9848123Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9848423Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9848555Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9849035Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9849289Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9849524Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9849742Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9849943Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9850234Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9850469Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9850761Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9850994Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9851284Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9851516Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9851816Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9852051Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9852355Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9852573Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9852780Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9852975Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9853183Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9853415Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9853646Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9853951Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9854158Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9854393Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9854685Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9854906Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9855102Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9855320Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9855525Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9855719Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9855926Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9856143Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9856358Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9856554Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9856752Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9856985Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9857276Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9857508Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9857802Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9858031Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9858251Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9858446Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9858653Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9858853Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9859088Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9859379Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9859610Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9859900Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9860131Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9860433Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9860672Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9860963Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9861194Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9861488Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9861719Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9862008Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9862249Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9862539Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9862781Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9863071Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9863332Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9863626Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9863856Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9864145Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9864374Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9864683Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9864901Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9865115Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9865311Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:24.9865601Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9865836Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9866127Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9866357Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9866647Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9866891Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9867192Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9867423Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9867712Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9867943Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9868238Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9868435Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9868629Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9868825Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9869043Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9869241Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9869480Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9869771Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9869967Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9870162Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9870358Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9870552Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9870783Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9871083Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9871324Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9871616Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9871811Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9872018Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9872220Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9872453Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9872747Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9872967Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9873180Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9873410Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9873626Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9873917Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9874150Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9874442Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9874674Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9874967Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9875218Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9875511Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9875754Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9876046Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9876244Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9876442Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9876663Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9876864Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9877062Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9877263Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9877568Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9877801Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9878102Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9878335Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9878628Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9878860Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9879152Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9879384Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9879689Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9879919Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9880122Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9880322Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9880515Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9880726Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:24.9880926Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9881217Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9881436Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9881638Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9881846Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9882049Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9882350Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9882583Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9882876Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9883109Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9883433Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9883664Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9883970Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9884218Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9884511Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9884742Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9885036Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9885269Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9885561Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9885796Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9886090Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9886335Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9886644Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9886876Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9887170Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9887367Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9887565Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9887797Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9888088Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9888332Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9888634Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9888871Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9889164Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9889397Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9889691Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9889924Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9890216Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9890412Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9890653Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9890945Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9891197Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9891488Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9891703Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9891905Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9892103Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9892303Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9892604Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9892827Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9893028Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9893229Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9893456Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9893749Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9893970Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9894171Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9894368Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9894560Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9894710Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:24.9894919Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9895140Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9895358Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9895554Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9895777Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9895982Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9896178Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9896396Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9896601Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9896811Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9897042Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:24.9897247Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:24.9897443Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:24.9897637Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9897852Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9898054Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9898251Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9898451Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9898745Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9898967Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9899171Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9899379Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9899571Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:24.9899766Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9899983Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9900184Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9900380Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9900581Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9900883Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9901106Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9901306Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9901504Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9901703Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9901997Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9902210Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:24.9902411Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:24.9902610Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:24.9902809Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9903112Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9903336Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:24.9903549Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:24.9903738Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:24.9903934Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:24.9904149Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:24.9904355Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:24.9904553Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:24.9904740Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:24.9904933Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:24.9905105Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:24.9905243Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:24.9905345Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:24.9905470Z E1204 11:18:34.137000 830772 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:24.9905511Z FAILED [0.9970s] [100%] 2025-12-04T11:45:24.9905513Z 2025-12-04T11:45:24.9905569Z ==================================== RERUNS ==================================== 2025-12-04T11:45:24.9905713Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9905761Z Traceback (most recent call last): 2025-12-04T11:45:24.9905928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9905973Z method(*args, **kwargs) 2025-12-04T11:45:24.9906127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9906167Z method(*args, **kwargs) 2025-12-04T11:45:24.9906317Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9906355Z with policy(): 2025-12-04T11:45:24.9906509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9906551Z raise RuntimeError(msg) 2025-12-04T11:45:24.9906967Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:24.9906971Z 2025-12-04T11:45:24.9907048Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9907310Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9907312Z 2025-12-04T11:45:24.9907413Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9907490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9907534Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9907594Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9908149Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9908250Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9908288Z graph_break [] 2025-12-04T11:45:24.9908352Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9908430Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9908921Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9908993Z current_size = base.storage().size() 2025-12-04T11:45:24.9909034Z Autotune Choices Stats: 2025-12-04T11:45:24.9909408Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.9909474Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9909523Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9909646Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9909884Z triton_mm_13 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9910114Z triton_mm_11 0.0064 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9910340Z triton_mm_10 0.0064 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9910565Z triton_mm_9 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9910804Z triton_mm_17 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9911030Z triton_mm_14 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9911261Z triton_mm_18 0.0069 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9911484Z triton_mm_8 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9911714Z triton_mm_12 0.0070 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9911940Z triton_mm_15 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9912071Z SingleProcess AUTOTUNE benchmarking takes 0.0851 seconds and 0.7748 seconds precompiling for 18 choices 2025-12-04T11:45:24.9912214Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9912260Z Traceback (most recent call last): 2025-12-04T11:45:24.9912418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9912469Z method(*args, **kwargs) 2025-12-04T11:45:24.9912624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9912678Z method(*args, **kwargs) 2025-12-04T11:45:24.9912832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9912870Z with policy(): 2025-12-04T11:45:24.9913023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9913064Z raise RuntimeError(msg) 2025-12-04T11:45:24.9913482Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:24.9913486Z 2025-12-04T11:45:24.9913559Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9913822Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9913825Z 2025-12-04T11:45:24.9913913Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9913987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9914029Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9914088Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9914639Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9914755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9914794Z graph_break [] 2025-12-04T11:45:24.9914856Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9914930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9915429Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9915479Z current_size = base.storage().size() 2025-12-04T11:45:24.9915519Z Autotune Choices Stats: 2025-12-04T11:45:24.9915885Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.9915950Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9915999Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9916122Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9916356Z triton_mm_13 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9916598Z triton_mm_11 0.0064 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9916834Z triton_mm_10 0.0064 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9917060Z triton_mm_9 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9917285Z triton_mm_17 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9917510Z triton_mm_14 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9917735Z triton_mm_18 0.0069 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9917959Z triton_mm_8 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9918185Z triton_mm_12 0.0070 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9918420Z triton_mm_15 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9918551Z SingleProcess AUTOTUNE benchmarking takes 0.0851 seconds and 0.7748 seconds precompiling for 18 choices 2025-12-04T11:45:24.9918625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9918668Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9918724Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9918834Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9919325Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9919364Z graph_break [] 2025-12-04T11:45:24.9919428Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9919501Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9919542Z Autotune Choices Stats: 2025-12-04T11:45:24.9919904Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.9919966Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9920025Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9920145Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9920376Z triton_mm_33 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9920616Z triton_mm_30 0.0064 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9920841Z triton_mm_31 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9921063Z triton_mm_29 0.0066 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9921288Z triton_mm_34 0.0066 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9921510Z triton_mm_37 0.0066 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9921733Z triton_mm_38 0.0071 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9921958Z triton_mm_27 0.0072 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9922191Z triton_mm_28 0.0073 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9922416Z triton_mm_32 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9922556Z SingleProcess AUTOTUNE benchmarking takes 0.1488 seconds and 0.5024 seconds precompiling for 21 choices 2025-12-04T11:45:24.9922616Z =================================== FAILURES =================================== 2025-12-04T11:45:24.9922756Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:24.9922804Z Traceback (most recent call last): 2025-12-04T11:45:24.9922961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9923005Z method(*args, **kwargs) 2025-12-04T11:45:24.9923160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:24.9923201Z method(*args, **kwargs) 2025-12-04T11:45:24.9923377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:24.9923415Z with policy(): 2025-12-04T11:45:24.9923568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:24.9923609Z raise RuntimeError(msg) 2025-12-04T11:45:24.9923998Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.9924025Z 2025-12-04T11:45:24.9924100Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9924356Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9924359Z 2025-12-04T11:45:24.9924448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9924520Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9924563Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9924620Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9925170Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9925270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9925306Z graph_break [] 2025-12-04T11:45:24.9925370Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9925442Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9925929Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:24.9925992Z current_size = base.storage().size() 2025-12-04T11:45:24.9926033Z Autotune Choices Stats: 2025-12-04T11:45:24.9926409Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.9926473Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9926521Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9926642Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9926876Z triton_mm_13 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9927105Z triton_mm_11 0.0064 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9927332Z triton_mm_10 0.0064 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9927557Z triton_mm_9 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9927793Z triton_mm_17 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9928032Z triton_mm_14 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9928256Z triton_mm_18 0.0069 ms 91.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9928479Z triton_mm_8 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9928704Z triton_mm_12 0.0070 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9928929Z triton_mm_15 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9929058Z SingleProcess AUTOTUNE benchmarking takes 0.0851 seconds and 0.7748 seconds precompiling for 18 choices 2025-12-04T11:45:24.9929133Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9929174Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9929231Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9929330Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9929831Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9929870Z graph_break [] 2025-12-04T11:45:24.9929931Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9930004Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9930054Z Autotune Choices Stats: 2025-12-04T11:45:24.9930415Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:24.9930477Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9930527Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9930649Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9930877Z triton_mm_33 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9931100Z triton_mm_30 0.0064 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9931328Z triton_mm_31 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9931565Z triton_mm_29 0.0066 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9931797Z triton_mm_34 0.0066 ms 95.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9932022Z triton_mm_37 0.0066 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9932244Z triton_mm_38 0.0071 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9932474Z triton_mm_27 0.0072 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9932697Z triton_mm_28 0.0073 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9932924Z triton_mm_32 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9933052Z SingleProcess AUTOTUNE benchmarking takes 0.1488 seconds and 0.5024 seconds precompiling for 21 choices 2025-12-04T11:45:24.9933126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:24.9933169Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:24.9933235Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:24.9933362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:24.9933858Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:24.9933897Z graph_break [] 2025-12-04T11:45:24.9933957Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:24.9934032Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:24.9934072Z Autotune Choices Stats: 2025-12-04T11:45:24.9934435Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_51", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:24.9934502Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:24.9934550Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:24.9934671Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:24.9934905Z triton_mm_51 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9935162Z triton_mm_57 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9935399Z triton_mm_53 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9935624Z triton_mm_50 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9935846Z triton_mm_49 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9936071Z triton_mm_54 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9936295Z triton_mm_58 0.0066 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9936520Z triton_mm_52 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9936744Z triton_mm_48 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:24.9936983Z triton_mm_55 0.0070 ms 88.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:24.9937113Z SingleProcess AUTOTUNE benchmarking takes 0.1234 seconds and 0.3610 seconds precompiling for 21 choices 2025-12-04T11:45:24.9937306Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-41de2e9bfc78920a.xml - 2025-12-04T11:45:24.9937381Z =========================== short test summary info ============================ 2025-12-04T11:45:24.9937969Z FAILED [0.9970s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:24.9937973Z 2025-12-04T11:45:24.9938047Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:24.9938309Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9938311Z 2025-12-04T11:45:24.9938399Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:24.9938463Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:24.9938531Z ================== 1 failed, 187 deselected, 2 rerun in 4.94s ================== 2025-12-04T11:45:24.9938569Z Got exit code 1 2025-12-04T11:45:24.9938609Z Retrying single test... 2025-12-04T11:45:24.9938766Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4bf8fcbbdea18b91.xml 2025-12-04T11:45:24.9938823Z ============================= test session starts ============================== 2025-12-04T11:45:24.9938947Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:24.9938988Z cachedir: .pytest_cache 2025-12-04T11:45:24.9939148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:24.9939195Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:24.9939236Z configfile: pytest.ini 2025-12-04T11:45:24.9939400Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:24.9939475Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:24.9939727Z stepcurrent: skipping 94 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:24.9939772Z Running 1 items in this shard 2025-12-04T11:45:24.9939774Z 2025-12-04T11:45:24.9940107Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:43.705229257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9940109Z 2025-12-04T11:45:24.9940427Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9940725Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9940859Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9941356Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9941622Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9941848Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9942063Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9942264Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9942559Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9942793Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9943094Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9943370Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9943660Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9943891Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9944184Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9944418Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9944712Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9944941Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9945230Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9945478Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9945769Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9945977Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9946208Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9946502Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9946699Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9946931Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9947221Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9947464Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9947768Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9947988Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9948194Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9948392Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9948604Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9948772Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9948951Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9949469Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp_ohz0p6/6e/c6eanht4rb54oimg6mp2gh4utfw73vfy2waao2vvr7no7oh5u3uo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9949619Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9949850Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9950006Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9950169Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9950455Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9950587Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9950846Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9950987Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9951243Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9951399Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9951677Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9951824Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9952101Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9952297Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9952612Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9952907Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9953037Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9953547Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9953799Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9954039Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9954245Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9954458Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9954751Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9954986Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9955277Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9955508Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9955799Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9956044Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9956347Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9956579Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9956868Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9957101Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9957392Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9957624Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9957916Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9958118Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9958362Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9958651Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9958857Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9959088Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9959379Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9959611Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9959905Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9960126Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9960344Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9960545Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9960767Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9960933Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9961110Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9961213Z E1204 11:18:43.756000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.9961374Z [W1204 11:18:43.054482430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9961378Z 2025-12-04T11:45:24.9961685Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9961979Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9962110Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9962601Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9962854Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9963089Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9963320Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9963520Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9963812Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9964045Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9964335Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9964583Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9964876Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9965120Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9965410Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9965643Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9965935Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9966167Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9966458Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9966687Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9966996Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9967193Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9967435Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9967724Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9967919Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9968154Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9968446Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9968676Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9968964Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9969199Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9969415Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9969617Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9969826Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9969992Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9970170Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9970685Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp_ohz0p6/bj/cbjxbag7j66nd7yspeletejj7ht5iv6ck7i6xezu2ugg5krurumw.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9970835Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9971050Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9971217Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9971363Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9971661Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9971794Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9972049Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9972190Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9972443Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9972599Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9972867Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9973001Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9973309Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9973513Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9973825Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9974118Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9974248Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9974728Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9974980Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9975205Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9975422Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9975623Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9975927Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9976162Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9976455Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9976689Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9976981Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9977210Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9977500Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9977743Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9978042Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9978275Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9978566Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9978803Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9979093Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9979293Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9979523Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9979825Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9980021Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9980264Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9980553Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9980786Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9981078Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9981297Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9981503Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9981703Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9981923Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9982107Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9982282Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9982385Z E1204 11:18:43.788000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:24.9982540Z [W1204 11:18:43.058005017 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:24.9982543Z 2025-12-04T11:45:24.9982850Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9983148Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9983307Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9983784Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9984048Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9984275Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9984490Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9984689Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9984979Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9985214Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9985509Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9985741Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9986032Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9986275Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9986576Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9986807Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9987095Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9987329Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9987619Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9987853Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9988143Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9988338Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9988581Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9988880Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9989077Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:24.9989306Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9989600Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9989833Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9990127Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9990348Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:24.9990564Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:24.9990774Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:24.9990983Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:24.9991149Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:24.9991327Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:24.9991848Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp_ohz0p6/c7/cc7w47w5kffhezsr6uxbhlttcdh2a457snd3tbuhehtlnmgurngz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:24.9991998Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:24.9992217Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:24.9992372Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:24.9992518Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:24.9992815Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:24.9992949Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:24.9993221Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:24.9993377Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:24.9993632Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:24.9993790Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:24.9994058Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:24.9994194Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:24.9994472Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:24.9994681Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:24.9995007Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:24.9995300Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:24.9995431Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:24.9995906Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:24.9996159Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:24.9996385Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:24.9996592Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:24.9996793Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:24.9997097Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9997333Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9997639Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9997871Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9998163Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9998393Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9998684Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9998915Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9999225Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9999467Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:24.9999758Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:24.9999993Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0000285Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0000483Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0000713Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0001002Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0001198Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0001448Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0001754Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0001984Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0002276Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0002496Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0002704Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0002906Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0003115Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0003319Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0003499Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0003614Z E1204 11:18:43.791000 836028 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.0003666Z ('RERUN', {'yellow': True}) [2.9779s] [100%] 2025-12-04T11:45:25.0003999Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:45.501630846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0004002Z 2025-12-04T11:45:25.0004148Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0004444Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0004741Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0004871Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0005348Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0005615Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0005842Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0006060Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0006258Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0006552Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0006784Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0007075Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0007304Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0007598Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0007843Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0008143Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0008375Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0008665Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0008885Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0009090Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0009288Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0009495Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0009693Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0009936Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0010227Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0010434Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0010665Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0010958Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0011179Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0011374Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0011592Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0011795Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0012098Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0012305Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0012526Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0012732Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0012929Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0013127Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0013377Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0013670Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0013899Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0014205Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0014424Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0014647Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0014843Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0015051Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0015253Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0015485Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0015777Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0016006Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0016313Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0016558Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0016846Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0017076Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0017368Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0017601Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0017894Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0018124Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0018414Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0018654Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0018947Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0019186Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0019477Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0019710Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0020002Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0020234Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0020522Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0020764Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0021063Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0021282Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0021482Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0021678Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0021972Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0022204Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0022494Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0022722Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0023024Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0023284Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0023586Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0023816Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0024107Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0024341Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0024634Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0024827Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0025036Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0025244Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0025451Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0025648Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0025879Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0026170Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0026365Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0026562Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0026757Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0026952Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0027196Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0027489Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0027733Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0028024Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0028221Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0028426Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0028628Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0028862Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0029158Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0029390Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0029607Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0029807Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0030007Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0030300Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0030534Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0030827Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0031061Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0031360Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0031615Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0031907Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0032151Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0032442Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0032643Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0032841Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0033063Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0033293Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0033505Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0033706Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0034009Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0034244Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0034536Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0034769Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0035061Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0035292Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0035587Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0035833Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0036129Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0036361Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0036562Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0036762Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0036954Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0037164Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0037363Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0037657Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0037889Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0039863Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0040070Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0040275Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0040571Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0040808Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0041101Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0041334Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0041626Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0041876Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0042172Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0042415Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0042706Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0042940Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0043233Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0043490Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0043783Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0044032Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0044342Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0044575Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0044867Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0045100Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0045391Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0045589Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0045786Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0046019Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0046324Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0046559Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0046866Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0047098Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0047392Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0047624Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0047995Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0048228Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0048537Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0048755Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0048986Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0049281Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0049512Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0049806Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0050024Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0050224Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0050422Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0050622Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0050925Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0051137Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0051351Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0051553Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0051755Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0052051Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0052271Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0052473Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0052680Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0052872Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0053030Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0053228Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0053475Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0053681Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0053880Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0054099Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0054312Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0054508Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0054729Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0054947Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0056226Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0056462Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0056667Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0056864Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0057059Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0057285Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0057487Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0057685Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0057896Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0058191Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0058404Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0058605Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0058803Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0058995Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0059192Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0059406Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0059609Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0059807Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0060008Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0060319Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0060594Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0060794Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0060992Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0061194Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0061489Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0061705Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0061904Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0062113Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0062313Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0062607Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0062804Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0063004Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0063195Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0063435Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0063650Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0063856Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0064055Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0064244Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0064437Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0064630Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0064780Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0064885Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0065011Z E1204 11:18:45.240000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0065171Z [W1204 11:18:45.509599498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0065174Z 2025-12-04T11:45:25.0065321Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0065620Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0065922Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0066054Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0066559Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0066815Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0067043Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0067250Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0067451Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0067743Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0067979Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0068273Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0068505Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0068806Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0069059Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0069350Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0069582Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0069872Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0070092Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0070298Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0070499Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0070720Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0070918Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0071151Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0071441Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0071638Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0071869Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0072161Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0072379Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0072574Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0072795Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0073019Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0073225Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0073463Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0073682Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0073886Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0074082Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0074275Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0074507Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0074799Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0075045Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0075337Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0075555Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0075759Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0075955Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0076163Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0076363Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0076594Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0076885Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0077128Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0077422Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0077676Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0077967Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0078200Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0078491Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0078724Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0079012Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0079254Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0079546Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0079782Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0080072Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0080302Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0080593Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0080823Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0081113Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0081343Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0081642Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0081890Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0082191Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0082409Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0082611Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0082807Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0083101Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0083365Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0083667Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0083898Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0084192Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0084423Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0084713Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0084946Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0085237Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0085467Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0085757Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0085967Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0086173Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0086381Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0086589Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0086787Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0087018Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0087308Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0087503Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0087696Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0087902Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0088097Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0088327Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0088618Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0088849Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0089142Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0089336Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0089543Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0089745Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0089978Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0090282Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0090522Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0090724Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0090923Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0091128Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0091422Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0091655Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0091946Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0092191Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0092483Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0092714Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0093007Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0093242Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0093569Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0093768Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0093964Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0094184Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0094403Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0094601Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0094827Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0095121Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0095354Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0095649Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0095883Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0096174Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0096407Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0096714Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0096946Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0097238Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0097458Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0097660Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0097859Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0098056Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0098267Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0098466Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0098777Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0099006Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0099217Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0099414Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0099615Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0099909Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0100143Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0100439Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0100670Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0100974Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0101205Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0101499Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0101732Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0102025Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0102258Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0102551Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0102784Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0103088Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0103355Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0103674Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0103905Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0104198Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0104430Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0104724Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0104922Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0105118Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0105363Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0105655Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0105888Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0106179Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0106412Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0106704Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0106937Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0107232Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0107479Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0107771Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0107988Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0108220Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0108513Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0108746Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0109041Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0109254Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0109458Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0109674Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0109874Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0110169Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0110381Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0110584Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0110781Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0110983Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0111276Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0111495Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0111699Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0111908Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0112108Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0112265Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0112461Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0112681Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0112887Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0113084Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0113329Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0113534Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0113748Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0113969Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0114175Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0114369Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0114595Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0114802Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0115001Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0115196Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0115409Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0115609Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0115810Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0116026Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0116349Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0116561Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0116760Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0116960Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0117149Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0117348Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0117561Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0117760Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0117973Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0118172Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0118469Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0118681Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0118882Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0119080Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0119280Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0119574Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0119786Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0119997Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0120193Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0120403Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0120706Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0120901Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0121104Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0121294Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0121491Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0121704Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0121909Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0122118Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0122306Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0122489Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0122660Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0122787Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0122890Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0123016Z E1204 11:18:45.243000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0123171Z [W1204 11:18:45.511784375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0123174Z 2025-12-04T11:45:25.0123346Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0123640Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0123935Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0124079Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0124569Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0124835Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0125062Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0125270Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0125470Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0125761Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0125995Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0126308Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0126547Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0126841Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0127074Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0127389Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0127622Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0127914Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0128132Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0128338Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0128547Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0128771Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0128990Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0129221Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0129513Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0129711Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0129948Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0130238Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0130468Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0130664Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0130881Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0131090Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0131288Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0131490Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0131712Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0131918Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0132128Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0132331Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0132566Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0132867Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0133118Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0133463Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0133684Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0133890Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0134086Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0134302Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0134500Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0134734Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0135043Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0135277Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0135568Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0135797Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0136092Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0136322Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0136615Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0136844Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0137149Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0137392Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0137689Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0137920Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0138211Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0138441Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0138734Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0138965Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0139266Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0139496Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0139788Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0140017Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0140307Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0140525Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0140725Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0140921Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0141216Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0141458Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0141747Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0141998Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0142290Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0142520Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0142810Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0143042Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0143366Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0143612Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0143905Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0144103Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0144297Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0144492Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0144699Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0144897Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0145128Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0145417Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0145613Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0145819Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0146037Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0146241Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0146471Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0146760Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0146991Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0147282Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0147476Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0147683Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0147895Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0148131Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0148423Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0148643Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0148845Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0149043Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0149243Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0149535Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0149767Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0150069Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0150313Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0150616Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0150847Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0151140Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0151372Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0151665Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0151860Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0152066Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0152287Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0152489Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0152690Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0152890Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0153183Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0153436Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0153731Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0153963Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0154254Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0154499Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0156494Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0156728Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0157019Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0157240Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0157443Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0157641Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0157832Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0158056Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0158256Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0158549Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0158771Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0158972Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0159170Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0159372Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0159664Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0159897Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0160188Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0160429Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0160739Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0160971Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0161265Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0161499Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0161791Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0162023Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0162313Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0162555Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0162846Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0163079Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0163407Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0163642Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0163933Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0164165Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0164456Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0164652Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0164869Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0165111Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0165414Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0165648Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0165941Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0166175Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0166466Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0166697Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0166999Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0167233Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0167525Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0167721Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0167957Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0168249Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0168484Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0168775Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0168989Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0169202Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0169410Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0169619Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0169911Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0170126Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0170328Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0170526Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0170727Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0171017Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0171250Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0171451Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0171649Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0171841Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0171989Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0172185Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0172405Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0172613Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0172807Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0173026Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0173241Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0173482Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0173714Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0173918Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0174114Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0174334Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0174541Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0174738Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0174932Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0175155Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0175356Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0175554Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0175753Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0176045Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0176257Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0176460Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0176660Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0176855Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0177050Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0177261Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0177474Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0177685Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0177893Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0178186Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0178401Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0178600Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0178800Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0179000Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0179290Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0179519Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0179721Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0179919Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0180117Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0180411Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0180605Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0180806Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0180995Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0181192Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0181407Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0181620Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0181825Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0182022Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0182202Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0182373Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0182498Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0182603Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0182728Z E1204 11:18:45.245000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0182780Z ('RERUN', {'yellow': True}) [1.2289s] [100%] 2025-12-04T11:45:25.0183113Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:18:46.545499974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0183124Z 2025-12-04T11:45:25.0183307Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0183601Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0183899Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0184030Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0184507Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0184761Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0184986Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0185193Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0185391Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0185702Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0185948Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0186249Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0186481Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0186772Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0187003Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0187297Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0187526Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0187831Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0188050Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0188257Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0188451Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0188658Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0188858Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0189088Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0189380Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0189574Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0189808Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0190111Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0190355Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0190551Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0190767Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0190973Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0191167Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0191365Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0191581Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0191784Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0191993Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0192194Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0192429Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0192721Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0192954Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0193246Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0193490Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0193694Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0193889Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0194097Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0194307Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0194568Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0194860Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0195091Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0195384Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0195613Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0195906Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0196136Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0196440Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0196671Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0196962Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0197195Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0197486Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0197717Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0198006Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0198236Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0198534Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0198765Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0199075Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0199306Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0199599Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0199829Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0200119Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0200338Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0200537Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0200745Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0201035Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0201266Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0201557Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0201791Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0202080Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0202312Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0202602Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0202833Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0203131Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0203405Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0203695Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0203893Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0204090Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0204285Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0204491Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0204690Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0204933Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0205224Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0205419Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0205614Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0205808Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0206002Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0206236Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0206527Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0206758Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0207049Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0207255Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0207473Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0207689Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0207922Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0208215Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0208437Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0208640Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0208840Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0209041Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0209343Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0209576Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0209868Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0210100Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0210395Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0210625Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0210922Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0211154Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0211457Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0211653Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0211885Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0212108Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0212308Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0212508Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0212707Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0213003Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0213236Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0213564Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0213798Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0214090Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0214323Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0214614Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0214846Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0215138Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0215359Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0215562Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0215773Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0215965Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0216198Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0216398Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0216691Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0216913Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0217116Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0217314Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0217515Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0217823Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0218057Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0218350Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0218581Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0218872Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0219104Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0219396Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0219627Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0219919Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0220160Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0220464Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0220706Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0220996Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0221230Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0221521Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0221754Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0222046Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0222289Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0222583Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0222781Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0222977Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0223210Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0223559Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0223792Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0224084Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0224316Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0224624Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0224869Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0225173Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0225405Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0225698Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0225894Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0226128Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0226418Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0226664Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0226958Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0227177Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0227378Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0227577Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0227779Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0228070Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0228285Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0228486Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0228685Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0228903Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0229205Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0229435Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0229637Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0229837Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0230027Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0230175Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0230372Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0230590Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0230805Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0231001Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0231222Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0231428Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0231624Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0231845Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0232052Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0232249Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0232468Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0232672Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0232879Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0233073Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0233340Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0233544Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0233744Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0233946Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0234239Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0234452Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0234654Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0234864Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0235056Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0235251Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0235464Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0235665Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0235864Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0236066Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0236358Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0236571Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0236770Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0236969Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0237182Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0237496Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0237710Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0237909Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0238110Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0238308Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0238602Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0238797Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0239006Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0239195Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0239389Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0239603Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0239806Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0240004Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0240192Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0240374Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0240546Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0240670Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0240772Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0240897Z E1204 11:18:46.279000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0241067Z [W1204 11:18:46.547904078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0241070Z 2025-12-04T11:45:25.0241213Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0241524Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0241818Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0241949Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0242431Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0242685Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0242910Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0243130Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0243361Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0243655Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0243889Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0244180Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0244413Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0244707Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0244939Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0245228Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0245474Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0245775Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0246006Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0246212Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0246409Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0246616Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0246815Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0247048Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0247338Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0247549Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0247778Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0248071Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0248288Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0248484Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0248703Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0248907Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0249103Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0249298Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0249518Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0249729Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0249933Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0250136Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0250368Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0250661Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0250891Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0251184Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0251403Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0251628Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0251823Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0252032Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0252231Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0252463Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0252757Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0252987Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0253304Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0253533Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0253831Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0254075Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0254391Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0254622Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0254911Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0255142Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0255433Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0255664Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0255953Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0256199Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0256492Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0256721Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0257010Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0257242Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0257531Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0257765Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0258055Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0258274Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0258486Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0258693Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0258999Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0259229Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0259521Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0259751Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0260041Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0260272Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0260573Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0260806Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0261095Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0261325Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0261616Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0261813Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0262009Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0262203Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0262411Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0262617Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0262851Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0263163Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0263394Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0263589Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0263785Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0263980Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0264210Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0264501Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0264746Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0265036Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0265233Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0265440Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0265641Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0265875Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0266168Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0266388Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0266589Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0266787Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0266998Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0267318Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0267554Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0267849Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0268083Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0268376Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0268608Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0268900Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0269143Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0269434Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0269634Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0269831Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0270055Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0270256Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0270458Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0270659Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0270950Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0271194Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0271486Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0271738Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0272031Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0272267Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0272560Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0272792Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0273086Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0273338Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0273542Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0273744Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0273936Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0274147Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0274350Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0274650Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0274871Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0275071Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0275269Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0275483Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0275777Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0276039Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0276334Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0276566Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0276861Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0277095Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0277386Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0277631Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0277922Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0278155Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0278446Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0278680Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0278975Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0279208Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0279501Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0279732Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0280040Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0280282Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0280588Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0280785Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0280983Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0281216Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0281512Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0281744Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0282044Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0282278Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0282573Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0282804Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0283096Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0283363Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0283658Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0283857Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0284091Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0284399Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0284642Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0284948Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0285162Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0285364Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0285564Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0285764Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0286060Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0286273Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0286489Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0286686Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0286888Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0287181Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0287401Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0287602Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0287800Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0287991Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0288137Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0288335Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0288566Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0288783Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0288989Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0289208Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0289415Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0289610Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0289830Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0290034Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0290232Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0290467Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0290673Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0290871Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0291065Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0291279Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0291480Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0291679Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0291880Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0292172Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0292384Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0292604Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0292804Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0293006Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0293211Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0293447Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0293652Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0293851Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0294051Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0294343Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0294554Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0294769Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0294969Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0295169Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0295459Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0295672Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0295874Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0296072Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0296273Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0296564Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0296771Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0296972Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0297179Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0297388Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0297601Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0297808Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0298004Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0298194Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0298375Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0298545Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0298680Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0298784Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0298912Z E1204 11:18:46.281000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0299066Z [W1204 11:18:46.550065726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0299068Z 2025-12-04T11:45:25.0299214Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0299506Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0299803Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0299931Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0300409Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0300665Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0300900Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0301116Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0301324Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0301621Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0301857Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0302151Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0302386Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0302676Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0302920Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0303209Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0303474Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0303767Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0303989Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0304196Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0304391Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0304600Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0304799Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0305031Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0305333Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0305540Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0305783Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0306077Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0306298Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0306493Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0306714Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0306918Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0307114Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0307331Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0307548Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0307754Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0307948Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0308144Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0308377Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0308669Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0308901Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0309191Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0309420Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0309624Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0309839Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0310045Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0310244Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0310479Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0310771Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0311006Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0311294Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0311532Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0311822Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0312054Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0312347Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0312577Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0312870Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0313101Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0313418Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0313649Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0313950Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0314195Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0314497Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0314728Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0315019Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0315254Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0315545Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0315774Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0316076Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0316296Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0316500Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0316699Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0316992Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0317226Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0317520Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0317752Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0318041Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0318283Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0318584Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0318824Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0319117Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0319349Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0319643Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0319840Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0320035Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0320239Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0320446Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0320645Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0320875Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0321166Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0321362Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0321556Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0321751Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0321946Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0322179Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0322482Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0322722Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0323025Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0323220Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0323457Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0323659Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0323893Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0324186Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0324408Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0324628Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0324828Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0325029Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0325322Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0325556Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0325848Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0326083Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0326374Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0326610Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0326920Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0327164Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0327469Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0327666Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0327865Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0328085Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0328287Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0328485Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0328687Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0328991Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0329224Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0329517Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0329749Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0330045Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0330278Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0330574Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0330808Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0331112Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0331333Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0331556Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0331758Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0331948Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0332159Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0332358Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0332651Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0332874Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0333085Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0333317Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0333518Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0333813Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0334046Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0334338Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0334571Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0334864Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0335097Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0335402Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0335635Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0335964Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0336196Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0336492Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0336723Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0337016Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0337248Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0337552Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0337785Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0338079Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0338313Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0338605Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0338804Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0339001Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0339235Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0339529Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0339772Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0340065Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0340330Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0340624Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0340860Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0341153Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0341387Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0341677Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0341884Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0342116Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0342410Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0342641Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0342933Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0343149Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0343369Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0343568Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0343767Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0344060Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0344296Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0344509Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0344718Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0344919Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0345215Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0345436Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0345639Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0345836Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0346025Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0346186Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0346380Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0348874Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0349097Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0349302Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0349526Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0349735Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0349935Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0350156Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0350360Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0350578Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0350798Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0351024Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0351221Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0351417Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0351632Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0351835Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0352035Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0352235Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0352529Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0352752Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0352952Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0353151Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0353380Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0353576Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0353789Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0353992Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0354192Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0354392Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0354687Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0354917Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0355129Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0355338Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0355537Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0355831Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0356043Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0356247Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0356446Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0356646Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0356952Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0357147Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0357349Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0357537Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0357733Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0357945Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0358150Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0358348Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0358538Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0358719Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0358905Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0359044Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0359146Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0359284Z E1204 11:18:46.283000 836028 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0359326Z FAILED [0.9983s] [100%] 2025-12-04T11:45:25.0359328Z 2025-12-04T11:45:25.0359386Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.0359529Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0359579Z Traceback (most recent call last): 2025-12-04T11:45:25.0359743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0359786Z method(*args, **kwargs) 2025-12-04T11:45:25.0359939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0359980Z method(*args, **kwargs) 2025-12-04T11:45:25.0360131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0360169Z with policy(): 2025-12-04T11:45:25.0360321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0360363Z raise RuntimeError(msg) 2025-12-04T11:45:25.0360769Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1075838976. 2025-12-04T11:45:25.0360772Z 2025-12-04T11:45:25.0360850Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0361113Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0361116Z 2025-12-04T11:45:25.0361206Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0361285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0361328Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0361387Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0361943Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0362044Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0362083Z graph_break [] 2025-12-04T11:45:25.0362148Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0362224Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0362717Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0362782Z current_size = base.storage().size() 2025-12-04T11:45:25.0362823Z Autotune Choices Stats: 2025-12-04T11:45:25.0363223Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0363323Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0363374Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0363495Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0363737Z triton_mm_13 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0363969Z triton_mm_11 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0364197Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0364423Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0364659Z triton_mm_17 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0364886Z triton_mm_8 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0365115Z triton_mm_14 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0365337Z triton_mm_18 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0365563Z triton_mm_15 0.0070 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0365787Z triton_mm_12 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0365920Z SingleProcess AUTOTUNE benchmarking takes 0.0830 seconds and 0.8748 seconds precompiling for 18 choices 2025-12-04T11:45:25.0366062Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0366110Z Traceback (most recent call last): 2025-12-04T11:45:25.0366267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0366309Z method(*args, **kwargs) 2025-12-04T11:45:25.0366461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0366514Z method(*args, **kwargs) 2025-12-04T11:45:25.0366666Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0366717Z with policy(): 2025-12-04T11:45:25.0366871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0366927Z raise RuntimeError(msg) 2025-12-04T11:45:25.0367320Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1145044992. 2025-12-04T11:45:25.0367325Z 2025-12-04T11:45:25.0367398Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0367659Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0367662Z 2025-12-04T11:45:25.0367750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0367826Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0367869Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0367926Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0368474Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0368585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0368623Z graph_break [] 2025-12-04T11:45:25.0368688Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0368761Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0369252Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0369301Z current_size = base.storage().size() 2025-12-04T11:45:25.0369341Z Autotune Choices Stats: 2025-12-04T11:45:25.0369712Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0369776Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0369826Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0369947Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0370180Z triton_mm_13 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0370418Z triton_mm_11 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0370644Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0370887Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0371114Z triton_mm_17 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0371339Z triton_mm_8 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0371561Z triton_mm_14 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0371787Z triton_mm_18 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0372014Z triton_mm_15 0.0070 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0372255Z triton_mm_12 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0372384Z SingleProcess AUTOTUNE benchmarking takes 0.0830 seconds and 0.8748 seconds precompiling for 18 choices 2025-12-04T11:45:25.0372460Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0372502Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0372558Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0372660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0373144Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0373183Z graph_break [] 2025-12-04T11:45:25.0373246Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0373351Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0373391Z Autotune Choices Stats: 2025-12-04T11:45:25.0373757Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_30", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.0373820Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0373870Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0373990Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0374237Z triton_mm_30 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0374477Z triton_mm_31 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0374712Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0374935Z triton_mm_34 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0375161Z triton_mm_37 0.0065 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0375388Z triton_mm_28 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0375609Z triton_mm_38 0.0067 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0375832Z triton_mm_29 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0376071Z triton_mm_35 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0376297Z triton_mm_27 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0376428Z SingleProcess AUTOTUNE benchmarking takes 0.1409 seconds and 0.4995 seconds precompiling for 21 choices 2025-12-04T11:45:25.0376481Z =================================== FAILURES =================================== 2025-12-04T11:45:25.0376624Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0376672Z Traceback (most recent call last): 2025-12-04T11:45:25.0376829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0376870Z method(*args, **kwargs) 2025-12-04T11:45:25.0377023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0377065Z method(*args, **kwargs) 2025-12-04T11:45:25.0377216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0377253Z with policy(): 2025-12-04T11:45:25.0377408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0377448Z raise RuntimeError(msg) 2025-12-04T11:45:25.0377838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:25.0377852Z 2025-12-04T11:45:25.0377929Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0378198Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0378201Z 2025-12-04T11:45:25.0378298Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0378372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0378416Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0378472Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0379025Z inductor [('triton_bundler_save_kernel', 168), ('generated_module_cache_miss', 20), ('benchmarking.InductorBenchmarker.benchmark_gpu', 18), ('select_algorithm_num_precompiles', 17), ('select_algorithm_num_precompilation_exceptions', 3), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0379124Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0379161Z graph_break [] 2025-12-04T11:45:25.0379224Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0379299Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0379790Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0379850Z current_size = base.storage().size() 2025-12-04T11:45:25.0379892Z Autotune Choices Stats: 2025-12-04T11:45:25.0380261Z {"num_choices": 18, "num_triton_choices": 17, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0380324Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0380371Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0380490Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0380722Z triton_mm_13 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0380951Z triton_mm_11 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0381176Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0381401Z triton_mm_9 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0381627Z triton_mm_17 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0381859Z triton_mm_8 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0382102Z triton_mm_14 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0382325Z triton_mm_18 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0382551Z triton_mm_15 0.0070 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0382776Z triton_mm_12 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0382904Z SingleProcess AUTOTUNE benchmarking takes 0.0830 seconds and 0.8748 seconds precompiling for 18 choices 2025-12-04T11:45:25.0382979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0383020Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0383077Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0383176Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0383712Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0383750Z graph_break [] 2025-12-04T11:45:25.0383816Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0383890Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0383930Z Autotune Choices Stats: 2025-12-04T11:45:25.0384289Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_30", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.0384352Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0384400Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0384521Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0384753Z triton_mm_30 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0384981Z triton_mm_31 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0385207Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0385449Z triton_mm_34 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0385686Z triton_mm_37 0.0065 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0385926Z triton_mm_28 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0386150Z triton_mm_38 0.0067 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0386376Z triton_mm_29 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0386601Z triton_mm_35 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0386828Z triton_mm_27 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0386957Z SingleProcess AUTOTUNE benchmarking takes 0.1409 seconds and 0.4995 seconds precompiling for 21 choices 2025-12-04T11:45:25.0387044Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0387086Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0387144Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0387242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0387731Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0387769Z graph_break [] 2025-12-04T11:45:25.0387829Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:25.0387904Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0387944Z Autotune Choices Stats: 2025-12-04T11:45:25.0388304Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_49", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.0388366Z AUTOTUNE scaled_mm(257x32, 32x2048, 257x1, 1x2048, 2048) 2025-12-04T11:45:25.0388416Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0388535Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0388766Z triton_mm_49 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0389008Z triton_mm_51 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0389232Z triton_mm_53 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0389475Z triton_mm_50 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0389701Z triton_mm_54 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0389925Z triton_mm_57 0.0066 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0390147Z triton_mm_48 0.0068 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0390370Z triton_mm_58 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0390594Z triton_mm_52 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0390829Z triton_mm_55 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0390958Z SingleProcess AUTOTUNE benchmarking takes 0.1195 seconds and 0.3569 seconds precompiling for 21 choices 2025-12-04T11:45:25.0391152Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4bf8fcbbdea18b91.xml - 2025-12-04T11:45:25.0391216Z =========================== short test summary info ============================ 2025-12-04T11:45:25.0391802Z FAILED [0.9983s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1145044992 and is now 1214251008. 2025-12-04T11:45:25.0391805Z 2025-12-04T11:45:25.0391880Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0392139Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0392144Z 2025-12-04T11:45:25.0392231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0392294Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.0392363Z ================== 1 failed, 187 deselected, 2 rerun in 5.22s ================== 2025-12-04T11:45:25.0392402Z Got exit code 1 2025-12-04T11:45:25.0392607Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0392735Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.0392891Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b5a5c1f5e97b303d.xml 2025-12-04T11:45:25.0392962Z ============================= test session starts ============================== 2025-12-04T11:45:25.0393076Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.0393118Z cachedir: .pytest_cache 2025-12-04T11:45:25.0393322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.0393372Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.0393412Z configfile: pytest.ini 2025-12-04T11:45:25.0393575Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.0393651Z collecting ... collected 188 items / 95 deselected / 93 selected 2025-12-04T11:45:25.0393704Z stepcurrent: skipping 95 already run items. 2025-12-04T11:45:25.0393749Z Running 93 items in this shard 2025-12-04T11:45:25.0393751Z 2025-12-04T11:45:25.0393969Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0998s] [ 1%] 2025-12-04T11:45:25.0394180Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8190s] [ 1%] 2025-12-04T11:45:25.0394480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7064s] [ 1%] 2025-12-04T11:45:25.0394483Z 2025-12-04T11:45:25.0394548Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.0394688Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0394733Z Traceback (most recent call last): 2025-12-04T11:45:25.0394895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0394938Z method(*args, **kwargs) 2025-12-04T11:45:25.0395091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0395138Z method(*args, **kwargs) 2025-12-04T11:45:25.0395289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0395327Z with policy(): 2025-12-04T11:45:25.0395478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0395519Z raise RuntimeError(msg) 2025-12-04T11:45:25.0395905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1035993088. 2025-12-04T11:45:25.0395909Z 2025-12-04T11:45:25.0395982Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0396240Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0396242Z 2025-12-04T11:45:25.0396329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0396402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0396446Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0396502Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0397000Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0397122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0397159Z graph_break [] 2025-12-04T11:45:25.0397222Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0397294Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0397781Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0397829Z current_size = base.storage().size() 2025-12-04T11:45:25.0397871Z Autotune Choices Stats: 2025-12-04T11:45:25.0398247Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0398307Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0398357Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0398490Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0398726Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0398954Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0399181Z triton_mm_2 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0399404Z triton_mm_6 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0399627Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0399850Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0400075Z triton_mm_7 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0400300Z triton_mm_5 0.0080 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0400531Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0400771Z triton_mm_0 0.0116 ms 52.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0400908Z SingleProcess AUTOTUNE benchmarking takes 0.0504 seconds and 0.2300 seconds precompiling for 11 choices 2025-12-04T11:45:25.0401050Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0401095Z Traceback (most recent call last): 2025-12-04T11:45:25.0401250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0401291Z method(*args, **kwargs) 2025-12-04T11:45:25.0401444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0401484Z method(*args, **kwargs) 2025-12-04T11:45:25.0401635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0401672Z with policy(): 2025-12-04T11:45:25.0401825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0401866Z raise RuntimeError(msg) 2025-12-04T11:45:25.0402252Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1035993088 and is now 1084227584. 2025-12-04T11:45:25.0402264Z 2025-12-04T11:45:25.0402339Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0402594Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0402597Z 2025-12-04T11:45:25.0402685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0402759Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0402805Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0402861Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0403387Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0403488Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0403526Z graph_break [] 2025-12-04T11:45:25.0403588Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0403661Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0404145Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0404194Z current_size = base.storage().size() 2025-12-04T11:45:25.0404235Z Autotune Choices Stats: 2025-12-04T11:45:25.0404618Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0404690Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0404740Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0404873Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0405108Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0405334Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0405561Z triton_mm_2 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0405785Z triton_mm_6 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0406006Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0406240Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0406465Z triton_mm_7 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0406688Z triton_mm_5 0.0080 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0406909Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0407131Z triton_mm_0 0.0116 ms 52.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0407260Z SingleProcess AUTOTUNE benchmarking takes 0.0504 seconds and 0.2300 seconds precompiling for 11 choices 2025-12-04T11:45:25.0407335Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0407377Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0407433Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0407532Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0408010Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0408047Z graph_break [] 2025-12-04T11:45:25.0408118Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0408191Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0408244Z Autotune Choices Stats: 2025-12-04T11:45:25.0408621Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.0408679Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0408728Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0408848Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0409082Z triton_mm_19 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0409309Z triton_mm_16 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0409536Z triton_mm_18 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0409761Z triton_mm_12 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0409996Z triton_mm_13 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0410220Z triton_mm_14 0.0076 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0410445Z triton_mm_17 0.0077 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0410667Z triton_mm_15 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0410890Z triton_mm_11 0.0086 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0411114Z triton_mm_10 0.0117 ms 52.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0411242Z SingleProcess AUTOTUNE benchmarking takes 0.0543 seconds and 0.2238 seconds precompiling for 11 choices 2025-12-04T11:45:25.0411295Z =================================== FAILURES =================================== 2025-12-04T11:45:25.0411434Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0411481Z Traceback (most recent call last): 2025-12-04T11:45:25.0411636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0411687Z method(*args, **kwargs) 2025-12-04T11:45:25.0411839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0411891Z method(*args, **kwargs) 2025-12-04T11:45:25.0412041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0412089Z with policy(): 2025-12-04T11:45:25.0412243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0412284Z raise RuntimeError(msg) 2025-12-04T11:45:25.0412671Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0412676Z 2025-12-04T11:45:25.0412749Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0413009Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0413011Z 2025-12-04T11:45:25.0413099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0413172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0413213Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0413387Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0413887Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0413987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0414023Z graph_break [] 2025-12-04T11:45:25.0414085Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0414158Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0414643Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0414691Z current_size = base.storage().size() 2025-12-04T11:45:25.0414731Z Autotune Choices Stats: 2025-12-04T11:45:25.0415103Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0415163Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0415213Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0415332Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0415565Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0415810Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0416063Z triton_mm_2 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0416287Z triton_mm_6 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0416508Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0416735Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0416959Z triton_mm_7 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0417182Z triton_mm_5 0.0080 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0417402Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0417634Z triton_mm_0 0.0116 ms 52.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0417763Z SingleProcess AUTOTUNE benchmarking takes 0.0504 seconds and 0.2300 seconds precompiling for 11 choices 2025-12-04T11:45:25.0417837Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0417879Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0417934Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0418033Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0418514Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0418553Z graph_break [] 2025-12-04T11:45:25.0418613Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0418687Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0418726Z Autotune Choices Stats: 2025-12-04T11:45:25.0419092Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.0419149Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0419199Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0419327Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0419561Z triton_mm_19 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0419807Z triton_mm_16 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0420030Z triton_mm_18 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0420259Z triton_mm_12 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0420481Z triton_mm_13 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0420709Z triton_mm_14 0.0076 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0420931Z triton_mm_17 0.0077 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0421164Z triton_mm_15 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0421386Z triton_mm_11 0.0086 ms 71.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0421611Z triton_mm_10 0.0117 ms 52.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0421738Z SingleProcess AUTOTUNE benchmarking takes 0.0543 seconds and 0.2238 seconds precompiling for 11 choices 2025-12-04T11:45:25.0421812Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0421854Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0421909Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0422009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0422492Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0422531Z graph_break [] 2025-12-04T11:45:25.0422592Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0422665Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0422707Z Autotune Choices Stats: 2025-12-04T11:45:25.0423079Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:25.0423146Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0423195Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0423364Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0423596Z triton_mm_22 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0423824Z triton_mm_29 0.0064 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0424046Z triton_mm_26 0.0067 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0424271Z triton_mm_28 0.0069 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0424495Z triton_mm_23 0.0072 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0424732Z triton_mm_24 0.0081 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0424955Z triton_mm_25 0.0082 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0425181Z triton_mm_27 0.0082 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0425404Z triton_mm_21 0.0089 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0425627Z triton_mm_20 0.0115 ms 55.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0425756Z SingleProcess AUTOTUNE benchmarking takes 0.0673 seconds and 0.2262 seconds precompiling for 11 choices 2025-12-04T11:45:25.0425948Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b5a5c1f5e97b303d.xml - 2025-12-04T11:45:25.0426008Z =========================== short test summary info ============================ 2025-12-04T11:45:25.0426591Z FAILED [0.7064s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0426595Z 2025-12-04T11:45:25.0426668Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0426943Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0426958Z 2025-12-04T11:45:25.0427045Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0427118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.0427188Z ================== 1 failed, 95 deselected, 2 rerun in 3.64s =================== 2025-12-04T11:45:25.0427226Z Got exit code 1 2025-12-04T11:45:25.0427265Z Retrying single test... 2025-12-04T11:45:25.0427412Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8d3df8d6a78ff3dc.xml 2025-12-04T11:45:25.0427470Z ============================= test session starts ============================== 2025-12-04T11:45:25.0427580Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.0427622Z cachedir: .pytest_cache 2025-12-04T11:45:25.0427781Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.0427828Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.0427868Z configfile: pytest.ini 2025-12-04T11:45:25.0428032Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.0428106Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.0428357Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0428419Z Running 1 items in this shard 2025-12-04T11:45:25.0428421Z 2025-12-04T11:45:25.0428634Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1013s] [100%] 2025-12-04T11:45:25.0428844Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8103s] [100%] 2025-12-04T11:45:25.0429033Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6974s] [100%] 2025-12-04T11:45:25.0429035Z 2025-12-04T11:45:25.0429086Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.0429225Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0429271Z Traceback (most recent call last): 2025-12-04T11:45:25.0429429Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0429470Z method(*args, **kwargs) 2025-12-04T11:45:25.0429624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0429664Z method(*args, **kwargs) 2025-12-04T11:45:25.0429816Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0429853Z with policy(): 2025-12-04T11:45:25.0430005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0430046Z raise RuntimeError(msg) 2025-12-04T11:45:25.0430428Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1035993088. 2025-12-04T11:45:25.0430432Z 2025-12-04T11:45:25.0430515Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0430772Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0430784Z 2025-12-04T11:45:25.0430884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0430959Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0431002Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0431059Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0431544Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0431644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0431680Z graph_break [] 2025-12-04T11:45:25.0431741Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0431814Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0432302Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0432360Z current_size = base.storage().size() 2025-12-04T11:45:25.0432400Z Autotune Choices Stats: 2025-12-04T11:45:25.0432769Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0432830Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0432879Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0433000Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0433237Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0433496Z triton_mm_6 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0433720Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0433947Z triton_mm_2 0.0065 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0434169Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0434405Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0434640Z triton_mm_7 0.0077 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0434872Z triton_mm_5 0.0079 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0435096Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0435320Z triton_mm_0 0.0117 ms 52.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0435449Z SingleProcess AUTOTUNE benchmarking takes 0.0465 seconds and 0.2388 seconds precompiling for 11 choices 2025-12-04T11:45:25.0435590Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0435636Z Traceback (most recent call last): 2025-12-04T11:45:25.0435791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0435831Z method(*args, **kwargs) 2025-12-04T11:45:25.0435983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0436036Z method(*args, **kwargs) 2025-12-04T11:45:25.0436188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0436224Z with policy(): 2025-12-04T11:45:25.0436380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0436419Z raise RuntimeError(msg) 2025-12-04T11:45:25.0436808Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1035993088 and is now 1084227584. 2025-12-04T11:45:25.0436810Z 2025-12-04T11:45:25.0436885Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0437143Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0437146Z 2025-12-04T11:45:25.0437234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0437308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0437351Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0437407Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0437893Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0437992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0438028Z graph_break [] 2025-12-04T11:45:25.0438101Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0438175Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0438681Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0438729Z current_size = base.storage().size() 2025-12-04T11:45:25.0438769Z Autotune Choices Stats: 2025-12-04T11:45:25.0439138Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0439197Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0439248Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0439368Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0439603Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0439826Z triton_mm_6 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0440059Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0440285Z triton_mm_2 0.0065 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0440507Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0440727Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0440950Z triton_mm_7 0.0077 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0441169Z triton_mm_5 0.0079 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0441393Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0441615Z triton_mm_0 0.0117 ms 52.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0441745Z SingleProcess AUTOTUNE benchmarking takes 0.0465 seconds and 0.2388 seconds precompiling for 11 choices 2025-12-04T11:45:25.0441829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0441871Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0441938Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0442036Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0442534Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0442572Z graph_break [] 2025-12-04T11:45:25.0442634Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0442706Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0442748Z Autotune Choices Stats: 2025-12-04T11:45:25.0443111Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.0443170Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0443219Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0443361Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0443593Z triton_mm_19 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0443833Z triton_mm_16 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0444061Z triton_mm_18 0.0064 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0444285Z triton_mm_12 0.0065 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0444329Z _scaled_mm 0.0067 ms 92.2% 2025-12-04T11:45:25.0444550Z triton_mm_13 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0444773Z triton_mm_14 0.0076 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0444996Z triton_mm_17 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0445219Z triton_mm_15 0.0079 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0445445Z triton_mm_11 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0445585Z SingleProcess AUTOTUNE benchmarking takes 0.0541 seconds and 0.2221 seconds precompiling for 11 choices 2025-12-04T11:45:25.0445652Z =================================== FAILURES =================================== 2025-12-04T11:45:25.0445790Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0445849Z Traceback (most recent call last): 2025-12-04T11:45:25.0446006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0446047Z method(*args, **kwargs) 2025-12-04T11:45:25.0446199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0446240Z method(*args, **kwargs) 2025-12-04T11:45:25.0446390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0446428Z with policy(): 2025-12-04T11:45:25.0446580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0446622Z raise RuntimeError(msg) 2025-12-04T11:45:25.0447010Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0447013Z 2025-12-04T11:45:25.0447087Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0447357Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0447360Z 2025-12-04T11:45:25.0447448Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0447522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0447565Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0447622Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0448105Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0448204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0448241Z graph_break [] 2025-12-04T11:45:25.0448304Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0448378Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0448871Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0448917Z current_size = base.storage().size() 2025-12-04T11:45:25.0448958Z Autotune Choices Stats: 2025-12-04T11:45:25.0449323Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.0449395Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0449456Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0449576Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0449822Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0450048Z triton_mm_6 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0450274Z triton_mm_8 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0450499Z triton_mm_2 0.0065 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0450723Z triton_mm_3 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0450944Z triton_mm_4 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0451176Z triton_mm_7 0.0077 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0451397Z triton_mm_5 0.0079 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0451698Z triton_mm_1 0.0085 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0451921Z triton_mm_0 0.0117 ms 52.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0452051Z SingleProcess AUTOTUNE benchmarking takes 0.0465 seconds and 0.2388 seconds precompiling for 11 choices 2025-12-04T11:45:25.0452128Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0452170Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0452228Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0452326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0452808Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0452847Z graph_break [] 2025-12-04T11:45:25.0452907Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0452981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0453020Z Autotune Choices Stats: 2025-12-04T11:45:25.0453439Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.0453522Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0453574Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0453692Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0453924Z triton_mm_19 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0454151Z triton_mm_16 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0454376Z triton_mm_18 0.0064 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0454606Z triton_mm_12 0.0065 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0454647Z _scaled_mm 0.0067 ms 92.2% 2025-12-04T11:45:25.0454886Z triton_mm_13 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0455108Z triton_mm_14 0.0076 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0455333Z triton_mm_17 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0455556Z triton_mm_15 0.0079 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0455781Z triton_mm_11 0.0084 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0455911Z SingleProcess AUTOTUNE benchmarking takes 0.0541 seconds and 0.2221 seconds precompiling for 11 choices 2025-12-04T11:45:25.0455984Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0456028Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0456084Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0456184Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0456667Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0456705Z graph_break [] 2025-12-04T11:45:25.0456776Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0456850Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0456905Z Autotune Choices Stats: 2025-12-04T11:45:25.0457277Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:25.0457333Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0457385Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0457504Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0457738Z triton_mm_29 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0457963Z triton_mm_26 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0458260Z triton_mm_22 0.0066 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0458485Z triton_mm_28 0.0068 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0458721Z triton_mm_23 0.0071 ms 83.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0458944Z triton_mm_25 0.0078 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0459169Z triton_mm_24 0.0080 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0459392Z triton_mm_27 0.0082 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0459622Z triton_mm_21 0.0088 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0459848Z triton_mm_20 0.0115 ms 51.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0459977Z SingleProcess AUTOTUNE benchmarking takes 0.0646 seconds and 0.2312 seconds precompiling for 11 choices 2025-12-04T11:45:25.0460167Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8d3df8d6a78ff3dc.xml - 2025-12-04T11:45:25.0460227Z =========================== short test summary info ============================ 2025-12-04T11:45:25.0460816Z FAILED [0.6974s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0460832Z 2025-12-04T11:45:25.0460905Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0461170Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0461173Z 2025-12-04T11:45:25.0461259Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0461323Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.0461391Z ================== 1 failed, 187 deselected, 2 rerun in 3.63s ================== 2025-12-04T11:45:25.0461429Z Got exit code 1 2025-12-04T11:45:25.0461469Z Retrying single test... 2025-12-04T11:45:25.0461615Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6733bb76e51a4652.xml 2025-12-04T11:45:25.0461673Z ============================= test session starts ============================== 2025-12-04T11:45:25.0461783Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.0461824Z cachedir: .pytest_cache 2025-12-04T11:45:25.0461984Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.0462031Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.0462073Z configfile: pytest.ini 2025-12-04T11:45:25.0462243Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.0462318Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.0462571Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0462617Z Running 1 items in this shard 2025-12-04T11:45:25.0462619Z 2025-12-04T11:45:25.0462831Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1058s] [100%] 2025-12-04T11:45:25.0463042Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8131s] [100%] 2025-12-04T11:45:25.0463230Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7214s] [100%] 2025-12-04T11:45:25.0463232Z 2025-12-04T11:45:25.0463309Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.0463448Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0463495Z Traceback (most recent call last): 2025-12-04T11:45:25.0463652Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0463693Z method(*args, **kwargs) 2025-12-04T11:45:25.0463848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0463888Z method(*args, **kwargs) 2025-12-04T11:45:25.0464041Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0464078Z with policy(): 2025-12-04T11:45:25.0464232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0464288Z raise RuntimeError(msg) 2025-12-04T11:45:25.0464676Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1035993088. 2025-12-04T11:45:25.0464690Z 2025-12-04T11:45:25.0464776Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0465033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0465036Z 2025-12-04T11:45:25.0465123Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0465196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0465239Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0465298Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0465784Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0465881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0465918Z graph_break [] 2025-12-04T11:45:25.0465978Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0466065Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0466553Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0466602Z current_size = base.storage().size() 2025-12-04T11:45:25.0466642Z Autotune Choices Stats: 2025-12-04T11:45:25.0467017Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.0467076Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0467126Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0467247Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0467484Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0467715Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0467939Z triton_mm_8 0.0064 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0468178Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0468403Z triton_mm_3 0.0072 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0468647Z triton_mm_4 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0468873Z triton_mm_7 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0469096Z triton_mm_5 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0469319Z triton_mm_1 0.0085 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0469544Z triton_mm_0 0.0115 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0469673Z SingleProcess AUTOTUNE benchmarking takes 0.0510 seconds and 0.2375 seconds precompiling for 11 choices 2025-12-04T11:45:25.0469815Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0469874Z Traceback (most recent call last): 2025-12-04T11:45:25.0470031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0470071Z method(*args, **kwargs) 2025-12-04T11:45:25.0470226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0470267Z method(*args, **kwargs) 2025-12-04T11:45:25.0470421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0470457Z with policy(): 2025-12-04T11:45:25.0470612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0470652Z raise RuntimeError(msg) 2025-12-04T11:45:25.0471042Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1035993088 and is now 1084227584. 2025-12-04T11:45:25.0471045Z 2025-12-04T11:45:25.0471117Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0471374Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0471378Z 2025-12-04T11:45:25.0471464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0471538Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0471580Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0471638Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0472138Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0472247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0472284Z graph_break [] 2025-12-04T11:45:25.0472355Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0472431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0472917Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0472967Z current_size = base.storage().size() 2025-12-04T11:45:25.0473006Z Autotune Choices Stats: 2025-12-04T11:45:25.0473404Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.0473463Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0473513Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0473632Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0473883Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0474110Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0474335Z triton_mm_8 0.0064 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0474565Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0474789Z triton_mm_3 0.0072 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0475012Z triton_mm_4 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0475234Z triton_mm_7 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0475456Z triton_mm_5 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0475678Z triton_mm_1 0.0085 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0475911Z triton_mm_0 0.0115 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0476054Z SingleProcess AUTOTUNE benchmarking takes 0.0510 seconds and 0.2375 seconds precompiling for 11 choices 2025-12-04T11:45:25.0476139Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0476182Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0476238Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0476339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0476827Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0476864Z graph_break [] 2025-12-04T11:45:25.0476925Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0477000Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0477039Z Autotune Choices Stats: 2025-12-04T11:45:25.0477401Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:25.0477469Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0477518Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0477638Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0477869Z triton_mm_16 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0478098Z triton_mm_19 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0478321Z triton_mm_18 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0478554Z triton_mm_12 0.0065 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0478780Z triton_mm_13 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0479003Z triton_mm_14 0.0076 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0479227Z triton_mm_17 0.0077 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0479460Z triton_mm_15 0.0079 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0479692Z triton_mm_11 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0479924Z triton_mm_10 0.0116 ms 54.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0480053Z SingleProcess AUTOTUNE benchmarking takes 0.0549 seconds and 0.2173 seconds precompiling for 11 choices 2025-12-04T11:45:25.0480107Z =================================== FAILURES =================================== 2025-12-04T11:45:25.0480245Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0480293Z Traceback (most recent call last): 2025-12-04T11:45:25.0480450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0480492Z method(*args, **kwargs) 2025-12-04T11:45:25.0480646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0480688Z method(*args, **kwargs) 2025-12-04T11:45:25.0480838Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0480874Z with policy(): 2025-12-04T11:45:25.0481025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0481076Z raise RuntimeError(msg) 2025-12-04T11:45:25.0481460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0481464Z 2025-12-04T11:45:25.0481537Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0481793Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0481796Z 2025-12-04T11:45:25.0481884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0481956Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0482000Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0482055Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0482536Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0482636Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0482672Z graph_break [] 2025-12-04T11:45:25.0482734Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0482806Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0483337Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0483396Z current_size = base.storage().size() 2025-12-04T11:45:25.0483436Z Autotune Choices Stats: 2025-12-04T11:45:25.0483813Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.0483871Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0483921Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0484042Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0484275Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0484500Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0484723Z triton_mm_8 0.0064 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0484951Z triton_mm_2 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0485194Z triton_mm_3 0.0072 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0485413Z triton_mm_4 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0485634Z triton_mm_7 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0485855Z triton_mm_5 0.0079 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0486080Z triton_mm_1 0.0085 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0486303Z triton_mm_0 0.0115 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0486431Z SingleProcess AUTOTUNE benchmarking takes 0.0510 seconds and 0.2375 seconds precompiling for 11 choices 2025-12-04T11:45:25.0486506Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0486547Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0486605Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0486703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0487195Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 6), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0487242Z graph_break [] 2025-12-04T11:45:25.0487312Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0487385Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0487427Z Autotune Choices Stats: 2025-12-04T11:45:25.0487785Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:25.0487843Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0487893Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0488012Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0488242Z triton_mm_16 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0488470Z triton_mm_19 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0488709Z triton_mm_18 0.0064 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0488935Z triton_mm_12 0.0065 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0489160Z triton_mm_13 0.0072 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0489383Z triton_mm_14 0.0076 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0489606Z triton_mm_17 0.0077 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0489830Z triton_mm_15 0.0079 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0490055Z triton_mm_11 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0490279Z triton_mm_10 0.0116 ms 54.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0490409Z SingleProcess AUTOTUNE benchmarking takes 0.0549 seconds and 0.2173 seconds precompiling for 11 choices 2025-12-04T11:45:25.0490494Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0490538Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0490594Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0490705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0491200Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0491240Z graph_break [] 2025-12-04T11:45:25.0491300Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:25.0491373Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0491413Z Autotune Choices Stats: 2025-12-04T11:45:25.0491776Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:25.0491833Z AUTOTUNE scaled_mm(33x1024, 1024x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.0491882Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0491999Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0492232Z triton_mm_29 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0492468Z triton_mm_26 0.0067 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0492695Z triton_mm_23 0.0068 ms 87.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0492925Z triton_mm_22 0.0069 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0493147Z triton_mm_28 0.0069 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0493402Z triton_mm_27 0.0076 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0493627Z triton_mm_24 0.0077 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0493851Z triton_mm_25 0.0082 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.0494074Z triton_mm_21 0.0092 ms 65.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0494116Z _scaled_mm 0.0111 ms 53.8% 2025-12-04T11:45:25.0494257Z SingleProcess AUTOTUNE benchmarking takes 0.0662 seconds and 0.2300 seconds precompiling for 11 choices 2025-12-04T11:45:25.0494461Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6733bb76e51a4652.xml - 2025-12-04T11:45:25.0494521Z =========================== short test summary info ============================ 2025-12-04T11:45:25.0495115Z FAILED [0.7214s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1084227584 and is now 1132462080. 2025-12-04T11:45:25.0495119Z 2025-12-04T11:45:25.0495193Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0495451Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0495454Z 2025-12-04T11:45:25.0495540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0495605Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.0495673Z ================== 1 failed, 187 deselected, 2 rerun in 3.66s ================== 2025-12-04T11:45:25.0495711Z Got exit code 1 2025-12-04T11:45:25.0495915Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.0496053Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.0496196Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5d343086abeeb2e.xml 2025-12-04T11:45:25.0496253Z ============================= test session starts ============================== 2025-12-04T11:45:25.0496364Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.0496405Z cachedir: .pytest_cache 2025-12-04T11:45:25.0496563Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.0496610Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.0496651Z configfile: pytest.ini 2025-12-04T11:45:25.0496815Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.0496889Z collecting ... collected 188 items / 96 deselected / 92 selected 2025-12-04T11:45:25.0496942Z stepcurrent: skipping 96 already run items. 2025-12-04T11:45:25.0496985Z Running 92 items in this shard 2025-12-04T11:45:25.0496988Z 2025-12-04T11:45:25.0497904Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpf_kx330v/yc/cyc6kxtypptzk2fnikrodt4aztwqzfhd4sudsvvqxxoiuf53jz5u.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.0498057Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0498287Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0498455Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0498602Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0498907Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0499040Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0499301Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0499440Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0499695Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0499851Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0500117Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0500262Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0500537Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0500734Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0501050Z E1204 11:19:25.927000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0501778Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpf_kx330v/xe/cxedvxvlos3gemo4tn3u42sugw35sjfm2g2kzzhrrmbc33koddnm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.0501926Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0502138Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0502292Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0502436Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0502729Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0502871Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0503135Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0503303Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0503558Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0503713Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0503982Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0504117Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0504390Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0504597Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0504910Z E1204 11:19:25.953000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0505630Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpf_kx330v/xq/cxqbbei3bvcfcosi7eyccrrtxl4y2o2d5vveoambtjqx7xltq7z3.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.0505778Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0505990Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0506144Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0506291Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0506573Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0506703Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0506967Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0507116Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0507380Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0507536Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0507804Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0507937Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0508212Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0508405Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0508717Z E1204 11:19:25.956000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0509449Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpf_kx330v/yi/cyi6bjjhzh6rc4qdflakznwdwskipruplqvwjvuai6pd6agmgki2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.0509595Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0509807Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0509959Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0510103Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0510393Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0510523Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0510779Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0510918Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0511183Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0511347Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0511625Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0511757Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0512034Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0512226Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0512541Z E1204 11:19:25.958000 850620 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0512592Z ('RERUN', {'yellow': True}) [3.0003s] [ 1%] 2025-12-04T11:45:25.0512913Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:19:27.328000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0513221Z E1204 11:19:27.328000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0513382Z E1204 11:19:27.328000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0513527Z E1204 11:19:27.344000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0513822Z E1204 11:19:27.344000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0513948Z E1204 11:19:27.344000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0514090Z E1204 11:19:27.375000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0514382Z E1204 11:19:27.375000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0514510Z E1204 11:19:27.375000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0514653Z E1204 11:19:27.529000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0514946Z E1204 11:19:27.529000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0515072Z E1204 11:19:27.529000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0515122Z ('RERUN', {'yellow': True}) [1.2971s] [ 1%] 2025-12-04T11:45:25.0515455Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda E1204 11:19:28.426000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0515781Z E1204 11:19:28.426000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0515907Z E1204 11:19:28.426000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0516050Z E1204 11:19:28.441000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0516343Z E1204 11:19:28.441000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0516467Z E1204 11:19:28.441000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0516610Z E1204 11:19:28.472000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0516901Z E1204 11:19:28.472000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0517027Z E1204 11:19:28.472000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0517182Z E1204 11:19:28.608000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0517477Z E1204 11:19:28.608000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.0517602Z E1204 11:19:28.608000 850620 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0517642Z FAILED [1.0700s] [ 1%] 2025-12-04T11:45:25.0517645Z 2025-12-04T11:45:25.0517698Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.0517841Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0517887Z Traceback (most recent call last): 2025-12-04T11:45:25.0518045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0518087Z method(*args, **kwargs) 2025-12-04T11:45:25.0518240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0518281Z method(*args, **kwargs) 2025-12-04T11:45:25.0518432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0518469Z with policy(): 2025-12-04T11:45:25.0518622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0518664Z raise RuntimeError(msg) 2025-12-04T11:45:25.0519055Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1086324736. 2025-12-04T11:45:25.0519059Z 2025-12-04T11:45:25.0519135Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0519403Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0519415Z 2025-12-04T11:45:25.0519504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0519590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0519634Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0519691Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0520244Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0520345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0520382Z graph_break [] 2025-12-04T11:45:25.0520447Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0520523Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0521008Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0521073Z current_size = base.storage().size() 2025-12-04T11:45:25.0521113Z Autotune Choices Stats: 2025-12-04T11:45:25.0521487Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839000154286623, "best_triton_pos": 0} 2025-12-04T11:45:25.0521554Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0521605Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0521727Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0521966Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0522196Z triton_mm_19 0.0074 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0522423Z triton_mm_32 0.0078 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0522467Z _scaled_mm 0.0080 ms 85.9% 2025-12-04T11:45:25.0522690Z triton_mm_27 0.0090 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0522912Z triton_mm_15 0.0091 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0523147Z triton_mm_20 0.0092 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0523434Z triton_mm_13 0.0097 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0523659Z triton_mm_23 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0523887Z triton_mm_8 0.0102 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0524018Z SingleProcess AUTOTUNE benchmarking takes 0.1315 seconds and 0.7275 seconds precompiling for 31 choices 2025-12-04T11:45:25.0524161Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0524207Z Traceback (most recent call last): 2025-12-04T11:45:25.0524365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0524405Z method(*args, **kwargs) 2025-12-04T11:45:25.0524558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0524597Z method(*args, **kwargs) 2025-12-04T11:45:25.0524748Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0524799Z with policy(): 2025-12-04T11:45:25.0524954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0524993Z raise RuntimeError(msg) 2025-12-04T11:45:25.0525386Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1086324736 and is now 1184890880. 2025-12-04T11:45:25.0525388Z 2025-12-04T11:45:25.0525462Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0525721Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0525724Z 2025-12-04T11:45:25.0525812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0525887Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0525931Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0525989Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0526542Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0526640Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0526679Z graph_break [] 2025-12-04T11:45:25.0526742Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0526818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0527317Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0527390Z current_size = base.storage().size() 2025-12-04T11:45:25.0527430Z Autotune Choices Stats: 2025-12-04T11:45:25.0527799Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839000154286623, "best_triton_pos": 0} 2025-12-04T11:45:25.0527867Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0527919Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0528040Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0528278Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0528504Z triton_mm_19 0.0074 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0528729Z triton_mm_32 0.0078 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0528782Z _scaled_mm 0.0080 ms 85.9% 2025-12-04T11:45:25.0529005Z triton_mm_27 0.0090 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0529230Z triton_mm_15 0.0091 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0529453Z triton_mm_20 0.0092 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0529675Z triton_mm_13 0.0097 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0529900Z triton_mm_23 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0530129Z triton_mm_8 0.0102 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0530260Z SingleProcess AUTOTUNE benchmarking takes 0.1315 seconds and 0.7275 seconds precompiling for 31 choices 2025-12-04T11:45:25.0530333Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0530376Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0530433Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0530534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0531048Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0531096Z graph_break [] 2025-12-04T11:45:25.0531159Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0531232Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0531272Z Autotune Choices Stats: 2025-12-04T11:45:25.0531638Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-12-04T11:45:25.0531703Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0531752Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0531873Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0532110Z triton_mm_65 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0532340Z triton_mm_53 0.0075 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0532578Z triton_mm_66 0.0076 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0532623Z _scaled_mm 0.0080 ms 82.0% 2025-12-04T11:45:25.0532849Z triton_mm_61 0.0082 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0533071Z triton_mm_49 0.0092 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0533329Z triton_mm_54 0.0092 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0533556Z triton_mm_42 0.0097 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0533783Z triton_mm_47 0.0097 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0534009Z triton_mm_57 0.0099 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0534140Z SingleProcess AUTOTUNE benchmarking takes 0.2137 seconds and 0.4329 seconds precompiling for 35 choices 2025-12-04T11:45:25.0534193Z =================================== FAILURES =================================== 2025-12-04T11:45:25.0534361Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.0534408Z Traceback (most recent call last): 2025-12-04T11:45:25.0534578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0534619Z method(*args, **kwargs) 2025-12-04T11:45:25.0534789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.0534830Z method(*args, **kwargs) 2025-12-04T11:45:25.0534980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.0535020Z with policy(): 2025-12-04T11:45:25.0535173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.0535216Z raise RuntimeError(msg) 2025-12-04T11:45:25.0535604Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.0535607Z 2025-12-04T11:45:25.0535681Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0535940Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0535942Z 2025-12-04T11:45:25.0536032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0536119Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0536161Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0536218Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0536767Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0536867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0536903Z graph_break [] 2025-12-04T11:45:25.0536967Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0537041Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0537528Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.0537575Z current_size = base.storage().size() 2025-12-04T11:45:25.0537617Z Autotune Choices Stats: 2025-12-04T11:45:25.0537984Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839000154286623, "best_triton_pos": 0} 2025-12-04T11:45:25.0538049Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0538099Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0538229Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0538466Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0538711Z triton_mm_19 0.0074 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0538939Z triton_mm_32 0.0078 ms 88.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0538980Z _scaled_mm 0.0080 ms 85.9% 2025-12-04T11:45:25.0539205Z triton_mm_27 0.0090 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0539428Z triton_mm_15 0.0091 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0539654Z triton_mm_20 0.0092 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0539878Z triton_mm_13 0.0097 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0540112Z triton_mm_23 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0540344Z triton_mm_8 0.0102 ms 67.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0540473Z SingleProcess AUTOTUNE benchmarking takes 0.1315 seconds and 0.7275 seconds precompiling for 31 choices 2025-12-04T11:45:25.0540547Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0540588Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0540649Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0540748Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0541233Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0541271Z graph_break [] 2025-12-04T11:45:25.0541333Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0541408Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0541447Z Autotune Choices Stats: 2025-12-04T11:45:25.0541811Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-12-04T11:45:25.0541884Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0541935Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0542066Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0542312Z triton_mm_65 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0542537Z triton_mm_53 0.0075 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0542765Z triton_mm_66 0.0076 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0542808Z _scaled_mm 0.0080 ms 82.0% 2025-12-04T11:45:25.0543032Z triton_mm_61 0.0082 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0543279Z triton_mm_49 0.0092 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0543504Z triton_mm_54 0.0092 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0543758Z triton_mm_42 0.0097 ms 67.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0543983Z triton_mm_47 0.0097 ms 67.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0544208Z triton_mm_57 0.0099 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0544338Z SingleProcess AUTOTUNE benchmarking takes 0.2137 seconds and 0.4329 seconds precompiling for 35 choices 2025-12-04T11:45:25.0544414Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.0544456Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.0544512Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.0544613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.0545100Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.0545138Z graph_break [] 2025-12-04T11:45:25.0545199Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.0545273Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.0545315Z Autotune Choices Stats: 2025-12-04T11:45:25.0545688Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839999929070473, "best_triton_pos": 0} 2025-12-04T11:45:25.0545762Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.0545813Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.0545947Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.0546180Z triton_mm_99 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0546405Z triton_mm_87 0.0075 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0546634Z triton_mm_100 0.0078 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0546677Z _scaled_mm 0.0081 ms 84.2% 2025-12-04T11:45:25.0546900Z triton_mm_95 0.0082 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0547124Z triton_mm_83 0.0091 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0547357Z triton_mm_88 0.0093 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0547580Z triton_mm_81 0.0095 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0547803Z triton_mm_91 0.0100 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.0548033Z triton_mm_76 0.0102 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.0548163Z SingleProcess AUTOTUNE benchmarking takes 0.1948 seconds and 0.2731 seconds precompiling for 35 choices 2025-12-04T11:45:25.0548359Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d5d343086abeeb2e.xml - 2025-12-04T11:45:25.0548420Z =========================== short test summary info ============================ 2025-12-04T11:45:25.0549012Z FAILED [1.0700s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.0549014Z 2025-12-04T11:45:25.0549091Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.0549351Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0549364Z 2025-12-04T11:45:25.0549452Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.0549528Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.0549594Z ================== 1 failed, 96 deselected, 2 rerun in 5.39s =================== 2025-12-04T11:45:25.0549633Z Got exit code 1 2025-12-04T11:45:25.0549683Z Retrying single test... 2025-12-04T11:45:25.0549828Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26279b2ffe9336fe.xml 2025-12-04T11:45:25.0549884Z ============================= test session starts ============================== 2025-12-04T11:45:25.0549996Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.0550036Z cachedir: .pytest_cache 2025-12-04T11:45:25.0550197Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.0550243Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.0550286Z configfile: pytest.ini 2025-12-04T11:45:25.0550449Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.0550524Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.0550777Z stepcurrent: skipping 96 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.0550820Z Running 1 items in this shard 2025-12-04T11:45:25.0550822Z 2025-12-04T11:45:25.0551164Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:38.503558243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0551168Z 2025-12-04T11:45:25.0551484Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0551787Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0551920Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0552406Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0552664Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0552895Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0553103Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0553356Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0553650Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0553910Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0554202Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0554435Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0554726Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0554961Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0555253Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0555496Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0555785Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0556016Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0556307Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0556539Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0556834Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0557031Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0557263Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0557555Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0557763Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0557993Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0558311Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0558542Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0558832Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0559053Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0559261Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0559465Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0559675Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0559854Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0560033Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0560561Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpe2y4guvz/yc/cyc6kxtypptzk2fnikrodt4aztwqzfhd4sudsvvqxxoiuf53jz5u.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.0560714Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0560932Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0561087Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0561232Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0561519Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0561651Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0561911Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0562063Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0562316Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0562491Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0562760Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0562895Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0563169Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0563383Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0563700Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0563996Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0564144Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0564626Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0564882Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0565109Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0565317Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0565519Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0565810Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0566045Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0566350Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0566587Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0566903Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0567133Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0567425Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0567655Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0567949Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0568181Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0568472Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0568715Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0569007Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0569204Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0569433Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0569725Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0569920Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0570151Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0570443Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0570675Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0570975Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0571218Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0571425Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0571625Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0571836Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0572004Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0572184Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0572287Z E1204 11:19:38.544000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.0572443Z [W1204 11:19:38.867622590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0572456Z 2025-12-04T11:45:25.0574648Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0574947Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0575083Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0575568Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0575821Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0576048Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0576254Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0576454Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0576746Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0577004Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0577329Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0577560Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0577852Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0578084Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0578374Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0578605Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0578893Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0579146Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0579434Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0579666Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0579955Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0580154Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0580387Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0580678Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0580874Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0581104Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0581404Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0581654Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0581944Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0582165Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0582373Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0582574Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0582784Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0582951Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0583127Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0583715Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpe2y4guvz/xe/cxedvxvlos3gemo4tn3u42sugw35sjfm2g2kzzhrrmbc33koddnm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.0583864Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0584080Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0584236Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0584384Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0584674Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0584806Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0585063Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0585204Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0585459Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0585626Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0585906Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0586056Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0586331Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0586526Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0586842Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0587137Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0587267Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0587742Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0588008Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0588234Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0588440Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0588641Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0588934Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0589171Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0589462Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0589694Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0589994Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0590250Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0590540Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0590770Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0591060Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0591291Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0591584Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0591814Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0592115Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0592311Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0592543Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0592832Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0593026Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0593291Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0593584Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0593816Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0594107Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0594356Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0594574Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0594786Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0594998Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0595163Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0595341Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0595445Z E1204 11:19:38.601000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.0595600Z [W1204 11:19:38.871147638 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0595604Z 2025-12-04T11:45:25.0595915Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0596221Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0596351Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0596829Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0597082Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0597309Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0597513Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0597714Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0598004Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0598238Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0598541Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0598782Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0599080Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0599310Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0599601Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0599831Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0600121Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0600351Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0600653Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0600885Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0601175Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0601370Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0601601Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0601891Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0602088Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0602318Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0602609Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0602851Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0603151Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0603418Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0603624Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0603825Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0604034Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0604200Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0604376Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0604900Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpe2y4guvz/xq/cxqbbei3bvcfcosi7eyccrrtxl4y2o2d5vveoambtjqx7xltq7z3.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.0605060Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0605280Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0605437Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0605581Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0605865Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0605998Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0606256Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0606394Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0606648Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0606803Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0607084Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0607233Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0607520Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0607712Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0608026Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0608320Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0608451Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0608929Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0609198Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0609423Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0609632Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0609831Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0610122Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0610357Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0610650Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0610881Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0611171Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0611413Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0611712Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0611956Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0612246Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0612478Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0612768Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0612999Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0613320Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0613530Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0613759Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0614052Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0614249Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0614483Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0614773Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0615007Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0615300Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0615520Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0615739Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0615939Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0616173Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0616338Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0616518Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0616620Z E1204 11:19:38.604000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.0616776Z [W1204 11:19:38.874115994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0616779Z 2025-12-04T11:45:25.0617087Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0617380Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0617510Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0617995Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0618248Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0618472Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0618679Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0618881Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0619172Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0619406Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0619696Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0619938Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0620236Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0620475Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0620767Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0621001Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0621292Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0621524Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0621815Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0622056Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0622345Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0622544Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0622773Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0623063Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0623298Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0623530Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0623821Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0624050Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0624357Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0624587Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0624813Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0625013Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0625223Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0625389Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0625568Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0626097Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpe2y4guvz/yi/cyi6bjjhzh6rc4qdflakznwdwskipruplqvwjvuai6pd6agmgki2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.0626253Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.0626470Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.0626624Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.0626771Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.0627056Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.0627188Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.0627445Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.0627583Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.0627838Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.0627994Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.0628261Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.0628397Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.0628681Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.0628883Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.0629205Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0629498Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0629627Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0630105Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0630358Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0630591Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0630797Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0630998Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0631290Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0631524Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0631816Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0632047Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0632339Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0632571Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0632869Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0633111Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0633445Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0633675Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0633968Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0634197Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0634489Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0634685Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0634930Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0635221Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0635415Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0635645Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0635934Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0636167Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0636457Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0636677Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0636882Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0637084Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.0637306Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.0637483Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.0637671Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.0637772Z E1204 11:19:38.607000 856451 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.0637824Z ('RERUN', {'yellow': True}) [3.2475s] [100%] 2025-12-04T11:45:25.0638161Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:39.161718455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0638165Z 2025-12-04T11:45:25.0638310Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0638605Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0638897Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0639037Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0639517Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0639770Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0639997Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0640202Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0640403Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0640696Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0640930Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0641219Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0641463Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0641776Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0642016Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0642307Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0642541Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0642832Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0643050Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0643279Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0643486Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0643693Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0643896Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0644128Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0644419Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0644615Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0644848Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0645139Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0645357Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0645552Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0645784Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0646003Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0646211Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0646406Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0646625Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0646830Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0647025Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0647219Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0647451Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0647755Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0647986Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0648281Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0648502Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0648707Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0648903Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0649110Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0649310Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0649541Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0649832Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0650074Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0650384Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0650615Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0650907Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0651138Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0651429Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0651659Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0651948Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0652190Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0652480Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0652712Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0653002Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0653234Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0653548Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0653778Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0654068Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0654311Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0654601Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0654859Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0655152Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0655373Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0655574Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0655770Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0656062Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0656292Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0656594Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0656824Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0657113Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0657343Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0657636Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0657867Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0658157Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0658387Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0658693Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0658890Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0659106Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0659301Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0659506Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0659715Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0659947Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0660238Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0660432Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0660636Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0660832Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0661025Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0661257Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0661547Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0661778Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0662071Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0662267Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0662475Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0662677Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0662922Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0663223Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0663497Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0663699Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0663898Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0664099Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0664395Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0664628Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0664920Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0665167Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0665461Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0665695Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0665987Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0666222Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0666515Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0666715Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0666912Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0667135Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0667348Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0667561Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0667770Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0668063Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0668298Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0668589Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0668823Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0669117Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0669359Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0669650Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0669883Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0670174Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0670395Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0670597Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0670795Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0670988Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0671198Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0671399Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0671700Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0671940Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0672141Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0672339Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0672543Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0672833Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0673066Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0673390Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0673644Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0673937Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0674170Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0674460Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0674693Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0674986Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0675220Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0675511Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0675744Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0676051Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0676295Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0676598Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0676830Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0677124Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0677355Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0677648Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0677844Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0678053Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0678290Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0678587Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0678818Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0679111Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0679344Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0679637Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0679869Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0680161Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0680403Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0680716Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0680913Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0681144Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0681437Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0681670Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0681963Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0682176Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0682388Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0682586Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0682788Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0683080Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0683322Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0683525Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0683722Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0683923Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0684214Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0684435Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0684649Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0684858Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0685065Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0685215Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0685412Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0685633Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0685842Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0686037Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0686258Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0686474Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0686670Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0686890Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0687098Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0687295Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0687516Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0687722Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0687919Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0688115Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0688327Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0688529Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0688737Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0688946Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0689257Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0689469Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0689672Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0689869Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0690061Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0690258Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0690470Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0690682Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0690878Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0691078Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0691371Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0691584Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0691789Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0691986Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0692187Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0692477Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0692690Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0692900Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0693108Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0693339Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0693630Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0693828Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0694029Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0694219Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0694414Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0694628Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0694845Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0695042Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0695232Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0695413Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0695584Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0695710Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0695815Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0695940Z E1204 11:19:39.900000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0696100Z [W1204 11:19:39.182761563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0696102Z 2025-12-04T11:45:25.0696246Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0696541Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0696849Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0696979Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0697480Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0697733Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0697961Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0698169Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0698369Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0698661Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0698904Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0699195Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0699428Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0699719Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0699952Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0700241Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0700474Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0700766Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0700985Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0701200Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0701396Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0701622Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0701821Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0702051Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0702341Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0702536Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0702767Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0703060Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0703325Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0703518Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0703739Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0703942Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0704140Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0704335Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0704553Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0704757Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0704955Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0705151Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0705397Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0705691Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0705951Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0706242Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0706461Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0706667Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0706864Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0707069Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0707268Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0707515Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0707806Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0708039Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0708329Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0708562Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0708851Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0709083Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0709371Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0709603Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0709905Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0710155Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0710444Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0710674Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0710965Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0711195Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0711485Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0711715Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0712017Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0712248Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0712537Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0712767Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0713057Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0713303Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0713508Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0713703Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0713994Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0714240Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0714559Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0714790Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0715079Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0715311Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0715600Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0715832Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0716120Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0716364Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0716656Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0716852Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0717047Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0717242Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0717450Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0717648Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0717879Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0718168Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0718375Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0718570Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0718785Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0718980Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0719208Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0719499Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0719729Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0720021Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0720216Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0720432Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0720633Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0720867Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0721161Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0721381Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0721583Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0721781Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0721983Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0722275Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0722507Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0722822Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0723078Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0723390Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0723625Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0723917Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0724151Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0724445Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0724643Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0724857Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0725077Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0725282Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0725481Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0725682Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0725979Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0726212Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0726505Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0726737Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0727042Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0727285Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0727594Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0727825Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0728121Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0728345Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0728548Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0728746Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0728936Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0729157Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0729357Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0729652Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0729871Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0730073Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0730275Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0730478Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0730770Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0731002Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0731308Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0731548Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0731850Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0732083Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0732375Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0732611Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0732903Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0733134Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0733474Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0733705Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0733998Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0734229Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0734522Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0734752Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0735049Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0735282Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0735573Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0735785Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0735993Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0736239Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0736529Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0736763Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0737055Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0737290Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0737582Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0737831Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0738124Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0738358Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0738648Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0738845Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0739076Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0739370Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0739603Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0739894Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0740119Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0740330Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0740539Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0740739Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0741032Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0741244Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0741446Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0741645Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0741844Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0742149Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0742371Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0742574Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0742771Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0742962Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0743110Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0743345Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0743566Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0743771Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0743967Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0744200Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0744405Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0744623Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0744842Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0745048Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0745243Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0745463Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0745669Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0745865Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0746059Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0746289Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0746490Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0746689Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0746889Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0747182Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0747396Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0747597Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0747795Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0747985Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0748180Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0748403Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0748614Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0748822Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0749022Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0749319Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0749532Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0749734Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0749931Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0750131Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0750434Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0750648Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0750852Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0751048Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0751248Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0751542Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0751738Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0751939Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0752127Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0752322Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0752544Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0752760Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0752974Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0753164Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0753372Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0753543Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0753670Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0753772Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0753899Z E1204 11:19:39.916000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0754053Z [W1204 11:19:39.213770895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0754055Z 2025-12-04T11:45:25.0754198Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0754505Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0754801Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0754931Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0755417Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0755672Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0755897Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0756104Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0756302Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0756608Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0756842Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0757160Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0757394Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0757686Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0757917Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0758207Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0758438Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0758738Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0758958Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0759165Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0759361Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0759568Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0759768Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0760001Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0760294Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0760487Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0760717Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0761018Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0761246Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0761453Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0761671Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0761878Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0762074Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0762268Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0762485Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0762689Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0762894Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0763089Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0763352Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0763642Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0763873Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0764169Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0764389Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0764593Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0764788Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0764997Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0765211Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0765456Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0765758Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0765989Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0766282Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0766515Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0766807Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0767037Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0767339Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0767569Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0767861Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0768091Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0768382Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0768613Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0768908Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0769138Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0769427Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0769667Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0769982Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0770213Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0770504Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0770736Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0771028Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0771247Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0771447Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0771653Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0771943Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0772176Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0772466Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0772697Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0772987Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0773220Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0773551Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0773781Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0774083Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0774323Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0774625Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0774820Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0775016Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0775212Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0775419Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0775619Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0775850Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0776155Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0776350Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0776545Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0776742Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0776936Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0777168Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0777457Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0777688Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0777979Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0778189Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0778405Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0778615Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0778848Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0779141Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0779362Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0779563Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0779763Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0779963Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0780271Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0780504Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0780800Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0781032Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0781325Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0781558Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0781852Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0782084Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0782380Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0782590Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0782797Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0783029Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0783230Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0783467Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0783668Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0783963Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0784194Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0784486Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0784734Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0785031Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0785266Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0785558Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0785791Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0786083Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0786303Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0786503Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0786703Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0786913Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0787137Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0787352Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0787643Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0787865Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0788065Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0788265Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0788466Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0788757Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0789002Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0789297Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0789531Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0789821Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0790053Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0790345Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0790576Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0790868Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0791110Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0791402Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0791656Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0791950Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0792184Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0792474Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0792707Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0792998Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0793242Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0793550Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0793747Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0793944Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0794176Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0794470Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0794701Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0794995Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0795227Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0795532Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0795777Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0796081Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0796316Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0796610Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0796806Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0797041Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0797331Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0797575Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0797868Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0798083Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0798283Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0798481Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0798690Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0798983Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0799198Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0799398Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0799597Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0799806Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0800109Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0800339Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0800540Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0800740Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0800933Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0801080Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0801276Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0801499Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0801724Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0801920Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0802141Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0802346Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0802541Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0802761Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0802967Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0803164Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0803412Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0803617Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0803813Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0804031Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0804255Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0804469Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0804667Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0804868Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0805160Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0805375Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0805575Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0805772Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0805978Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0806172Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0806386Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0806586Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0806783Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0806988Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0807280Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0807497Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0807698Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0807900Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0808116Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0808420Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0808665Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0808865Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0809063Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0809263Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0809558Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0809754Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0809955Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0810157Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0810350Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0810566Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0810770Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0810968Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0811157Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0811339Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0811510Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0811637Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0811738Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0811865Z E1204 11:19:39.947000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0812022Z [W1204 11:19:40.347267581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0812024Z 2025-12-04T11:45:25.0812177Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0812479Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0812783Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0812914Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0813407Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0813662Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0813887Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0814108Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0814309Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0814599Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0814834Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0815126Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0815359Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0815650Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0815882Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0816174Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0816419Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0816712Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0816962Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0817166Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0817363Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0817570Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0817769Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0818000Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0818291Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0818496Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0818729Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0819022Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0819240Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0819434Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0819652Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0819856Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0820053Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0820246Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0820464Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0820679Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0820886Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0821092Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0821325Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0821615Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0821846Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0822138Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0822354Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0822558Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0822762Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0822968Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0823170Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0823428Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0823719Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0823950Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0824242Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0824471Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0824761Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0825006Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0825311Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0825554Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0825844Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0826076Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0826367Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0826598Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0826891Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0827134Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0827423Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0827654Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0827945Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0828175Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0828466Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0828699Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0828988Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0829207Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0829416Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0829621Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0829920Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0830153Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0830448Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0830678Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0830969Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0831198Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0831499Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0831730Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0832021Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0832251Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0832544Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0832741Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0832937Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0833134Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0833353Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0833554Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0833798Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0834120Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0834315Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0834508Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0834705Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0834900Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0835131Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0835422Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0835665Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0835956Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0836152Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0836358Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0836558Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0836793Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0837089Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0837310Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0837511Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0837709Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0837918Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0838218Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0838462Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0838757Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0838990Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0839286Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0839520Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0839813Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0840057Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0840348Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0840549Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0840744Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0840964Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0841168Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0841366Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0841568Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0841864Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0842097Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0842400Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0842652Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0842943Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0843176Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0843530Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0843764Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0844062Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0844281Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0844498Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0844695Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0844892Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0845100Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0845301Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0845595Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0845816Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0846018Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0846219Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0846422Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0846727Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0846986Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0847279Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0847513Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0847807Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0848039Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0848332Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0848566Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0848869Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0849102Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0849393Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0849627Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0849920Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0850153Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0850445Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0850677Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0850988Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0851230Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0851531Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0851727Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0851925Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0852158Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0852451Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0852683Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0852974Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0853222Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0853537Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0853768Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0854061Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0854294Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0854586Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0854786Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0855022Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0855334Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0855568Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0855886Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0856099Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0856303Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0856501Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0856703Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0856996Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0857208Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0857422Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0857619Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0857822Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0858114Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0858334Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0858537Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0858735Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0858926Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0859072Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0859268Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0859498Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0859706Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0859924Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0860147Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0860352Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0860549Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0860769Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0860974Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0861168Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0861389Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0861606Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0861803Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0861998Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0862212Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0862412Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0862612Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0862813Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0863106Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0863339Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0863543Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0863758Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0863965Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0864173Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0864387Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0864588Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0864785Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0864986Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0865281Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0865492Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0865714Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0865911Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0866113Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0866404Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0866619Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0866820Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0867019Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0867219Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0867512Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0867708Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0867918Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0868120Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0868324Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0868539Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0868748Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0868946Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0869135Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0869314Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0869485Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0869620Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0869724Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0869850Z E1204 11:19:40.080000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0869903Z ('RERUN', {'yellow': True}) [1.1826s] [100%] 2025-12-04T11:45:25.0870235Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:40.133423348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0870239Z 2025-12-04T11:45:25.0870382Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0870673Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0870971Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0871102Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0871581Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0871845Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0872069Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0872294Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0872494Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0872784Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0873019Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0873340Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0873573Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0873865Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0874108Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0874398Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0874629Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0874917Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0875137Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0875341Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0875541Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0875749Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0875949Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0876191Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0876484Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0876702Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0876934Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0877225Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0877442Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0877637Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0877856Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0878061Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0878266Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0878461Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0878679Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0878883Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0879078Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0879271Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0879504Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0879795Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0880028Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0880321Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0880549Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0880764Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0880974Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0881180Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0881378Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0881609Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0881901Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0882130Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0882423Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0882663Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0884445Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0884680Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0884971Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0885206Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0885496Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0885729Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0886021Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0886274Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0886562Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0886821Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0887111Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0887342Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0887632Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0887862Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0888151Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0888396Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0888690Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0888910Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0889110Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0889306Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0889598Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0889831Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0890121Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0890351Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0890653Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0890885Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0891195Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0891427Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0891718Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0891948Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0892239Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0892434Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0892639Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0892835Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0893042Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0893243Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0893510Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0893802Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0893998Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0894192Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0894387Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0894581Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0894813Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0895116Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0895374Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0895664Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0895860Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0896067Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0896268Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0896502Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0896793Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0897027Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0897229Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0897428Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0897633Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0897925Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0898159Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0898451Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0898686Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0898978Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0899220Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0899512Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0899773Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0900067Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0900265Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0900462Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0900683Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0900883Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0901082Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0901293Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0901584Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0901818Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0902111Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0902346Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0902637Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0902870Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0903160Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0903431Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0903736Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0903967Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0904181Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0904380Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0904575Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0904784Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0904985Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0905279Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0905615Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0905833Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0906031Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0906231Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0906522Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0906757Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0907051Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0907284Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0907575Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0907806Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0908109Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0908352Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0908654Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0908888Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0909182Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0909415Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0909706Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0909938Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0910239Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0910472Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0910765Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0910996Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0911289Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0911486Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0911687Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0911919Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0912211Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0912453Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0912767Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0912999Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0913323Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0913557Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0913851Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0914084Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0914375Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0914587Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0914821Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0915112Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0915347Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0915641Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0915855Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0916060Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0916259Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0916459Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0916768Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0916992Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0917205Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0917402Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0917604Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0917897Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0918119Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0918321Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0918522Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0918724Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0918873Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0919069Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0919291Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0919496Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0919692Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0919916Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0920121Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0920318Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0920539Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0920745Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0920949Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0921176Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0921392Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0921588Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0921783Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0921996Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0922197Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0922394Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0922592Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0922899Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0923112Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0923333Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0923530Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0923721Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0923918Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0924129Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0924332Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0924529Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0924729Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0925039Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0925263Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0925476Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0925674Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0925875Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0926166Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0926380Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0926581Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0926778Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0926992Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0927286Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0927484Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0927684Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0927872Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0928067Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0928281Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0928486Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0928683Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0928872Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0929063Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0929234Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0929371Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0929484Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0929610Z E1204 11:19:40.866000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0929766Z [W1204 11:19:40.147899974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0929770Z 2025-12-04T11:45:25.0929913Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0930207Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0930505Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0930635Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0931118Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0931387Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0931617Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0931823Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0932024Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0932316Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0932551Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0932841Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0933072Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0933411Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0933653Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0933964Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0934198Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0934488Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0934708Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0934914Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0935110Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0935413Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0935612Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0935844Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0936135Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0936331Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0936563Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0936853Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0937073Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0937267Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0937484Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0937700Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0937905Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0938107Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0938325Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0938530Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0938728Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0938923Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0939153Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0939443Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0939684Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0939976Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0940195Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0940398Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0940593Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0940801Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0940999Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0941230Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0941520Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0941751Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0942050Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0942300Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0942588Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0942819Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0943110Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0943368Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0943659Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0943891Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0944198Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0944429Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0944719Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0944948Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0945239Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0945471Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0945763Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0945993Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0946294Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0946525Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0946846Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0947065Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0947267Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0947461Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.0947754Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0947984Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0948274Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0948516Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0948806Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0949037Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0949325Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0949558Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0949848Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0950082Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0950372Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0950578Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0950774Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0950978Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0951195Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0951393Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0951624Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0951913Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0952110Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0952305Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0952499Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0952705Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0952934Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0953226Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0953497Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0953788Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0953982Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0954189Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0954390Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0954625Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0954933Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0955165Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0955379Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0955578Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0955779Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0956071Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0956304Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0956598Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0956831Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0957137Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0957370Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0957661Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0957893Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0958185Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0958383Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0958579Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0958799Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0959002Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0959211Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0959421Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0959720Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0959952Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0960245Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0960477Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0960771Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0961001Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0961307Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0961540Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0961834Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0962053Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0962257Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0962458Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0962650Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0962861Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0963060Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0963382Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0963615Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0963836Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0964046Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0964245Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0964541Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0964852Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0965147Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0965378Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0965686Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0965918Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0966213Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0966444Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0966736Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0966968Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0967261Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0967492Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0967784Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0968024Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0968339Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0968572Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0968863Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0969098Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0969389Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0969586Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0969782Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0970025Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0970315Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0970550Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0970846Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0971078Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0971370Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0971602Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0971894Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0972127Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0972426Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0972632Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0972873Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0973168Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0973432Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0973724Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0973938Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0974139Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0974352Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0974551Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0974845Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0975057Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0975259Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0975460Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0975658Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0975952Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0976170Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0976372Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0976580Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0976784Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0976944Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.0977140Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0977359Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0977567Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0977762Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0977984Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0978189Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0978382Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0978613Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0978818Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.0979012Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0979232Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.0979436Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0979634Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0979829Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0980042Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0980242Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0980440Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0980657Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0980960Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0981183Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0981383Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0981581Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0981774Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.0981971Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0982184Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0982384Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0982592Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0982791Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0983084Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0983316Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0983516Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0983714Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0983915Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0984211Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0984423Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.0984625Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.0984835Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.0985047Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0985352Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0985548Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.0985750Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.0985937Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.0986133Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.0986347Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.0986552Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.0986762Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.0986951Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.0987132Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.0987301Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.0987428Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.0987530Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.0987657Z E1204 11:19:40.881000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.0987811Z [W1204 11:19:40.178647409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.0987814Z 2025-12-04T11:45:25.0987959Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.0988252Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.0988550Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.0988681Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.0989170Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.0989441Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.0989666Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.0989873Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.0990070Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0990363Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0990597Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0990900Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0991133Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0991424Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0991655Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0991944Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0992176Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0992467Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0992685Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0992890Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0993098Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0993334Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0993561Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0993793Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0994084Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0994279Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0994510Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0994800Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0995019Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0995236Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0995457Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0995662Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0995856Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0996048Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0996269Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0996472Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0996666Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0996860Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.0997091Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0997394Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0997636Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0997935Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0998153Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.0998358Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.0998554Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.0998763Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.0998962Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.0999192Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.0999496Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.0999728Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1000022Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1000252Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1000545Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1000774Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1001065Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1001295Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1001586Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1001825Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1002139Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1002370Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1002661Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1002892Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1003182Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1003445Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1003738Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1003982Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1004273Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1004504Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1004797Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1005016Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1005217Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1005413Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1005703Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1005935Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1006237Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1006493Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1006787Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1007022Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1007314Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1007543Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1007833Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1008063Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1008363Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1008559Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1008754Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1008949Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1009157Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1009356Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1009586Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1009877Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1010072Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1010267Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1010472Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1010675Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1010921Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1011214Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1011447Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1011737Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1011933Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1012138Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1012350Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1012583Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1012877Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1013096Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1013337Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1013541Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1013743Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1014038Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1014271Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1014563Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1014807Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1015123Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1015355Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1015648Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1015881Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1016176Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1016373Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1016570Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1016802Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1017003Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1017204Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1017404Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1017696Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1017929Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1018223Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1018455Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1018746Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1018990Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1019290Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1019533Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1019824Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1020046Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1020247Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1020448Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1020640Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1020849Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1021061Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1021351Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1021572Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1021771Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1021970Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1022171Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1022463Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1022699Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1022990Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1023234Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1023569Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1023816Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1024108Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1024342Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1024634Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1024866Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1025161Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1025405Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1025698Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1025930Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1026223Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1026455Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1026749Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1026982Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1027278Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1027476Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1027685Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1027937Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1028240Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1028472Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1028765Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1028997Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1029290Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1029522Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1029828Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1030059Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1030352Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1030548Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1030782Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1031073Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1031306Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1031598Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1031811Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1032026Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1032236Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1032445Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1032736Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1032953Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1033156Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1033383Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1033582Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1033873Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1034108Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1034311Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1034511Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1034702Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1034850Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1035046Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1035266Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1035473Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1035668Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1035888Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1036093Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1036301Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1036531Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1036749Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1036943Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1037165Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1037370Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1037568Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1037763Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1037976Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1038189Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1038387Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1038589Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1038883Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1039096Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1039298Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1039496Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1039689Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1039884Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1040097Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1040307Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1040516Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1040724Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1041016Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1041229Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1041431Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1041631Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1041833Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1042128Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1042353Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1042553Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1042753Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1042951Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1043245Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1043479Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1043682Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1043875Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1044069Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1044281Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1044506Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1044715Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1044919Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1045101Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1045270Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1045396Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1045498Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1045627Z E1204 11:19:40.912000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1045784Z [W1204 11:19:41.331629527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1045787Z 2025-12-04T11:45:25.1045929Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1046223Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1046534Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1046667Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1047145Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1047400Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1047629Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1047833Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1048035Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1048326Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1048572Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1048863Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1049118Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1049409Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1049642Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1049935Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1050168Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1050458Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1050687Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1050894Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1051089Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1051297Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1051497Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1051730Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1052020Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1052217Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1052448Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1052738Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1052967Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1053173Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1053428Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1053633Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1053830Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1054024Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1054243Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1054447Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1054642Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1054851Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1055084Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1055377Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1055611Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1055901Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1056122Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1056329Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1056524Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1056732Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1056929Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1057172Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1057477Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1057717Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1058007Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1058239Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1058530Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1058761Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1059054Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1059303Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1059594Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1059827Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1060117Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1060348Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1060638Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1060869Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1061159Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1061392Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1061693Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1061944Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1062234Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1062466Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1062756Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1062977Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1063178Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1063393Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1063701Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1063932Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1064228Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1064458Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1064751Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1064981Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1065275Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1065506Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1065802Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1066048Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1066365Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1066562Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1066758Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1066956Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1067161Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1067365Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1067601Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1067892Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1068102Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1068298Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1068494Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1068687Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1068920Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1069215Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1069450Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1069744Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1069939Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1070164Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1070366Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1070623Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1070918Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1071139Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1071343Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1071543Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1071747Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1072038Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1072286Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1072580Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1072817Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1073110Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1073360Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1073658Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1073891Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1074188Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1074387Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1074600Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1074837Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1075058Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1075261Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1075463Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1075759Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1075997Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1076291Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1076541Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1076833Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1077069Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1077369Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1077604Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1077904Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1078125Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1078331Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1078533Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1078730Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1078953Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1079168Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1079480Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1079701Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1079913Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1080115Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1080326Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1080621Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1080874Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1081173Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1081411Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1081714Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1081949Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1082254Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1082500Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1082796Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1083038Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1083411Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1083669Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1083996Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1084237Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1084540Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1084776Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1085078Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1085313Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1085642Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1085844Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1086052Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1086294Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1086590Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1086838Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1087136Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1087375Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1087676Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1087927Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1088244Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1088492Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1088795Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1088995Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1089238Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1089541Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1089778Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1090100Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1090352Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1090578Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1090795Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1093297Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1093619Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1093841Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1094055Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1094258Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1094503Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1094836Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1095086Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1095292Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1095501Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1095699Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1095863Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1096069Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1096294Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1096507Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1096724Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1096953Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1097163Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1097370Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1097594Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1097914Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1098119Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1098343Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1098560Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1098761Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1098968Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1099198Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1099422Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1099631Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1099834Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1100139Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1100358Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1100568Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1100770Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1100972Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1101184Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1101407Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1101619Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1101820Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1102065Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1102365Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1102585Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1102791Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1102998Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1103209Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1103595Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1103829Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1104035Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1104243Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1104451Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1104750Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1104957Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1105161Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1105359Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1105578Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1105800Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1106008Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1106213Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1106424Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1106615Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1106794Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1106922Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1107034Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1107163Z E1204 11:19:41.065000 856451 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1107215Z FAILED [1.0041s] [100%] 2025-12-04T11:45:25.1107219Z 2025-12-04T11:45:25.1107280Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1107435Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1107488Z Traceback (most recent call last): 2025-12-04T11:45:25.1107678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1107738Z method(*args, **kwargs) 2025-12-04T11:45:25.1107901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1107946Z method(*args, **kwargs) 2025-12-04T11:45:25.1108110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1108151Z with policy(): 2025-12-04T11:45:25.1108311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1108356Z raise RuntimeError(msg) 2025-12-04T11:45:25.1108762Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1086324736. 2025-12-04T11:45:25.1108766Z 2025-12-04T11:45:25.1108853Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1109121Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1109123Z 2025-12-04T11:45:25.1109223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1109310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1109377Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1109441Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1110013Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1110119Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1110167Z graph_break [] 2025-12-04T11:45:25.1110236Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1110320Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1110835Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1110893Z current_size = base.storage().size() 2025-12-04T11:45:25.1110944Z Autotune Choices Stats: 2025-12-04T11:45:25.1111323Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006399000063538551, "best_triton_pos": 0} 2025-12-04T11:45:25.1111397Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1111453Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1111585Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1111846Z triton_mm_31 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1112096Z triton_mm_19 0.0079 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1112328Z triton_mm_32 0.0081 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1112560Z triton_mm_27 0.0086 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1112800Z triton_mm_15 0.0091 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1113026Z triton_mm_20 0.0092 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1113283Z triton_mm_13 0.0098 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1113510Z triton_mm_23 0.0099 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1113770Z triton_mm_28 0.0101 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1114002Z triton_mm_8 0.0102 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1114142Z SingleProcess AUTOTUNE benchmarking takes 0.1155 seconds and 1.0546 seconds precompiling for 31 choices 2025-12-04T11:45:25.1114311Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1114360Z Traceback (most recent call last): 2025-12-04T11:45:25.1114552Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1114599Z method(*args, **kwargs) 2025-12-04T11:45:25.1114760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1114805Z method(*args, **kwargs) 2025-12-04T11:45:25.1114966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1115006Z with policy(): 2025-12-04T11:45:25.1115167Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1115211Z raise RuntimeError(msg) 2025-12-04T11:45:25.1115612Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1086324736 and is now 1184890880. 2025-12-04T11:45:25.1115615Z 2025-12-04T11:45:25.1115692Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1115975Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1115992Z 2025-12-04T11:45:25.1116083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1116169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1116217Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1116281Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1116839Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1116943Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1116992Z graph_break [] 2025-12-04T11:45:25.1117058Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1117137Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1117632Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1117704Z current_size = base.storage().size() 2025-12-04T11:45:25.1117750Z Autotune Choices Stats: 2025-12-04T11:45:25.1118134Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006399000063538551, "best_triton_pos": 0} 2025-12-04T11:45:25.1118203Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1118262Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1118391Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1118648Z triton_mm_31 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1118885Z triton_mm_19 0.0079 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1119115Z triton_mm_32 0.0081 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1119346Z triton_mm_27 0.0086 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1119574Z triton_mm_15 0.0091 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1119825Z triton_mm_20 0.0092 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1120071Z triton_mm_13 0.0098 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1120297Z triton_mm_23 0.0099 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1120532Z triton_mm_28 0.0101 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1120762Z triton_mm_8 0.0102 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1120902Z SingleProcess AUTOTUNE benchmarking takes 0.1155 seconds and 1.0546 seconds precompiling for 31 choices 2025-12-04T11:45:25.1120980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1121032Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1121093Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1121199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1121693Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1121755Z graph_break [] 2025-12-04T11:45:25.1121828Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1121905Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1121954Z Autotune Choices Stats: 2025-12-04T11:45:25.1122318Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_53", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-12-04T11:45:25.1122407Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1122461Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1122590Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1122825Z triton_mm_53 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1123068Z triton_mm_65 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1123331Z triton_mm_66 0.0082 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1123585Z triton_mm_61 0.0085 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1123817Z triton_mm_49 0.0092 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1124058Z triton_mm_54 0.0093 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1124292Z triton_mm_47 0.0095 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1124521Z triton_mm_57 0.0101 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1124751Z triton_mm_48 0.0104 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1124984Z triton_mm_62 0.0106 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1125118Z SingleProcess AUTOTUNE benchmarking takes 0.1975 seconds and 0.4584 seconds precompiling for 35 choices 2025-12-04T11:45:25.1125180Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1125359Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1125415Z Traceback (most recent call last): 2025-12-04T11:45:25.1125578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1125629Z method(*args, **kwargs) 2025-12-04T11:45:25.1125785Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1125831Z method(*args, **kwargs) 2025-12-04T11:45:25.1125985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1126029Z with policy(): 2025-12-04T11:45:25.1126185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1126233Z raise RuntimeError(msg) 2025-12-04T11:45:25.1126645Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.1126648Z 2025-12-04T11:45:25.1126729Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1126994Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1127001Z 2025-12-04T11:45:25.1127090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1127172Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1127217Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1127283Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1127848Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1127964Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1128003Z graph_break [] 2025-12-04T11:45:25.1128074Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1128150Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1128648Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1128698Z current_size = base.storage().size() 2025-12-04T11:45:25.1128747Z Autotune Choices Stats: 2025-12-04T11:45:25.1129115Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006399000063538551, "best_triton_pos": 0} 2025-12-04T11:45:25.1129185Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1129244Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1129392Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1129634Z triton_mm_31 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1129865Z triton_mm_19 0.0079 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1130099Z triton_mm_32 0.0081 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1130339Z triton_mm_27 0.0086 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1130573Z triton_mm_15 0.0091 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1130802Z triton_mm_20 0.0092 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1131029Z triton_mm_13 0.0098 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1131259Z triton_mm_23 0.0099 ms 64.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1131499Z triton_mm_28 0.0101 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1131744Z triton_mm_8 0.0102 ms 62.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1131873Z SingleProcess AUTOTUNE benchmarking takes 0.1155 seconds and 1.0546 seconds precompiling for 31 choices 2025-12-04T11:45:25.1131952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1131997Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1132058Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1132161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1132654Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1132700Z graph_break [] 2025-12-04T11:45:25.1132764Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1132841Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1132886Z Autotune Choices Stats: 2025-12-04T11:45:25.1133278Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_53", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.007040000054985285, "best_triton_pos": 0} 2025-12-04T11:45:25.1133371Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1133424Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1133546Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1133781Z triton_mm_53 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1134011Z triton_mm_65 0.0070 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1134268Z triton_mm_66 0.0082 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1134499Z triton_mm_61 0.0085 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1134725Z triton_mm_49 0.0092 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1134954Z triton_mm_54 0.0093 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1135193Z triton_mm_47 0.0095 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1135422Z triton_mm_57 0.0101 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1135673Z triton_mm_48 0.0104 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1135899Z triton_mm_62 0.0106 ms 66.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1136036Z SingleProcess AUTOTUNE benchmarking takes 0.1975 seconds and 0.4584 seconds precompiling for 35 choices 2025-12-04T11:45:25.1136111Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1136161Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1136220Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1136327Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1136813Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1136872Z graph_break [] 2025-12-04T11:45:25.1136937Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1137017Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1137061Z Autotune Choices Stats: 2025-12-04T11:45:25.1137432Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839999929070473, "best_triton_pos": 0} 2025-12-04T11:45:25.1137503Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1137556Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1137682Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1137929Z triton_mm_99 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1138164Z triton_mm_87 0.0075 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1138395Z triton_mm_100 0.0082 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1138442Z _scaled_mm 0.0084 ms 81.8% 2025-12-04T11:45:25.1138667Z triton_mm_95 0.0085 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1138900Z triton_mm_83 0.0092 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1139140Z triton_mm_88 0.0094 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1139393Z triton_mm_81 0.0098 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1139624Z triton_mm_76 0.0102 ms 67.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1139851Z triton_mm_82 0.0103 ms 66.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1139988Z SingleProcess AUTOTUNE benchmarking takes 0.2102 seconds and 0.2847 seconds precompiling for 35 choices 2025-12-04T11:45:25.1140183Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26279b2ffe9336fe.xml - 2025-12-04T11:45:25.1140249Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1140859Z FAILED [1.0041s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.1140874Z 2025-12-04T11:45:25.1140949Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1141216Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1141219Z 2025-12-04T11:45:25.1141309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1141377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1141448Z ================== 1 failed, 187 deselected, 2 rerun in 5.45s ================== 2025-12-04T11:45:25.1141493Z Got exit code 1 2025-12-04T11:45:25.1141535Z Retrying single test... 2025-12-04T11:45:25.1141688Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a44036a8173ee72.xml 2025-12-04T11:45:25.1141761Z ============================= test session starts ============================== 2025-12-04T11:45:25.1141884Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1141928Z cachedir: .pytest_cache 2025-12-04T11:45:25.1142094Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1142143Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1142192Z configfile: pytest.ini 2025-12-04T11:45:25.1142362Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1142441Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1142701Z stepcurrent: skipping 96 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1142752Z Running 1 items in this shard 2025-12-04T11:45:25.1142754Z 2025-12-04T11:45:25.1143105Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:50.342566732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1143120Z 2025-12-04T11:45:25.1143471Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1143774Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1143909Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1144400Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1144663Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1144892Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1145136Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1145339Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1145641Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1145879Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1146193Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1146431Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1146727Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1146965Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1147257Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1147508Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1147815Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1148052Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1148350Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1148585Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1148878Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1149074Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1149309Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1149619Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1149815Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1150055Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1150347Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1150599Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1150893Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1151118Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1151330Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1151543Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1151759Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1151942Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1152141Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1152674Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp8zy13t5/yc/cyc6kxtypptzk2fnikrodt4aztwqzfhd4sudsvvqxxoiuf53jz5u.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.1152831Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1153056Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1153215Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1153390Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1153680Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1153836Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1154097Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1154243Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1154501Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1154664Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1154956Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1155094Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1155377Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1155573Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1155894Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1156210Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1156349Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1156851Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1157105Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1157339Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1157548Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1157754Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1158049Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1158300Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1158601Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1158839Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1159134Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1159383Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1159680Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1159918Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1160213Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1160449Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1160758Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1161008Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1161300Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1161503Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1161739Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1162033Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1162236Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1162469Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1162779Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1163013Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1163337Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1163563Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1163819Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1164026Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1164238Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1164411Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1164592Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1164702Z E1204 11:19:50.397000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1164865Z [W1204 11:19:50.777568556 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1164871Z 2025-12-04T11:45:25.1165197Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1165510Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1165645Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1166128Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1166383Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1166614Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1166830Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1167048Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1167346Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1167581Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1167878Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1168126Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1168420Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1168658Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1168951Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1169189Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1169507Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1169760Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1170054Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1170288Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1170584Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1170782Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1171023Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1171315Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1171532Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1171770Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1172062Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1172300Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1172608Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1172834Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1173043Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1173246Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1173487Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1173656Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1173858Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1174382Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp8zy13t5/xq/cxqbbei3bvcfcosi7eyccrrtxl4y2o2d5vveoambtjqx7xltq7z3.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.1174547Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1174767Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1174929Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1175077Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1175362Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1175496Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1175756Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1175914Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1176171Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1176336Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1176605Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1176744Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1177042Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1177237Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1177557Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1177851Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1177988Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1178477Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1178747Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1178978Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1179186Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1179392Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1179686Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1179923Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1180221Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1180466Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1180764Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1180998Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1181308Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1185704Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1186012Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1186251Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1186544Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1186813Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1187105Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1187323Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1187555Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1187853Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1188055Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1188288Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1188581Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1188814Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1189126Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1189351Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1189561Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1189766Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1189993Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1190167Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1190349Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1190455Z E1204 11:19:50.511000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1190612Z [W1204 11:19:50.787253083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1190615Z 2025-12-04T11:45:25.1190930Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1191248Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1191392Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1191872Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1192127Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1192357Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1192563Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1192766Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1193064Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1193347Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1193649Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1193880Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1194193Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1194432Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1194722Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1194956Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1195248Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1195494Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1195784Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1196032Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1196333Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1196531Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1196762Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1197053Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1197251Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1197481Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1197790Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1198022Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1198311Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1198541Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1198747Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1198948Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1199159Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1199328Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1199505Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1200044Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp8zy13t5/xe/cxedvxvlos3gemo4tn3u42sugw35sjfm2g2kzzhrrmbc33koddnm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.1200204Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1200420Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1200578Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1200724Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1201010Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1201144Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1201399Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1201538Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1201792Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1201963Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1202230Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1202366Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1202640Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1202849Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1203172Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1203500Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1203631Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1204126Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1204381Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1204622Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1204827Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1205033Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1205327Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1205563Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1205854Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1206089Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1206396Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1206629Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1206919Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1207163Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1207458Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1207689Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1207984Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1208215Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1208516Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1208713Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1208961Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1209255Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1209452Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1209683Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1209974Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1210204Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1210495Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1210727Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1210934Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1211139Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1211347Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1211524Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1211703Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1211808Z E1204 11:19:50.520000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1211965Z [W1204 11:19:50.788781080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1211967Z 2025-12-04T11:45:25.1212277Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1212570Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1212713Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1213190Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1213487Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1213713Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1213917Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1214119Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1214410Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1214643Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1214951Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1215183Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1215478Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1215707Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1216015Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1216249Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1216538Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1216768Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1217060Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1217308Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1217610Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1217809Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1218044Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1218335Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1218532Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1218762Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1219058Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1219305Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1219596Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1219817Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1220021Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1220232Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1220444Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1220612Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1220787Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1221321Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpp8zy13t5/yi/cyi6bjjhzh6rc4qdflakznwdwskipruplqvwjvuai6pd6agmgki2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.1221479Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1221693Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1225554Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1225698Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1225986Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1226117Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1226375Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1226516Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1226772Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1226931Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1227215Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1227351Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1227626Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1227820Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1228149Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1228443Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1228574Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1229053Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1229310Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1229555Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1229776Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1229977Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1230267Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1230503Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1230798Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1231034Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1231325Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1231569Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1231861Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1232092Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1232384Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1232625Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1232921Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1233155Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1233475Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1233673Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1233915Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1234218Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1234411Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1234649Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1234943Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1235174Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1235466Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1235684Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1235904Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1236104Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1236314Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1236480Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1236657Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1236775Z E1204 11:19:50.522000 862287 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1236829Z ('RERUN', {'yellow': True}) [3.2693s] [100%] 2025-12-04T11:45:25.1237168Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:51.092032807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1237171Z 2025-12-04T11:45:25.1237318Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1237614Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1237907Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1238049Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1238527Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1238794Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1239023Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1239227Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1239432Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1239723Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1239955Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1240260Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1240491Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1240785Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1241028Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1241323Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1241557Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1241846Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1242066Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1242271Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1242481Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1242701Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1242902Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1243134Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1243443Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1243641Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1243871Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1244161Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1244397Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1244601Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1244819Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1245024Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1245219Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1245433Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1245652Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1245859Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1246057Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1246250Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1246484Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1246797Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1247039Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1247331Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1247552Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1247759Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1247954Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1248164Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1248363Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1248612Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1248908Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1249140Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1249435Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1249674Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1249967Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1250199Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1250490Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1250725Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1251025Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1251269Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1251560Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1251789Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1252083Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1252311Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1252607Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1252838Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1253141Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1253407Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1253697Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1253930Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1254234Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1254457Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1254662Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1254861Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1255155Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1255396Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1255701Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1255931Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1256223Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1256457Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1256748Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1256982Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1257272Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1257519Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1257809Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1258005Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1258202Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1258407Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1258620Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1258819Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1259052Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1259341Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1259542Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1259745Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1259951Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1260148Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1260378Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1260673Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1260904Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1261196Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1261390Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1261601Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1261818Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1262050Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1262343Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1262564Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1262778Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1262981Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1263184Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1263552Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1263785Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1264100Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1264345Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1264640Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1264873Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1265168Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1265403Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1265695Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1265892Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1266103Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1266326Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1266529Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1266730Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1266932Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1267239Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1267474Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1267767Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1268002Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1268305Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1268548Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1268860Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1269094Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1269397Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1269621Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1269831Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1270036Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1270229Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1270456Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1270662Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1270963Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1271185Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1271403Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1271610Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1271811Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1272109Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1272342Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1272644Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1272890Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1273197Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1273483Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1273777Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1274017Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1274311Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1274550Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1274848Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1275101Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1275398Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1275632Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1275944Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1276182Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1276480Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1276718Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1277014Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1277232Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1277429Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1277700Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1277995Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1278236Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1278535Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1278771Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1279071Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1279306Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1279618Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1279852Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1280151Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1280366Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1280603Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1280902Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1281138Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1281438Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1281659Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1281874Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1282089Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1282290Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1282589Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1282806Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1283011Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1283213Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1283448Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1283749Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1283987Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1284195Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1284394Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1284591Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1284755Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1284959Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1285187Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1285394Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1285594Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1285818Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1286044Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1286256Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1286483Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1286690Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1286892Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1287122Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1287330Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1287533Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1287729Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1287962Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1288166Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1288369Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1288573Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1288880Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1289100Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1289303Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1289508Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1289700Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1289899Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1290123Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1290328Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1290538Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1290744Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1291042Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1291257Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1291460Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1291658Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1291859Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1292165Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1292379Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1292584Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1292783Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1292984Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1293324Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1293523Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1293729Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1293918Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1294117Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1294352Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1294561Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1294771Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1294962Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1295145Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1295319Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1295449Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1295553Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1295682Z E1204 11:19:51.830000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1295839Z [W1204 11:19:51.113071296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1295842Z 2025-12-04T11:45:25.1295988Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1296296Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1296597Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1296727Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1297222Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1297479Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1297707Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1297919Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1298118Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1298418Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1298665Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1298968Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1299204Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1299495Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1299732Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1300024Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1300257Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1300553Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1300785Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1300994Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1301189Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1301400Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1301610Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1301846Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1302142Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1302340Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1302577Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1302879Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1303101Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1303336Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1303558Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1303766Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1303963Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1304160Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1304381Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1304588Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1304784Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1304995Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1305226Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1305520Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1305753Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1306059Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1306282Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1306491Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1306688Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1306898Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1307112Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1307347Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1307654Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1307888Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1308183Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1308420Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1308712Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1308947Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1309241Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1309484Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1309780Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1310012Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1310333Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1310572Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1310865Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1311103Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1311396Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1311647Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1311950Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1312188Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1312488Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1312722Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1313019Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1313240Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1313468Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1313683Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1313981Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1314220Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1314516Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1314767Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1315063Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1315300Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1315596Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1315829Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1316147Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1316392Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1316690Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1316888Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1317090Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1317292Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1317501Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1317705Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1317937Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1318246Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1318445Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1318650Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1318846Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1319056Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1319296Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1319589Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1319828Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1320120Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1320322Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1320542Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1320758Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1320996Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1321291Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1321520Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1321725Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1321929Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1322130Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1322430Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1322679Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1322974Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1323211Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1323549Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1323790Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1324084Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1324322Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1324621Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1324834Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1325036Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1325274Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1325484Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1325683Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1325891Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1326188Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1326424Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1326727Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1326982Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1327280Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1327520Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1327814Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1328064Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1328361Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1328591Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1328797Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1329004Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1329214Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1329427Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1329643Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1329938Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1330167Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1330373Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1330579Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1330781Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1331082Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1331332Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1331627Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1331867Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1332160Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1332411Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1332713Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1332949Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1333246Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1333500Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1333815Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1334061Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1334358Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1334602Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1334901Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1335142Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1335436Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1335678Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1335985Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1336187Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1336389Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1336624Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1336941Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1337177Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1337476Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1337714Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1338009Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1338258Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1338562Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1338801Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1339097Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1339298Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1339538Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1339832Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1340071Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1340376Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1340598Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1340802Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1341008Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1341226Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1341521Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1341741Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1341944Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1342148Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1342351Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1342658Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1342901Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1343105Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1343334Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1343528Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1343680Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1343878Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1344104Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1344312Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1344531Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1344758Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1344966Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1345167Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1345401Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1345613Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1345809Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1346036Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1346247Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1346451Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1346678Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1346894Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1347114Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1347313Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1347521Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1347818Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1348037Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1348246Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1348447Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1348661Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1348859Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1349078Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1349281Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1349483Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1349700Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1349997Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1350217Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1350420Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1350627Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1350830Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1351141Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1351371Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1351574Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1351777Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1351981Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1352279Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1352476Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1352683Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1352884Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1353084Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1353337Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1353542Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1353746Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1353950Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1354136Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1354308Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1354439Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1354541Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1354672Z E1204 11:19:51.846000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1354828Z [W1204 11:19:51.144854597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1354834Z 2025-12-04T11:45:25.1354992Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1355288Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1355596Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1355728Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1356212Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1356470Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1356700Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1356910Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1357125Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1357417Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1357656Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1357948Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1358196Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1358493Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1358725Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1359019Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1359253Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1359563Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1359795Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1360000Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1360198Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1360407Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1360609Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1360841Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1361137Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1361335Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1361583Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1361881Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1362101Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1362302Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1362534Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1362746Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1362943Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1363143Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1363393Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1363602Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1363819Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1364027Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1364267Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1364563Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1364802Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1365099Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1365321Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1365528Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1365736Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1365946Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1366147Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1366384Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1366693Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1366925Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1367219Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1367450Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1367743Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1367977Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1368290Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1368535Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1368825Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1369059Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1369350Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1369585Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1369875Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1370121Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1370416Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1370649Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1370943Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1371185Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1371482Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1371717Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1372008Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1372230Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1372443Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1372642Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1372946Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1373179Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1373501Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1373733Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1374026Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1374257Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1374553Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1374800Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1375093Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1375332Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1375641Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1375842Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1376038Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1376235Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1376444Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1376647Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1376893Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1377199Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1377399Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1377595Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1377793Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1377987Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1378222Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1378521Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1378756Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1379066Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1379263Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1379473Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1379676Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1379938Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1380235Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1380460Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1380669Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1380870Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1381078Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1381383Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1381631Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1381928Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1382164Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1382461Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1382698Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1382997Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1383243Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1383574Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1383778Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1383975Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1384203Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1384428Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1384632Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1384836Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1385133Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1385372Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1385687Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1385938Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1386232Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1386473Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1386769Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1387009Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1387309Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1387533Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1387752Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1387952Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1388148Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1388357Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1388562Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1388869Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1389090Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1389294Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1389493Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1389695Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1389999Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1390244Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1390542Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1390777Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1391075Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1391311Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1391609Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1391844Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1392156Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1392394Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1392692Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1392927Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1393233Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1393505Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1393801Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1394033Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1394330Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1394578Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1394886Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1395083Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1395285Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1395521Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1395817Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1396054Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1396347Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1396597Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1396891Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1397127Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1397422Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1397670Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1397966Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1398164Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1398399Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1398697Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1398947Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1399252Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1399465Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1399670Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1399871Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1400075Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1400367Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1400583Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1400802Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1401002Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1401206Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1401498Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1401720Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1401935Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1402137Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1402332Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1402480Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1402682Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1402905Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1403126Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1403357Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1403582Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1403789Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1403988Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1404211Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1404419Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1404620Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1404841Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1405072Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1405272Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1405471Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1405688Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1405890Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1406105Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1406308Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1406608Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1406821Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1407027Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1407366Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1407558Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1407796Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1408009Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1408212Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1408412Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1408618Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1408913Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1409132Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1409354Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1409554Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1409762Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1410057Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1410276Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1410493Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1410698Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1410906Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1411202Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1411404Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1411624Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1411820Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1412030Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1412249Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1412462Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1412664Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1412860Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1413045Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1413226Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1413484Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1413631Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1413758Z E1204 11:19:51.878000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1413920Z [W1204 11:19:52.280506783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1413923Z 2025-12-04T11:45:25.1414070Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1414372Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1414699Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1414833Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1415324Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1415581Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1415814Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1416051Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1416280Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1416580Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1416818Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1417120Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1417354Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1417652Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1417889Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1418196Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1418433Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1418728Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1418953Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1419171Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1419375Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1419589Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1419790Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1420026Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1420319Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1420534Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1420781Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1421078Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1421303Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1421502Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1421725Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1421932Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1422133Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1422329Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1422567Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1422774Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1422977Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1423178Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1423470Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1423771Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1424004Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1424301Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1424524Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1424757Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1424958Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1425183Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1425389Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1425622Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1425921Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1426155Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1426455Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1426693Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1427005Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1427243Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1427538Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1427775Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1428083Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1428316Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1428619Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1428851Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1429163Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1429395Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1429700Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1429935Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1430226Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1430461Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1430759Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1430992Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1431284Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1431517Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1431723Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1431918Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1432223Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1432457Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1432754Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1432985Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1433306Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1433541Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1433847Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1434094Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1434384Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1434619Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1434914Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1435113Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1435310Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1435506Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1435731Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1435930Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1436165Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1436458Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1436667Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1436869Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1437063Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1437261Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1437491Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1437786Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1438030Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1438340Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1438541Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1438752Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1438962Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1439197Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1439498Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1439720Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1439940Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1440144Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1440348Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1440649Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1440884Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1441203Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1441437Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1441736Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1441974Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1442279Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1442518Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1442824Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1443028Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1443230Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1443483Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1443691Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1443891Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1444098Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1444410Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1444651Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1444948Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1445184Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1445494Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1445730Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1446029Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1446265Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1446561Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1446802Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1447020Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1447226Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1447420Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1447636Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1447838Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1448138Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1448364Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1448569Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1448787Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1448988Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1449290Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1449525Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1449835Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1450074Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1450371Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1450609Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1450907Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1451157Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1451460Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1451699Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1451998Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1452234Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1452533Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1452768Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1453071Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1453352Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1453649Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1453888Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1454182Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1454403Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1454603Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1454842Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1455143Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1455379Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1455698Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1455943Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1456241Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1456475Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1456773Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1457013Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1457309Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1457513Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1457766Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1458065Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1458300Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1458598Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1458829Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1459034Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1459242Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1459443Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1459741Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1459966Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1460173Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1460389Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1460589Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1460887Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1461111Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1461319Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1461520Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1461715Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1461870Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1462080Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1462306Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1462514Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1462715Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1462951Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1463166Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1463392Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1463621Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1463832Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1464030Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1464270Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1464488Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1464691Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1464888Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1465110Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1465315Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1465520Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1465726Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1466020Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1466253Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1466457Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1466661Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1466855Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1467055Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1467285Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1467490Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1467694Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1467895Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1468195Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1468421Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1468629Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1468845Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1469047Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1469346Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1469562Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1469771Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1469971Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1470176Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1470480Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1470684Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1470894Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1471086Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1471288Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1471515Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1471728Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1471927Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1472122Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1472307Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1472481Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1472622Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1472727Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1472875Z E1204 11:19:52.013000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1472929Z ('RERUN', {'yellow': True}) [1.2834s] [100%] 2025-12-04T11:45:25.1473284Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda [W1204 11:19:52.181587643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1473287Z 2025-12-04T11:45:25.1473436Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1473736Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1474036Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1474169Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1474655Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1474924Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1475155Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1475362Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1475579Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1475880Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1476117Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1476414Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1476650Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1476961Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1477194Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1477502Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1477740Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1478035Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1478260Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1478468Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1478673Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1478881Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1479097Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1479334Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1479627Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1479827Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1480070Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1480367Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1480591Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1480790Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1481013Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1481219Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1481428Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1481635Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1481858Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1482066Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1482266Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1482466Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1482700Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1482998Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1483232Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1483583Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1483804Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1484015Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1484217Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1484444Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1484647Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1484882Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1485180Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1485414Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1485727Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1485978Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1486270Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1486509Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1486804Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1487042Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1487335Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1487572Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1487885Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1488117Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1488416Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1488650Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1488957Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1489194Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1489485Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1489722Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1490015Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1490268Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1490573Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1490799Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1491008Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1491204Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1491499Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1491732Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1492029Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1492274Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1492570Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1492809Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1493102Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1493370Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1493664Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1493900Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1494193Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1494395Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1494609Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1494823Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1495038Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1495238Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1495477Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1495770Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1495972Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1496172Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1496369Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1496586Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1496821Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1497117Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1497350Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1497657Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1497859Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1498068Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1498273Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1498507Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1498818Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1499042Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1499258Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1499462Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1499665Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1499966Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1500201Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1500498Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1500734Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1501046Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1501285Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1501580Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1501818Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1502124Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1505168Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1505374Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1505599Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1505805Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1506045Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1506249Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1506564Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1506800Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1507096Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1507331Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1507628Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1507860Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1508156Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1508410Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1508705Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1508924Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1509150Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1509351Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1509544Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1509759Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1509958Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1510255Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1510486Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1510703Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1510902Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1511102Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1511399Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1511636Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1511934Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1512164Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1512457Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1512700Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1512991Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1513225Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1513565Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1513803Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1514095Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1514329Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1514621Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1514867Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1515160Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1515408Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1515701Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1515937Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1516227Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1516426Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1516622Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1516856Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1517165Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1517398Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1517690Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1517936Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1518231Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1518465Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1518758Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1518993Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1519299Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1519508Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1519743Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1520042Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1520277Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1520572Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1520787Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1520991Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1521194Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1521406Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1521698Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1521911Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1522115Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1522326Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1522531Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1522828Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1523049Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1523278Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1523479Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1523687Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1523853Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1524050Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1524270Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1524476Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1524674Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1524895Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1525102Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1525298Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1525539Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1525746Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1525944Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1526167Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1526386Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1526586Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1526781Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1526998Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1527198Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1527400Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1527603Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1527916Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1528141Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1528343Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1528543Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1528736Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1528933Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1529146Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1529350Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1529556Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1529767Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1530062Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1530274Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1530476Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1530686Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1530889Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1531185Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1531401Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1531608Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1531816Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1532019Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1532321Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1532518Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1532721Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1532914Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1533109Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1533353Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1533563Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1533762Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1533971Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1534153Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1534324Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1534453Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1534557Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1534700Z E1204 11:19:52.915000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1534857Z [W1204 11:19:52.196449463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1534860Z 2025-12-04T11:45:25.1535009Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1535305Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1535603Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1535735Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1536234Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1536505Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1536731Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1536941Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1537143Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1537437Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1537676Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1537967Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1538214Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1538506Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1538738Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1539039Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1539273Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1539565Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1539786Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1539992Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1540190Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1540408Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1540622Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1540854Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1541150Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1541347Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1541581Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1541873Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1542092Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1542297Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1542518Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1542723Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1542921Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1543120Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1543387Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1543596Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1543792Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1543989Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1544220Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1544518Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1544763Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1545069Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1545291Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1545495Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1545694Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1545902Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1546100Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1546334Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1546638Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1546872Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1547166Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1547399Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1547699Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1547932Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1548224Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1548453Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1548745Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1548987Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1549290Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1549522Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1549812Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1550045Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1550333Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1550564Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1550855Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1551103Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1551394Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1551625Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1551916Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1552147Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1552351Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1552548Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1552841Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1553076Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1553410Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1553658Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1553948Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1554182Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1554472Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1554703Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1554998Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1555229Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1555537Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1555731Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1555927Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1556122Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1556342Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1556544Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1556773Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1557069Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1557264Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1557462Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1557674Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1557878Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1558111Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1558400Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1558634Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1558926Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1559123Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1559329Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1559536Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1559780Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1560072Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1560295Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1560498Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1560710Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1560915Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1561212Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1561446Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1561737Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1561983Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1562290Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1562527Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1562819Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1563057Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1563375Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1563573Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1563770Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1564009Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1564214Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1564415Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1564617Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1564914Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1565162Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1565456Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1565689Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1565984Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1566218Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1566525Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1566770Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1567066Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1567289Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1567492Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1567692Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1567882Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1568095Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1568308Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1568601Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1568823Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1569027Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1569237Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1569439Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1569735Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1569969Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1570262Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1570497Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1570798Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1571045Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1571342Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1571575Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1571869Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1572100Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1572395Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1572628Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1572935Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1573172Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1573485Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1573741Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1574035Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1574269Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1574561Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1574759Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1574958Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1575205Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1575512Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1575902Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1576198Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1576430Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1576724Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1576956Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1577250Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1577504Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1577797Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1577995Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1578240Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1578533Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1578767Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1579058Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1579274Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1579477Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1579690Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1579902Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1580194Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1580413Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1580618Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1580818Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1581022Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1581320Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1581553Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1581754Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1581956Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1582148Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1582297Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1582502Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1582729Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1582935Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1583136Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1583394Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1583602Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1583812Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1584046Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1584252Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1584448Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1584672Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1584878Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1585079Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1585277Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1585490Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1585707Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1585905Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1586108Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1586401Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1586617Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1586835Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1587035Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1587230Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1587425Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1587638Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1587839Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1588048Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1588259Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1588551Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1588766Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1588971Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1589169Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1589369Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1589662Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1589893Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1590096Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1590298Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1590497Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1590793Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1591000Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1591202Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1591390Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1591588Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1591801Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1592007Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1592216Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1592414Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1592599Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1592771Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1592900Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1593002Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1593129Z E1204 11:19:52.929000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1593320Z [W1204 11:19:52.227949688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1593325Z 2025-12-04T11:45:25.1593470Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1593763Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1594074Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1594205Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1594694Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1594961Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1595190Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1595394Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1595596Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1595888Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1596124Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1596431Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1596679Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1596972Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1597205Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1597500Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1597730Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1598021Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1598246Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1598463Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1598661Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1598869Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1599070Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1599313Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1599609Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1599805Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1600039Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1600335Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1600567Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1600766Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1600996Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1601204Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1601402Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1601599Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1601818Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1602022Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1602219Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1602415Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1602659Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1602949Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1603181Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1603494Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1603730Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1603936Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1604131Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1604341Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1604543Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1604788Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1605079Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1605331Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1605621Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1605852Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1606143Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1606376Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1606667Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1606912Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1607202Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1607434Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1607723Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1607967Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1608262Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1608500Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1608793Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1609023Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1609326Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1609567Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1609857Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1610089Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1610382Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1610602Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1610803Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1610998Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1611289Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1611533Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1611824Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1612054Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1612355Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1612589Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1612883Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1613113Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1613436Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1613681Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1613983Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1614178Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1614376Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1614577Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1614782Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1614983Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1615215Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1615507Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1615717Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1615911Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1616106Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1616301Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1616547Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1616839Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1617069Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1617361Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1617556Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1617764Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1617976Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1618222Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1618519Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1618742Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1618950Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1619150Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1619353Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1619645Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1619893Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1620187Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1620422Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1620716Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1620959Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1621254Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1621489Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1621782Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1621984Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1622200Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1622425Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1622639Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1622842Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1623042Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1623373Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1623607Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1623898Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1624131Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1624450Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1624686Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1624982Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1625216Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1625522Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1625744Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1625947Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1626145Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1626341Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1626566Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1626767Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1627072Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1627291Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1627494Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1627691Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1627891Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1628183Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1628420Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1628724Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1628956Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1629249Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1629482Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1629787Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1630018Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1630315Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1630548Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1630840Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1631083Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1631384Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1631618Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1631916Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1632149Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1632443Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1632675Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1632967Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1633173Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1633399Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1633632Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1633923Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1634173Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1634467Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1634702Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1634992Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1635228Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1635536Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1635780Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1636074Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1636272Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1636508Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1636801Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1637034Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1637326Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1637554Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1637758Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1637956Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1638156Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1638467Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1638683Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1638885Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1639082Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1639282Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1639586Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1639809Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1640020Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1640219Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1640415Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1640562Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1640761Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1640981Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1641188Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1641382Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1641613Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1641818Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1642014Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1642235Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1642451Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1642648Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1642867Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1643073Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1643294Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1643490Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1643718Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1643918Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1644131Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1644334Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1644632Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1644846Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1645050Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1645251Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1645443Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1645654Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1645869Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1646075Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1646272Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1646476Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1646785Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1647003Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1647208Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1647405Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1647606Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1647920Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1648137Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1648351Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1648550Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1648754Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1649050Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1649249Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1649448Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1649638Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1649846Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1650066Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1650273Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1650475Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1650666Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1650859Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1651033Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1651161Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1651265Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1651392Z E1204 11:19:52.961000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1651551Z [W1204 11:19:53.384403187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1651554Z 2025-12-04T11:45:25.1651701Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1652009Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1652306Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1652449Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1652932Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1653187Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1653446Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1653656Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1653854Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1654170Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1654404Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1654699Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1654930Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1655239Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1655471Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1655767Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1656002Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1656297Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1656531Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1656749Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1656949Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1657157Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1657357Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1657594Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1657887Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1658083Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1658314Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1658622Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1658845Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1659041Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1659264Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1659493Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1659693Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1659890Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1660113Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1660318Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1660516Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1660727Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1660970Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1661264Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1661496Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1661793Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1662013Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1662219Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1662417Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1662637Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1662839Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1663072Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1663404Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1663652Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1663947Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1664180Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1664472Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1664706Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1665010Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1665244Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1665551Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1665784Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1666077Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1666308Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1666603Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1666832Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1667141Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1667374Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1667667Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1667904Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1668321Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1668559Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1668851Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1669072Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1669277Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1669473Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.1669782Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1670032Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1670325Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1670559Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1670854Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1671087Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1671376Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1671621Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1671914Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1672150Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1672442Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1672648Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1672847Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1673042Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1673284Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1673486Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1673722Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1674028Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1674240Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1674436Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1674629Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1674828Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1675059Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1675355Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1675587Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1675882Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1676093Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1676300Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1676504Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1676739Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1677047Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1677270Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1677477Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1677677Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1677880Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1678189Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1678423Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1678732Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1678962Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1679257Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1679496Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1679789Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1680029Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1680332Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1680533Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1680730Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1680952Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1681154Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1681362Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1681563Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1681857Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1682091Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1682384Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1682627Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1682931Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1683164Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1683493Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1683726Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1684021Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1684242Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1684444Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1684663Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1684856Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1685067Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1685264Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1685569Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1685792Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1685995Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1686196Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1686394Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1686690Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1686943Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1687249Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1687481Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1687779Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1688018Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1688310Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1688544Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1688838Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1689083Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1689376Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1689610Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1689907Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1690152Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1690447Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1690679Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1690971Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1691207Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1691511Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1691718Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1691915Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1692149Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1692442Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1692678Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1692971Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1693203Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1693544Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1693775Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1694071Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1694304Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1694612Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1694809Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1695041Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1695332Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1695565Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1695872Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1696101Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1696302Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1696503Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1696704Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1697000Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1697214Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1697418Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1697618Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1697830Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1698125Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1698344Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1698546Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1698753Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1698947Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1699093Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.1699291Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1699512Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1699722Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1699930Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1700160Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1700366Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1700559Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1700781Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1700987Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1701183Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1701409Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.1701614Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1701834Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1702028Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1702243Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1702444Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1702646Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1702872Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1703166Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1703419Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1703621Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1703825Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1704034Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.1704233Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1704469Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1704669Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1704872Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1705075Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1705373Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1705588Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1705790Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1706006Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1706207Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1706504Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1706715Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.1706920Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.1707132Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.1707334Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1707629Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1707827Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.1708033Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.1708233Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.1708432Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.1708654Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.1708863Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.1709061Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.1709253Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.1709439Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.1709610Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.1709740Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.1709843Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.1709972Z E1204 11:19:53.117000 862287 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1710025Z FAILED [1.0975s] [100%] 2025-12-04T11:45:25.1710027Z 2025-12-04T11:45:25.1710088Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1710233Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1710285Z Traceback (most recent call last): 2025-12-04T11:45:25.1710452Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1710499Z method(*args, **kwargs) 2025-12-04T11:45:25.1710653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1710697Z method(*args, **kwargs) 2025-12-04T11:45:25.1710846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1710889Z with policy(): 2025-12-04T11:45:25.1711053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1711101Z raise RuntimeError(msg) 2025-12-04T11:45:25.1711499Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1086324736. 2025-12-04T11:45:25.1711505Z 2025-12-04T11:45:25.1711582Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1711847Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1711850Z 2025-12-04T11:45:25.1711941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1712020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1712066Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1712138Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1712696Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1712812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1712852Z graph_break [] 2025-12-04T11:45:25.1712922Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1712999Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1713522Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1713575Z current_size = base.storage().size() 2025-12-04T11:45:25.1713617Z Autotune Choices Stats: 2025-12-04T11:45:25.1713999Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839999929070473, "best_triton_pos": 0} 2025-12-04T11:45:25.1714083Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1714140Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1714264Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1714510Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1714739Z triton_mm_19 0.0078 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1714982Z triton_mm_32 0.0079 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1715210Z triton_mm_27 0.0085 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1715436Z triton_mm_20 0.0092 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1715665Z triton_mm_15 0.0094 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1715889Z triton_mm_23 0.0096 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1716132Z triton_mm_13 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1716360Z triton_mm_8 0.0100 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1716597Z triton_mm_14 0.0105 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1716730Z SingleProcess AUTOTUNE benchmarking takes 0.1283 seconds and 1.0736 seconds precompiling for 31 choices 2025-12-04T11:45:25.1716875Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1716927Z Traceback (most recent call last): 2025-12-04T11:45:25.1717085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1717132Z method(*args, **kwargs) 2025-12-04T11:45:25.1717288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1717332Z method(*args, **kwargs) 2025-12-04T11:45:25.1717483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1717524Z with policy(): 2025-12-04T11:45:25.1717678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1717723Z raise RuntimeError(msg) 2025-12-04T11:45:25.1718144Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1086324736 and is now 1184890880. 2025-12-04T11:45:25.1718147Z 2025-12-04T11:45:25.1718227Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1718488Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1718491Z 2025-12-04T11:45:25.1718582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1718657Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1718703Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1718773Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1719325Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1719432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1719470Z graph_break [] 2025-12-04T11:45:25.1719537Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1719611Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1720116Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1720166Z current_size = base.storage().size() 2025-12-04T11:45:25.1720223Z Autotune Choices Stats: 2025-12-04T11:45:25.1720595Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839999929070473, "best_triton_pos": 0} 2025-12-04T11:45:25.1720665Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1720715Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1720842Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1721082Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1721311Z triton_mm_19 0.0078 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1721545Z triton_mm_32 0.0079 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1721771Z triton_mm_27 0.0085 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1722010Z triton_mm_20 0.0092 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1722233Z triton_mm_15 0.0094 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1722465Z triton_mm_23 0.0096 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1722701Z triton_mm_13 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1722930Z triton_mm_8 0.0100 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1723161Z triton_mm_14 0.0105 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1723314Z SingleProcess AUTOTUNE benchmarking takes 0.1283 seconds and 1.0736 seconds precompiling for 31 choices 2025-12-04T11:45:25.1723393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1723437Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1723498Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1723599Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1724110Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1724172Z graph_break [] 2025-12-04T11:45:25.1724234Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1724312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1724352Z Autotune Choices Stats: 2025-12-04T11:45:25.1724718Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-12-04T11:45:25.1724783Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1724838Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1724960Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1725194Z triton_mm_65 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1725419Z triton_mm_53 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1725665Z triton_mm_66 0.0078 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1725890Z triton_mm_61 0.0082 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1726116Z triton_mm_49 0.0089 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1726342Z triton_mm_54 0.0091 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1726582Z triton_mm_42 0.0096 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1726811Z triton_mm_47 0.0100 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1727036Z triton_mm_57 0.0101 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1727261Z triton_mm_48 0.0101 ms 68.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1727394Z SingleProcess AUTOTUNE benchmarking takes 0.2008 seconds and 0.4416 seconds precompiling for 35 choices 2025-12-04T11:45:25.1727449Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1727607Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1727666Z Traceback (most recent call last): 2025-12-04T11:45:25.1727826Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1727870Z method(*args, **kwargs) 2025-12-04T11:45:25.1728028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1728068Z method(*args, **kwargs) 2025-12-04T11:45:25.1728225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1728266Z with policy(): 2025-12-04T11:45:25.1728421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1728464Z raise RuntimeError(msg) 2025-12-04T11:45:25.1728856Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.1728859Z 2025-12-04T11:45:25.1728934Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1729196Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1729198Z 2025-12-04T11:45:25.1729301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1729375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1729421Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1729480Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1730031Z inductor [('triton_bundler_save_kernel', 280), ('generated_module_cache_miss', 34), ('benchmarking.InductorBenchmarker.benchmark_gpu', 31), ('select_algorithm_num_precompiles', 30), ('select_algorithm_num_precompilation_exceptions', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1730133Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1730174Z graph_break [] 2025-12-04T11:45:25.1730238Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1730325Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1730811Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1730859Z current_size = base.storage().size() 2025-12-04T11:45:25.1730900Z Autotune Choices Stats: 2025-12-04T11:45:25.1731273Z {"num_choices": 31, "num_triton_choices": 30, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006839999929070473, "best_triton_pos": 0} 2025-12-04T11:45:25.1731338Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1731389Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1731524Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1731764Z triton_mm_31 0.0068 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1732008Z triton_mm_19 0.0078 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1732236Z triton_mm_32 0.0079 ms 86.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1732463Z triton_mm_27 0.0085 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1732687Z triton_mm_20 0.0092 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1732912Z triton_mm_15 0.0094 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1733137Z triton_mm_23 0.0096 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1733409Z triton_mm_13 0.0100 ms 68.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1733643Z triton_mm_8 0.0100 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1733867Z triton_mm_14 0.0105 ms 65.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1733996Z SingleProcess AUTOTUNE benchmarking takes 0.1283 seconds and 1.0736 seconds precompiling for 31 choices 2025-12-04T11:45:25.1734086Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1734131Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1734186Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1734288Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1734781Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1734818Z graph_break [] 2025-12-04T11:45:25.1734880Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1734953Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1734998Z Autotune Choices Stats: 2025-12-04T11:45:25.1735380Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00687999976798892, "best_triton_pos": 0} 2025-12-04T11:45:25.1735460Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1735512Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1735631Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1735864Z triton_mm_65 0.0069 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1736090Z triton_mm_53 0.0072 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1736316Z triton_mm_66 0.0078 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1736540Z triton_mm_61 0.0082 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1736763Z triton_mm_49 0.0089 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1737001Z triton_mm_54 0.0091 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1737230Z triton_mm_42 0.0096 ms 71.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1737453Z triton_mm_47 0.0100 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1737678Z triton_mm_57 0.0101 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1737913Z triton_mm_48 0.0101 ms 68.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1738042Z SingleProcess AUTOTUNE benchmarking takes 0.2008 seconds and 0.4416 seconds precompiling for 35 choices 2025-12-04T11:45:25.1738116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1738158Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1738216Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1738315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1738801Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1738840Z graph_break [] 2025-12-04T11:45:25.1738914Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:25.1738987Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1739039Z Autotune Choices Stats: 2025-12-04T11:45:25.1739398Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-12-04T11:45:25.1739463Z AUTOTUNE scaled_mm(33x1024, 1024x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1739516Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1739636Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1739871Z triton_mm_99 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1740099Z triton_mm_87 0.0076 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1740332Z triton_mm_100 0.0080 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1740557Z triton_mm_95 0.0082 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1740793Z triton_mm_83 0.0091 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1741017Z triton_mm_88 0.0091 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1741241Z triton_mm_81 0.0096 ms 66.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1741475Z triton_mm_91 0.0100 ms 64.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1741704Z triton_mm_76 0.0101 ms 63.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1741932Z triton_mm_82 0.0103 ms 62.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1742061Z SingleProcess AUTOTUNE benchmarking takes 0.2150 seconds and 0.2854 seconds precompiling for 35 choices 2025-12-04T11:45:25.1742254Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9a44036a8173ee72.xml - 2025-12-04T11:45:25.1742313Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1742919Z FAILED [1.0975s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1184890880 and is now 1283457024. 2025-12-04T11:45:25.1742931Z 2025-12-04T11:45:25.1743005Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1743309Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1743311Z 2025-12-04T11:45:25.1743403Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1743467Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1743536Z ================== 1 failed, 187 deselected, 2 rerun in 5.67s ================== 2025-12-04T11:45:25.1743575Z Got exit code 1 2025-12-04T11:45:25.1743786Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1743913Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.1744060Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4eda93d4ef86457.xml 2025-12-04T11:45:25.1744118Z ============================= test session starts ============================== 2025-12-04T11:45:25.1744230Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1744274Z cachedir: .pytest_cache 2025-12-04T11:45:25.1744451Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1744499Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1744539Z configfile: pytest.ini 2025-12-04T11:45:25.1744706Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1744781Z collecting ... collected 188 items / 97 deselected / 91 selected 2025-12-04T11:45:25.1744835Z stepcurrent: skipping 97 already run items. 2025-12-04T11:45:25.1744880Z Running 91 items in this shard 2025-12-04T11:45:25.1744882Z 2025-12-04T11:45:25.1745098Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6089s] [ 1%] 2025-12-04T11:45:25.1745324Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2708s] [ 1%] 2025-12-04T11:45:25.1745513Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2293s] [ 1%] 2025-12-04T11:45:25.1745516Z 2025-12-04T11:45:25.1745567Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1745707Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1745753Z Traceback (most recent call last): 2025-12-04T11:45:25.1745914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1745955Z method(*args, **kwargs) 2025-12-04T11:45:25.1746110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1746150Z method(*args, **kwargs) 2025-12-04T11:45:25.1746305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1746342Z with policy(): 2025-12-04T11:45:25.1746518Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1746572Z raise RuntimeError(msg) 2025-12-04T11:45:25.1746956Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1746959Z 2025-12-04T11:45:25.1747035Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1747288Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1747291Z 2025-12-04T11:45:25.1747380Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1747454Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1747499Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1747556Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1747623Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1747722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1747760Z graph_break [] 2025-12-04T11:45:25.1747820Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1747961Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1748006Z Traceback (most recent call last): 2025-12-04T11:45:25.1748176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1748215Z method(*args, **kwargs) 2025-12-04T11:45:25.1748368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1748409Z method(*args, **kwargs) 2025-12-04T11:45:25.1748560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1748598Z with policy(): 2025-12-04T11:45:25.1748751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1748791Z raise RuntimeError(msg) 2025-12-04T11:45:25.1749180Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1749184Z 2025-12-04T11:45:25.1749261Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1749513Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1749516Z 2025-12-04T11:45:25.1749603Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1749677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1749723Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1749780Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1749846Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1749946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1749983Z graph_break [] 2025-12-04T11:45:25.1750042Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1750128Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1750169Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1750238Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1750335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1750402Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1750439Z graph_break [] 2025-12-04T11:45:25.1750499Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1750551Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1750692Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1750738Z Traceback (most recent call last): 2025-12-04T11:45:25.1750893Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1750935Z method(*args, **kwargs) 2025-12-04T11:45:25.1751090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1751131Z method(*args, **kwargs) 2025-12-04T11:45:25.1751281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1751318Z with policy(): 2025-12-04T11:45:25.1751474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1751515Z raise RuntimeError(msg) 2025-12-04T11:45:25.1751896Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1751911Z 2025-12-04T11:45:25.1751984Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1752239Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1752241Z 2025-12-04T11:45:25.1752330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1752404Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1752448Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1752503Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1752581Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1752680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1752718Z graph_break [] 2025-12-04T11:45:25.1752777Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1752853Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1752895Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1752952Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1753047Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1753114Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1753152Z graph_break [] 2025-12-04T11:45:25.1753212Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1753322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1753367Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1753422Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1753519Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1753598Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1753649Z graph_break [] 2025-12-04T11:45:25.1753706Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1753899Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4eda93d4ef86457.xml - 2025-12-04T11:45:25.1753958Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1754532Z FAILED [0.2293s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1754535Z 2025-12-04T11:45:25.1754607Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1754863Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1754865Z 2025-12-04T11:45:25.1754954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1755016Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1755086Z ================== 1 failed, 97 deselected, 2 rerun in 2.13s =================== 2025-12-04T11:45:25.1755123Z Got exit code 1 2025-12-04T11:45:25.1755178Z Retrying single test... 2025-12-04T11:45:25.1755322Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69c39936f8b969fa.xml 2025-12-04T11:45:25.1755380Z ============================= test session starts ============================== 2025-12-04T11:45:25.1755493Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1755537Z cachedir: .pytest_cache 2025-12-04T11:45:25.1755694Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1755742Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1755783Z configfile: pytest.ini 2025-12-04T11:45:25.1755946Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1756022Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1756284Z stepcurrent: skipping 97 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1756328Z Running 1 items in this shard 2025-12-04T11:45:25.1756330Z 2025-12-04T11:45:25.1756546Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5964s] [100%] 2025-12-04T11:45:25.1756752Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2711s] [100%] 2025-12-04T11:45:25.1756939Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2265s] [100%] 2025-12-04T11:45:25.1756941Z 2025-12-04T11:45:25.1756995Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1757136Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1757185Z Traceback (most recent call last): 2025-12-04T11:45:25.1757350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1757395Z method(*args, **kwargs) 2025-12-04T11:45:25.1757563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1757607Z method(*args, **kwargs) 2025-12-04T11:45:25.1757759Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1757796Z with policy(): 2025-12-04T11:45:25.1757948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1757991Z raise RuntimeError(msg) 2025-12-04T11:45:25.1758374Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1758377Z 2025-12-04T11:45:25.1758452Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1758705Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1758707Z 2025-12-04T11:45:25.1758796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1758869Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1758914Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1758980Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1759051Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1759148Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1759185Z graph_break [] 2025-12-04T11:45:25.1759245Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1759387Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1759432Z Traceback (most recent call last): 2025-12-04T11:45:25.1759586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1759627Z method(*args, **kwargs) 2025-12-04T11:45:25.1759781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1759822Z method(*args, **kwargs) 2025-12-04T11:45:25.1759993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1760033Z with policy(): 2025-12-04T11:45:25.1760186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1760230Z raise RuntimeError(msg) 2025-12-04T11:45:25.1760612Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1760614Z 2025-12-04T11:45:25.1760688Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1760944Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1760947Z 2025-12-04T11:45:25.1761035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1761120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1761166Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1761232Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1761301Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1761397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1761436Z graph_break [] 2025-12-04T11:45:25.1761495Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1761570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1761611Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1761672Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1761771Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1761839Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1761876Z graph_break [] 2025-12-04T11:45:25.1761936Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1761990Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1762131Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1762176Z Traceback (most recent call last): 2025-12-04T11:45:25.1762333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1762373Z method(*args, **kwargs) 2025-12-04T11:45:25.1762525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1762579Z method(*args, **kwargs) 2025-12-04T11:45:25.1762734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1762771Z with policy(): 2025-12-04T11:45:25.1762923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1762965Z raise RuntimeError(msg) 2025-12-04T11:45:25.1763381Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1763384Z 2025-12-04T11:45:25.1763459Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1763728Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1763730Z 2025-12-04T11:45:25.1763819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1763894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1763937Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1763993Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1764059Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1764154Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1764192Z graph_break [] 2025-12-04T11:45:25.1764251Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1764328Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1764372Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1764429Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1764525Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1764604Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1764652Z graph_break [] 2025-12-04T11:45:25.1764711Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1764785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1764826Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1764880Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1764979Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1765043Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1765080Z graph_break [] 2025-12-04T11:45:25.1765138Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1765332Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-69c39936f8b969fa.xml - 2025-12-04T11:45:25.1765392Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1765963Z FAILED [0.2265s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1765966Z 2025-12-04T11:45:25.1766040Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1766293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1766308Z 2025-12-04T11:45:25.1766401Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1766463Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1766535Z ================== 1 failed, 187 deselected, 2 rerun in 2.11s ================== 2025-12-04T11:45:25.1766573Z Got exit code 1 2025-12-04T11:45:25.1766615Z Retrying single test... 2025-12-04T11:45:25.1766760Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f29cf08f563511e.xml 2025-12-04T11:45:25.1766818Z ============================= test session starts ============================== 2025-12-04T11:45:25.1766928Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1766971Z cachedir: .pytest_cache 2025-12-04T11:45:25.1767137Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1767184Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1767227Z configfile: pytest.ini 2025-12-04T11:45:25.1767389Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1767464Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1767714Z stepcurrent: skipping 97 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1767757Z Running 1 items in this shard 2025-12-04T11:45:25.1767759Z 2025-12-04T11:45:25.1767972Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5870s] [100%] 2025-12-04T11:45:25.1768182Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2628s] [100%] 2025-12-04T11:45:25.1768376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2153s] [100%] 2025-12-04T11:45:25.1768388Z 2025-12-04T11:45:25.1768442Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1768582Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1771073Z Traceback (most recent call last): 2025-12-04T11:45:25.1771237Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1771279Z method(*args, **kwargs) 2025-12-04T11:45:25.1771436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1771477Z method(*args, **kwargs) 2025-12-04T11:45:25.1771629Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1771668Z with policy(): 2025-12-04T11:45:25.1771821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1771863Z raise RuntimeError(msg) 2025-12-04T11:45:25.1772243Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1772246Z 2025-12-04T11:45:25.1772320Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1772595Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1772599Z 2025-12-04T11:45:25.1772687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1772764Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1772806Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1772864Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1772930Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1773032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1773068Z graph_break [] 2025-12-04T11:45:25.1773127Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1773309Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1773356Z Traceback (most recent call last): 2025-12-04T11:45:25.1773509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1773549Z method(*args, **kwargs) 2025-12-04T11:45:25.1773700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1773740Z method(*args, **kwargs) 2025-12-04T11:45:25.1773891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1773929Z with policy(): 2025-12-04T11:45:25.1774079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1774121Z raise RuntimeError(msg) 2025-12-04T11:45:25.1774518Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1774521Z 2025-12-04T11:45:25.1774607Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1774859Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1774861Z 2025-12-04T11:45:25.1774947Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1775021Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1775065Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1775121Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1775191Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1775289Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1775328Z graph_break [] 2025-12-04T11:45:25.1775386Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1775461Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1775503Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1775558Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1775654Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1775718Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1775754Z graph_break [] 2025-12-04T11:45:25.1775811Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1775886Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1776024Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1776070Z Traceback (most recent call last): 2025-12-04T11:45:25.1776224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1776267Z method(*args, **kwargs) 2025-12-04T11:45:25.1776417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1776457Z method(*args, **kwargs) 2025-12-04T11:45:25.1776605Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1776643Z with policy(): 2025-12-04T11:45:25.1776795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1776848Z raise RuntimeError(msg) 2025-12-04T11:45:25.1777231Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1777234Z 2025-12-04T11:45:25.1777307Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1777560Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1777562Z 2025-12-04T11:45:25.1777648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1777720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1777764Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1777820Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1777885Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1777991Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1778041Z graph_break [] 2025-12-04T11:45:25.1778100Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1778173Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1778214Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1778269Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1778363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1778429Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1778466Z graph_break [] 2025-12-04T11:45:25.1778527Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1778601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1778644Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1778699Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1778795Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1778859Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1778895Z graph_break [] 2025-12-04T11:45:25.1778952Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:25.1779144Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2f29cf08f563511e.xml - 2025-12-04T11:45:25.1779204Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1779770Z FAILED [0.2153s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1779784Z 2025-12-04T11:45:25.1779857Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1780109Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1780111Z 2025-12-04T11:45:25.1780197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1780258Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1780338Z ================== 1 failed, 187 deselected, 2 rerun in 2.08s ================== 2025-12-04T11:45:25.1780374Z Got exit code 1 2025-12-04T11:45:25.1780579Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1780706Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.1780853Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09e8ef8267de6b3c.xml 2025-12-04T11:45:25.1780911Z ============================= test session starts ============================== 2025-12-04T11:45:25.1781023Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1781064Z cachedir: .pytest_cache 2025-12-04T11:45:25.1781222Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1781271Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1781311Z configfile: pytest.ini 2025-12-04T11:45:25.1781484Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1781559Z collecting ... collected 188 items / 98 deselected / 90 selected 2025-12-04T11:45:25.1781623Z stepcurrent: skipping 98 already run items. 2025-12-04T11:45:25.1781667Z Running 90 items in this shard 2025-12-04T11:45:25.1781669Z 2025-12-04T11:45:25.1781884Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6057s] [ 1%] 2025-12-04T11:45:25.1782095Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2658s] [ 1%] 2025-12-04T11:45:25.1782281Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2199s] [ 1%] 2025-12-04T11:45:25.1782284Z 2025-12-04T11:45:25.1782334Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1782474Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1782519Z Traceback (most recent call last): 2025-12-04T11:45:25.1782677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1782717Z method(*args, **kwargs) 2025-12-04T11:45:25.1782870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1782909Z method(*args, **kwargs) 2025-12-04T11:45:25.1783060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1783108Z with policy(): 2025-12-04T11:45:25.1783288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1783330Z raise RuntimeError(msg) 2025-12-04T11:45:25.1783715Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1783718Z 2025-12-04T11:45:25.1783791Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1784047Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1784050Z 2025-12-04T11:45:25.1784152Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1784225Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1784269Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1784326Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1784393Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1784490Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1784528Z graph_break [] 2025-12-04T11:45:25.1784589Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1784728Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1784772Z Traceback (most recent call last): 2025-12-04T11:45:25.1784926Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1784966Z method(*args, **kwargs) 2025-12-04T11:45:25.1785117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1785170Z method(*args, **kwargs) 2025-12-04T11:45:25.1785324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1785374Z with policy(): 2025-12-04T11:45:25.1785527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1785567Z raise RuntimeError(msg) 2025-12-04T11:45:25.1785948Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1785952Z 2025-12-04T11:45:25.1786024Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1786281Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1786284Z 2025-12-04T11:45:25.1786371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1786443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1786487Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1786543Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1786608Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1786704Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1786754Z graph_break [] 2025-12-04T11:45:25.1786815Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1786890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1786930Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1786986Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1787081Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1787146Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1787182Z graph_break [] 2025-12-04T11:45:25.1787242Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1787293Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1787433Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1787478Z Traceback (most recent call last): 2025-12-04T11:45:25.1787646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1787686Z method(*args, **kwargs) 2025-12-04T11:45:25.1787842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1787882Z method(*args, **kwargs) 2025-12-04T11:45:25.1788031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1788067Z with policy(): 2025-12-04T11:45:25.1788222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1788263Z raise RuntimeError(msg) 2025-12-04T11:45:25.1788645Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1788648Z 2025-12-04T11:45:25.1788720Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1788989Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1789007Z 2025-12-04T11:45:25.1789095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1789167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1789210Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1789265Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1789330Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1789429Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1789467Z graph_break [] 2025-12-04T11:45:25.1789526Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1789601Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1789643Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1789700Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1789795Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1789860Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1789896Z graph_break [] 2025-12-04T11:45:25.1789956Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1790027Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1790071Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1790139Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1790238Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1790301Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1790340Z graph_break [] 2025-12-04T11:45:25.1790398Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1790593Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-09e8ef8267de6b3c.xml - 2025-12-04T11:45:25.1790651Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1791240Z FAILED [0.2199s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1791244Z 2025-12-04T11:45:25.1791315Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1791570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1791573Z 2025-12-04T11:45:25.1791658Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1791720Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1791788Z ================== 1 failed, 98 deselected, 2 rerun in 2.11s =================== 2025-12-04T11:45:25.1791825Z Got exit code 1 2025-12-04T11:45:25.1791866Z Retrying single test... 2025-12-04T11:45:25.1792011Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e9b49f9a0a5af5ad.xml 2025-12-04T11:45:25.1792071Z ============================= test session starts ============================== 2025-12-04T11:45:25.1792192Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1792233Z cachedir: .pytest_cache 2025-12-04T11:45:25.1792391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1792449Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1792490Z configfile: pytest.ini 2025-12-04T11:45:25.1792651Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1792725Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1792977Z stepcurrent: skipping 98 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1793021Z Running 1 items in this shard 2025-12-04T11:45:25.1793023Z 2025-12-04T11:45:25.1793238Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5940s] [100%] 2025-12-04T11:45:25.1793477Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2687s] [100%] 2025-12-04T11:45:25.1793664Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2214s] [100%] 2025-12-04T11:45:25.1793666Z 2025-12-04T11:45:25.1793716Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1793856Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1793918Z Traceback (most recent call last): 2025-12-04T11:45:25.1794074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1794116Z method(*args, **kwargs) 2025-12-04T11:45:25.1794268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1794310Z method(*args, **kwargs) 2025-12-04T11:45:25.1794460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1794497Z with policy(): 2025-12-04T11:45:25.1794650Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1794694Z raise RuntimeError(msg) 2025-12-04T11:45:25.1795092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1795096Z 2025-12-04T11:45:25.1795169Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1795425Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1795427Z 2025-12-04T11:45:25.1795513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1795585Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1795629Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1795684Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1795752Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1795850Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1795887Z graph_break [] 2025-12-04T11:45:25.1795959Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1796098Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1796155Z Traceback (most recent call last): 2025-12-04T11:45:25.1796308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1796349Z method(*args, **kwargs) 2025-12-04T11:45:25.1796499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1796540Z method(*args, **kwargs) 2025-12-04T11:45:25.1796690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1796727Z with policy(): 2025-12-04T11:45:25.1796879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1796922Z raise RuntimeError(msg) 2025-12-04T11:45:25.1797302Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1797306Z 2025-12-04T11:45:25.1797378Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1797634Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1797647Z 2025-12-04T11:45:25.1797734Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1797807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1797854Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1797909Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1797975Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1798074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1798112Z graph_break [] 2025-12-04T11:45:25.1798171Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1798246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1798286Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1798343Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1798451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1798516Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1798551Z graph_break [] 2025-12-04T11:45:25.1798611Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1798664Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1798805Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1798850Z Traceback (most recent call last): 2025-12-04T11:45:25.1799004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1799042Z method(*args, **kwargs) 2025-12-04T11:45:25.1799194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1799235Z method(*args, **kwargs) 2025-12-04T11:45:25.1799386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1799422Z with policy(): 2025-12-04T11:45:25.1799585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1799642Z raise RuntimeError(msg) 2025-12-04T11:45:25.1800024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1800027Z 2025-12-04T11:45:25.1800099Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1800355Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1800358Z 2025-12-04T11:45:25.1800444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1800518Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1800562Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1800618Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1800683Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1800778Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1800816Z graph_break [] 2025-12-04T11:45:25.1800875Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1800948Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1800989Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1801057Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1801152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1801220Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1801255Z graph_break [] 2025-12-04T11:45:25.1801315Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1801388Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1801430Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1801484Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1801579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1801642Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1801680Z graph_break [] 2025-12-04T11:45:25.1801738Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1801942Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e9b49f9a0a5af5ad.xml - 2025-12-04T11:45:25.1802001Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1802576Z FAILED [0.2214s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1802579Z 2025-12-04T11:45:25.1802651Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1802904Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1802907Z 2025-12-04T11:45:25.1802993Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1803072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1803142Z ================== 1 failed, 187 deselected, 2 rerun in 2.10s ================== 2025-12-04T11:45:25.1803190Z Got exit code 1 2025-12-04T11:45:25.1803230Z Retrying single test... 2025-12-04T11:45:25.1803411Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1cda6f75e146f70.xml 2025-12-04T11:45:25.1803468Z ============================= test session starts ============================== 2025-12-04T11:45:25.1803578Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1803618Z cachedir: .pytest_cache 2025-12-04T11:45:25.1803777Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1803824Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1803864Z configfile: pytest.ini 2025-12-04T11:45:25.1804025Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1804100Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1804350Z stepcurrent: skipping 98 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1804394Z Running 1 items in this shard 2025-12-04T11:45:25.1804398Z 2025-12-04T11:45:25.1804608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5939s] [100%] 2025-12-04T11:45:25.1804840Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2593s] [100%] 2025-12-04T11:45:25.1805026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2174s] [100%] 2025-12-04T11:45:25.1805029Z 2025-12-04T11:45:25.1805079Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1805218Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1805265Z Traceback (most recent call last): 2025-12-04T11:45:25.1805420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1805461Z method(*args, **kwargs) 2025-12-04T11:45:25.1805627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1805669Z method(*args, **kwargs) 2025-12-04T11:45:25.1805820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1805861Z with policy(): 2025-12-04T11:45:25.1806012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1806061Z raise RuntimeError(msg) 2025-12-04T11:45:25.1806442Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.1806447Z 2025-12-04T11:45:25.1806521Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1806780Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1806782Z 2025-12-04T11:45:25.1806881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1806967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1807011Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1807070Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1807134Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1807232Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1807269Z graph_break [] 2025-12-04T11:45:25.1807332Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1807472Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1807520Z Traceback (most recent call last): 2025-12-04T11:45:25.1807674Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1807717Z method(*args, **kwargs) 2025-12-04T11:45:25.1807869Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1807913Z method(*args, **kwargs) 2025-12-04T11:45:25.1808064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1808105Z with policy(): 2025-12-04T11:45:25.1808257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1808298Z raise RuntimeError(msg) 2025-12-04T11:45:25.1808682Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.1808695Z 2025-12-04T11:45:25.1808769Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1809026Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1809029Z 2025-12-04T11:45:25.1809114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1809188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1809230Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1809286Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1809364Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1809467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1809504Z graph_break [] 2025-12-04T11:45:25.1809566Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1809643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1809687Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1809743Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1809843Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1809907Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1809948Z graph_break [] 2025-12-04T11:45:25.1810009Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1810063Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1810205Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1810253Z Traceback (most recent call last): 2025-12-04T11:45:25.1810417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1810481Z method(*args, **kwargs) 2025-12-04T11:45:25.1810630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1810671Z method(*args, **kwargs) 2025-12-04T11:45:25.1810820Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1810857Z with policy(): 2025-12-04T11:45:25.1811008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1811053Z raise RuntimeError(msg) 2025-12-04T11:45:25.1811436Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1811440Z 2025-12-04T11:45:25.1811513Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1811767Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1811770Z 2025-12-04T11:45:25.1811855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1811928Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1811980Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1812040Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1812105Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1812203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1812240Z graph_break [] 2025-12-04T11:45:25.1812302Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1812375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1812417Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1812472Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1812568Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1812631Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1812667Z graph_break [] 2025-12-04T11:45:25.1812725Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1812810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1812851Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1812909Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1813003Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1813070Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.1813106Z graph_break [] 2025-12-04T11:45:25.1813169Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:25.1813395Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f1cda6f75e146f70.xml - 2025-12-04T11:45:25.1813455Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1814045Z FAILED [0.2174s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.1814061Z 2025-12-04T11:45:25.1814136Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1814391Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1814394Z 2025-12-04T11:45:25.1814481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1814543Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1814611Z ================== 1 failed, 187 deselected, 2 rerun in 2.09s ================== 2025-12-04T11:45:25.1814650Z Got exit code 1 2025-12-04T11:45:25.1814857Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1814986Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.1815132Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-630047a1707181a4.xml 2025-12-04T11:45:25.1815188Z ============================= test session starts ============================== 2025-12-04T11:45:25.1815301Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1815343Z cachedir: .pytest_cache 2025-12-04T11:45:25.1815500Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1815560Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1815601Z configfile: pytest.ini 2025-12-04T11:45:25.1815766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1815839Z collecting ... collected 188 items / 99 deselected / 89 selected 2025-12-04T11:45:25.1815894Z stepcurrent: skipping 99 already run items. 2025-12-04T11:45:25.1815939Z Running 89 items in this shard 2025-12-04T11:45:25.1815941Z 2025-12-04T11:45:25.1816152Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8899s] [ 1%] 2025-12-04T11:45:25.1816358Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5085s] [ 1%] 2025-12-04T11:45:25.1816562Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5813s] [ 1%] 2025-12-04T11:45:25.1816565Z 2025-12-04T11:45:25.1816616Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1816755Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1816800Z Traceback (most recent call last): 2025-12-04T11:45:25.1816957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1816997Z method(*args, **kwargs) 2025-12-04T11:45:25.1817152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1817191Z method(*args, **kwargs) 2025-12-04T11:45:25.1817343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1817383Z with policy(): 2025-12-04T11:45:25.1817536Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1817577Z raise RuntimeError(msg) 2025-12-04T11:45:25.1817965Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.1817978Z 2025-12-04T11:45:25.1818050Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1818304Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1818306Z 2025-12-04T11:45:25.1818394Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1818466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1818509Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1818565Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1819057Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1819157Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1819193Z graph_break [] 2025-12-04T11:45:25.1819253Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1819340Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1819827Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1819876Z current_size = base.storage().size() 2025-12-04T11:45:25.1819918Z Autotune Choices Stats: 2025-12-04T11:45:25.1820291Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:25.1820359Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1820408Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1820533Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1820775Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1821004Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1821229Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1821454Z triton_mm_1 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1821507Z _scaled_mm 0.0248 ms 24.0% 2025-12-04T11:45:25.1821637Z SingleProcess AUTOTUNE benchmarking takes 0.0241 seconds and 0.1264 seconds precompiling for 5 choices 2025-12-04T11:45:25.1821786Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1821830Z Traceback (most recent call last): 2025-12-04T11:45:25.1821985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1822026Z method(*args, **kwargs) 2025-12-04T11:45:25.1822179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1822220Z method(*args, **kwargs) 2025-12-04T11:45:25.1822372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1822408Z with policy(): 2025-12-04T11:45:25.1822568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1822609Z raise RuntimeError(msg) 2025-12-04T11:45:25.1822992Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.1822994Z 2025-12-04T11:45:25.1823068Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1823348Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1823366Z 2025-12-04T11:45:25.1823457Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1823529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1823574Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1823631Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1824112Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1824225Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1824263Z graph_break [] 2025-12-04T11:45:25.1824322Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1824398Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1824884Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1824932Z current_size = base.storage().size() 2025-12-04T11:45:25.1824975Z Autotune Choices Stats: 2025-12-04T11:45:25.1825340Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:25.1825397Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1825458Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1825594Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1825824Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1826056Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1826279Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1826504Z triton_mm_1 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1826545Z _scaled_mm 0.0248 ms 24.0% 2025-12-04T11:45:25.1826673Z SingleProcess AUTOTUNE benchmarking takes 0.0241 seconds and 0.1264 seconds precompiling for 5 choices 2025-12-04T11:45:25.1826746Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1826789Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1826844Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1826943Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1827436Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1827473Z graph_break [] 2025-12-04T11:45:25.1827534Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1827608Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1827650Z Autotune Choices Stats: 2025-12-04T11:45:25.1828022Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:25.1828078Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1828126Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1828248Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1828478Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1828708Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1828935Z triton_mm_7 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1829169Z triton_mm_5 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1829224Z _scaled_mm 0.0232 ms 26.2% 2025-12-04T11:45:25.1829351Z SingleProcess AUTOTUNE benchmarking takes 0.0228 seconds and 0.1058 seconds precompiling for 5 choices 2025-12-04T11:45:25.1829405Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1829542Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1829588Z Traceback (most recent call last): 2025-12-04T11:45:25.1829744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1829789Z method(*args, **kwargs) 2025-12-04T11:45:25.1829943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1829987Z method(*args, **kwargs) 2025-12-04T11:45:25.1830138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1830177Z with policy(): 2025-12-04T11:45:25.1830333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1830377Z raise RuntimeError(msg) 2025-12-04T11:45:25.1830759Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1830778Z 2025-12-04T11:45:25.1830855Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1831112Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1831118Z 2025-12-04T11:45:25.1831205Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1831280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1831321Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1831380Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1831866Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1831969Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1832006Z graph_break [] 2025-12-04T11:45:25.1832067Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1832141Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1832624Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1832670Z current_size = base.storage().size() 2025-12-04T11:45:25.1832714Z Autotune Choices Stats: 2025-12-04T11:45:25.1833086Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:25.1833151Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1833202Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1833355Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1833588Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1833812Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1834039Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1834263Z triton_mm_1 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1834307Z _scaled_mm 0.0248 ms 24.0% 2025-12-04T11:45:25.1834437Z SingleProcess AUTOTUNE benchmarking takes 0.0241 seconds and 0.1264 seconds precompiling for 5 choices 2025-12-04T11:45:25.1834512Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1834568Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1834627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1834728Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1835214Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1835252Z graph_break [] 2025-12-04T11:45:25.1835310Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1835385Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1835424Z Autotune Choices Stats: 2025-12-04T11:45:25.1835797Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:25.1835852Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1835902Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1836023Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1836253Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1836479Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1836713Z triton_mm_7 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1836949Z triton_mm_5 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1836990Z _scaled_mm 0.0232 ms 26.2% 2025-12-04T11:45:25.1837122Z SingleProcess AUTOTUNE benchmarking takes 0.0228 seconds and 0.1058 seconds precompiling for 5 choices 2025-12-04T11:45:25.1837195Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1837241Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1837298Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1837401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1837879Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1837919Z graph_break [] 2025-12-04T11:45:25.1837978Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1838052Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1838092Z Autotune Choices Stats: 2025-12-04T11:45:25.1838451Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.1838516Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1838570Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1838691Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1838919Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1839145Z triton_mm_8 0.0060 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1839385Z triton_mm_11 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1839612Z triton_mm_10 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1839654Z _scaled_mm 0.0260 ms 23.1% 2025-12-04T11:45:25.1839783Z SingleProcess AUTOTUNE benchmarking takes 0.0331 seconds and 0.1960 seconds precompiling for 5 choices 2025-12-04T11:45:25.1839972Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-630047a1707181a4.xml - 2025-12-04T11:45:25.1840035Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1840623Z FAILED [0.5813s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1840638Z 2025-12-04T11:45:25.1840712Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1840970Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1840972Z 2025-12-04T11:45:25.1841059Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1841123Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1841193Z ================== 1 failed, 99 deselected, 2 rerun in 3.00s =================== 2025-12-04T11:45:25.1841232Z Got exit code 1 2025-12-04T11:45:25.1841271Z Retrying single test... 2025-12-04T11:45:25.1841416Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-273f558983266f13.xml 2025-12-04T11:45:25.1841473Z ============================= test session starts ============================== 2025-12-04T11:45:25.1841583Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1841622Z cachedir: .pytest_cache 2025-12-04T11:45:25.1841781Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1841826Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1841868Z configfile: pytest.ini 2025-12-04T11:45:25.1842031Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1842119Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1842369Z stepcurrent: skipping 99 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1842414Z Running 1 items in this shard 2025-12-04T11:45:25.1842416Z 2025-12-04T11:45:25.1842627Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9284s] [100%] 2025-12-04T11:45:25.1842835Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5385s] [100%] 2025-12-04T11:45:25.1843031Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5984s] [100%] 2025-12-04T11:45:25.1843034Z 2025-12-04T11:45:25.1843086Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1843227Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1843298Z Traceback (most recent call last): 2025-12-04T11:45:25.1843457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1843500Z method(*args, **kwargs) 2025-12-04T11:45:25.1843656Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1843696Z method(*args, **kwargs) 2025-12-04T11:45:25.1843848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1843887Z with policy(): 2025-12-04T11:45:25.1844046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1844089Z raise RuntimeError(msg) 2025-12-04T11:45:25.1844490Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.1844511Z 2025-12-04T11:45:25.1844587Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1844842Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1844844Z 2025-12-04T11:45:25.1844930Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1845004Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1845051Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1845107Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1845597Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1845695Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1845733Z graph_break [] 2025-12-04T11:45:25.1845792Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1845881Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1846369Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1846419Z current_size = base.storage().size() 2025-12-04T11:45:25.1846459Z Autotune Choices Stats: 2025-12-04T11:45:25.1846825Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.1846880Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1846942Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1847064Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1847298Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1847528Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1847755Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1847981Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1848032Z _scaled_mm 0.0242 ms 25.0% 2025-12-04T11:45:25.1848165Z SingleProcess AUTOTUNE benchmarking takes 0.0261 seconds and 0.1301 seconds precompiling for 5 choices 2025-12-04T11:45:25.1848313Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1848363Z Traceback (most recent call last): 2025-12-04T11:45:25.1848517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1848559Z method(*args, **kwargs) 2025-12-04T11:45:25.1848713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1848755Z method(*args, **kwargs) 2025-12-04T11:45:25.1848909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1848945Z with policy(): 2025-12-04T11:45:25.1849100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1849142Z raise RuntimeError(msg) 2025-12-04T11:45:25.1849533Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.1849536Z 2025-12-04T11:45:25.1849608Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1849866Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1849878Z 2025-12-04T11:45:25.1849965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1850042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1850085Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1850144Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1850624Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1850723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1850773Z graph_break [] 2025-12-04T11:45:25.1850831Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1850909Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1851397Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1851447Z current_size = base.storage().size() 2025-12-04T11:45:25.1851488Z Autotune Choices Stats: 2025-12-04T11:45:25.1851851Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.1851906Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1851967Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1852088Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1852333Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1852558Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1852786Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1853014Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1853055Z _scaled_mm 0.0242 ms 25.0% 2025-12-04T11:45:25.1853185Z SingleProcess AUTOTUNE benchmarking takes 0.0261 seconds and 0.1301 seconds precompiling for 5 choices 2025-12-04T11:45:25.1853293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1853336Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1853393Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1853493Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1853989Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1854029Z graph_break [] 2025-12-04T11:45:25.1854089Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1854163Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1854202Z Autotune Choices Stats: 2025-12-04T11:45:25.1854576Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.1854631Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1854680Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1854803Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1855032Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1855256Z triton_mm_4 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1855478Z triton_mm_7 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1855714Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1855770Z _scaled_mm 0.0238 ms 25.8% 2025-12-04T11:45:25.1855898Z SingleProcess AUTOTUNE benchmarking takes 0.0243 seconds and 0.0971 seconds precompiling for 5 choices 2025-12-04T11:45:25.1855950Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1856090Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1856137Z Traceback (most recent call last): 2025-12-04T11:45:25.1856294Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1856335Z method(*args, **kwargs) 2025-12-04T11:45:25.1856490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1856529Z method(*args, **kwargs) 2025-12-04T11:45:25.1856682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1856721Z with policy(): 2025-12-04T11:45:25.1856873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1856914Z raise RuntimeError(msg) 2025-12-04T11:45:25.1857295Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1857309Z 2025-12-04T11:45:25.1857384Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1857641Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1857644Z 2025-12-04T11:45:25.1857733Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1857805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1857849Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1857906Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1858402Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1858504Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1858542Z graph_break [] 2025-12-04T11:45:25.1858601Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1858677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1859164Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1859211Z current_size = base.storage().size() 2025-12-04T11:45:25.1859254Z Autotune Choices Stats: 2025-12-04T11:45:25.1859636Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.1859705Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1859754Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1859876Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1860106Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1860333Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1860563Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1860788Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1860832Z _scaled_mm 0.0242 ms 25.0% 2025-12-04T11:45:25.1860962Z SingleProcess AUTOTUNE benchmarking takes 0.0261 seconds and 0.1301 seconds precompiling for 5 choices 2025-12-04T11:45:25.1861037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1861089Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1861148Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1861246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1861727Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1861764Z graph_break [] 2025-12-04T11:45:25.1861824Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1861897Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1861937Z Autotune Choices Stats: 2025-12-04T11:45:25.1862305Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.1862360Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1862408Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1862528Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1862758Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1862982Z triton_mm_4 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1863216Z triton_mm_7 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1863475Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1863536Z _scaled_mm 0.0238 ms 25.8% 2025-12-04T11:45:25.1863662Z SingleProcess AUTOTUNE benchmarking takes 0.0243 seconds and 0.0971 seconds precompiling for 5 choices 2025-12-04T11:45:25.1863737Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1863778Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1863834Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1863935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1864414Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1864452Z graph_break [] 2025-12-04T11:45:25.1864510Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1864584Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1864623Z Autotune Choices Stats: 2025-12-04T11:45:25.1864980Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005799999926239252, "best_triton_pos": 0} 2025-12-04T11:45:25.1865045Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1865095Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1865214Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1865442Z triton_mm_8 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1865666Z triton_mm_10 0.0061 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1865904Z triton_mm_9 0.0062 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1866131Z triton_mm_11 0.0063 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1866172Z _scaled_mm 0.0232 ms 25.0% 2025-12-04T11:45:25.1866298Z SingleProcess AUTOTUNE benchmarking takes 0.0333 seconds and 0.1964 seconds precompiling for 5 choices 2025-12-04T11:45:25.1866488Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-273f558983266f13.xml - 2025-12-04T11:45:25.1866550Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1867138Z FAILED [0.5984s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1867152Z 2025-12-04T11:45:25.1867228Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1867484Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1867486Z 2025-12-04T11:45:25.1867573Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1867637Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1867706Z ================== 1 failed, 187 deselected, 2 rerun in 3.08s ================== 2025-12-04T11:45:25.1867743Z Got exit code 1 2025-12-04T11:45:25.1867782Z Retrying single test... 2025-12-04T11:45:25.1867928Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bad8aae5f82545af.xml 2025-12-04T11:45:25.1867985Z ============================= test session starts ============================== 2025-12-04T11:45:25.1868094Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1868134Z cachedir: .pytest_cache 2025-12-04T11:45:25.1868292Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1868338Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1868381Z configfile: pytest.ini 2025-12-04T11:45:25.1868541Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1868627Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1868877Z stepcurrent: skipping 99 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1868923Z Running 1 items in this shard 2025-12-04T11:45:25.1868925Z 2025-12-04T11:45:25.1869134Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8956s] [100%] 2025-12-04T11:45:25.1869343Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5167s] [100%] 2025-12-04T11:45:25.1869526Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5843s] [100%] 2025-12-04T11:45:25.1869539Z 2025-12-04T11:45:25.1869591Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1869729Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1869777Z Traceback (most recent call last): 2025-12-04T11:45:25.1869935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1869976Z method(*args, **kwargs) 2025-12-04T11:45:25.1870127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1870170Z method(*args, **kwargs) 2025-12-04T11:45:25.1870322Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1870358Z with policy(): 2025-12-04T11:45:25.1870515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1870555Z raise RuntimeError(msg) 2025-12-04T11:45:25.1870951Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.1870963Z 2025-12-04T11:45:25.1871036Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1871293Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1871295Z 2025-12-04T11:45:25.1871382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1871458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1871501Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1871558Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1872039Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1872140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1872177Z graph_break [] 2025-12-04T11:45:25.1872238Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1872311Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1872815Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1872865Z current_size = base.storage().size() 2025-12-04T11:45:25.1872905Z Autotune Choices Stats: 2025-12-04T11:45:25.1873299Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.1873353Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1873429Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1873549Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1873784Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1874011Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1874236Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1874463Z triton_mm_1 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1874503Z _scaled_mm 0.0136 ms 44.1% 2025-12-04T11:45:25.1874643Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1257 seconds precompiling for 5 choices 2025-12-04T11:45:25.1874796Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1874843Z Traceback (most recent call last): 2025-12-04T11:45:25.1874998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1875044Z method(*args, **kwargs) 2025-12-04T11:45:25.1875196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1875236Z method(*args, **kwargs) 2025-12-04T11:45:25.1875389Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1875429Z with policy(): 2025-12-04T11:45:25.1875581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1875624Z raise RuntimeError(msg) 2025-12-04T11:45:25.1876009Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.1876012Z 2025-12-04T11:45:25.1876085Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1876344Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1876359Z 2025-12-04T11:45:25.1876446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1876520Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1876562Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1876620Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1877101Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1877200Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1877238Z graph_break [] 2025-12-04T11:45:25.1877310Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1877383Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1877870Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1877917Z current_size = base.storage().size() 2025-12-04T11:45:25.1877958Z Autotune Choices Stats: 2025-12-04T11:45:25.1878320Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.1878373Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1878432Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1878554Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1878797Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1879021Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1879245Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1879473Z triton_mm_1 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1879516Z _scaled_mm 0.0136 ms 44.1% 2025-12-04T11:45:25.1879643Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1257 seconds precompiling for 5 choices 2025-12-04T11:45:25.1879717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1879758Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1879815Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1879914Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1880406Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1880444Z graph_break [] 2025-12-04T11:45:25.1880504Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1880580Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1880619Z Autotune Choices Stats: 2025-12-04T11:45:25.1880977Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.1881040Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1881089Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1881210Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1881438Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1881661Z triton_mm_6 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1881884Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1882117Z triton_mm_7 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1882157Z _scaled_mm 0.0266 ms 23.2% 2025-12-04T11:45:25.1882296Z SingleProcess AUTOTUNE benchmarking takes 0.0235 seconds and 0.0702 seconds precompiling for 5 choices 2025-12-04T11:45:25.1882348Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1882486Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1882531Z Traceback (most recent call last): 2025-12-04T11:45:25.1882687Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1882728Z method(*args, **kwargs) 2025-12-04T11:45:25.1882882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1882923Z method(*args, **kwargs) 2025-12-04T11:45:25.1883075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1883112Z with policy(): 2025-12-04T11:45:25.1883297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1883338Z raise RuntimeError(msg) 2025-12-04T11:45:25.1883722Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1883740Z 2025-12-04T11:45:25.1883814Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1884071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1884073Z 2025-12-04T11:45:25.1884163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1884236Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1884278Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1884335Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1884826Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1884925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1884964Z graph_break [] 2025-12-04T11:45:25.1885023Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1885098Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1885581Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1885631Z current_size = base.storage().size() 2025-12-04T11:45:25.1885670Z Autotune Choices Stats: 2025-12-04T11:45:25.1886049Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.1886115Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1886163Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1886284Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1886514Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1886738Z triton_mm_3 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1886962Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1887187Z triton_mm_1 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1887227Z _scaled_mm 0.0136 ms 44.1% 2025-12-04T11:45:25.1887357Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1257 seconds precompiling for 5 choices 2025-12-04T11:45:25.1887430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1887475Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1887550Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1887652Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1888132Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1888172Z graph_break [] 2025-12-04T11:45:25.1888233Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1888307Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1888350Z Autotune Choices Stats: 2025-12-04T11:45:25.1888722Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.1888780Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1888830Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1888951Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1889178Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1889403Z triton_mm_6 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1889638Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1889862Z triton_mm_7 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1889915Z _scaled_mm 0.0266 ms 23.2% 2025-12-04T11:45:25.1890042Z SingleProcess AUTOTUNE benchmarking takes 0.0235 seconds and 0.0702 seconds precompiling for 5 choices 2025-12-04T11:45:25.1890116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1890158Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1890216Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1890316Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1890796Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1890834Z graph_break [] 2025-12-04T11:45:25.1890895Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:25.1890969Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1891011Z Autotune Choices Stats: 2025-12-04T11:45:25.1891369Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.1891434Z AUTOTUNE scaled_mm(33x32, 32x16, 33x1, 1x16, 16) 2025-12-04T11:45:25.1891484Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1891605Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1891843Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1892067Z triton_mm_9 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1892302Z triton_mm_8 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1892528Z triton_mm_10 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.1892571Z _scaled_mm 0.0210 ms 28.7% 2025-12-04T11:45:25.1892699Z SingleProcess AUTOTUNE benchmarking takes 0.0337 seconds and 0.1972 seconds precompiling for 5 choices 2025-12-04T11:45:25.1892894Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bad8aae5f82545af.xml - 2025-12-04T11:45:25.1892954Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1893571Z FAILED [0.5843s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.1893586Z 2025-12-04T11:45:25.1893662Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1893918Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1893920Z 2025-12-04T11:45:25.1894007Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1894069Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1894140Z ================== 1 failed, 187 deselected, 2 rerun in 3.01s ================== 2025-12-04T11:45:25.1894178Z Got exit code 1 2025-12-04T11:45:25.1894385Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.1894511Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.1894656Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b2d45033e1b70f7.xml 2025-12-04T11:45:25.1894713Z ============================= test session starts ============================== 2025-12-04T11:45:25.1894826Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1894866Z cachedir: .pytest_cache 2025-12-04T11:45:25.1895023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1895083Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1895126Z configfile: pytest.ini 2025-12-04T11:45:25.1895286Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1895365Z collecting ... collected 188 items / 100 deselected / 88 selected 2025-12-04T11:45:25.1895419Z stepcurrent: skipping 100 already run items. 2025-12-04T11:45:25.1895466Z Running 88 items in this shard 2025-12-04T11:45:25.1895468Z 2025-12-04T11:45:25.1896398Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpfsb7x6n4/us/cusjrl6flaffm4ukvzgsaiztwmfuxs46u7fjn3avl4wrw2kwt5c7.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.1896549Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1896772Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1896930Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1897079Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1897373Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1897522Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1897793Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1897931Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1898188Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1898347Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1898618Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1898753Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1899028Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1899223Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1899552Z E1204 11:21:21.750000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1900281Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpfsb7x6n4/lq/clq4m6rrx43n4nmm4mrr7xyryqphd652biolnofogvs3zsxjmxd6.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.1900430Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1900660Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1900819Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1900964Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1901249Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1901379Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1901638Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1901785Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1902040Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1902211Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1902480Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1902616Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1903011Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1903206Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1903544Z E1204 11:21:21.817000 878494 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1903597Z ('RERUN', {'yellow': True}) [2.2900s] [ 1%] 2025-12-04T11:45:25.1903912Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda E1204 11:21:22.757000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1904234Z E1204 11:21:22.757000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.1904364Z E1204 11:21:22.757000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1904508Z E1204 11:21:22.771000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1904800Z E1204 11:21:22.771000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.1904942Z E1204 11:21:22.771000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1904993Z ('RERUN', {'yellow': True}) [0.9049s] [ 1%] 2025-12-04T11:45:25.1905305Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda E1204 11:21:23.619000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1905600Z E1204 11:21:23.619000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.1905726Z E1204 11:21:23.619000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1905869Z E1204 11:21:23.634000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1906178Z E1204 11:21:23.634000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.1906304Z E1204 11:21:23.634000 878494 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.1906359Z FAILED [0.9363s] [ 1%] 2025-12-04T11:45:25.1906361Z 2025-12-04T11:45:25.1906414Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.1906554Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1906599Z Traceback (most recent call last): 2025-12-04T11:45:25.1906757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1906800Z method(*args, **kwargs) 2025-12-04T11:45:25.1906954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1906994Z method(*args, **kwargs) 2025-12-04T11:45:25.1907147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1907184Z with policy(): 2025-12-04T11:45:25.1907339Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1907379Z raise RuntimeError(msg) 2025-12-04T11:45:25.1907775Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1048576000. 2025-12-04T11:45:25.1907788Z 2025-12-04T11:45:25.1907863Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1908126Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1908129Z 2025-12-04T11:45:25.1908218Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1908292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1908336Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1908393Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1908969Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1909070Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1909109Z graph_break [] 2025-12-04T11:45:25.1909173Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1909250Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1909737Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1909789Z current_size = base.storage().size() 2025-12-04T11:45:25.1909829Z Autotune Choices Stats: 2025-12-04T11:45:25.1910213Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:25.1910291Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1910340Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1910466Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1910698Z triton_mm_8 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1910931Z triton_mm_9 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1911158Z triton_mm_15 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1911383Z triton_mm_2 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1911607Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1911842Z triton_mm_6 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1912070Z triton_mm_11 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1912292Z triton_mm_12 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1912520Z triton_mm_13 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1912751Z triton_mm_5 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1912883Z SingleProcess AUTOTUNE benchmarking takes 0.0608 seconds and 0.3863 seconds precompiling for 15 choices 2025-12-04T11:45:25.1913026Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1913075Z Traceback (most recent call last): 2025-12-04T11:45:25.1913235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1913312Z method(*args, **kwargs) 2025-12-04T11:45:25.1913466Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1913507Z method(*args, **kwargs) 2025-12-04T11:45:25.1913662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1913699Z with policy(): 2025-12-04T11:45:25.1913871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1913925Z raise RuntimeError(msg) 2025-12-04T11:45:25.1914313Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1048576000 and is now 1109393408. 2025-12-04T11:45:25.1914316Z 2025-12-04T11:45:25.1914389Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1914647Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1914650Z 2025-12-04T11:45:25.1914738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1914814Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1914858Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1914918Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1915470Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1915585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1915625Z graph_break [] 2025-12-04T11:45:25.1915688Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1915763Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1916251Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1916302Z current_size = base.storage().size() 2025-12-04T11:45:25.1916342Z Autotune Choices Stats: 2025-12-04T11:45:25.1916728Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:25.1916793Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1916843Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1916964Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1917197Z triton_mm_8 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1917424Z triton_mm_9 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1917652Z triton_mm_15 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1917895Z triton_mm_2 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1918129Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1918352Z triton_mm_6 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1918577Z triton_mm_11 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1918807Z triton_mm_12 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1919036Z triton_mm_13 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1919257Z triton_mm_5 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1919402Z SingleProcess AUTOTUNE benchmarking takes 0.0608 seconds and 0.3863 seconds precompiling for 15 choices 2025-12-04T11:45:25.1919479Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1919526Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1919582Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1919686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1920175Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1920217Z graph_break [] 2025-12-04T11:45:25.1920278Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1920366Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1920407Z Autotune Choices Stats: 2025-12-04T11:45:25.1920771Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.1920832Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1920879Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1921002Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1921232Z triton_mm_22 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1921471Z triton_mm_27 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1921706Z triton_mm_28 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1921932Z triton_mm_21 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1922160Z triton_mm_31 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1922383Z triton_mm_23 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1922611Z triton_mm_25 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1922834Z triton_mm_24 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1923060Z triton_mm_18 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1923323Z triton_mm_26 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1923456Z SingleProcess AUTOTUNE benchmarking takes 0.0912 seconds and 0.3424 seconds precompiling for 17 choices 2025-12-04T11:45:25.1923509Z =================================== FAILURES =================================== 2025-12-04T11:45:25.1923648Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.1923696Z Traceback (most recent call last): 2025-12-04T11:45:25.1923852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1923895Z method(*args, **kwargs) 2025-12-04T11:45:25.1924063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.1924106Z method(*args, **kwargs) 2025-12-04T11:45:25.1924257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.1924296Z with policy(): 2025-12-04T11:45:25.1924450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.1924493Z raise RuntimeError(msg) 2025-12-04T11:45:25.1924881Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.1924884Z 2025-12-04T11:45:25.1924963Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1925233Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1925235Z 2025-12-04T11:45:25.1925338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1925410Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1925457Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1925516Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1926069Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1926170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1926207Z graph_break [] 2025-12-04T11:45:25.1926270Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1926344Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1926831Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.1926878Z current_size = base.storage().size() 2025-12-04T11:45:25.1926932Z Autotune Choices Stats: 2025-12-04T11:45:25.1927295Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:25.1927358Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1927407Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1927531Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1927763Z triton_mm_8 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1928001Z triton_mm_9 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1928230Z triton_mm_15 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1928456Z triton_mm_2 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1928680Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1928905Z triton_mm_6 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1929143Z triton_mm_11 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1929380Z triton_mm_12 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1929605Z triton_mm_13 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1929831Z triton_mm_5 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1929960Z SingleProcess AUTOTUNE benchmarking takes 0.0608 seconds and 0.3863 seconds precompiling for 15 choices 2025-12-04T11:45:25.1930035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1930078Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1930136Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1930234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1930723Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1930772Z graph_break [] 2025-12-04T11:45:25.1930832Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1930907Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1930950Z Autotune Choices Stats: 2025-12-04T11:45:25.1931316Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.1931375Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1931425Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1931573Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1931810Z triton_mm_22 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1932036Z triton_mm_27 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1932261Z triton_mm_28 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1932484Z triton_mm_21 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1932722Z triton_mm_31 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1932946Z triton_mm_23 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1933185Z triton_mm_25 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1933443Z triton_mm_24 0.0065 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1933668Z triton_mm_18 0.0067 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1933894Z triton_mm_26 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1934025Z SingleProcess AUTOTUNE benchmarking takes 0.0912 seconds and 0.3424 seconds precompiling for 17 choices 2025-12-04T11:45:25.1934098Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.1934141Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.1934198Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.1934314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.1934799Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.1934838Z graph_break [] 2025-12-04T11:45:25.1934898Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.1934972Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.1935011Z Autotune Choices Stats: 2025-12-04T11:45:25.1935384Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_43", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.1935445Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.1935496Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.1935615Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.1935847Z triton_mm_43 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1936072Z triton_mm_39 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1936297Z triton_mm_40 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1936534Z triton_mm_47 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1936773Z triton_mm_41 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1936998Z triton_mm_38 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1937221Z triton_mm_44 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.1937450Z triton_mm_37 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1937677Z triton_mm_34 0.0065 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1937902Z triton_mm_42 0.0066 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.1938044Z SingleProcess AUTOTUNE benchmarking takes 0.1197 seconds and 0.3409 seconds precompiling for 17 choices 2025-12-04T11:45:25.1938234Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b2d45033e1b70f7.xml - 2025-12-04T11:45:25.1938297Z =========================== short test summary info ============================ 2025-12-04T11:45:25.1938879Z FAILED [0.9363s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.1938884Z 2025-12-04T11:45:25.1938958Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.1939229Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1939231Z 2025-12-04T11:45:25.1939318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.1939382Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.1939450Z ================== 1 failed, 100 deselected, 2 rerun in 4.15s ================== 2025-12-04T11:45:25.1939489Z Got exit code 1 2025-12-04T11:45:25.1939530Z Retrying single test... 2025-12-04T11:45:25.1939676Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51093d45c7cf8f34.xml 2025-12-04T11:45:25.1939733Z ============================= test session starts ============================== 2025-12-04T11:45:25.1939847Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.1939890Z cachedir: .pytest_cache 2025-12-04T11:45:25.1940053Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.1940111Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.1940152Z configfile: pytest.ini 2025-12-04T11:45:25.1940325Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.1940403Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.1940652Z stepcurrent: skipping 100 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.1940699Z Running 1 items in this shard 2025-12-04T11:45:25.1940701Z 2025-12-04T11:45:25.1941031Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:32.883149445 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1941034Z 2025-12-04T11:45:25.1941351Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1941650Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1941783Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1942272Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1942539Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1942764Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1942975Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1943187Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1943512Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1943748Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1944046Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1944280Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1944584Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1944831Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1945121Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1945354Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1945646Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1945882Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1946175Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1946406Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1946718Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1946915Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1947148Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1947439Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1947649Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1947884Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1948177Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1948410Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1948701Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1948935Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1949158Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1949358Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1949572Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1949743Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1949925Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1950444Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp7j_xcbpm/us/cusjrl6flaffm4ukvzgsaiztwmfuxs46u7fjn3avl4wrw2kwt5c7.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.1950594Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1950812Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1950978Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1951126Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1951415Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1951549Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1951818Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1951960Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1952215Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1952372Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1952643Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1952852Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1953147Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1953359Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1953693Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1953988Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1954121Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1954606Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1954858Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1955089Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1955308Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1955510Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1955803Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1956039Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1956345Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1956577Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1956869Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1957098Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1959328Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1959581Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1959887Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1960117Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1960407Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1960641Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1960931Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1961128Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1961359Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1961661Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1961858Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1962090Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1962381Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1962623Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1962916Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1963136Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1963380Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1963581Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1963793Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1963976Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1964174Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1964280Z E1204 11:21:32.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1964438Z [W1204 11:21:32.158980978 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1964441Z 2025-12-04T11:45:25.1964751Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1965047Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1965178Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1965658Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1965923Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1966149Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1966355Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1966553Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1966859Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1967092Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1967383Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1967615Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1967909Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1968152Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1968451Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1968682Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1968972Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1969205Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1969493Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1969726Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1970018Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1970224Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1970457Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1970747Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1970942Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1971182Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1971474Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1971705Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1971996Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1972217Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1972432Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1972633Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1972854Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1973019Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1973196Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1973751Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp7j_xcbpm/lq/clq4m6rrx43n4nmm4mrr7xyryqphd652biolnofogvs3zsxjmxd6.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.1973900Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.1974116Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.1974272Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.1974432Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.1974723Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.1974856Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.1975112Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.1975250Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.1975517Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.1975676Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.1975945Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.1976078Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.1976354Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.1976549Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.1976878Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1977184Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1977313Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1977795Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1978048Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1978274Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1978478Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1978690Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1978983Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1979218Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1979509Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1979751Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1980046Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1980276Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1980566Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1980877Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1981189Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1981432Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1981721Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1981956Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1982248Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1982444Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1982675Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1982966Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1983173Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1983443Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1983734Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1983965Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1984271Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1984493Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1984702Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.1984902Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.1985110Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.1985280Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.1985470Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.1985573Z E1204 11:21:32.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.1985638Z ('RERUN', {'yellow': True}) [2.5201s] [100%] 2025-12-04T11:45:25.1985970Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:33.135258436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.1985973Z 2025-12-04T11:45:25.1986119Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.1986415Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.1986709Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.1986839Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.1987314Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.1987578Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.1987808Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.1988013Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.1988213Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1988516Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1988749Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1989044Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1989274Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1989565Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1989810Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1990110Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1990340Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1990631Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1990851Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1991057Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1991253Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1991460Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1991671Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1991907Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1992198Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1992393Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1992624Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1992926Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1993146Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1993370Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1993590Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1993797Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1994006Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1994199Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1994429Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1994633Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1994830Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1995026Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.1995256Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1995552Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1995784Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1996089Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1996307Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.1996511Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.1996707Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.1996935Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.1997137Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.1997366Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1997665Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1997895Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1998190Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1998430Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1998730Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1998959Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1999251Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.1999481Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.1999776Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2000005Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2000297Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2000536Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2000827Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2001057Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2001358Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2001592Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2001883Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2002113Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2002404Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2002648Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2002938Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2003167Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2003401Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2003598Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2003892Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2004123Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2004415Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2004645Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2004953Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2005185Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2005475Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2005721Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2006012Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2006244Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2006534Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2006728Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2006926Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2007135Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2007355Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2007552Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2007786Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2008080Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2008273Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2008470Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2008664Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2008858Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2009101Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2009391Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2009624Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2009915Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2010121Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2010329Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2010530Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2010762Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2011056Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2011285Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2011499Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2011699Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2011899Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2012193Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2012426Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2012720Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2012951Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2013243Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2013527Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2013821Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2014054Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2014359Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2014559Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2014756Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2014976Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2015178Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2015376Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2015591Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2015896Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2016127Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2016418Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2016654Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2016945Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2017178Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2017470Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2017718Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2018011Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2018235Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2018435Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2018644Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2018837Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2019049Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2019248Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2019543Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2019766Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2019979Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2020189Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2020390Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2020683Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2020917Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2021208Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2021442Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2021733Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2021980Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2022272Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2022506Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2022796Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2023040Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2023359Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2023592Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2023884Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2024117Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2024423Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2024667Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2024959Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2025192Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2025484Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2025685Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2025881Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2026117Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2026423Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2026655Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2026948Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2027178Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2027484Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2027723Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2028019Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2028252Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2028543Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2028750Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2028997Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2029288Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2029522Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2029818Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2030033Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2030236Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2030434Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2030645Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2030939Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2031152Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2031352Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2031551Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2031764Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2032058Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2032278Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2032480Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2032678Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2032881Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2033030Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2033239Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2033478Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2033685Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2033883Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2034103Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2034310Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2034504Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2034726Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2034946Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2035140Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2035363Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2035567Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2035777Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2035974Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2036187Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2036387Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2036585Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2036785Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2037091Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2037317Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2037517Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2037716Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2037910Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2038106Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2038319Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2038518Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2038716Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2038928Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2039226Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2039438Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2039639Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2039847Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2040049Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2040340Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2040551Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2040751Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2040950Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2041164Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2041468Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2041662Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2041864Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2042053Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2042249Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2042463Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2042668Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2042864Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2043067Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2043272Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2043443Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2043571Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2043672Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2043798Z E1204 11:21:33.874000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2043969Z [W1204 11:21:33.158755692 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2043972Z 2025-12-04T11:45:25.2044118Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2044410Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2044709Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2044839Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2045336Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2045602Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2045826Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2046032Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2046231Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2046523Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2046763Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2047053Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2047298Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2047589Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2047820Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2048111Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2048352Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2048644Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2048864Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2049069Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2049264Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2049474Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2049683Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2049923Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2050214Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2050410Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2050644Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2050938Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2051159Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2051355Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2051586Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2051791Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2051986Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2052180Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2052398Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2052615Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2052811Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2053007Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2053237Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2053557Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2053804Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2054094Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2054327Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2054532Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2054730Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2054938Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2055135Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2055366Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2055656Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2055902Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2056192Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2056424Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2056716Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2056959Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2057249Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2057479Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2057769Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2058000Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2058300Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2058543Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2058834Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2059066Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2059356Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2059587Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2059877Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2060107Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2060407Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2060639Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2060928Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2061167Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2061369Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2061564Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2061855Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2062085Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2062375Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2062619Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2062920Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2063153Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2063474Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2063705Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2063996Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2064225Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2064514Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2064729Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2064925Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2065120Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2065324Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2065535Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2065766Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2066058Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2066252Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2066449Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2066646Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2066851Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2067101Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2067391Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2067622Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2067913Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2068109Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2068315Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2068517Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2068762Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2069054Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2069276Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2069477Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2069686Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2069887Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2070179Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2070417Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2070709Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2070945Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2071247Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2071493Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2071785Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2072017Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2072309Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2072507Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2072703Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2072923Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2073136Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2073364Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2073566Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2073859Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2074103Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2074399Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2074635Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2074927Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2075162Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2075467Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2075714Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2076004Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2076225Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2076430Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2076632Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2076826Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2077034Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2077236Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2077546Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2077768Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2077969Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2078168Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2078378Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2078673Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2078905Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2079196Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2079428Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2079731Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2079973Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2080267Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2080500Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2080793Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2081024Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2081316Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2081548Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2081850Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2082082Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2082377Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2082611Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2082912Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2083145Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2083467Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2083662Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2083859Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2084103Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2084407Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2084640Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2084931Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2085165Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2085455Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2085688Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2085979Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2086226Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2086521Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2086718Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2086951Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2087256Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2087490Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2087781Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2087994Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2088197Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2088406Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2088608Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2088910Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2089123Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2089325Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2089524Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2089723Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2090016Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2090236Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2090450Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2090652Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2090844Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2090992Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2091188Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2091422Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2091630Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2091826Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2092045Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2092250Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2092449Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2092677Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2092897Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2093091Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2093339Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2093547Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2093743Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2093938Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2094151Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2094353Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2094565Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2094766Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2095061Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2095273Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2095487Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2095686Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2095876Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2096079Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2096293Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2096494Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2096704Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2096906Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2097211Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2097423Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2097625Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2097823Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2098028Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2098325Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2098537Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2098751Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2098950Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2099149Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2099444Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2099649Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2099853Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2100045Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2100240Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2100452Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2100655Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2100868Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2101055Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2101247Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2101417Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2101544Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2101650Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2101777Z E1204 11:21:33.892000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2101830Z ('RERUN', {'yellow': True}) [1.0268s] [100%] 2025-12-04T11:45:25.2102159Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:34.132068674 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2102162Z 2025-12-04T11:45:25.2102306Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2102600Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2102906Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2103034Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2103553Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2103830Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2104059Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2104267Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2104466Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2104759Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2104996Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2105300Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2105547Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2105840Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2106073Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2106363Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2106597Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2106887Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2107118Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2107328Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2107524Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2107735Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2107932Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2108176Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2108468Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2108665Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2108896Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2109185Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2109423Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2109628Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2109847Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2110051Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2110248Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2110442Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2110663Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2110866Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2111059Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2111264Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2111500Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2111792Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2112023Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2112323Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2112542Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2112744Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2112940Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2113146Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2113373Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2113622Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2113926Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2114158Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2114451Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2114686Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2114975Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2115207Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2115501Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2115746Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2116038Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2116267Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2116559Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2116804Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2117095Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2117325Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2117618Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2117852Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2118154Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2118396Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2118687Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2118917Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2119209Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2119426Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2119631Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2119825Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2120128Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2120359Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2120648Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2120882Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2121180Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2121414Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2121711Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2121940Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2122233Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2122473Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2122778Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2122971Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2123169Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2123396Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2123605Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2123804Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2124033Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2124331Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2124540Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2124735Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2124930Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2125125Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2125376Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2125671Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2125904Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2126194Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2126389Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2126610Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2126892Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2127141Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2127432Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2127657Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2127859Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2128059Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2128257Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2128549Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2128796Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2129090Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2129324Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2129618Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2129862Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2130154Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2130385Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2130676Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2130874Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2131083Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2131316Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2131520Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2131718Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2131920Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2132212Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2132446Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2132742Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2132974Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2133305Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2133543Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2133838Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2134086Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2134378Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2134602Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2134803Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2135002Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2135194Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2135419Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2135634Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2135925Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2136144Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2136346Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2136546Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2136746Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2137036Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2137270Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2137578Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2137812Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2138103Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2138346Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2138640Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2138871Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2139163Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2139396Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2139703Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2139934Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2140237Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2140470Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2140765Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2140998Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2141289Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2141523Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2141834Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2142031Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2142228Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2142459Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2142763Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2142997Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2143323Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2143558Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2143850Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2144097Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2144388Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2144633Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2144923Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2145124Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2145360Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2145652Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2145884Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2146176Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2146402Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2146603Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2146801Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2147002Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2147307Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2147523Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2147725Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2147923Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2148122Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2148425Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2148655Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2148855Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2149053Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2149244Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2149391Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2149587Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2149807Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2150012Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2150209Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2150441Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2150645Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2150841Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2151059Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2151275Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2151471Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2151695Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2151900Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2152095Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2152291Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2152512Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2152723Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2152920Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2153121Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2153457Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2153671Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2153874Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2154071Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2154262Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2154471Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2154684Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2154884Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2155082Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2155282Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2155587Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2155802Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2156003Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2156200Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2156399Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2156706Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2156940Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2157142Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2157341Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2157543Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2157837Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2158034Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2158236Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2158425Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2158633Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2158846Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2159052Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2159250Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2159437Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2159627Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2159800Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2159928Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2160029Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2160156Z E1204 11:21:34.865000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2160312Z [W1204 11:21:34.147322220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2160314Z 2025-12-04T11:45:25.2160460Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2160767Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2161070Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2161202Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2161678Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2161936Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2162163Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2162368Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2162569Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2162872Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2163107Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2163428Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2163679Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2163972Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2164201Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2164493Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2164726Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2165034Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2165256Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2165474Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2165670Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2165878Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2166079Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2166308Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2166604Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2166800Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2167042Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2167334Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2167551Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2167745Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2167972Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2168177Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2168372Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2168571Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2168789Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2168992Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2169190Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2169397Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2169638Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2169928Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2170164Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2170457Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2170679Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2170886Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2171079Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2171299Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2171498Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2171732Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2172024Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2172263Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2172558Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2172787Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2173079Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2173330Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2173643Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2173889Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2174184Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2174418Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2174712Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2174946Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2175236Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2175464Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2175771Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2176001Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2176292Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2176521Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2176830Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2177064Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2177354Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2177572Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2177772Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2177979Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2178268Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2178510Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2178802Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2179033Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2179324Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2179555Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2179846Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2180086Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2180376Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2180608Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2180898Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2181107Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2181303Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2181499Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2181704Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2181904Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2182137Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2182436Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2182642Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2182835Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2183031Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2183225Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2183496Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2183789Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2184019Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2184312Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2184522Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2184735Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2184936Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2185172Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2185480Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2185699Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2185902Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2186100Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2186301Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2186611Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2186856Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2187149Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2187381Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2187677Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2187909Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2188203Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2188437Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2188746Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2188945Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2189142Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2189363Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2189579Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2189781Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2189979Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2190273Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2190510Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2190803Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2191046Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2191349Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2191582Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2191874Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2192110Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2192405Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2192625Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2192828Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2193037Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2193227Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2193471Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2193672Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2193978Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2194199Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2194401Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2194600Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2194802Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2195094Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2195339Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2195646Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2195877Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2196170Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2196402Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2196700Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2196932Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2197224Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2197470Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2197763Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2197994Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2198295Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2198530Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2198826Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2199059Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2199352Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2199596Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2199888Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2200095Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2200293Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2200527Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2200821Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2201055Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2201346Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2201579Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2201881Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2202114Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2202406Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2202654Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2202949Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2203145Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2203416Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2203708Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2203941Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2204250Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2204476Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2204679Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2204877Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2205079Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2205372Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2205585Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2205789Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2206011Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2206213Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2206504Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2206726Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2206938Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2207137Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2207328Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2207475Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2207671Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2207889Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2208099Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2208304Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2208536Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2208741Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2208934Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2209156Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2209360Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2209555Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2209775Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2209979Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2210187Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2210384Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2210598Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2210800Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2211008Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2211210Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2211505Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2211717Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2211919Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2212118Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2212322Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2212517Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2212744Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2212944Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2213142Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2213386Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2213678Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2213891Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2214092Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2214308Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2214508Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2214801Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2215017Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2215235Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2215434Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2215632Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2215925Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2216125Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2216327Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2216530Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2216725Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2216953Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2217159Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2217356Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2217545Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2217724Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2217897Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2218023Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2218127Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2218264Z E1204 11:21:34.880000 883922 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2218306Z FAILED [0.9488s] [100%] 2025-12-04T11:45:25.2218308Z 2025-12-04T11:45:25.2218365Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2218506Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2218553Z Traceback (most recent call last): 2025-12-04T11:45:25.2218717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2218760Z method(*args, **kwargs) 2025-12-04T11:45:25.2218912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2218952Z method(*args, **kwargs) 2025-12-04T11:45:25.2219104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2219152Z with policy(): 2025-12-04T11:45:25.2219306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2219348Z raise RuntimeError(msg) 2025-12-04T11:45:25.2219742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1048576000. 2025-12-04T11:45:25.2219746Z 2025-12-04T11:45:25.2219822Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2220081Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2220084Z 2025-12-04T11:45:25.2220175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2220251Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2220307Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2220367Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2220943Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2221044Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2221083Z graph_break [] 2025-12-04T11:45:25.2221148Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2221224Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2221714Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2221762Z current_size = base.storage().size() 2025-12-04T11:45:25.2221807Z Autotune Choices Stats: 2025-12-04T11:45:25.2222178Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.2222252Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2222301Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2222424Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2222662Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2222891Z triton_mm_8 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2223133Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2223405Z triton_mm_11 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2223630Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2223856Z triton_mm_10 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2224081Z triton_mm_7 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2224320Z triton_mm_15 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2224558Z triton_mm_12 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2224787Z triton_mm_13 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2224918Z SingleProcess AUTOTUNE benchmarking takes 0.0623 seconds and 0.5978 seconds precompiling for 15 choices 2025-12-04T11:45:25.2225061Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2225107Z Traceback (most recent call last): 2025-12-04T11:45:25.2225267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2225308Z method(*args, **kwargs) 2025-12-04T11:45:25.2225465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2225507Z method(*args, **kwargs) 2025-12-04T11:45:25.2225658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2225697Z with policy(): 2025-12-04T11:45:25.2225851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2225906Z raise RuntimeError(msg) 2025-12-04T11:45:25.2226294Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1048576000 and is now 1109393408. 2025-12-04T11:45:25.2226297Z 2025-12-04T11:45:25.2226372Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2226633Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2226636Z 2025-12-04T11:45:25.2226725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2226799Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2226844Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2226915Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2227468Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2227569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2227610Z graph_break [] 2025-12-04T11:45:25.2227671Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2227746Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2228248Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2228299Z current_size = base.storage().size() 2025-12-04T11:45:25.2228351Z Autotune Choices Stats: 2025-12-04T11:45:25.2228721Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.2228783Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2228831Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2228956Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2229189Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2229417Z triton_mm_8 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2229642Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2229867Z triton_mm_11 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2230105Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2230331Z triton_mm_10 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2232219Z triton_mm_7 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2232456Z triton_mm_15 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2232682Z triton_mm_12 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2232909Z triton_mm_13 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2233059Z SingleProcess AUTOTUNE benchmarking takes 0.0623 seconds and 0.5978 seconds precompiling for 15 choices 2025-12-04T11:45:25.2233136Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2233179Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2233238Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2233363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2233870Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2233911Z graph_break [] 2025-12-04T11:45:25.2233972Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2234049Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2234089Z Autotune Choices Stats: 2025-12-04T11:45:25.2234459Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006118999794125557, "best_triton_pos": 0} 2025-12-04T11:45:25.2234522Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2234572Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2234693Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2234928Z triton_mm_25 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2235157Z triton_mm_24 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2235399Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2235624Z triton_mm_31 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2235847Z triton_mm_23 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2236135Z triton_mm_27 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2236376Z triton_mm_21 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2236604Z triton_mm_26 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2236830Z triton_mm_28 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2237055Z triton_mm_18 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2237186Z SingleProcess AUTOTUNE benchmarking takes 0.1038 seconds and 0.3368 seconds precompiling for 17 choices 2025-12-04T11:45:25.2237241Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2237391Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2237437Z Traceback (most recent call last): 2025-12-04T11:45:25.2237597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2237639Z method(*args, **kwargs) 2025-12-04T11:45:25.2237793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2237834Z method(*args, **kwargs) 2025-12-04T11:45:25.2237985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2238022Z with policy(): 2025-12-04T11:45:25.2238177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2238219Z raise RuntimeError(msg) 2025-12-04T11:45:25.2238609Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.2238612Z 2025-12-04T11:45:25.2238687Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2238947Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2238949Z 2025-12-04T11:45:25.2239052Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2239124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2239168Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2239226Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2239776Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2239889Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2239927Z graph_break [] 2025-12-04T11:45:25.2239990Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2240075Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2240564Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2240613Z current_size = base.storage().size() 2025-12-04T11:45:25.2240655Z Autotune Choices Stats: 2025-12-04T11:45:25.2241018Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:25.2241081Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2241129Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2241262Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2241494Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2241722Z triton_mm_8 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2241951Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2242181Z triton_mm_11 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2242404Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2242630Z triton_mm_10 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2242854Z triton_mm_7 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2243088Z triton_mm_15 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2243341Z triton_mm_12 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2243566Z triton_mm_13 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2243712Z SingleProcess AUTOTUNE benchmarking takes 0.0623 seconds and 0.5978 seconds precompiling for 15 choices 2025-12-04T11:45:25.2243801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2243845Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2243901Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2244005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2244488Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2244525Z graph_break [] 2025-12-04T11:45:25.2244587Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2244660Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2244703Z Autotune Choices Stats: 2025-12-04T11:45:25.2245079Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006118999794125557, "best_triton_pos": 0} 2025-12-04T11:45:25.2245140Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2245187Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2245309Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2245542Z triton_mm_25 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2245771Z triton_mm_24 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2245995Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2246219Z triton_mm_31 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2246445Z triton_mm_23 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2246682Z triton_mm_27 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2246912Z triton_mm_21 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2247136Z triton_mm_26 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2247372Z triton_mm_28 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2247607Z triton_mm_18 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2247737Z SingleProcess AUTOTUNE benchmarking takes 0.1038 seconds and 0.3368 seconds precompiling for 17 choices 2025-12-04T11:45:25.2247811Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2247852Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2247909Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2248007Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2248493Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2248532Z graph_break [] 2025-12-04T11:45:25.2248603Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2248676Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2248717Z Autotune Choices Stats: 2025-12-04T11:45:25.2249080Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_41", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2249142Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2249191Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2249314Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2249549Z triton_mm_41 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2249774Z triton_mm_47 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2250000Z triton_mm_39 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2250225Z triton_mm_40 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2250463Z triton_mm_44 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2250686Z triton_mm_43 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2250909Z triton_mm_38 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2251160Z triton_mm_45 0.0066 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2251385Z triton_mm_34 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2251614Z triton_mm_42 0.0068 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2251745Z SingleProcess AUTOTUNE benchmarking takes 0.0963 seconds and 0.3448 seconds precompiling for 17 choices 2025-12-04T11:45:25.2251942Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51093d45c7cf8f34.xml - 2025-12-04T11:45:25.2252001Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2252594Z FAILED [0.9488s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.2252597Z 2025-12-04T11:45:25.2252672Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2252932Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2252934Z 2025-12-04T11:45:25.2253023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2253088Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2253158Z ================== 1 failed, 187 deselected, 2 rerun in 4.51s ================== 2025-12-04T11:45:25.2253195Z Got exit code 1 2025-12-04T11:45:25.2253236Z Retrying single test... 2025-12-04T11:45:25.2255488Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6fc843ccce408c60.xml 2025-12-04T11:45:25.2255548Z ============================= test session starts ============================== 2025-12-04T11:45:25.2255662Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2255704Z cachedir: .pytest_cache 2025-12-04T11:45:25.2255863Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2255910Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2255951Z configfile: pytest.ini 2025-12-04T11:45:25.2256143Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2256219Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2256473Z stepcurrent: skipping 100 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2256520Z Running 1 items in this shard 2025-12-04T11:45:25.2256523Z 2025-12-04T11:45:25.2256849Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:44.511414354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2256872Z 2025-12-04T11:45:25.2257200Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2257505Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2257639Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2258122Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2258379Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2258618Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2258833Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2259033Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2259335Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2259575Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2259872Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2260106Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2260405Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2260650Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2260939Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2261170Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2261474Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2261728Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2262019Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2262248Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2262538Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2262736Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2262982Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2263316Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2263512Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2263746Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2264036Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2264267Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2264557Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2264778Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2264999Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2265201Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.2265411Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.2265592Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.2265771Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.2266313Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpgd4g1mqi/us/cusjrl6flaffm4ukvzgsaiztwmfuxs46u7fjn3avl4wrw2kwt5c7.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8) 2025-12-04T11:45:25.2266460Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.2266678Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.2266835Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.2266981Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.2267281Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.2267415Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.2267673Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.2267815Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.2268069Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.2268227Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.2268495Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.2268629Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.2268904Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.2269097Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.2269432Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2269727Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2269872Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2270363Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2270617Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2270843Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2271049Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2271250Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2271554Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2271789Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2272080Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2272313Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2272606Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2272836Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2273127Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2273392Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2273697Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2273929Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2274218Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2274464Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2274768Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2274965Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2275198Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2275489Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2275688Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2275931Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2276223Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2276452Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2276744Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2276965Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2277171Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2277372Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.2277585Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.2277752Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.2277942Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.2278046Z E1204 11:21:44.525000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.2278202Z [W1204 11:21:44.804355210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2278206Z 2025-12-04T11:45:25.2278512Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2278831Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2278962Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2279442Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2279694Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2279920Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2280138Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2280337Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2280627Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2280859Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2281152Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2281384Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2281674Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2281905Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2282205Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2282438Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2282726Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2282969Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2283303Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2283536Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2283827Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2284023Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2284254Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2284561Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2284758Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2284987Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2285276Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2285513Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2285803Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2286023Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2286230Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2286430Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.2286653Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.2286820Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.2286998Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.2287542Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpgd4g1mqi/lq/clq4m6rrx43n4nmm4mrr7xyryqphd652biolnofogvs3zsxjmxd6.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.2287692Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.2287907Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.2288063Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.2288209Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.2288495Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.2288628Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.2288892Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.2289031Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.2289285Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.2289442Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.2289710Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.2289845Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.2290120Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.2290311Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.2290626Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2290930Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2291061Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2291539Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2291816Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2292042Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2292247Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2292447Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2292740Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2292984Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2293305Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2293538Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2293831Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2294062Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2294353Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2294583Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2294875Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2295119Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2295411Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2295643Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2295947Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2296158Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2296389Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2296683Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2296880Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2297113Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2297421Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2297651Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2297940Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2298159Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2298368Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2298570Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.2298778Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.2298944Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.2299122Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.2299238Z E1204 11:21:44.537000 889355 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.2299288Z ('RERUN', {'yellow': True}) [2.5502s] [100%] 2025-12-04T11:45:25.2299619Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:45.772901592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2299621Z 2025-12-04T11:45:25.2299766Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2300071Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2300381Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2300512Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2300990Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2301244Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2301471Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2301687Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2301885Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2302175Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2302407Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2302699Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2302929Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2303221Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2303477Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2303783Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2304014Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2304302Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2304543Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2304761Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2304958Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2305166Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2305365Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2305597Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2305888Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2306096Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2306326Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2306617Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2306837Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2307032Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2307251Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2307453Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2307648Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2307853Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2308072Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2308276Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2308469Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2308673Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2308916Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2309214Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2309444Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2309735Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2309954Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2310169Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2310365Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2310570Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2310769Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2311002Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2311294Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2311526Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2311817Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2312048Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2312349Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2312579Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2312868Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2313121Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2313449Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2313678Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2313967Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2314198Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2314505Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2314734Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2315025Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2315257Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2315547Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2315778Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2316066Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2316299Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2316603Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2316822Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2317023Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2317237Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2317540Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2317771Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2318061Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2318292Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2318583Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2318815Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2319117Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2319348Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2319638Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2319872Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2320163Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2320357Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2320552Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2320747Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2320964Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2321164Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2321395Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2321696Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2321900Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2322097Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2322290Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2322485Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2322716Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2323012Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2323280Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2323569Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2323764Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2323971Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2324173Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2324407Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2324700Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2324922Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2325137Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2325337Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2325536Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2325829Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2326089Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2326383Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2326617Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2326910Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2327144Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2327448Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2327681Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2327970Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2328169Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2328367Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2328587Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2328789Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2328987Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2329188Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2329491Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2329724Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2330017Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2330261Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2330567Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2330801Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2331095Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2331328Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2331620Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2331856Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2332057Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2332257Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2332448Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2332662Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2332866Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2333161Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2333409Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2333610Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2333831Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2334032Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2334327Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2334573Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2334880Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2335119Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2335411Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2335646Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2335938Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2336189Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2336481Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2336717Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2337016Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2337251Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2337542Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2337774Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2338069Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2338312Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2338603Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2338835Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2339138Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2339348Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2339545Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2339777Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2340071Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2340305Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2340609Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2340840Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2341139Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2341373Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2341668Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2341901Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2342193Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2342391Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2342633Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2342928Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2343162Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2343497Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2343728Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2343931Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2344130Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2344330Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2344625Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2344849Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2345052Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2345252Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2345451Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2345744Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2345965Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2346166Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2346363Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2346556Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2346726Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2346924Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2347145Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2347350Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2347560Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2347789Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2347997Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2348190Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2348411Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2348617Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2348813Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2349052Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2349258Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2349455Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2349649Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2349864Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2350068Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2350266Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2350466Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2350765Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2350992Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2351197Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2351400Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2351601Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2351796Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2352019Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2352221Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2352419Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2352620Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2352915Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2353137Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2353357Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2353555Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2353754Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2354047Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2354260Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2354460Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2354658Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2354859Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2355167Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2355362Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2355563Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2355767Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2355962Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2356187Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2356394Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2356591Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2356781Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2356961Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2357132Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2357272Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2357374Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2357501Z E1204 11:21:45.512000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2357656Z [W1204 11:21:45.793226994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2357659Z 2025-12-04T11:45:25.2357803Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2358098Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2358396Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2358526Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2359005Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2359270Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2359496Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2359703Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2359914Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2360217Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2360453Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2360744Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2360976Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2361267Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2361508Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2361800Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2362029Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2362324Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2362543Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2362748Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2362942Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2363149Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2363370Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2363614Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2363904Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2364098Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2364347Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2364656Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2364876Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2365070Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2365288Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2365494Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2365702Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2365895Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2366113Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2366319Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2366513Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2366708Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2366941Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2367234Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2367470Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2367771Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2367990Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2368196Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2368400Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2368606Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2368815Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2369048Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2369338Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2369569Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2369859Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2370102Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2370394Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2370624Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2370914Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2371146Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2371436Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2371666Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2371959Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2372207Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2372497Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2372727Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2373045Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2373305Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2373594Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2373823Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2374117Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2374348Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2374654Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2374871Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2375073Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2375270Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2375563Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2375796Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2376088Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2376323Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2376626Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2376856Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2377147Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2377390Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2377696Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2377926Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2378218Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2378415Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2378610Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2378817Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2379023Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2379222Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2379453Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2379748Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2379942Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2380139Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2380335Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2380527Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2380771Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2381061Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2381291Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2381602Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2381810Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2382020Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2382221Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2382456Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2382750Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2382983Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2383183Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2383395Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2383598Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2383890Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2384128Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2384421Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2384654Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2384947Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2385199Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2385491Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2385724Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2386042Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2386242Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2386439Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2386659Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2386861Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2387062Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2387275Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2387568Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2387799Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2388098Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2388331Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2388626Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2388858Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2389150Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2389396Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2389687Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2389908Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2390122Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2390329Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2390524Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2390732Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2390931Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2391223Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2391445Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2391659Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2391859Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2392060Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2392351Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2392586Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2392878Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2393111Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2393432Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2393677Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2394108Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2394342Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2394658Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2394902Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2395198Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2395432Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2395724Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2395957Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2396265Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2396500Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2396791Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2397024Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2397319Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2397517Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2397714Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2397945Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2398260Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2398494Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2398785Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2399028Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2399328Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2399562Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2399854Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2400087Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2400383Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2400588Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2400822Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2401113Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2401346Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2401639Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2401852Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2402054Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2402255Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2402458Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2402762Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2402974Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2403174Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2403413Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2403629Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2403921Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2404147Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2404356Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2404554Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2404761Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2404915Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2405110Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2405331Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2405537Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2405733Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2405954Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2406162Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2406358Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2406579Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2406801Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2406998Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2407217Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2407437Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2407645Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2407842Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2408057Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2408258Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2408456Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2408658Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2408961Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2409177Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2409379Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2409577Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2409775Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2409973Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2410188Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2410391Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2410589Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2410799Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2411093Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2411305Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2411516Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2411715Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2411925Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2412220Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2412432Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2412631Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2412833Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2413049Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2413461Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2413656Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2413861Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2414054Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2414250Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2414464Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2414670Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2414870Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2415088Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2415270Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2415439Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2415566Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2415683Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2415808Z E1204 11:21:45.526000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2415861Z ('RERUN', {'yellow': True}) [0.9150s] [100%] 2025-12-04T11:45:25.2416206Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda [W1204 11:21:46.649761278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2416210Z 2025-12-04T11:45:25.2416356Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2416650Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2416949Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2417080Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2417570Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2417826Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2418051Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2418258Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2418457Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2418748Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2418983Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2419277Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2419523Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2419812Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2420054Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2420354Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2420591Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2420883Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2421101Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2421308Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2421506Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2421725Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2421922Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2422153Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2422445Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2422640Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2422871Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2423160Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2423401Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2423614Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2423836Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2424041Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2424234Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2424441Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2424676Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2424883Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2425077Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2425272Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2425506Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2425800Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2426058Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2426347Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2426565Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2426770Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2426968Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2427177Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2427376Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2427609Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2427911Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2428142Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2428433Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2428674Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2428985Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2429216Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2429508Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2429739Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2430032Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2430274Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2430564Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2430795Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2431085Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2431317Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2431608Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2431838Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2432129Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2432376Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2432666Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2432896Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2433199Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2433465Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2433667Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2433861Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2434152Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2434383Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2434686Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2434917Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2435207Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2435442Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2435740Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2435969Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2436258Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2436489Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2436808Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2437002Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2437198Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2437418Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2437638Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2437838Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2438069Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2438359Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2438554Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2438749Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2438955Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2439148Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2439378Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2439672Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2439904Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2440197Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2440393Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2440601Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2440803Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2441053Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2441347Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2441570Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2441785Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2441996Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2442197Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2442491Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2442727Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2443022Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2443313Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2443701Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2443951Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2444246Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2444479Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2444773Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2444968Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2445166Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2445414Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2445619Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2445819Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2446018Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2446351Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2446585Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2446877Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2447108Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2447405Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2447642Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2447946Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2448180Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2448472Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2448695Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2448897Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2449096Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2449289Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2449501Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2449715Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2450009Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2450228Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2450440Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2450648Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2450850Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2451143Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2451375Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2451670Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2451904Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2452206Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2452438Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2452731Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2452964Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2453292Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2453525Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2453821Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2454079Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2454373Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2454606Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2454919Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2455168Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2455461Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2455695Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2455989Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2456186Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2456395Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2456627Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2456921Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2457153Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2457447Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2457681Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2457973Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2458206Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2458499Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2458744Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2459035Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2459245Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2459489Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2459784Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2460017Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2460308Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2460524Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2460726Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2461010Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2461211Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2461503Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2461719Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2461922Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2462120Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2462319Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2462612Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2462851Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2463052Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2463275Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2463466Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2463630Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2463838Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2464064Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2464273Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2464467Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2464688Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2464894Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2465102Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2465322Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2465530Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2465725Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2465946Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2466151Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2466346Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2466542Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2466756Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2466975Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2467173Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2467376Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2467672Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2467906Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2468109Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2468307Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2468499Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2468694Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2468907Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2469108Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2469316Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2469519Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2469812Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2470024Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2470225Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2470424Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2470625Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2470920Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2471147Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2471350Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2471547Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2471747Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2472066Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2472261Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2472463Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2472653Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2472848Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2473063Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2473310Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2473523Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2473713Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2473895Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2474063Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2474193Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2474296Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2474423Z E1204 11:21:46.383000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2474581Z [W1204 11:21:46.665235961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.2474584Z 2025-12-04T11:45:25.2474729Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.2475026Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2475335Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.2475469Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.2475946Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.2476229Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.2476456Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.2476661Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.2476862Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2477155Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2477390Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2477701Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2477932Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2478224Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2478455Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2478748Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2478981Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2479275Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2479497Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2479714Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2479911Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2480116Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2480329Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2480570Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2480867Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2481063Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2481292Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2481585Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2481815Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2482011Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2482229Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2482437Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2482632Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2482828Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2483047Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2483268Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2483465Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2483663Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2483911Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2484200Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2484431Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2484753Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2484974Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2485178Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2485373Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2485583Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2485782Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2486028Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2486319Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2486549Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2486844Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2487074Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2487366Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2487597Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2487890Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2488136Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2488427Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2488659Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2488962Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2489205Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2489497Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2489726Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2490020Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2490251Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2490553Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2490782Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2491076Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2491308Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2491599Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2491818Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2492018Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2492214Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.2492515Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2492747Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2493043Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2493328Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2493637Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2493869Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2494161Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2494393Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2494685Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2494931Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2495221Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2495418Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2495613Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2495811Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2496017Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2496217Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2496447Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2496739Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2496950Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2497143Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2497337Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2497543Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2497785Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2498078Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2498308Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2498599Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2498796Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2499009Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2499223Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2499457Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2499752Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2499975Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2500179Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2500376Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2500579Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2500874Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2501118Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2501411Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2501645Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2501949Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2502192Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2502485Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2502716Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2503011Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2503211Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2503453Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2503675Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2503875Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2504076Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2504275Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2504571Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2504807Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2505099Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2505340Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2505645Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2505879Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2506173Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2506417Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2506726Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2506948Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2507152Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2507350Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2507543Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2507764Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.2507968Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2508263Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2508484Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2508688Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2508886Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2509086Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2509377Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2509612Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2509915Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2510146Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2510438Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2510696Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2511006Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2511239Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2511529Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2511762Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2512056Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2512300Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2512591Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2512829Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2513126Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2513381Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2513673Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2513904Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2514199Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2514415Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2514615Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2514851Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2515173Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2515410Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2515704Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2515936Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2516231Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2516466Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2516771Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2517005Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2517299Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2517497Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2517731Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2518024Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2518257Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.2518551Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2518776Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2518981Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2519179Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2519394Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2519700Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2519913Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2520116Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2520314Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2520514Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2520818Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2521042Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.2521245Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2521443Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2521635Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2521784Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.2521980Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2522199Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2522405Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2522600Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2522838Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2523045Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2523240Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2523488Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2523709Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.2523907Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2524128Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.2524333Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.2524532Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.2524729Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2524955Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2525156Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2525356Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2525555Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2525853Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2526066Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2526268Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2526468Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2526663Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.2526878Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.2527092Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2527292Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2527489Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2527704Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2528012Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2528225Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2528427Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2528626Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2528826Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2529130Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2529345Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.2529546Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.2529746Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.2529947Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.2530239Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.2530439Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.2530640Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.2530831Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.2531041Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.2531256Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.2531464Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.2531660Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.2531859Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.2532049Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.2532226Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.2532353Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.2532458Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.2532583Z E1204 11:21:46.398000 889355 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.2532628Z FAILED [0.8611s] [100%] 2025-12-04T11:45:25.2532631Z 2025-12-04T11:45:25.2532687Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2532829Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2532878Z Traceback (most recent call last): 2025-12-04T11:45:25.2533054Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2533098Z method(*args, **kwargs) 2025-12-04T11:45:25.2533291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2533332Z method(*args, **kwargs) 2025-12-04T11:45:25.2533484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2533524Z with policy(): 2025-12-04T11:45:25.2533675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2533717Z raise RuntimeError(msg) 2025-12-04T11:45:25.2534109Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1048576000. 2025-12-04T11:45:25.2534113Z 2025-12-04T11:45:25.2534190Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2534448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2534451Z 2025-12-04T11:45:25.2534543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2534620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2534664Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2534722Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2535298Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2535401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2535440Z graph_break [] 2025-12-04T11:45:25.2535520Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2535597Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2536102Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2536153Z current_size = base.storage().size() 2025-12-04T11:45:25.2536196Z Autotune Choices Stats: 2025-12-04T11:45:25.2536568Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2536633Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2536682Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2536806Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2537043Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2537282Z triton_mm_7 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2537515Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2537737Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2537963Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2538185Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2538407Z triton_mm_8 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2538633Z triton_mm_12 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2538866Z triton_mm_2 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2539097Z triton_mm_10 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2539229Z SingleProcess AUTOTUNE benchmarking takes 0.0595 seconds and 0.6385 seconds precompiling for 15 choices 2025-12-04T11:45:25.2539385Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2539430Z Traceback (most recent call last): 2025-12-04T11:45:25.2539589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2539641Z method(*args, **kwargs) 2025-12-04T11:45:25.2539795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2539835Z method(*args, **kwargs) 2025-12-04T11:45:25.2539990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2540028Z with policy(): 2025-12-04T11:45:25.2540184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2540224Z raise RuntimeError(msg) 2025-12-04T11:45:25.2540616Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1048576000 and is now 1109393408. 2025-12-04T11:45:25.2540620Z 2025-12-04T11:45:25.2540695Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2540973Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2540976Z 2025-12-04T11:45:25.2541065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2541140Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2541184Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2541243Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2541797Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2541898Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2541938Z graph_break [] 2025-12-04T11:45:25.2542000Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2542074Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2542557Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2542620Z current_size = base.storage().size() 2025-12-04T11:45:25.2542665Z Autotune Choices Stats: 2025-12-04T11:45:25.2543034Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2543096Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2543146Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2543315Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2543550Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2543794Z triton_mm_7 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2544022Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2544246Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2544477Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2544699Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2544935Z triton_mm_8 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2545160Z triton_mm_12 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2545387Z triton_mm_2 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2545616Z triton_mm_10 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2545746Z SingleProcess AUTOTUNE benchmarking takes 0.0595 seconds and 0.6385 seconds precompiling for 15 choices 2025-12-04T11:45:25.2545821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2545863Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2545922Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2546023Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2546511Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2546561Z graph_break [] 2025-12-04T11:45:25.2546624Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2546698Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2546740Z Autotune Choices Stats: 2025-12-04T11:45:25.2547102Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2547181Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2547228Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2547362Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2547597Z triton_mm_23 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2547829Z triton_mm_27 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2548054Z triton_mm_21 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2548281Z triton_mm_25 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2548518Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2548741Z triton_mm_24 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2548966Z triton_mm_31 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2549191Z triton_mm_28 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2549420Z triton_mm_18 0.0066 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2549646Z triton_mm_29 0.0068 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2549775Z SingleProcess AUTOTUNE benchmarking takes 0.0909 seconds and 0.3511 seconds precompiling for 17 choices 2025-12-04T11:45:25.2549830Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2549969Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2550028Z Traceback (most recent call last): 2025-12-04T11:45:25.2550184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2550229Z method(*args, **kwargs) 2025-12-04T11:45:25.2550382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2550425Z method(*args, **kwargs) 2025-12-04T11:45:25.2550577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2550625Z with policy(): 2025-12-04T11:45:25.2550781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2550823Z raise RuntimeError(msg) 2025-12-04T11:45:25.2551223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.2551227Z 2025-12-04T11:45:25.2551302Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2551558Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2551561Z 2025-12-04T11:45:25.2551648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2551722Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2551764Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2554344Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2554933Z inductor [('triton_bundler_save_kernel', 136), ('generated_module_cache_miss', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 15), ('select_algorithm_num_precompiles', 14), ('select_algorithm_num_precompilation_exceptions', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2555035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2555072Z graph_break [] 2025-12-04T11:45:25.2555138Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2555216Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2555708Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2555758Z current_size = base.storage().size() 2025-12-04T11:45:25.2555800Z Autotune Choices Stats: 2025-12-04T11:45:25.2556167Z {"num_choices": 15, "num_triton_choices": 14, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2556230Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2556280Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2556402Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2556651Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2556879Z triton_mm_7 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2557109Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2557348Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2557591Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2557815Z triton_mm_6 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2558037Z triton_mm_8 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2558260Z triton_mm_12 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2558484Z triton_mm_2 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2558726Z triton_mm_10 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2558858Z SingleProcess AUTOTUNE benchmarking takes 0.0595 seconds and 0.6385 seconds precompiling for 15 choices 2025-12-04T11:45:25.2558933Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2558978Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2559035Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2559136Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2559627Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2559665Z graph_break [] 2025-12-04T11:45:25.2559727Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2559801Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2559842Z Autotune Choices Stats: 2025-12-04T11:45:25.2560204Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2560275Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2560324Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2560446Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2560676Z triton_mm_23 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2560902Z triton_mm_27 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2561147Z triton_mm_21 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2561375Z triton_mm_25 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2561597Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2561820Z triton_mm_24 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2562047Z triton_mm_31 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2562280Z triton_mm_28 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2562503Z triton_mm_18 0.0066 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2562726Z triton_mm_29 0.0068 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2562857Z SingleProcess AUTOTUNE benchmarking takes 0.0909 seconds and 0.3511 seconds precompiling for 17 choices 2025-12-04T11:45:25.2562932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2562975Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2563031Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2563131Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2563643Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2563682Z graph_break [] 2025-12-04T11:45:25.2563744Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:25.2563817Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2563874Z Autotune Choices Stats: 2025-12-04T11:45:25.2564233Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_44", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2564292Z AUTOTUNE scaled_mm(33x32, 32x2048, 33x1, 1x2048, 2048) 2025-12-04T11:45:25.2564340Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2564463Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2564707Z triton_mm_44 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2564945Z triton_mm_37 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2565174Z triton_mm_41 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2565400Z triton_mm_43 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2565624Z triton_mm_38 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2565851Z triton_mm_39 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2566086Z triton_mm_40 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2566308Z triton_mm_47 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2566533Z triton_mm_34 0.0068 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2566760Z triton_mm_42 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2566890Z SingleProcess AUTOTUNE benchmarking takes 0.0967 seconds and 0.3546 seconds precompiling for 17 choices 2025-12-04T11:45:25.2567084Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6fc843ccce408c60.xml - 2025-12-04T11:45:25.2567144Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2567726Z FAILED [0.8611s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1170210816. 2025-12-04T11:45:25.2567740Z 2025-12-04T11:45:25.2567813Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2568073Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2568075Z 2025-12-04T11:45:25.2568163Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2568230Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2568312Z ================== 1 failed, 187 deselected, 2 rerun in 4.34s ================== 2025-12-04T11:45:25.2568351Z Got exit code 1 2025-12-04T11:45:25.2568555Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2568693Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2568841Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed4b14bde4d139c3.xml 2025-12-04T11:45:25.2568899Z ============================= test session starts ============================== 2025-12-04T11:45:25.2569013Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2569053Z cachedir: .pytest_cache 2025-12-04T11:45:25.2569214Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2569262Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2569304Z configfile: pytest.ini 2025-12-04T11:45:25.2569469Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2569549Z collecting ... collected 188 items / 101 deselected / 87 selected 2025-12-04T11:45:25.2569603Z stepcurrent: skipping 101 already run items. 2025-12-04T11:45:25.2569647Z Running 87 items in this shard 2025-12-04T11:45:25.2569661Z 2025-12-04T11:45:25.2569876Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9436s] [ 1%] 2025-12-04T11:45:25.2570087Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5517s] [ 1%] 2025-12-04T11:45:25.2570273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6466s] [ 1%] 2025-12-04T11:45:25.2570275Z 2025-12-04T11:45:25.2570328Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2570471Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2570519Z Traceback (most recent call last): 2025-12-04T11:45:25.2570680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2570722Z method(*args, **kwargs) 2025-12-04T11:45:25.2570874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2570914Z method(*args, **kwargs) 2025-12-04T11:45:25.2571068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2571107Z with policy(): 2025-12-04T11:45:25.2571261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2571312Z raise RuntimeError(msg) 2025-12-04T11:45:25.2571698Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.2571701Z 2025-12-04T11:45:25.2571774Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2572030Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2572050Z 2025-12-04T11:45:25.2572138Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2572213Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2572256Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2572327Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2572813Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2572915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2572954Z graph_break [] 2025-12-04T11:45:25.2573016Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2573090Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2573611Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2573676Z current_size = base.storage().size() 2025-12-04T11:45:25.2573717Z Autotune Choices Stats: 2025-12-04T11:45:25.2574090Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2574146Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2574198Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2574318Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2574557Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2574789Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2575015Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2575241Z triton_mm_0 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2575302Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:25.2575431Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1609 seconds precompiling for 5 choices 2025-12-04T11:45:25.2575574Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2575620Z Traceback (most recent call last): 2025-12-04T11:45:25.2575776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2575818Z method(*args, **kwargs) 2025-12-04T11:45:25.2575989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2576031Z method(*args, **kwargs) 2025-12-04T11:45:25.2576181Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2576220Z with policy(): 2025-12-04T11:45:25.2576386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2576429Z raise RuntimeError(msg) 2025-12-04T11:45:25.2576817Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.2576821Z 2025-12-04T11:45:25.2576894Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2577153Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2577156Z 2025-12-04T11:45:25.2577244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2577319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2577362Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2577429Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2577909Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2578010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2578046Z graph_break [] 2025-12-04T11:45:25.2578107Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2578182Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2578672Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2578720Z current_size = base.storage().size() 2025-12-04T11:45:25.2578761Z Autotune Choices Stats: 2025-12-04T11:45:25.2579129Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2579195Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2579249Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2579370Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2579606Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2579839Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2580075Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2580307Z triton_mm_0 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2580352Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:25.2580481Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1609 seconds precompiling for 5 choices 2025-12-04T11:45:25.2580556Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2580603Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2580659Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2580760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2581239Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2581290Z graph_break [] 2025-12-04T11:45:25.2581350Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2581423Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2581464Z Autotune Choices Stats: 2025-12-04T11:45:25.2581819Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2581874Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2581923Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2582043Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2582275Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2582503Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2582731Z triton_mm_7 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2582955Z triton_mm_4 0.0078 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2583007Z _scaled_mm 0.0246 ms 24.9% 2025-12-04T11:45:25.2583135Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1319 seconds precompiling for 5 choices 2025-12-04T11:45:25.2583187Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2583356Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2583402Z Traceback (most recent call last): 2025-12-04T11:45:25.2583575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2583615Z method(*args, **kwargs) 2025-12-04T11:45:25.2583772Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2583829Z method(*args, **kwargs) 2025-12-04T11:45:25.2583983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2584021Z with policy(): 2025-12-04T11:45:25.2584175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2584215Z raise RuntimeError(msg) 2025-12-04T11:45:25.2584602Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2584606Z 2025-12-04T11:45:25.2584680Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2584938Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2584941Z 2025-12-04T11:45:25.2585042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2585115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2585158Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2585215Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2585702Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2585804Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2585842Z graph_break [] 2025-12-04T11:45:25.2585902Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2585977Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2586461Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2586509Z current_size = base.storage().size() 2025-12-04T11:45:25.2586550Z Autotune Choices Stats: 2025-12-04T11:45:25.2586921Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2586995Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2587044Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2587164Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2587402Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2587642Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2587876Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2588102Z triton_mm_0 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2588142Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:25.2588268Z SingleProcess AUTOTUNE benchmarking takes 0.0244 seconds and 0.1609 seconds precompiling for 5 choices 2025-12-04T11:45:25.2588343Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2588387Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2588442Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2588543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2589038Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2589075Z graph_break [] 2025-12-04T11:45:25.2589138Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2589210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2589252Z Autotune Choices Stats: 2025-12-04T11:45:25.2589610Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2589664Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2589714Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2589833Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2590064Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2590293Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2590520Z triton_mm_7 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2590755Z triton_mm_4 0.0078 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2590796Z _scaled_mm 0.0246 ms 24.9% 2025-12-04T11:45:25.2590923Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1319 seconds precompiling for 5 choices 2025-12-04T11:45:25.2590997Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2591049Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2591108Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2591206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2591702Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2591739Z graph_break [] 2025-12-04T11:45:25.2591800Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2591873Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2591914Z Autotune Choices Stats: 2025-12-04T11:45:25.2592276Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2592331Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2592380Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2592509Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2592742Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2592966Z triton_mm_10 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2593196Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2593450Z triton_mm_8 0.0082 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2593492Z _scaled_mm 0.0224 ms 27.3% 2025-12-04T11:45:25.2593618Z SingleProcess AUTOTUNE benchmarking takes 0.0317 seconds and 0.2383 seconds precompiling for 5 choices 2025-12-04T11:45:25.2593812Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed4b14bde4d139c3.xml - 2025-12-04T11:45:25.2593873Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2594455Z FAILED [0.6466s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2594472Z 2025-12-04T11:45:25.2594546Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2594804Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2594819Z 2025-12-04T11:45:25.2594909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2594971Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2595040Z ================== 1 failed, 101 deselected, 2 rerun in 3.16s ================== 2025-12-04T11:45:25.2595078Z Got exit code 1 2025-12-04T11:45:25.2595131Z Retrying single test... 2025-12-04T11:45:25.2595274Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-305265071c504123.xml 2025-12-04T11:45:25.2595332Z ============================= test session starts ============================== 2025-12-04T11:45:25.2595446Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2595487Z cachedir: .pytest_cache 2025-12-04T11:45:25.2595647Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2595695Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2595736Z configfile: pytest.ini 2025-12-04T11:45:25.2595900Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2595975Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2596231Z stepcurrent: skipping 101 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2596287Z Running 1 items in this shard 2025-12-04T11:45:25.2596291Z 2025-12-04T11:45:25.2596502Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1007s] [100%] 2025-12-04T11:45:25.2596712Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8181s] [100%] 2025-12-04T11:45:25.2596897Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7350s] [100%] 2025-12-04T11:45:25.2596899Z 2025-12-04T11:45:25.2596951Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2597092Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2597140Z Traceback (most recent call last): 2025-12-04T11:45:25.2597299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2597342Z method(*args, **kwargs) 2025-12-04T11:45:25.2597496Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2597536Z method(*args, **kwargs) 2025-12-04T11:45:25.2597690Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2597727Z with policy(): 2025-12-04T11:45:25.2597881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2597935Z raise RuntimeError(msg) 2025-12-04T11:45:25.2598317Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.2598321Z 2025-12-04T11:45:25.2598394Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2598651Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2598664Z 2025-12-04T11:45:25.2598751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2598827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2598870Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2598939Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2599425Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2599524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2599562Z graph_break [] 2025-12-04T11:45:25.2599622Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2599695Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2600204Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2600252Z current_size = base.storage().size() 2025-12-04T11:45:25.2600294Z Autotune Choices Stats: 2025-12-04T11:45:25.2600663Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2600718Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2600770Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2600893Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2601131Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2601359Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2601583Z triton_mm_2 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2601806Z triton_mm_0 0.0078 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2601860Z _scaled_mm 0.0235 ms 25.2% 2025-12-04T11:45:25.2601986Z SingleProcess AUTOTUNE benchmarking takes 0.0255 seconds and 0.1662 seconds precompiling for 5 choices 2025-12-04T11:45:25.2602129Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2602175Z Traceback (most recent call last): 2025-12-04T11:45:25.2602332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2602383Z method(*args, **kwargs) 2025-12-04T11:45:25.2602535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2602575Z method(*args, **kwargs) 2025-12-04T11:45:25.2602725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2602773Z with policy(): 2025-12-04T11:45:25.2602927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2602969Z raise RuntimeError(msg) 2025-12-04T11:45:25.2603385Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.2603389Z 2025-12-04T11:45:25.2603464Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2603717Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2603721Z 2025-12-04T11:45:25.2603811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2603884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2603930Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2604000Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2604485Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2604587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2604623Z graph_break [] 2025-12-04T11:45:25.2604684Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2604759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2605247Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2605294Z current_size = base.storage().size() 2025-12-04T11:45:25.2605334Z Autotune Choices Stats: 2025-12-04T11:45:25.2605704Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2605773Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2605823Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2605945Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2606179Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2606408Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2606644Z triton_mm_2 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2606880Z triton_mm_0 0.0078 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2606925Z _scaled_mm 0.0235 ms 25.2% 2025-12-04T11:45:25.2607050Z SingleProcess AUTOTUNE benchmarking takes 0.0255 seconds and 0.1662 seconds precompiling for 5 choices 2025-12-04T11:45:25.2607124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2607165Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2607222Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2607321Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2607818Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2607856Z graph_break [] 2025-12-04T11:45:25.2607916Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2607989Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2608031Z Autotune Choices Stats: 2025-12-04T11:45:25.2608396Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2608449Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2608501Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2608619Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2608851Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2609079Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2609305Z triton_mm_6 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2609540Z triton_mm_4 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2609583Z _scaled_mm 0.0239 ms 25.8% 2025-12-04T11:45:25.2609710Z SingleProcess AUTOTUNE benchmarking takes 0.0249 seconds and 0.1361 seconds precompiling for 5 choices 2025-12-04T11:45:25.2609764Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2609902Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2609960Z Traceback (most recent call last): 2025-12-04T11:45:25.2610116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2610156Z method(*args, **kwargs) 2025-12-04T11:45:25.2610321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2610362Z method(*args, **kwargs) 2025-12-04T11:45:25.2610515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2610551Z with policy(): 2025-12-04T11:45:25.2610706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2610746Z raise RuntimeError(msg) 2025-12-04T11:45:25.2611133Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2611136Z 2025-12-04T11:45:25.2611209Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2611465Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2611467Z 2025-12-04T11:45:25.2611565Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2611639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2611681Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2611737Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2612218Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2612319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2612357Z graph_break [] 2025-12-04T11:45:25.2612416Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2612490Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2612976Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2613024Z current_size = base.storage().size() 2025-12-04T11:45:25.2613066Z Autotune Choices Stats: 2025-12-04T11:45:25.2613468Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2613536Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2613585Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2613705Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2613939Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2614186Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2614432Z triton_mm_2 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2614655Z triton_mm_0 0.0078 ms 76.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2614698Z _scaled_mm 0.0235 ms 25.2% 2025-12-04T11:45:25.2614825Z SingleProcess AUTOTUNE benchmarking takes 0.0255 seconds and 0.1662 seconds precompiling for 5 choices 2025-12-04T11:45:25.2614900Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2614943Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2614998Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2615102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2615592Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2615633Z graph_break [] 2025-12-04T11:45:25.2615692Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2615768Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2615807Z Autotune Choices Stats: 2025-12-04T11:45:25.2616171Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2616226Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2616275Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2616394Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2616626Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2616852Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2617089Z triton_mm_6 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2617314Z triton_mm_4 0.0077 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2617354Z _scaled_mm 0.0239 ms 25.8% 2025-12-04T11:45:25.2617481Z SingleProcess AUTOTUNE benchmarking takes 0.0249 seconds and 0.1361 seconds precompiling for 5 choices 2025-12-04T11:45:25.2617565Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2617607Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2617663Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2617762Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2618254Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2618290Z graph_break [] 2025-12-04T11:45:25.2618349Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2618421Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2618463Z Autotune Choices Stats: 2025-12-04T11:45:25.2618818Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2618872Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2618919Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2619052Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2619283Z triton_mm_10 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2619512Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2619738Z triton_mm_11 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2619964Z triton_mm_8 0.0078 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2620006Z _scaled_mm 0.0222 ms 27.6% 2025-12-04T11:45:25.2620131Z SingleProcess AUTOTUNE benchmarking takes 0.0349 seconds and 0.2297 seconds precompiling for 5 choices 2025-12-04T11:45:25.2620318Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-305265071c504123.xml - 2025-12-04T11:45:25.2620379Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2620959Z FAILED [0.7350s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2620973Z 2025-12-04T11:45:25.2621045Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2621302Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2621315Z 2025-12-04T11:45:25.2621404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2621466Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2621534Z ================== 1 failed, 187 deselected, 2 rerun in 3.67s ================== 2025-12-04T11:45:25.2621582Z Got exit code 1 2025-12-04T11:45:25.2621623Z Retrying single test... 2025-12-04T11:45:25.2621767Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-425ee79d470c5b66.xml 2025-12-04T11:45:25.2621824Z ============================= test session starts ============================== 2025-12-04T11:45:25.2621934Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2621975Z cachedir: .pytest_cache 2025-12-04T11:45:25.2622133Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2622181Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2622220Z configfile: pytest.ini 2025-12-04T11:45:25.2622382Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2622459Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2622721Z stepcurrent: skipping 101 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2622764Z Running 1 items in this shard 2025-12-04T11:45:25.2622766Z 2025-12-04T11:45:25.2622976Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1426s] [100%] 2025-12-04T11:45:25.2623184Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8092s] [100%] 2025-12-04T11:45:25.2623398Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7468s] [100%] 2025-12-04T11:45:25.2623401Z 2025-12-04T11:45:25.2623454Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2623593Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2623643Z Traceback (most recent call last): 2025-12-04T11:45:25.2623802Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2623843Z method(*args, **kwargs) 2025-12-04T11:45:25.2623997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2624040Z method(*args, **kwargs) 2025-12-04T11:45:25.2624190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2624227Z with policy(): 2025-12-04T11:45:25.2624381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2624442Z raise RuntimeError(msg) 2025-12-04T11:45:25.2624825Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1023410176. 2025-12-04T11:45:25.2624827Z 2025-12-04T11:45:25.2624901Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2625159Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2625176Z 2025-12-04T11:45:25.2625263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2625336Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2625392Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2625449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2625929Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2626030Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2626068Z graph_break [] 2025-12-04T11:45:25.2626128Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2626202Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2626704Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2626753Z current_size = base.storage().size() 2025-12-04T11:45:25.2626793Z Autotune Choices Stats: 2025-12-04T11:45:25.2627158Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:25.2627214Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2627262Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2627387Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2627624Z triton_mm_3 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2627854Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2628081Z triton_mm_2 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2628302Z triton_mm_0 0.0078 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2628355Z _scaled_mm 0.0230 ms 25.4% 2025-12-04T11:45:25.2628481Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1608 seconds precompiling for 5 choices 2025-12-04T11:45:25.2628621Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2628667Z Traceback (most recent call last): 2025-12-04T11:45:25.2628822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2628879Z method(*args, **kwargs) 2025-12-04T11:45:25.2629033Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2629072Z method(*args, **kwargs) 2025-12-04T11:45:25.2629233Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2629271Z with policy(): 2025-12-04T11:45:25.2629423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2629464Z raise RuntimeError(msg) 2025-12-04T11:45:25.2629857Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1023410176 and is now 1059061760. 2025-12-04T11:45:25.2629860Z 2025-12-04T11:45:25.2629933Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2630186Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2630190Z 2025-12-04T11:45:25.2630277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2630351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2630405Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2630461Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2630945Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2631044Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2631082Z graph_break [] 2025-12-04T11:45:25.2631143Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2631218Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2631702Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2631750Z current_size = base.storage().size() 2025-12-04T11:45:25.2631791Z Autotune Choices Stats: 2025-12-04T11:45:25.2632156Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:25.2632221Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2632269Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2632392Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2632627Z triton_mm_3 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2632856Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2633099Z triton_mm_2 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2633344Z triton_mm_0 0.0078 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2633385Z _scaled_mm 0.0230 ms 25.4% 2025-12-04T11:45:25.2633513Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1608 seconds precompiling for 5 choices 2025-12-04T11:45:25.2633590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2633632Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2633689Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2633788Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2634282Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2634319Z graph_break [] 2025-12-04T11:45:25.2634379Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2634452Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2634493Z Autotune Choices Stats: 2025-12-04T11:45:25.2634852Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2634909Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2634958Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2635079Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2635312Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2635538Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2635764Z triton_mm_6 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2636002Z triton_mm_4 0.0078 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2636044Z _scaled_mm 0.0172 ms 34.3% 2025-12-04T11:45:25.2636170Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1359 seconds precompiling for 5 choices 2025-12-04T11:45:25.2636224Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2636363Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2636425Z Traceback (most recent call last): 2025-12-04T11:45:25.2636580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2636621Z method(*args, **kwargs) 2025-12-04T11:45:25.2636788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2636829Z method(*args, **kwargs) 2025-12-04T11:45:25.2636981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2637018Z with policy(): 2025-12-04T11:45:25.2637170Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2637211Z raise RuntimeError(msg) 2025-12-04T11:45:25.2637599Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2637603Z 2025-12-04T11:45:25.2637675Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2637935Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2637949Z 2025-12-04T11:45:25.2638036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2638109Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2638151Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2638207Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2638691Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2638791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2638827Z graph_break [] 2025-12-04T11:45:25.2638887Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2638961Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2639449Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2639497Z current_size = base.storage().size() 2025-12-04T11:45:25.2639537Z Autotune Choices Stats: 2025-12-04T11:45:25.2639901Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:25.2639967Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2640016Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2640136Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2640369Z triton_mm_3 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2640623Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2640852Z triton_mm_2 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2641076Z triton_mm_0 0.0078 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2641119Z _scaled_mm 0.0230 ms 25.4% 2025-12-04T11:45:25.2641249Z SingleProcess AUTOTUNE benchmarking takes 0.0246 seconds and 0.1608 seconds precompiling for 5 choices 2025-12-04T11:45:25.2641322Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2641364Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2641419Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2641521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2642008Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2642046Z graph_break [] 2025-12-04T11:45:25.2642105Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2642181Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2642221Z Autotune Choices Stats: 2025-12-04T11:45:25.2642583Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2642638Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2642687Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2642804Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2643037Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2643302Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2643552Z triton_mm_6 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2643775Z triton_mm_4 0.0078 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2643815Z _scaled_mm 0.0172 ms 34.3% 2025-12-04T11:45:25.2643943Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1359 seconds precompiling for 5 choices 2025-12-04T11:45:25.2644028Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2644071Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2644128Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2644241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2644720Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2644759Z graph_break [] 2025-12-04T11:45:25.2644819Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:25.2644894Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2644935Z Autotune Choices Stats: 2025-12-04T11:45:25.2645295Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005799999926239252, "best_triton_pos": 0} 2025-12-04T11:45:25.2645350Z AUTOTUNE scaled_mm(3x1024, 1024x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2645411Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2645532Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2645764Z triton_mm_9 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2645996Z triton_mm_11 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2646220Z triton_mm_10 0.0064 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2646263Z _scaled_mm 0.0072 ms 81.0% 2025-12-04T11:45:25.2646486Z triton_mm_8 0.0080 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2646615Z SingleProcess AUTOTUNE benchmarking takes 0.0322 seconds and 0.2381 seconds precompiling for 5 choices 2025-12-04T11:45:25.2646806Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-425ee79d470c5b66.xml - 2025-12-04T11:45:25.2646868Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2647450Z FAILED [0.7468s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1059061760 and is now 1094713344. 2025-12-04T11:45:25.2647463Z 2025-12-04T11:45:25.2647536Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2647792Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2647804Z 2025-12-04T11:45:25.2647891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2647955Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2648035Z ================== 1 failed, 187 deselected, 2 rerun in 3.72s ================== 2025-12-04T11:45:25.2648073Z Got exit code 1 2025-12-04T11:45:25.2648278Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2648406Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2648550Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2130fdee0e7efc0d.xml 2025-12-04T11:45:25.2648608Z ============================= test session starts ============================== 2025-12-04T11:45:25.2648719Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2648762Z cachedir: .pytest_cache 2025-12-04T11:45:25.2648921Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2648969Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2649010Z configfile: pytest.ini 2025-12-04T11:45:25.2649180Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2649258Z collecting ... collected 188 items / 102 deselected / 86 selected 2025-12-04T11:45:25.2649314Z stepcurrent: skipping 102 already run items. 2025-12-04T11:45:25.2649358Z Running 86 items in this shard 2025-12-04T11:45:25.2649360Z 2025-12-04T11:45:25.2649578Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.5062s] [ 1%] 2025-12-04T11:45:25.2649790Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.0223s] [ 1%] 2025-12-04T11:45:25.2649978Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.9690s] [ 1%] 2025-12-04T11:45:25.2649982Z 2025-12-04T11:45:25.2650032Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2650173Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2650220Z Traceback (most recent call last): 2025-12-04T11:45:25.2650378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2650421Z method(*args, **kwargs) 2025-12-04T11:45:25.2650577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2650621Z method(*args, **kwargs) 2025-12-04T11:45:25.2650774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2650826Z with policy(): 2025-12-04T11:45:25.2650978Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2651022Z raise RuntimeError(msg) 2025-12-04T11:45:25.2651406Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:25.2651411Z 2025-12-04T11:45:25.2651496Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2651754Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2651758Z 2025-12-04T11:45:25.2651856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2651930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2651971Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2652030Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2652522Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2652622Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2652661Z graph_break [] 2025-12-04T11:45:25.2652726Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2652802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2653340Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2653390Z current_size = base.storage().size() 2025-12-04T11:45:25.2653429Z Autotune Choices Stats: 2025-12-04T11:45:25.2653805Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.2653873Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2653923Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2654046Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2654288Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2654517Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2654746Z triton_mm_7 0.0069 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2654986Z triton_mm_12 0.0070 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2655214Z triton_mm_6 0.0078 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2655443Z triton_mm_9 0.0082 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2655692Z triton_mm_10 0.0088 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2655921Z triton_mm_14 0.0088 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2656145Z triton_mm_5 0.0088 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2656371Z triton_mm_11 0.0104 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2656503Z SingleProcess AUTOTUNE benchmarking takes 0.0867 seconds and 0.3931 seconds precompiling for 20 choices 2025-12-04T11:45:25.2656645Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2656694Z Traceback (most recent call last): 2025-12-04T11:45:25.2656866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2656907Z method(*args, **kwargs) 2025-12-04T11:45:25.2657059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2657102Z method(*args, **kwargs) 2025-12-04T11:45:25.2657253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2657296Z with policy(): 2025-12-04T11:45:25.2657449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2657490Z raise RuntimeError(msg) 2025-12-04T11:45:25.2657883Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:25.2657886Z 2025-12-04T11:45:25.2657959Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2658221Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2658226Z 2025-12-04T11:45:25.2658313Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2658386Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2658428Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2658495Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2658985Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2659086Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2659122Z graph_break [] 2025-12-04T11:45:25.2659198Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2659272Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2659769Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2659818Z current_size = base.storage().size() 2025-12-04T11:45:25.2659858Z Autotune Choices Stats: 2025-12-04T11:45:25.2660224Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.2660292Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2660342Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2660462Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2660702Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2660941Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2661171Z triton_mm_7 0.0069 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2661395Z triton_mm_12 0.0070 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2661625Z triton_mm_6 0.0078 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2661854Z triton_mm_9 0.0082 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2662079Z triton_mm_10 0.0088 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2662305Z triton_mm_14 0.0088 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2662539Z triton_mm_5 0.0088 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2662768Z triton_mm_11 0.0104 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2662897Z SingleProcess AUTOTUNE benchmarking takes 0.0867 seconds and 0.3931 seconds precompiling for 20 choices 2025-12-04T11:45:25.2662983Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2663024Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2663083Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2663183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2663718Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2663758Z graph_break [] 2025-12-04T11:45:25.2663819Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2663893Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2663933Z Autotune Choices Stats: 2025-12-04T11:45:25.2664295Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2664358Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2664408Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2664540Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2664774Z triton_mm_35 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2665003Z triton_mm_36 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2665230Z triton_mm_26 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2665458Z triton_mm_31 0.0070 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2665686Z triton_mm_25 0.0078 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2665911Z triton_mm_33 0.0084 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2666143Z triton_mm_28 0.0084 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2666382Z triton_mm_29 0.0084 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2666605Z triton_mm_24 0.0090 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2666843Z triton_mm_30 0.0103 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2666972Z SingleProcess AUTOTUNE benchmarking takes 0.1215 seconds and 0.3012 seconds precompiling for 20 choices 2025-12-04T11:45:25.2667037Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2667179Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2667224Z Traceback (most recent call last): 2025-12-04T11:45:25.2667380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2667420Z method(*args, **kwargs) 2025-12-04T11:45:25.2667573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2667614Z method(*args, **kwargs) 2025-12-04T11:45:25.2667768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2667804Z with policy(): 2025-12-04T11:45:25.2667961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2668001Z raise RuntimeError(msg) 2025-12-04T11:45:25.2668400Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2668403Z 2025-12-04T11:45:25.2668476Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2668736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2668739Z 2025-12-04T11:45:25.2668826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2668901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2668946Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2669002Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2669492Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2669592Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2669629Z graph_break [] 2025-12-04T11:45:25.2669691Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2669764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2670262Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2670310Z current_size = base.storage().size() 2025-12-04T11:45:25.2670349Z Autotune Choices Stats: 2025-12-04T11:45:25.2670721Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.2670796Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2670864Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2670986Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2671223Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2671453Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2671680Z triton_mm_7 0.0069 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2671908Z triton_mm_12 0.0070 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2672146Z triton_mm_6 0.0078 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2672374Z triton_mm_9 0.0082 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2672597Z triton_mm_10 0.0088 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2672824Z triton_mm_14 0.0088 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2673051Z triton_mm_5 0.0088 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2673304Z triton_mm_11 0.0104 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2673434Z SingleProcess AUTOTUNE benchmarking takes 0.0867 seconds and 0.3931 seconds precompiling for 20 choices 2025-12-04T11:45:25.2673507Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2673550Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2673621Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2673721Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2674208Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2674258Z graph_break [] 2025-12-04T11:45:25.2674321Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2674394Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2674434Z Autotune Choices Stats: 2025-12-04T11:45:25.2674813Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2674877Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2674927Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2675046Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2675276Z triton_mm_35 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2675504Z triton_mm_36 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2675744Z triton_mm_26 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2675967Z triton_mm_31 0.0070 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2676196Z triton_mm_25 0.0078 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2676421Z triton_mm_33 0.0084 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2676650Z triton_mm_28 0.0084 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2676873Z triton_mm_29 0.0084 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2677096Z triton_mm_24 0.0090 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2677323Z triton_mm_30 0.0103 ms 59.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2677462Z SingleProcess AUTOTUNE benchmarking takes 0.1215 seconds and 0.3012 seconds precompiling for 20 choices 2025-12-04T11:45:25.2677537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2677578Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2677635Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2677733Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2678219Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2678278Z graph_break [] 2025-12-04T11:45:25.2678340Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2678413Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2678454Z Autotune Choices Stats: 2025-12-04T11:45:25.2678817Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2678880Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2678931Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2679051Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2679287Z triton_mm_54 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2679527Z triton_mm_55 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2679755Z triton_mm_45 0.0071 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2679979Z triton_mm_50 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2680210Z triton_mm_44 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2680438Z triton_mm_47 0.0084 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2680661Z triton_mm_52 0.0086 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2680885Z triton_mm_48 0.0088 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2681119Z triton_mm_43 0.0089 ms 68.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2681348Z triton_mm_49 0.0102 ms 59.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2681476Z SingleProcess AUTOTUNE benchmarking takes 0.1400 seconds and 0.2774 seconds precompiling for 20 choices 2025-12-04T11:45:25.2681681Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2130fdee0e7efc0d.xml - 2025-12-04T11:45:25.2681743Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2682338Z FAILED [0.9690s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2682342Z 2025-12-04T11:45:25.2682416Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2682674Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2682677Z 2025-12-04T11:45:25.2682768Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2682830Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2682903Z ================== 1 failed, 102 deselected, 2 rerun in 4.52s ================== 2025-12-04T11:45:25.2682941Z Got exit code 1 2025-12-04T11:45:25.2682984Z Retrying single test... 2025-12-04T11:45:25.2683137Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-52717aaff87a7c73.xml 2025-12-04T11:45:25.2683195Z ============================= test session starts ============================== 2025-12-04T11:45:25.2683336Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2683379Z cachedir: .pytest_cache 2025-12-04T11:45:25.2683537Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2683586Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2683626Z configfile: pytest.ini 2025-12-04T11:45:25.2683789Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2683867Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2684124Z stepcurrent: skipping 102 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2684169Z Running 1 items in this shard 2025-12-04T11:45:25.2684172Z 2025-12-04T11:45:25.2684386Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.5011s] [100%] 2025-12-04T11:45:25.2684600Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.0468s] [100%] 2025-12-04T11:45:25.2684790Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.9693s] [100%] 2025-12-04T11:45:25.2684809Z 2025-12-04T11:45:25.2684862Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2685005Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2685054Z Traceback (most recent call last): 2025-12-04T11:45:25.2685211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2685254Z method(*args, **kwargs) 2025-12-04T11:45:25.2685407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2685470Z method(*args, **kwargs) 2025-12-04T11:45:25.2685623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2685662Z with policy(): 2025-12-04T11:45:25.2685830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2685874Z raise RuntimeError(msg) 2025-12-04T11:45:25.2686264Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:25.2686266Z 2025-12-04T11:45:25.2686339Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2686600Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2686602Z 2025-12-04T11:45:25.2686690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2686767Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2686809Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2686867Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2687365Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2687465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2687501Z graph_break [] 2025-12-04T11:45:25.2687564Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2687638Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2688132Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2688180Z current_size = base.storage().size() 2025-12-04T11:45:25.2688221Z Autotune Choices Stats: 2025-12-04T11:45:25.2688590Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2688654Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2688716Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2688836Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2689076Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2689306Z triton_mm_17 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2689546Z triton_mm_7 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2689784Z triton_mm_12 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2690013Z triton_mm_6 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2690240Z triton_mm_9 0.0084 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2690464Z triton_mm_10 0.0084 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2690689Z triton_mm_5 0.0087 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2690922Z triton_mm_14 0.0090 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2691150Z triton_mm_11 0.0102 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2691279Z SingleProcess AUTOTUNE benchmarking takes 0.0858 seconds and 0.3942 seconds precompiling for 20 choices 2025-12-04T11:45:25.2691422Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2691472Z Traceback (most recent call last): 2025-12-04T11:45:25.2691628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2691669Z method(*args, **kwargs) 2025-12-04T11:45:25.2691822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2691865Z method(*args, **kwargs) 2025-12-04T11:45:25.2692014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2692055Z with policy(): 2025-12-04T11:45:25.2692209Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2692249Z raise RuntimeError(msg) 2025-12-04T11:45:25.2692640Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:25.2692653Z 2025-12-04T11:45:25.2692730Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2692987Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2692990Z 2025-12-04T11:45:25.2693077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2693162Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2693204Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2693301Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2693805Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2693906Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2693942Z graph_break [] 2025-12-04T11:45:25.2694004Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2694079Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2694566Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2694614Z current_size = base.storage().size() 2025-12-04T11:45:25.2694657Z Autotune Choices Stats: 2025-12-04T11:45:25.2695044Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2695111Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2695162Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2695280Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2695515Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2695746Z triton_mm_17 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2695975Z triton_mm_7 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2696200Z triton_mm_12 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2696429Z triton_mm_6 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2696673Z triton_mm_9 0.0084 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2696896Z triton_mm_10 0.0084 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2697139Z triton_mm_5 0.0087 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2697374Z triton_mm_14 0.0090 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2697602Z triton_mm_11 0.0102 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2697730Z SingleProcess AUTOTUNE benchmarking takes 0.0858 seconds and 0.3942 seconds precompiling for 20 choices 2025-12-04T11:45:25.2697804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2697848Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2697908Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2698008Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2698522Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2698560Z graph_break [] 2025-12-04T11:45:25.2698622Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2698695Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2698736Z Autotune Choices Stats: 2025-12-04T11:45:25.2699104Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2699170Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2699221Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2699341Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2699578Z triton_mm_36 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2699805Z triton_mm_35 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2700032Z triton_mm_31 0.0070 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2700278Z triton_mm_26 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2700505Z triton_mm_25 0.0077 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2700733Z triton_mm_28 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2700977Z triton_mm_24 0.0088 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2701202Z triton_mm_33 0.0088 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2701427Z triton_mm_29 0.0089 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2701654Z triton_mm_30 0.0102 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2701787Z SingleProcess AUTOTUNE benchmarking takes 0.1204 seconds and 0.3041 seconds precompiling for 20 choices 2025-12-04T11:45:25.2701840Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2701984Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2702031Z Traceback (most recent call last): 2025-12-04T11:45:25.2702201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2702241Z method(*args, **kwargs) 2025-12-04T11:45:25.2702394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2702434Z method(*args, **kwargs) 2025-12-04T11:45:25.2702586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2702622Z with policy(): 2025-12-04T11:45:25.2702778Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2702820Z raise RuntimeError(msg) 2025-12-04T11:45:25.2703214Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2703216Z 2025-12-04T11:45:25.2703319Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2703578Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2703582Z 2025-12-04T11:45:25.2703670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2703742Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2703802Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2703858Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2704348Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2704446Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2704497Z graph_break [] 2025-12-04T11:45:25.2704562Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2704637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2705137Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2705188Z current_size = base.storage().size() 2025-12-04T11:45:25.2705228Z Autotune Choices Stats: 2025-12-04T11:45:25.2705596Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2705661Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2705711Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2705833Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2706083Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2706314Z triton_mm_17 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2706540Z triton_mm_7 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2706771Z triton_mm_12 0.0070 ms 84.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2707005Z triton_mm_6 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2707235Z triton_mm_9 0.0084 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2707460Z triton_mm_10 0.0084 ms 70.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2707684Z triton_mm_5 0.0087 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2707923Z triton_mm_14 0.0090 ms 66.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2708149Z triton_mm_11 0.0102 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2708289Z SingleProcess AUTOTUNE benchmarking takes 0.0858 seconds and 0.3942 seconds precompiling for 20 choices 2025-12-04T11:45:25.2708363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2708407Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2708463Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2708582Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2709073Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2709110Z graph_break [] 2025-12-04T11:45:25.2709172Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2709248Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2709292Z Autotune Choices Stats: 2025-12-04T11:45:25.2709662Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2709737Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2709788Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2709909Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2710143Z triton_mm_36 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2710373Z triton_mm_35 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2710600Z triton_mm_31 0.0070 ms 86.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2710827Z triton_mm_26 0.0072 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2711059Z triton_mm_25 0.0077 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2711287Z triton_mm_28 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2711523Z triton_mm_24 0.0088 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2711748Z triton_mm_33 0.0088 ms 69.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2711973Z triton_mm_29 0.0089 ms 68.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2712209Z triton_mm_30 0.0102 ms 59.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2712351Z SingleProcess AUTOTUNE benchmarking takes 0.1204 seconds and 0.3041 seconds precompiling for 20 choices 2025-12-04T11:45:25.2712427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2712469Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2712526Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2712624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2713110Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2713148Z graph_break [] 2025-12-04T11:45:25.2713212Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2713316Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2713358Z Autotune Choices Stats: 2025-12-04T11:45:25.2713748Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:25.2713811Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2713862Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2713981Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2714214Z triton_mm_54 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2714441Z triton_mm_55 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2714668Z triton_mm_45 0.0071 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2714891Z triton_mm_50 0.0072 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2715124Z triton_mm_44 0.0079 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2715367Z triton_mm_47 0.0080 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2715590Z triton_mm_48 0.0087 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2715828Z triton_mm_52 0.0087 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2716063Z triton_mm_43 0.0090 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2716292Z triton_mm_49 0.0106 ms 59.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2716420Z SingleProcess AUTOTUNE benchmarking takes 0.1379 seconds and 0.2769 seconds precompiling for 20 choices 2025-12-04T11:45:25.2716612Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-52717aaff87a7c73.xml - 2025-12-04T11:45:25.2716673Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2717270Z FAILED [0.9693s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2717274Z 2025-12-04T11:45:25.2717347Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2717605Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2717608Z 2025-12-04T11:45:25.2717696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2717758Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2717827Z ================== 1 failed, 187 deselected, 2 rerun in 4.54s ================== 2025-12-04T11:45:25.2717865Z Got exit code 1 2025-12-04T11:45:25.2717905Z Retrying single test... 2025-12-04T11:45:25.2718049Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-547129065373a3b5.xml 2025-12-04T11:45:25.2718112Z ============================= test session starts ============================== 2025-12-04T11:45:25.2718222Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2718264Z cachedir: .pytest_cache 2025-12-04T11:45:25.2718422Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2718470Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2718510Z configfile: pytest.ini 2025-12-04T11:45:25.2718671Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2718747Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2719013Z stepcurrent: skipping 102 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2719058Z Running 1 items in this shard 2025-12-04T11:45:25.2719060Z 2025-12-04T11:45:25.2719273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4946s] [100%] 2025-12-04T11:45:25.2719485Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9918s] [100%] 2025-12-04T11:45:25.2719688Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.9167s] [100%] 2025-12-04T11:45:25.2719690Z 2025-12-04T11:45:25.2719753Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2719895Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2719943Z Traceback (most recent call last): 2025-12-04T11:45:25.2720103Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2720144Z method(*args, **kwargs) 2025-12-04T11:45:25.2720298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2720341Z method(*args, **kwargs) 2025-12-04T11:45:25.2720492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2720530Z with policy(): 2025-12-04T11:45:25.2720682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2720725Z raise RuntimeError(msg) 2025-12-04T11:45:25.2721124Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1054867456. 2025-12-04T11:45:25.2721126Z 2025-12-04T11:45:25.2721199Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2721458Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2721461Z 2025-12-04T11:45:25.2721549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2721622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2721666Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2721725Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2722218Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2722318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2722357Z graph_break [] 2025-12-04T11:45:25.2722421Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2722494Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2722982Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2723041Z current_size = base.storage().size() 2025-12-04T11:45:25.2723080Z Autotune Choices Stats: 2025-12-04T11:45:25.2723481Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2723559Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2723609Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2723746Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2723987Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2724216Z triton_mm_17 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2724446Z triton_mm_7 0.0069 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2724670Z triton_mm_12 0.0072 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2724912Z triton_mm_6 0.0079 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2725140Z triton_mm_9 0.0084 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2725365Z triton_mm_10 0.0086 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2725592Z triton_mm_5 0.0087 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2725817Z triton_mm_14 0.0089 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2726044Z triton_mm_11 0.0102 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2726176Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3900 seconds precompiling for 20 choices 2025-12-04T11:45:25.2726317Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2726364Z Traceback (most recent call last): 2025-12-04T11:45:25.2726533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2726574Z method(*args, **kwargs) 2025-12-04T11:45:25.2726727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2726767Z method(*args, **kwargs) 2025-12-04T11:45:25.2726920Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2726958Z with policy(): 2025-12-04T11:45:25.2727112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2727164Z raise RuntimeError(msg) 2025-12-04T11:45:25.2727567Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1054867456 and is now 1121976320. 2025-12-04T11:45:25.2727570Z 2025-12-04T11:45:25.2727644Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2727902Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2727905Z 2025-12-04T11:45:25.2727992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2728066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2728109Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2728166Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2728660Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2728760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2728796Z graph_break [] 2025-12-04T11:45:25.2728858Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2728932Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2729417Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2729464Z current_size = base.storage().size() 2025-12-04T11:45:25.2729507Z Autotune Choices Stats: 2025-12-04T11:45:25.2729875Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2729937Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2729989Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2730110Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2730348Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2730589Z triton_mm_17 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2730818Z triton_mm_7 0.0069 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2731041Z triton_mm_12 0.0072 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2731295Z triton_mm_6 0.0079 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2731526Z triton_mm_9 0.0084 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2731751Z triton_mm_10 0.0086 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2731975Z triton_mm_5 0.0087 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2732200Z triton_mm_14 0.0089 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2732440Z triton_mm_11 0.0102 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2732570Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3900 seconds precompiling for 20 choices 2025-12-04T11:45:25.2732646Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2732688Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2732749Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2732847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2733378Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2733420Z graph_break [] 2025-12-04T11:45:25.2733483Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2733560Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2733601Z Autotune Choices Stats: 2025-12-04T11:45:25.2733967Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:25.2734030Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2734097Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2734216Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2734455Z triton_mm_35 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2734683Z triton_mm_36 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2734925Z triton_mm_26 0.0070 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2735161Z triton_mm_31 0.0072 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2735394Z triton_mm_25 0.0080 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2735623Z triton_mm_28 0.0083 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2735846Z triton_mm_29 0.0084 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2736073Z triton_mm_33 0.0085 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2736310Z triton_mm_24 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2736536Z triton_mm_30 0.0103 ms 58.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2736665Z SingleProcess AUTOTUNE benchmarking takes 0.1168 seconds and 0.2949 seconds precompiling for 20 choices 2025-12-04T11:45:25.2736717Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2736860Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2736907Z Traceback (most recent call last): 2025-12-04T11:45:25.2737066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2737106Z method(*args, **kwargs) 2025-12-04T11:45:25.2737262Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2737302Z method(*args, **kwargs) 2025-12-04T11:45:25.2737454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2737492Z with policy(): 2025-12-04T11:45:25.2737645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2737686Z raise RuntimeError(msg) 2025-12-04T11:45:25.2738095Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2738097Z 2025-12-04T11:45:25.2738170Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2738431Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2738444Z 2025-12-04T11:45:25.2738531Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2738604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2738649Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2738706Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2739210Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2739308Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2739345Z graph_break [] 2025-12-04T11:45:25.2739407Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2739480Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2739963Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2740022Z current_size = base.storage().size() 2025-12-04T11:45:25.2740062Z Autotune Choices Stats: 2025-12-04T11:45:25.2740430Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2740497Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2740547Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2740667Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2740905Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2741136Z triton_mm_17 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2741361Z triton_mm_7 0.0069 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2741588Z triton_mm_12 0.0072 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2741831Z triton_mm_6 0.0079 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2742059Z triton_mm_9 0.0084 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2742286Z triton_mm_10 0.0086 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2742526Z triton_mm_5 0.0087 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2742758Z triton_mm_14 0.0089 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2742984Z triton_mm_11 0.0102 ms 57.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2743112Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3900 seconds precompiling for 20 choices 2025-12-04T11:45:25.2743187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2743228Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2743325Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2743426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2743931Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2743968Z graph_break [] 2025-12-04T11:45:25.2744030Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2744103Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2744145Z Autotune Choices Stats: 2025-12-04T11:45:25.2744505Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:25.2744568Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2744620Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2744740Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2744973Z triton_mm_35 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2745203Z triton_mm_36 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2745431Z triton_mm_26 0.0070 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2745671Z triton_mm_31 0.0072 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2745901Z triton_mm_25 0.0080 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2746141Z triton_mm_28 0.0083 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2746377Z triton_mm_29 0.0084 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2746601Z triton_mm_33 0.0085 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2746825Z triton_mm_24 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2747052Z triton_mm_30 0.0103 ms 58.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2747180Z SingleProcess AUTOTUNE benchmarking takes 0.1168 seconds and 0.2949 seconds precompiling for 20 choices 2025-12-04T11:45:25.2747257Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2747298Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2747365Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2747465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2747951Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2747988Z graph_break [] 2025-12-04T11:45:25.2748049Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:25.2748124Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2748165Z Autotune Choices Stats: 2025-12-04T11:45:25.2748526Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:25.2748589Z AUTOTUNE scaled_mm(3x1024, 1024x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2748638Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2748758Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2748999Z triton_mm_54 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2749244Z triton_mm_55 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2749471Z triton_mm_45 0.0069 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2749695Z triton_mm_50 0.0071 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2749935Z triton_mm_44 0.0078 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2750177Z triton_mm_47 0.0080 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2750401Z triton_mm_48 0.0086 ms 69.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2750626Z triton_mm_43 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2750852Z triton_mm_52 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2751088Z triton_mm_49 0.0100 ms 60.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2751217Z SingleProcess AUTOTUNE benchmarking takes 0.1226 seconds and 0.2790 seconds precompiling for 20 choices 2025-12-04T11:45:25.2751407Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-547129065373a3b5.xml - 2025-12-04T11:45:25.2751467Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2752051Z FAILED [0.9167s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1121976320 and is now 1189085184. 2025-12-04T11:45:25.2752055Z 2025-12-04T11:45:25.2752130Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2752388Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2752390Z 2025-12-04T11:45:25.2752478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2752541Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2752609Z ================== 1 failed, 187 deselected, 2 rerun in 4.42s ================== 2025-12-04T11:45:25.2752646Z Got exit code 1 2025-12-04T11:45:25.2752855Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2752995Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2753141Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7df57868c1bd63aa.xml 2025-12-04T11:45:25.2753199Z ============================= test session starts ============================== 2025-12-04T11:45:25.2753341Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2753397Z cachedir: .pytest_cache 2025-12-04T11:45:25.2753557Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2753603Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2753643Z configfile: pytest.ini 2025-12-04T11:45:25.2753820Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2753897Z collecting ... collected 188 items / 103 deselected / 85 selected 2025-12-04T11:45:25.2753953Z stepcurrent: skipping 103 already run items. 2025-12-04T11:45:25.2753997Z Running 85 items in this shard 2025-12-04T11:45:25.2753999Z 2025-12-04T11:45:25.2754216Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5819s] [ 1%] 2025-12-04T11:45:25.2754422Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2613s] [ 1%] 2025-12-04T11:45:25.2754605Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2182s] [ 1%] 2025-12-04T11:45:25.2754608Z 2025-12-04T11:45:25.2754660Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2754798Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2754856Z Traceback (most recent call last): 2025-12-04T11:45:25.2755018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2755059Z method(*args, **kwargs) 2025-12-04T11:45:25.2755213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2755254Z method(*args, **kwargs) 2025-12-04T11:45:25.2757073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2757111Z with policy(): 2025-12-04T11:45:25.2757265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2757309Z raise RuntimeError(msg) 2025-12-04T11:45:25.2757693Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2757696Z 2025-12-04T11:45:25.2757772Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2758026Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2758029Z 2025-12-04T11:45:25.2758117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2758190Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2758263Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2758319Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2758387Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2758488Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2758526Z graph_break [] 2025-12-04T11:45:25.2758586Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2758724Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2758789Z Traceback (most recent call last): 2025-12-04T11:45:25.2758944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2758983Z method(*args, **kwargs) 2025-12-04T11:45:25.2759134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2759185Z method(*args, **kwargs) 2025-12-04T11:45:25.2759336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2759374Z with policy(): 2025-12-04T11:45:25.2759528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2759568Z raise RuntimeError(msg) 2025-12-04T11:45:25.2759952Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2759955Z 2025-12-04T11:45:25.2760030Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2760283Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2760286Z 2025-12-04T11:45:25.2760385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2760459Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2760502Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2760557Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2760624Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2760723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2760759Z graph_break [] 2025-12-04T11:45:25.2760819Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2760892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2760935Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2760991Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2761087Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2761154Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2761190Z graph_break [] 2025-12-04T11:45:25.2761248Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2761300Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2761437Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2761483Z Traceback (most recent call last): 2025-12-04T11:45:25.2761640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2761681Z method(*args, **kwargs) 2025-12-04T11:45:25.2761850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2761890Z method(*args, **kwargs) 2025-12-04T11:45:25.2762045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2762081Z with policy(): 2025-12-04T11:45:25.2762234Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2762274Z raise RuntimeError(msg) 2025-12-04T11:45:25.2762652Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2762665Z 2025-12-04T11:45:25.2762740Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2763004Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2763008Z 2025-12-04T11:45:25.2763096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2763169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2763211Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2763295Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2763362Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2763459Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2763495Z graph_break [] 2025-12-04T11:45:25.2763553Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2763629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2763669Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2763724Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2763835Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2763901Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2763936Z graph_break [] 2025-12-04T11:45:25.2763993Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2764066Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2764108Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2764162Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2764256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2764319Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2764357Z graph_break [] 2025-12-04T11:45:25.2764414Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2764608Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7df57868c1bd63aa.xml - 2025-12-04T11:45:25.2764667Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2765239Z FAILED [0.2182s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2765243Z 2025-12-04T11:45:25.2765314Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2765580Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2765582Z 2025-12-04T11:45:25.2765670Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2765732Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2765800Z ================== 1 failed, 103 deselected, 2 rerun in 2.08s ================== 2025-12-04T11:45:25.2765837Z Got exit code 1 2025-12-04T11:45:25.2765891Z Retrying single test... 2025-12-04T11:45:25.2766037Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0c09abda29c11ff.xml 2025-12-04T11:45:25.2766094Z ============================= test session starts ============================== 2025-12-04T11:45:25.2766204Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2766258Z cachedir: .pytest_cache 2025-12-04T11:45:25.2766421Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2766469Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2766509Z configfile: pytest.ini 2025-12-04T11:45:25.2766672Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2766745Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2766997Z stepcurrent: skipping 103 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2767040Z Running 1 items in this shard 2025-12-04T11:45:25.2767042Z 2025-12-04T11:45:25.2767251Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5812s] [100%] 2025-12-04T11:45:25.2767467Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2559s] [100%] 2025-12-04T11:45:25.2767651Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2155s] [100%] 2025-12-04T11:45:25.2767653Z 2025-12-04T11:45:25.2767705Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2767843Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2767888Z Traceback (most recent call last): 2025-12-04T11:45:25.2768045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2768088Z method(*args, **kwargs) 2025-12-04T11:45:25.2768241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2768281Z method(*args, **kwargs) 2025-12-04T11:45:25.2768432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2768470Z with policy(): 2025-12-04T11:45:25.2768623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2768665Z raise RuntimeError(msg) 2025-12-04T11:45:25.2769047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2769060Z 2025-12-04T11:45:25.2769134Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2769387Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2769389Z 2025-12-04T11:45:25.2769476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2769549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2769592Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2769659Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2769727Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2769825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2769862Z graph_break [] 2025-12-04T11:45:25.2769923Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2770074Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2770119Z Traceback (most recent call last): 2025-12-04T11:45:25.2770274Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2770313Z method(*args, **kwargs) 2025-12-04T11:45:25.2770465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2770505Z method(*args, **kwargs) 2025-12-04T11:45:25.2770655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2770692Z with policy(): 2025-12-04T11:45:25.2770845Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2770887Z raise RuntimeError(msg) 2025-12-04T11:45:25.2771288Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2771290Z 2025-12-04T11:45:25.2771364Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2771616Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2771619Z 2025-12-04T11:45:25.2771705Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2771778Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2771820Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2771876Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2771943Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2772042Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2772079Z graph_break [] 2025-12-04T11:45:25.2772137Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2772212Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2772252Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2772309Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2772404Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2772468Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2772503Z graph_break [] 2025-12-04T11:45:25.2772562Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2772626Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2772766Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2772812Z Traceback (most recent call last): 2025-12-04T11:45:25.2772965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2773005Z method(*args, **kwargs) 2025-12-04T11:45:25.2773156Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2773206Z method(*args, **kwargs) 2025-12-04T11:45:25.2773394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2773431Z with policy(): 2025-12-04T11:45:25.2773602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2773644Z raise RuntimeError(msg) 2025-12-04T11:45:25.2774024Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2774026Z 2025-12-04T11:45:25.2774100Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2774351Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2774354Z 2025-12-04T11:45:25.2774441Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2774515Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2774557Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2774612Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2774692Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2774789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2774825Z graph_break [] 2025-12-04T11:45:25.2774884Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2774959Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2775001Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2775056Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2775151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2775215Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2775252Z graph_break [] 2025-12-04T11:45:25.2775311Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2775383Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2775426Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2775481Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2775579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2775643Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2775679Z graph_break [] 2025-12-04T11:45:25.2775735Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2775930Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0c09abda29c11ff.xml - 2025-12-04T11:45:25.2775990Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2776573Z FAILED [0.2155s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2776576Z 2025-12-04T11:45:25.2776648Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2776897Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2776912Z 2025-12-04T11:45:25.2776998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2777060Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2777141Z ================== 1 failed, 187 deselected, 2 rerun in 2.07s ================== 2025-12-04T11:45:25.2777177Z Got exit code 1 2025-12-04T11:45:25.2777218Z Retrying single test... 2025-12-04T11:45:25.2777364Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26c706e517d4c05c.xml 2025-12-04T11:45:25.2777421Z ============================= test session starts ============================== 2025-12-04T11:45:25.2777532Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2777574Z cachedir: .pytest_cache 2025-12-04T11:45:25.2777732Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2777778Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2777817Z configfile: pytest.ini 2025-12-04T11:45:25.2777978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2778053Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2778317Z stepcurrent: skipping 103 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2778361Z Running 1 items in this shard 2025-12-04T11:45:25.2778363Z 2025-12-04T11:45:25.2778573Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6094s] [100%] 2025-12-04T11:45:25.2778780Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2731s] [100%] 2025-12-04T11:45:25.2778963Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2141s] [100%] 2025-12-04T11:45:25.2778966Z 2025-12-04T11:45:25.2779017Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2779154Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2779201Z Traceback (most recent call last): 2025-12-04T11:45:25.2779357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2779397Z method(*args, **kwargs) 2025-12-04T11:45:25.2779548Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2779589Z method(*args, **kwargs) 2025-12-04T11:45:25.2779739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2779789Z with policy(): 2025-12-04T11:45:25.2779943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2779986Z raise RuntimeError(msg) 2025-12-04T11:45:25.2780362Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2780364Z 2025-12-04T11:45:25.2780437Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2780703Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2780705Z 2025-12-04T11:45:25.2780790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2780877Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2780919Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2780976Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2781043Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2781141Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2781176Z graph_break [] 2025-12-04T11:45:25.2781234Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2781371Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2781417Z Traceback (most recent call last): 2025-12-04T11:45:25.2781569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2781612Z method(*args, **kwargs) 2025-12-04T11:45:25.2781764Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2781804Z method(*args, **kwargs) 2025-12-04T11:45:25.2781964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2782001Z with policy(): 2025-12-04T11:45:25.2782153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2782193Z raise RuntimeError(msg) 2025-12-04T11:45:25.2782572Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2782576Z 2025-12-04T11:45:25.2782649Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2782903Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2782907Z 2025-12-04T11:45:25.2782992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2783067Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2783108Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2783164Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2783230Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2783355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2783391Z graph_break [] 2025-12-04T11:45:25.2783449Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2783539Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2783582Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2783636Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2783734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2783798Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2783834Z graph_break [] 2025-12-04T11:45:25.2783891Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2783944Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2784094Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2784139Z Traceback (most recent call last): 2025-12-04T11:45:25.2784293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2784355Z method(*args, **kwargs) 2025-12-04T11:45:25.2784507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2784549Z method(*args, **kwargs) 2025-12-04T11:45:25.2784699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2784735Z with policy(): 2025-12-04T11:45:25.2784888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2784931Z raise RuntimeError(msg) 2025-12-04T11:45:25.2785312Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2785316Z 2025-12-04T11:45:25.2785389Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2785654Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2785657Z 2025-12-04T11:45:25.2785743Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2785815Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2785859Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2785914Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2785980Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2786077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2786114Z graph_break [] 2025-12-04T11:45:25.2786173Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2786246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2786287Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2786342Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2786437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2786502Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2786537Z graph_break [] 2025-12-04T11:45:25.2786595Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2786669Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2786709Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2786763Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2786860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2786938Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2786974Z graph_break [] 2025-12-04T11:45:25.2787030Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:25.2787225Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-26c706e517d4c05c.xml - 2025-12-04T11:45:25.2787284Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2787849Z FAILED [0.2141s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2787865Z 2025-12-04T11:45:25.2787948Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2788202Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2788204Z 2025-12-04T11:45:25.2788290Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2788352Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2788420Z ================== 1 failed, 187 deselected, 2 rerun in 2.11s ================== 2025-12-04T11:45:25.2788457Z Got exit code 1 2025-12-04T11:45:25.2788657Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2788783Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2788930Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e120fc0bfaaf6263.xml 2025-12-04T11:45:25.2789001Z ============================= test session starts ============================== 2025-12-04T11:45:25.2789111Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2789152Z cachedir: .pytest_cache 2025-12-04T11:45:25.2789308Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2789355Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2789395Z configfile: pytest.ini 2025-12-04T11:45:25.2789555Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2789631Z collecting ... collected 188 items / 104 deselected / 84 selected 2025-12-04T11:45:25.2789688Z stepcurrent: skipping 104 already run items. 2025-12-04T11:45:25.2789730Z Running 84 items in this shard 2025-12-04T11:45:25.2789732Z 2025-12-04T11:45:25.2789947Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6130s] [ 1%] 2025-12-04T11:45:25.2790161Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2877s] [ 1%] 2025-12-04T11:45:25.2790346Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2564s] [ 1%] 2025-12-04T11:45:25.2790349Z 2025-12-04T11:45:25.2790399Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2790537Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2790599Z Traceback (most recent call last): 2025-12-04T11:45:25.2790757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2790797Z method(*args, **kwargs) 2025-12-04T11:45:25.2790951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2790990Z method(*args, **kwargs) 2025-12-04T11:45:25.2791143Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2791190Z with policy(): 2025-12-04T11:45:25.2791346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2791386Z raise RuntimeError(msg) 2025-12-04T11:45:25.2791785Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2791791Z 2025-12-04T11:45:25.2791866Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2792119Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2792122Z 2025-12-04T11:45:25.2792210Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2792284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2792326Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2792384Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2792450Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2792549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2792587Z graph_break [] 2025-12-04T11:45:25.2792659Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2792798Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2792843Z Traceback (most recent call last): 2025-12-04T11:45:25.2792997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2793038Z method(*args, **kwargs) 2025-12-04T11:45:25.2793190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2793228Z method(*args, **kwargs) 2025-12-04T11:45:25.2793412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2793452Z with policy(): 2025-12-04T11:45:25.2793606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2793650Z raise RuntimeError(msg) 2025-12-04T11:45:25.2794033Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2794036Z 2025-12-04T11:45:25.2794110Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2794362Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2794380Z 2025-12-04T11:45:25.2794468Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2794542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2794584Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2794639Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2794705Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2794802Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2794839Z graph_break [] 2025-12-04T11:45:25.2794912Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2794985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2795025Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2795079Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2795189Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2795254Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2795289Z graph_break [] 2025-12-04T11:45:25.2795350Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2795402Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2795541Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2795587Z Traceback (most recent call last): 2025-12-04T11:45:25.2795742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2795783Z method(*args, **kwargs) 2025-12-04T11:45:25.2795936Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2795977Z method(*args, **kwargs) 2025-12-04T11:45:25.2796130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2796165Z with policy(): 2025-12-04T11:45:25.2796333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2796374Z raise RuntimeError(msg) 2025-12-04T11:45:25.2796754Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2796757Z 2025-12-04T11:45:25.2796830Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2797085Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2797088Z 2025-12-04T11:45:25.2797175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2797249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2797291Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2797345Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2797413Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2797511Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2797548Z graph_break [] 2025-12-04T11:45:25.2797605Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2797678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2797718Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2797788Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2797885Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2797949Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2797985Z graph_break [] 2025-12-04T11:45:25.2798043Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2798115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2798156Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2798209Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2798324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2798387Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2798423Z graph_break [] 2025-12-04T11:45:25.2798480Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2798689Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e120fc0bfaaf6263.xml - 2025-12-04T11:45:25.2798748Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2799328Z FAILED [0.2564s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2799332Z 2025-12-04T11:45:25.2799403Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2799664Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2799668Z 2025-12-04T11:45:25.2799755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2799831Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2799903Z ================== 1 failed, 104 deselected, 2 rerun in 2.17s ================== 2025-12-04T11:45:25.2799940Z Got exit code 1 2025-12-04T11:45:25.2799988Z Retrying single test... 2025-12-04T11:45:25.2800133Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d25d4ee5e1c77e12.xml 2025-12-04T11:45:25.2800190Z ============================= test session starts ============================== 2025-12-04T11:45:25.2800299Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2800340Z cachedir: .pytest_cache 2025-12-04T11:45:25.2800505Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2800552Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2800591Z configfile: pytest.ini 2025-12-04T11:45:25.2800752Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2800826Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2801077Z stepcurrent: skipping 104 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2801121Z Running 1 items in this shard 2025-12-04T11:45:25.2801123Z 2025-12-04T11:45:25.2801335Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7112s] [100%] 2025-12-04T11:45:25.2801554Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3629s] [100%] 2025-12-04T11:45:25.2801740Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3306s] [100%] 2025-12-04T11:45:25.2801742Z 2025-12-04T11:45:25.2801792Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2801932Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2801988Z Traceback (most recent call last): 2025-12-04T11:45:25.2802146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2802187Z method(*args, **kwargs) 2025-12-04T11:45:25.2802340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2802392Z method(*args, **kwargs) 2025-12-04T11:45:25.2802543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2802581Z with policy(): 2025-12-04T11:45:25.2802733Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2802775Z raise RuntimeError(msg) 2025-12-04T11:45:25.2803158Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2803162Z 2025-12-04T11:45:25.2803235Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2803529Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2803532Z 2025-12-04T11:45:25.2803632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2803707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2803749Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2803805Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2803871Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2803969Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2804006Z graph_break [] 2025-12-04T11:45:25.2804064Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2804203Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2804250Z Traceback (most recent call last): 2025-12-04T11:45:25.2804402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2804444Z method(*args, **kwargs) 2025-12-04T11:45:25.2804593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2804633Z method(*args, **kwargs) 2025-12-04T11:45:25.2804781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2804820Z with policy(): 2025-12-04T11:45:25.2804972Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2805013Z raise RuntimeError(msg) 2025-12-04T11:45:25.2805396Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2805415Z 2025-12-04T11:45:25.2805490Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2805743Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2805745Z 2025-12-04T11:45:25.2805847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2805920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2805966Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2806020Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2806100Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2806198Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2806234Z graph_break [] 2025-12-04T11:45:25.2806295Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2806368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2806409Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2806464Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2806560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2806627Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2806662Z graph_break [] 2025-12-04T11:45:25.2806719Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2806771Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2806911Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2806957Z Traceback (most recent call last): 2025-12-04T11:45:25.2807125Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2807164Z method(*args, **kwargs) 2025-12-04T11:45:25.2807318Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2807358Z method(*args, **kwargs) 2025-12-04T11:45:25.2807506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2807544Z with policy(): 2025-12-04T11:45:25.2807695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2807737Z raise RuntimeError(msg) 2025-12-04T11:45:25.2808120Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2808122Z 2025-12-04T11:45:25.2808195Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2808450Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2808453Z 2025-12-04T11:45:25.2808539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2808612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2808654Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2808722Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2808788Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2808885Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2808921Z graph_break [] 2025-12-04T11:45:25.2808979Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2809052Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2809092Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2809147Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2809254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2809319Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2809354Z graph_break [] 2025-12-04T11:45:25.2809412Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2809497Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2809540Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2809594Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2809691Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2809754Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2809792Z graph_break [] 2025-12-04T11:45:25.2809848Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2810040Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d25d4ee5e1c77e12.xml - 2025-12-04T11:45:25.2810100Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2810687Z FAILED [0.3306s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2810690Z 2025-12-04T11:45:25.2810764Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2811016Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2811020Z 2025-12-04T11:45:25.2811106Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2811166Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2811234Z ================== 1 failed, 187 deselected, 2 rerun in 2.42s ================== 2025-12-04T11:45:25.2811272Z Got exit code 1 2025-12-04T11:45:25.2811314Z Retrying single test... 2025-12-04T11:45:25.2811460Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0256144eaf477fa.xml 2025-12-04T11:45:25.2811519Z ============================= test session starts ============================== 2025-12-04T11:45:25.2811627Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2811669Z cachedir: .pytest_cache 2025-12-04T11:45:25.2811825Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2811871Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2811911Z configfile: pytest.ini 2025-12-04T11:45:25.2812072Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2812165Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2812418Z stepcurrent: skipping 104 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2812462Z Running 1 items in this shard 2025-12-04T11:45:25.2812466Z 2025-12-04T11:45:25.2812771Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6209s] [100%] 2025-12-04T11:45:25.2812980Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2733s] [100%] 2025-12-04T11:45:25.2813176Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2274s] [100%] 2025-12-04T11:45:25.2813179Z 2025-12-04T11:45:25.2813245Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2813438Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2813485Z Traceback (most recent call last): 2025-12-04T11:45:25.2813643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2813683Z method(*args, **kwargs) 2025-12-04T11:45:25.2813837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2813878Z method(*args, **kwargs) 2025-12-04T11:45:25.2814029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2814067Z with policy(): 2025-12-04T11:45:25.2814221Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2814266Z raise RuntimeError(msg) 2025-12-04T11:45:25.2814664Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:25.2814668Z 2025-12-04T11:45:25.2814741Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2814996Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2814999Z 2025-12-04T11:45:25.2815086Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2815159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2815204Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2815261Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2815327Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2815426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2815462Z graph_break [] 2025-12-04T11:45:25.2815521Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2815659Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2815706Z Traceback (most recent call last): 2025-12-04T11:45:25.2815859Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2815899Z method(*args, **kwargs) 2025-12-04T11:45:25.2816050Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2816107Z method(*args, **kwargs) 2025-12-04T11:45:25.2816261Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2816298Z with policy(): 2025-12-04T11:45:25.2816450Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2816493Z raise RuntimeError(msg) 2025-12-04T11:45:25.2816877Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:25.2816891Z 2025-12-04T11:45:25.2816964Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2817230Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2817234Z 2025-12-04T11:45:25.2817321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2817396Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2817437Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2817495Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2817561Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2817660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2817695Z graph_break [] 2025-12-04T11:45:25.2817755Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2817827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2817871Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2817925Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2818031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2818098Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2818138Z graph_break [] 2025-12-04T11:45:25.2818195Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2818248Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2818385Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2818432Z Traceback (most recent call last): 2025-12-04T11:45:25.2818586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2818629Z method(*args, **kwargs) 2025-12-04T11:45:25.2818781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2818820Z method(*args, **kwargs) 2025-12-04T11:45:25.2818971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2819010Z with policy(): 2025-12-04T11:45:25.2819162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2819204Z raise RuntimeError(msg) 2025-12-04T11:45:25.2819592Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2819611Z 2025-12-04T11:45:25.2819685Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2819938Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2819941Z 2025-12-04T11:45:25.2820027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2820101Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2820142Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2820198Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2820273Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2820372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2820407Z graph_break [] 2025-12-04T11:45:25.2820466Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2820549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2820592Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2820646Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2820743Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2820807Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2820844Z graph_break [] 2025-12-04T11:45:25.2820901Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2820973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2821016Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2821070Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2821166Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2821232Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2821270Z graph_break [] 2025-12-04T11:45:25.2821332Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:25.2821535Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0256144eaf477fa.xml - 2025-12-04T11:45:25.2821595Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2822166Z FAILED [0.2274s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:25.2822169Z 2025-12-04T11:45:25.2822245Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2822501Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2822503Z 2025-12-04T11:45:25.2822592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2822653Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2822721Z ================== 1 failed, 187 deselected, 2 rerun in 2.14s ================== 2025-12-04T11:45:25.2822759Z Got exit code 1 2025-12-04T11:45:25.2822962Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2823091Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2823291Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-597f46f2badf0815.xml 2025-12-04T11:45:25.2823352Z ============================= test session starts ============================== 2025-12-04T11:45:25.2823487Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2823528Z cachedir: .pytest_cache 2025-12-04T11:45:25.2823687Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2823732Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2823794Z configfile: pytest.ini 2025-12-04T11:45:25.2823955Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2824030Z collecting ... collected 188 items / 105 deselected / 83 selected 2025-12-04T11:45:25.2824085Z stepcurrent: skipping 105 already run items. 2025-12-04T11:45:25.2824140Z Running 83 items in this shard 2025-12-04T11:45:25.2824143Z 2025-12-04T11:45:25.2824355Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8242s] [ 1%] 2025-12-04T11:45:25.2824561Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4303s] [ 1%] 2025-12-04T11:45:25.2824744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5192s] [ 1%] 2025-12-04T11:45:25.2824747Z 2025-12-04T11:45:25.2824797Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2824939Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2824984Z Traceback (most recent call last): 2025-12-04T11:45:25.2825145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2825185Z method(*args, **kwargs) 2025-12-04T11:45:25.2825359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2825400Z method(*args, **kwargs) 2025-12-04T11:45:25.2825553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2825589Z with policy(): 2025-12-04T11:45:25.2825742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2825792Z raise RuntimeError(msg) 2025-12-04T11:45:25.2826168Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:25.2826171Z 2025-12-04T11:45:25.2826245Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2826499Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2826501Z 2025-12-04T11:45:25.2826588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2826662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2826705Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2826760Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2827251Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2827363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2827400Z graph_break [] 2025-12-04T11:45:25.2827460Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2827533Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2828035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2828094Z current_size = base.storage().size() 2025-12-04T11:45:25.2828136Z Autotune Choices Stats: 2025-12-04T11:45:25.2828505Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.2828559Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2828606Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2828731Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2828969Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2829012Z _scaled_mm 0.0253 ms 24.3% 2025-12-04T11:45:25.2829139Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0777 seconds precompiling for 2 choices 2025-12-04T11:45:25.2829287Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2829333Z Traceback (most recent call last): 2025-12-04T11:45:25.2829488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2829529Z method(*args, **kwargs) 2025-12-04T11:45:25.2829682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2829723Z method(*args, **kwargs) 2025-12-04T11:45:25.2829874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2829914Z with policy(): 2025-12-04T11:45:25.2830069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2830110Z raise RuntimeError(msg) 2025-12-04T11:45:25.2830492Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:25.2830495Z 2025-12-04T11:45:25.2830569Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2830821Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2830824Z 2025-12-04T11:45:25.2830922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2830995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2831038Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2831096Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2831584Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2831696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2831732Z graph_break [] 2025-12-04T11:45:25.2831792Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2831877Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2832365Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2832413Z current_size = base.storage().size() 2025-12-04T11:45:25.2832454Z Autotune Choices Stats: 2025-12-04T11:45:25.2832819Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.2832873Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2832920Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2833042Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2833327Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2833370Z _scaled_mm 0.0253 ms 24.3% 2025-12-04T11:45:25.2833496Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0777 seconds precompiling for 2 choices 2025-12-04T11:45:25.2833571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2833613Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2833669Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2833767Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2834249Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2834286Z graph_break [] 2025-12-04T11:45:25.2834345Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2834420Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2834459Z Autotune Choices Stats: 2025-12-04T11:45:25.2834818Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2834886Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2834935Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2835055Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2835287Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2835342Z _scaled_mm 0.0243 ms 25.2% 2025-12-04T11:45:25.2835469Z SingleProcess AUTOTUNE benchmarking takes 0.0118 seconds and 0.0671 seconds precompiling for 2 choices 2025-12-04T11:45:25.2835522Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2835675Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2835720Z Traceback (most recent call last): 2025-12-04T11:45:25.2835879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2835919Z method(*args, **kwargs) 2025-12-04T11:45:25.2836072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2836111Z method(*args, **kwargs) 2025-12-04T11:45:25.2836264Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2836301Z with policy(): 2025-12-04T11:45:25.2836456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2836497Z raise RuntimeError(msg) 2025-12-04T11:45:25.2836901Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:25.2836904Z 2025-12-04T11:45:25.2836978Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2837233Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2837237Z 2025-12-04T11:45:25.2837325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2837397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2837439Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2837496Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2837982Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2838080Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2838117Z graph_break [] 2025-12-04T11:45:25.2838176Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2838249Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2838732Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2838793Z current_size = base.storage().size() 2025-12-04T11:45:25.2838834Z Autotune Choices Stats: 2025-12-04T11:45:25.2839196Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:25.2839258Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2839304Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2839426Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2839678Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2839721Z _scaled_mm 0.0253 ms 24.3% 2025-12-04T11:45:25.2839846Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0777 seconds precompiling for 2 choices 2025-12-04T11:45:25.2839920Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2839961Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2840018Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2840116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2840594Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2840633Z graph_break [] 2025-12-04T11:45:25.2840702Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2840777Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2840817Z Autotune Choices Stats: 2025-12-04T11:45:25.2841172Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2841223Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2841270Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2841393Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2841623Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2841663Z _scaled_mm 0.0243 ms 25.2% 2025-12-04T11:45:25.2841790Z SingleProcess AUTOTUNE benchmarking takes 0.0118 seconds and 0.0671 seconds precompiling for 2 choices 2025-12-04T11:45:25.2841862Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2841905Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2841960Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2842059Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2842539Z inductor [('triton_bundler_save_kernel', 16), ('async_compile_cache_miss', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2842587Z graph_break [] 2025-12-04T11:45:25.2842645Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2842718Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2842758Z Autotune Choices Stats: 2025-12-04T11:45:25.2843125Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2843186Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2843233Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2843392Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2843619Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2843660Z _scaled_mm 0.0201 ms 30.5% 2025-12-04T11:45:25.2843785Z SingleProcess AUTOTUNE benchmarking takes 0.0139 seconds and 0.1736 seconds precompiling for 2 choices 2025-12-04T11:45:25.2843981Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-597f46f2badf0815.xml - 2025-12-04T11:45:25.2844040Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2844626Z FAILED [0.5192s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:25.2844629Z 2025-12-04T11:45:25.2844703Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2844955Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2844958Z 2025-12-04T11:45:25.2845044Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2845109Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2845176Z ================== 1 failed, 105 deselected, 2 rerun in 2.79s ================== 2025-12-04T11:45:25.2845214Z Got exit code 1 2025-12-04T11:45:25.2845254Z Retrying single test... 2025-12-04T11:45:25.2845399Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ae73e6afb2fcafff.xml 2025-12-04T11:45:25.2845456Z ============================= test session starts ============================== 2025-12-04T11:45:25.2845565Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2845607Z cachedir: .pytest_cache 2025-12-04T11:45:25.2845766Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2845813Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2845855Z configfile: pytest.ini 2025-12-04T11:45:25.2846032Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2846105Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2846357Z stepcurrent: skipping 105 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2846399Z Running 1 items in this shard 2025-12-04T11:45:25.2846401Z 2025-12-04T11:45:25.2846608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7703s] [100%] 2025-12-04T11:45:25.2846831Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4233s] [100%] 2025-12-04T11:45:25.2847026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.3138s] [100%] 2025-12-04T11:45:25.2847029Z 2025-12-04T11:45:25.2847080Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2847219Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2847266Z Traceback (most recent call last): 2025-12-04T11:45:25.2847423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2847463Z method(*args, **kwargs) 2025-12-04T11:45:25.2847618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2847659Z method(*args, **kwargs) 2025-12-04T11:45:25.2847811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2847849Z with policy(): 2025-12-04T11:45:25.2848005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2848046Z raise RuntimeError(msg) 2025-12-04T11:45:25.2848435Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:25.2848437Z 2025-12-04T11:45:25.2848513Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2848766Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2848769Z 2025-12-04T11:45:25.2848858Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2848932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2848973Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2849030Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2849514Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2849615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2849651Z graph_break [] 2025-12-04T11:45:25.2849710Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2849795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2850282Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2850328Z current_size = base.storage().size() 2025-12-04T11:45:25.2850369Z Autotune Choices Stats: 2025-12-04T11:45:25.2850736Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2850802Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2850861Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2850983Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2851217Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2851258Z _scaled_mm 0.0225 ms 27.0% 2025-12-04T11:45:25.2851386Z SingleProcess AUTOTUNE benchmarking takes 0.0127 seconds and 0.0769 seconds precompiling for 2 choices 2025-12-04T11:45:25.2851524Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2851570Z Traceback (most recent call last): 2025-12-04T11:45:25.2851723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2851766Z method(*args, **kwargs) 2025-12-04T11:45:25.2851918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2851970Z method(*args, **kwargs) 2025-12-04T11:45:25.2852124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2852160Z with policy(): 2025-12-04T11:45:25.2852315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2852357Z raise RuntimeError(msg) 2025-12-04T11:45:25.2852736Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:25.2852741Z 2025-12-04T11:45:25.2852814Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2853071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2853074Z 2025-12-04T11:45:25.2853160Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2853233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2853315Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2853372Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2853853Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2853971Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2854008Z graph_break [] 2025-12-04T11:45:25.2854067Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2854140Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2854627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2854693Z current_size = base.storage().size() 2025-12-04T11:45:25.2854735Z Autotune Choices Stats: 2025-12-04T11:45:25.2855110Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2855162Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2855209Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2855330Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2855563Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2855603Z _scaled_mm 0.0225 ms 27.0% 2025-12-04T11:45:25.2855732Z SingleProcess AUTOTUNE benchmarking takes 0.0127 seconds and 0.0769 seconds precompiling for 2 choices 2025-12-04T11:45:25.2855805Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2855860Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2855917Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2856016Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2856495Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2856533Z graph_break [] 2025-12-04T11:45:25.2856592Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2856666Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2856706Z Autotune Choices Stats: 2025-12-04T11:45:25.2857062Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:25.2857112Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2857160Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2857280Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2857513Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2857565Z _scaled_mm 0.0084 ms 74.3% 2025-12-04T11:45:25.2857690Z SingleProcess AUTOTUNE benchmarking takes 0.0115 seconds and 0.0687 seconds precompiling for 2 choices 2025-12-04T11:45:25.2857744Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2857880Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2857930Z Traceback (most recent call last): 2025-12-04T11:45:25.2858085Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2858136Z method(*args, **kwargs) 2025-12-04T11:45:25.2858288Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2858330Z method(*args, **kwargs) 2025-12-04T11:45:25.2858489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2858527Z with policy(): 2025-12-04T11:45:25.2858680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2858723Z raise RuntimeError(msg) 2025-12-04T11:45:25.2859106Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1153433600. 2025-12-04T11:45:25.2859110Z 2025-12-04T11:45:25.2859183Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2859435Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2859440Z 2025-12-04T11:45:25.2859526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2859612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2859653Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2859711Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2860195Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2860295Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2860333Z graph_break [] 2025-12-04T11:45:25.2860392Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2860465Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2860947Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2860994Z current_size = base.storage().size() 2025-12-04T11:45:25.2861035Z Autotune Choices Stats: 2025-12-04T11:45:25.2861398Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2861461Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2861510Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2861635Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2861868Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2861919Z _scaled_mm 0.0225 ms 27.0% 2025-12-04T11:45:25.2862045Z SingleProcess AUTOTUNE benchmarking takes 0.0127 seconds and 0.0769 seconds precompiling for 2 choices 2025-12-04T11:45:25.2862118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2862161Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2862229Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2862328Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2862807Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2862845Z graph_break [] 2025-12-04T11:45:25.2862902Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2862977Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2863016Z Autotune Choices Stats: 2025-12-04T11:45:25.2863427Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:25.2863478Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2863526Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2863644Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2863877Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2863919Z _scaled_mm 0.0084 ms 74.3% 2025-12-04T11:45:25.2864044Z SingleProcess AUTOTUNE benchmarking takes 0.0115 seconds and 0.0687 seconds precompiling for 2 choices 2025-12-04T11:45:25.2864120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2864162Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2864218Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2864317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2864733Z inductor [('triton_bundler_save_kernel', 8), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('extern_calls', 1)] 2025-12-04T11:45:25.2864770Z graph_break [] 2025-12-04T11:45:25.2864828Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2864904Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2864959Z Autotune Choices Stats: 2025-12-04T11:45:25.2865425Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "_scaled_mm", "best_time": 0.0061599998734891415, "best_triton_pos": 1, "best_triton_time": 0.0061599998734891415, "best_triton_kernel": "triton_mm_2", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1"} 2025-12-04T11:45:25.2865477Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2865523Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2865656Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2865698Z _scaled_mm 0.0062 ms 100.0% 2025-12-04T11:45:25.2865940Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2866067Z SingleProcess AUTOTUNE benchmarking takes 0.0118 seconds and 0.0643 seconds precompiling for 2 choices 2025-12-04T11:45:25.2866261Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ae73e6afb2fcafff.xml - 2025-12-04T11:45:25.2866324Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2866892Z FAILED [0.3138s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1153433600. 2025-12-04T11:45:25.2866896Z 2025-12-04T11:45:25.2866974Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2867240Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2867242Z 2025-12-04T11:45:25.2867330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2867393Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2867462Z ================== 1 failed, 187 deselected, 2 rerun in 2.53s ================== 2025-12-04T11:45:25.2867499Z Got exit code 1 2025-12-04T11:45:25.2867540Z Retrying single test... 2025-12-04T11:45:25.2867685Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d93563b4420ff45d.xml 2025-12-04T11:45:25.2867741Z ============================= test session starts ============================== 2025-12-04T11:45:25.2867852Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2867894Z cachedir: .pytest_cache 2025-12-04T11:45:25.2868054Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2868099Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2868140Z configfile: pytest.ini 2025-12-04T11:45:25.2868300Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2868374Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2868623Z stepcurrent: skipping 105 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2868665Z Running 1 items in this shard 2025-12-04T11:45:25.2868685Z 2025-12-04T11:45:25.2868896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7986s] [100%] 2025-12-04T11:45:25.2869104Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4240s] [100%] 2025-12-04T11:45:25.2869287Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6060s] [100%] 2025-12-04T11:45:25.2869290Z 2025-12-04T11:45:25.2869341Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2869490Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2869536Z Traceback (most recent call last): 2025-12-04T11:45:25.2869694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2869748Z method(*args, **kwargs) 2025-12-04T11:45:25.2869902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2869943Z method(*args, **kwargs) 2025-12-04T11:45:25.2870095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2870134Z with policy(): 2025-12-04T11:45:25.2870287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2870331Z raise RuntimeError(msg) 2025-12-04T11:45:25.2870712Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1017118720. 2025-12-04T11:45:25.2870716Z 2025-12-04T11:45:25.2870789Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2871051Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2871053Z 2025-12-04T11:45:25.2871140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2871214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2871257Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2871313Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2871796Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2871897Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2871933Z graph_break [] 2025-12-04T11:45:25.2871993Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2872066Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2872551Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2872599Z current_size = base.storage().size() 2025-12-04T11:45:25.2872652Z Autotune Choices Stats: 2025-12-04T11:45:25.2873017Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2873067Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2873115Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2873235Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2873514Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2873556Z _scaled_mm 0.0231 ms 26.7% 2025-12-04T11:45:25.2873698Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0753 seconds precompiling for 2 choices 2025-12-04T11:45:25.2873838Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2873884Z Traceback (most recent call last): 2025-12-04T11:45:25.2874039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2874079Z method(*args, **kwargs) 2025-12-04T11:45:25.2874231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2874278Z method(*args, **kwargs) 2025-12-04T11:45:25.2874428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2874465Z with policy(): 2025-12-04T11:45:25.2874619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2874661Z raise RuntimeError(msg) 2025-12-04T11:45:25.2875057Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1017118720 and is now 1046478848. 2025-12-04T11:45:25.2875060Z 2025-12-04T11:45:25.2875133Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2875387Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2875389Z 2025-12-04T11:45:25.2875476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2875551Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2875593Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2875649Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2876132Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2876233Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2876269Z graph_break [] 2025-12-04T11:45:25.2876328Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2876401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2876899Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2876946Z current_size = base.storage().size() 2025-12-04T11:45:25.2876986Z Autotune Choices Stats: 2025-12-04T11:45:25.2877349Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2877413Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2877463Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2877594Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2877829Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2877869Z _scaled_mm 0.0231 ms 26.7% 2025-12-04T11:45:25.2877997Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0753 seconds precompiling for 2 choices 2025-12-04T11:45:25.2878070Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2878113Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2878169Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2878269Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2878757Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2878796Z graph_break [] 2025-12-04T11:45:25.2878854Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2878930Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2878973Z Autotune Choices Stats: 2025-12-04T11:45:25.2879327Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2879379Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2879425Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2879545Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2879775Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2879816Z _scaled_mm 0.0211 ms 29.0% 2025-12-04T11:45:25.2879943Z SingleProcess AUTOTUNE benchmarking takes 0.0125 seconds and 0.0666 seconds precompiling for 2 choices 2025-12-04T11:45:25.2879996Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2880133Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2880191Z Traceback (most recent call last): 2025-12-04T11:45:25.2880345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2880387Z method(*args, **kwargs) 2025-12-04T11:45:25.2880540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2880582Z method(*args, **kwargs) 2025-12-04T11:45:25.2880732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2880785Z with policy(): 2025-12-04T11:45:25.2880938Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2880980Z raise RuntimeError(msg) 2025-12-04T11:45:25.2881374Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:25.2881377Z 2025-12-04T11:45:25.2881451Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2881708Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2881710Z 2025-12-04T11:45:25.2881796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2881871Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2881913Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2881970Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2882469Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2882572Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2882607Z graph_break [] 2025-12-04T11:45:25.2882667Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2882740Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2883226Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2883304Z current_size = base.storage().size() 2025-12-04T11:45:25.2883344Z Autotune Choices Stats: 2025-12-04T11:45:25.2883707Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2883757Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2883807Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2883927Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2884162Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2884217Z _scaled_mm 0.0231 ms 26.7% 2025-12-04T11:45:25.2884343Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0753 seconds precompiling for 2 choices 2025-12-04T11:45:25.2884416Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2884458Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2884514Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2884614Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2885122Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2885162Z graph_break [] 2025-12-04T11:45:25.2885223Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2885296Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2885336Z Autotune Choices Stats: 2025-12-04T11:45:25.2885693Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2885744Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2885791Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2885910Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2886154Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2886196Z _scaled_mm 0.0211 ms 29.0% 2025-12-04T11:45:25.2886323Z SingleProcess AUTOTUNE benchmarking takes 0.0125 seconds and 0.0666 seconds precompiling for 2 choices 2025-12-04T11:45:25.2886397Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2886439Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2886496Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2886595Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2887074Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2887113Z graph_break [] 2025-12-04T11:45:25.2887173Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:25.2887248Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2887289Z Autotune Choices Stats: 2025-12-04T11:45:25.2887645Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2887695Z AUTOTUNE scaled_mm(3x32, 32x16, 3x1, 1x16, 16) 2025-12-04T11:45:25.2887756Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2887876Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2888109Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2888150Z _scaled_mm 0.0235 ms 25.9% 2025-12-04T11:45:25.2888275Z SingleProcess AUTOTUNE benchmarking takes 0.0129 seconds and 0.0635 seconds precompiling for 2 choices 2025-12-04T11:45:25.2888474Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d93563b4420ff45d.xml - 2025-12-04T11:45:25.2888538Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2889119Z FAILED [0.6060s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1046478848 and is now 1075838976. 2025-12-04T11:45:25.2889125Z 2025-12-04T11:45:25.2889199Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2889458Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2889461Z 2025-12-04T11:45:25.2889547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2889611Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2889680Z ================== 1 failed, 187 deselected, 2 rerun in 2.85s ================== 2025-12-04T11:45:25.2889724Z Got exit code 1 2025-12-04T11:45:25.2889939Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:25.2890067Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2890209Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a2d737f77601caa.xml 2025-12-04T11:45:25.2890265Z ============================= test session starts ============================== 2025-12-04T11:45:25.2890376Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2890417Z cachedir: .pytest_cache 2025-12-04T11:45:25.2890575Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2890623Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2890662Z configfile: pytest.ini 2025-12-04T11:45:25.2890825Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2890901Z collecting ... collected 188 items / 106 deselected / 82 selected 2025-12-04T11:45:25.2890957Z stepcurrent: skipping 106 already run items. 2025-12-04T11:45:25.2891000Z Running 82 items in this shard 2025-12-04T11:45:25.2891002Z 2025-12-04T11:45:25.2891216Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0015s] [ 1%] 2025-12-04T11:45:25.2891426Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7657s] [ 1%] 2025-12-04T11:45:25.2891613Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6387s] [ 1%] 2025-12-04T11:45:25.2891627Z 2025-12-04T11:45:25.2891681Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2891820Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2891867Z Traceback (most recent call last): 2025-12-04T11:45:25.2892024Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2892067Z method(*args, **kwargs) 2025-12-04T11:45:25.2892232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2892274Z method(*args, **kwargs) 2025-12-04T11:45:25.2892425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2892474Z with policy(): 2025-12-04T11:45:25.2892627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2892671Z raise RuntimeError(msg) 2025-12-04T11:45:25.2893056Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:25.2893058Z 2025-12-04T11:45:25.2893134Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2893437Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2893442Z 2025-12-04T11:45:25.2893529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2893602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2893645Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2893716Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2894199Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2894301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2894336Z graph_break [] 2025-12-04T11:45:25.2894398Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2894476Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2894965Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2895012Z current_size = base.storage().size() 2025-12-04T11:45:25.2895054Z Autotune Choices Stats: 2025-12-04T11:45:25.2895423Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006198999937623739, "best_triton_pos": 0} 2025-12-04T11:45:25.2895499Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2895547Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2895671Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2895907Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2896135Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2896374Z triton_mm_1 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2896619Z triton_mm_2 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2896843Z triton_mm_3 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2897066Z triton_mm_5 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2897294Z triton_mm_0 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2897532Z triton_mm_6 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2897576Z _scaled_mm 0.0252 ms 24.6% 2025-12-04T11:45:25.2897707Z SingleProcess AUTOTUNE benchmarking takes 0.0398 seconds and 0.1968 seconds precompiling for 9 choices 2025-12-04T11:45:25.2897846Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2897895Z Traceback (most recent call last): 2025-12-04T11:45:25.2898052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2898093Z method(*args, **kwargs) 2025-12-04T11:45:25.2898245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2898289Z method(*args, **kwargs) 2025-12-04T11:45:25.2898439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2898477Z with policy(): 2025-12-04T11:45:25.2898630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2898674Z raise RuntimeError(msg) 2025-12-04T11:45:25.2899060Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:25.2899065Z 2025-12-04T11:45:25.2899138Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2899405Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2899408Z 2025-12-04T11:45:25.2899496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2899570Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2899612Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2899669Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2900153Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2900273Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2900310Z graph_break [] 2025-12-04T11:45:25.2900371Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2900445Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2900934Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2900981Z current_size = base.storage().size() 2025-12-04T11:45:25.2901024Z Autotune Choices Stats: 2025-12-04T11:45:25.2901390Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006198999937623739, "best_triton_pos": 0} 2025-12-04T11:45:25.2901459Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2901510Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2901628Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2901861Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2902089Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2902319Z triton_mm_1 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2902542Z triton_mm_2 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2902767Z triton_mm_3 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2902996Z triton_mm_5 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2903229Z triton_mm_0 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2903487Z triton_mm_6 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2903531Z _scaled_mm 0.0252 ms 24.6% 2025-12-04T11:45:25.2903658Z SingleProcess AUTOTUNE benchmarking takes 0.0398 seconds and 0.1968 seconds precompiling for 9 choices 2025-12-04T11:45:25.2903747Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2903790Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2903848Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2903962Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2904443Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2904481Z graph_break [] 2025-12-04T11:45:25.2904542Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2904616Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2904658Z Autotune Choices Stats: 2025-12-04T11:45:25.2905017Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.2905077Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2905140Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2905261Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2905491Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2905723Z triton_mm_8 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2905948Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2906175Z triton_mm_13 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2906401Z triton_mm_14 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2906630Z triton_mm_11 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2906855Z triton_mm_15 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2907094Z triton_mm_12 0.0067 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2907137Z _scaled_mm 0.0235 ms 25.6% 2025-12-04T11:45:25.2907264Z SingleProcess AUTOTUNE benchmarking takes 0.0389 seconds and 0.1251 seconds precompiling for 9 choices 2025-12-04T11:45:25.2907332Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2907470Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2907516Z Traceback (most recent call last): 2025-12-04T11:45:25.2907682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2907727Z method(*args, **kwargs) 2025-12-04T11:45:25.2907880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2907927Z method(*args, **kwargs) 2025-12-04T11:45:25.2908078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2908117Z with policy(): 2025-12-04T11:45:25.2908272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2908315Z raise RuntimeError(msg) 2025-12-04T11:45:25.2908703Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2908706Z 2025-12-04T11:45:25.2908780Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2909047Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2909049Z 2025-12-04T11:45:25.2909137Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2909211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2909254Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2909311Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2909793Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2909894Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2909930Z graph_break [] 2025-12-04T11:45:25.2909991Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2910063Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2910545Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2910605Z current_size = base.storage().size() 2025-12-04T11:45:25.2910647Z Autotune Choices Stats: 2025-12-04T11:45:25.2911012Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006198999937623739, "best_triton_pos": 0} 2025-12-04T11:45:25.2911070Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2911119Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2911255Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2911488Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2911725Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2911953Z triton_mm_1 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2912178Z triton_mm_2 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2912405Z triton_mm_3 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2912630Z triton_mm_5 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2912861Z triton_mm_0 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2913085Z triton_mm_6 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2913127Z _scaled_mm 0.0252 ms 24.6% 2025-12-04T11:45:25.2913288Z SingleProcess AUTOTUNE benchmarking takes 0.0398 seconds and 0.1968 seconds precompiling for 9 choices 2025-12-04T11:45:25.2913364Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2913408Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2913467Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2913570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2914051Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2914088Z graph_break [] 2025-12-04T11:45:25.2914149Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2914221Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2914279Z Autotune Choices Stats: 2025-12-04T11:45:25.2914636Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:25.2914696Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2914744Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2914865Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2915106Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2915346Z triton_mm_8 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2915574Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2915799Z triton_mm_13 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2916025Z triton_mm_14 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2916253Z triton_mm_11 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2916491Z triton_mm_15 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2916716Z triton_mm_12 0.0067 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2916759Z _scaled_mm 0.0235 ms 25.6% 2025-12-04T11:45:25.2916890Z SingleProcess AUTOTUNE benchmarking takes 0.0389 seconds and 0.1251 seconds precompiling for 9 choices 2025-12-04T11:45:25.2918903Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2918947Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2919005Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2919105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2919586Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2919626Z graph_break [] 2025-12-04T11:45:25.2919686Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2919761Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2919801Z Autotune Choices Stats: 2025-12-04T11:45:25.2920185Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:25.2920243Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2920291Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2920410Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2920644Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2920893Z triton_mm_20 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2921124Z triton_mm_19 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2921350Z triton_mm_22 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2921575Z triton_mm_16 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2921801Z triton_mm_18 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2922035Z triton_mm_23 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2922077Z _scaled_mm 0.0066 ms 93.3% 2025-12-04T11:45:25.2922302Z triton_mm_21 0.0066 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2922433Z SingleProcess AUTOTUNE benchmarking takes 0.0531 seconds and 0.2183 seconds precompiling for 9 choices 2025-12-04T11:45:25.2922626Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2a2d737f77601caa.xml - 2025-12-04T11:45:25.2922689Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2923303Z FAILED [0.6387s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2923306Z 2025-12-04T11:45:25.2923380Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2923639Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2923641Z 2025-12-04T11:45:25.2923729Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2923810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2923878Z ================== 1 failed, 106 deselected, 2 rerun in 3.42s ================== 2025-12-04T11:45:25.2923916Z Got exit code 1 2025-12-04T11:45:25.2923956Z Retrying single test... 2025-12-04T11:45:25.2924101Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-633ee898b87c8bd0.xml 2025-12-04T11:45:25.2924158Z ============================= test session starts ============================== 2025-12-04T11:45:25.2924270Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2924324Z cachedir: .pytest_cache 2025-12-04T11:45:25.2924484Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2924530Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2924571Z configfile: pytest.ini 2025-12-04T11:45:25.2924749Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2924825Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2925077Z stepcurrent: skipping 106 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2925120Z Running 1 items in this shard 2025-12-04T11:45:25.2925122Z 2025-12-04T11:45:25.2925332Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1086s] [100%] 2025-12-04T11:45:25.2925540Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8341s] [100%] 2025-12-04T11:45:25.2925728Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7559s] [100%] 2025-12-04T11:45:25.2925730Z 2025-12-04T11:45:25.2925793Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2925934Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2925979Z Traceback (most recent call last): 2025-12-04T11:45:25.2926140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2926182Z method(*args, **kwargs) 2025-12-04T11:45:25.2926336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2926375Z method(*args, **kwargs) 2025-12-04T11:45:25.2926528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2926567Z with policy(): 2025-12-04T11:45:25.2926723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2926764Z raise RuntimeError(msg) 2025-12-04T11:45:25.2927146Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:25.2927150Z 2025-12-04T11:45:25.2927225Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2927480Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2927499Z 2025-12-04T11:45:25.2927588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2927662Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2927707Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2927763Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2928244Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2928355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2928391Z graph_break [] 2025-12-04T11:45:25.2928463Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2928537Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2929025Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2929071Z current_size = base.storage().size() 2025-12-04T11:45:25.2929113Z Autotune Choices Stats: 2025-12-04T11:45:25.2929481Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2929540Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2929588Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2929720Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2929952Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2930178Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2930404Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2930633Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2930858Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2931080Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2931307Z triton_mm_4 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2931543Z triton_mm_0 0.0066 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2931586Z _scaled_mm 0.0237 ms 24.8% 2025-12-04T11:45:25.2931715Z SingleProcess AUTOTUNE benchmarking takes 0.0416 seconds and 0.1983 seconds precompiling for 9 choices 2025-12-04T11:45:25.2931856Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2931913Z Traceback (most recent call last): 2025-12-04T11:45:25.2932072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2932113Z method(*args, **kwargs) 2025-12-04T11:45:25.2932276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2932319Z method(*args, **kwargs) 2025-12-04T11:45:25.2932470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2932507Z with policy(): 2025-12-04T11:45:25.2932661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2932703Z raise RuntimeError(msg) 2025-12-04T11:45:25.2933089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:25.2933092Z 2025-12-04T11:45:25.2933166Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2933451Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2933467Z 2025-12-04T11:45:25.2933556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2933629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2933672Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2933727Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2934208Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2934309Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2934345Z graph_break [] 2025-12-04T11:45:25.2934407Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2934481Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2934970Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2935018Z current_size = base.storage().size() 2025-12-04T11:45:25.2935059Z Autotune Choices Stats: 2025-12-04T11:45:25.2935424Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2935503Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2935550Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2935671Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2935903Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2936158Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2936387Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2936613Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2936835Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2937059Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2937293Z triton_mm_4 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2937519Z triton_mm_0 0.0066 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2937559Z _scaled_mm 0.0237 ms 24.8% 2025-12-04T11:45:25.2937688Z SingleProcess AUTOTUNE benchmarking takes 0.0416 seconds and 0.1983 seconds precompiling for 9 choices 2025-12-04T11:45:25.2937763Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2937805Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2937860Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2937963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2938440Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2938478Z graph_break [] 2025-12-04T11:45:25.2938536Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2938612Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2938652Z Autotune Choices Stats: 2025-12-04T11:45:25.2939016Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2939084Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2939133Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2939253Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2939485Z triton_mm_15 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2939726Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2939962Z triton_mm_12 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2940188Z triton_mm_14 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2940414Z triton_mm_8 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2940640Z triton_mm_9 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2940866Z triton_mm_13 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2941112Z triton_mm_10 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2941154Z _scaled_mm 0.0252 ms 24.3% 2025-12-04T11:45:25.2941281Z SingleProcess AUTOTUNE benchmarking takes 0.0432 seconds and 0.1789 seconds precompiling for 9 choices 2025-12-04T11:45:25.2941338Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2941477Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2941524Z Traceback (most recent call last): 2025-12-04T11:45:25.2941684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2941725Z method(*args, **kwargs) 2025-12-04T11:45:25.2941878Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2941918Z method(*args, **kwargs) 2025-12-04T11:45:25.2942069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2942106Z with policy(): 2025-12-04T11:45:25.2942258Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2942302Z raise RuntimeError(msg) 2025-12-04T11:45:25.2942688Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2942701Z 2025-12-04T11:45:25.2942775Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2943033Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2943035Z 2025-12-04T11:45:25.2943122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2943206Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2943276Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2943334Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2943830Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2943931Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2943967Z graph_break [] 2025-12-04T11:45:25.2944027Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2944101Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2944584Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2944633Z current_size = base.storage().size() 2025-12-04T11:45:25.2944674Z Autotune Choices Stats: 2025-12-04T11:45:25.2945048Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:25.2945107Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2945157Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2945276Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2945509Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2945733Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2945958Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2946183Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2946407Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2946644Z triton_mm_7 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2946867Z triton_mm_4 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2947090Z triton_mm_0 0.0066 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2947144Z _scaled_mm 0.0237 ms 24.8% 2025-12-04T11:45:25.2947273Z SingleProcess AUTOTUNE benchmarking takes 0.0416 seconds and 0.1983 seconds precompiling for 9 choices 2025-12-04T11:45:25.2947357Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2947400Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2947455Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2947558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2948039Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2948078Z graph_break [] 2025-12-04T11:45:25.2948137Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2948210Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2948252Z Autotune Choices Stats: 2025-12-04T11:45:25.2948620Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2948678Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2948725Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2948844Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2949078Z triton_mm_15 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2949310Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2949540Z triton_mm_12 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2949766Z triton_mm_14 0.0065 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2949992Z triton_mm_8 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2950226Z triton_mm_9 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2950451Z triton_mm_13 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2950672Z triton_mm_10 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2950724Z _scaled_mm 0.0252 ms 24.3% 2025-12-04T11:45:25.2950851Z SingleProcess AUTOTUNE benchmarking takes 0.0432 seconds and 0.1789 seconds precompiling for 9 choices 2025-12-04T11:45:25.2950925Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2950983Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2951039Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2951140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2951620Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2951658Z graph_break [] 2025-12-04T11:45:25.2951717Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2951791Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2951832Z Autotune Choices Stats: 2025-12-04T11:45:25.2952206Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:25.2952263Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2952311Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2952429Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2952663Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2952891Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2953116Z triton_mm_22 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2953370Z triton_mm_23 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2953595Z triton_mm_21 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2953823Z triton_mm_20 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2954062Z triton_mm_18 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2954288Z triton_mm_19 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2954344Z _scaled_mm 0.0240 ms 25.4% 2025-12-04T11:45:25.2954471Z SingleProcess AUTOTUNE benchmarking takes 0.0559 seconds and 0.2200 seconds precompiling for 9 choices 2025-12-04T11:45:25.2954661Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-633ee898b87c8bd0.xml - 2025-12-04T11:45:25.2954747Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2955320Z FAILED [0.7559s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2955324Z 2025-12-04T11:45:25.2955397Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2955655Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2955658Z 2025-12-04T11:45:25.2955748Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2955811Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2955890Z ================== 1 failed, 187 deselected, 2 rerun in 3.72s ================== 2025-12-04T11:45:25.2955928Z Got exit code 1 2025-12-04T11:45:25.2955968Z Retrying single test... 2025-12-04T11:45:25.2956113Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ea647e249ec60b4.xml 2025-12-04T11:45:25.2956169Z ============================= test session starts ============================== 2025-12-04T11:45:25.2956283Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2956323Z cachedir: .pytest_cache 2025-12-04T11:45:25.2956483Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2956529Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2956571Z configfile: pytest.ini 2025-12-04T11:45:25.2956731Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2956806Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.2957056Z stepcurrent: skipping 106 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2957099Z Running 1 items in this shard 2025-12-04T11:45:25.2957102Z 2025-12-04T11:45:25.2957314Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0987s] [100%] 2025-12-04T11:45:25.2957523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8623s] [100%] 2025-12-04T11:45:25.2957719Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7886s] [100%] 2025-12-04T11:45:25.2957722Z 2025-12-04T11:45:25.2957772Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.2957912Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2957957Z Traceback (most recent call last): 2025-12-04T11:45:25.2958115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2958171Z method(*args, **kwargs) 2025-12-04T11:45:25.2958325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2958364Z method(*args, **kwargs) 2025-12-04T11:45:25.2958528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2958565Z with policy(): 2025-12-04T11:45:25.2958722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2958762Z raise RuntimeError(msg) 2025-12-04T11:45:25.2959145Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1031798784. 2025-12-04T11:45:25.2959148Z 2025-12-04T11:45:25.2959222Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2959481Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2959484Z 2025-12-04T11:45:25.2959572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2959654Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2959697Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2959753Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2960235Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2960335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2960373Z graph_break [] 2025-12-04T11:45:25.2960434Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2960507Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2960994Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2961043Z current_size = base.storage().size() 2025-12-04T11:45:25.2961083Z Autotune Choices Stats: 2025-12-04T11:45:25.2961451Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:25.2961522Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2961570Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2961691Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2961921Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2962161Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2962399Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2962624Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2962852Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2963077Z triton_mm_6 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2963326Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2963563Z triton_mm_3 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2963605Z _scaled_mm 0.0254 ms 23.5% 2025-12-04T11:45:25.2963731Z SingleProcess AUTOTUNE benchmarking takes 0.0412 seconds and 0.1890 seconds precompiling for 9 choices 2025-12-04T11:45:25.2963871Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2963917Z Traceback (most recent call last): 2025-12-04T11:45:25.2964074Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2964116Z method(*args, **kwargs) 2025-12-04T11:45:25.2964270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2964310Z method(*args, **kwargs) 2025-12-04T11:45:25.2964463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2964499Z with policy(): 2025-12-04T11:45:25.2964654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2964694Z raise RuntimeError(msg) 2025-12-04T11:45:25.2965082Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1031798784 and is now 1075838976. 2025-12-04T11:45:25.2965099Z 2025-12-04T11:45:25.2965174Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2965432Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2965434Z 2025-12-04T11:45:25.2965521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2965594Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2965637Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2965693Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2966202Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2966302Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2966340Z graph_break [] 2025-12-04T11:45:25.2966401Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2966476Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2966964Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2967012Z current_size = base.storage().size() 2025-12-04T11:45:25.2967053Z Autotune Choices Stats: 2025-12-04T11:45:25.2967425Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:25.2967484Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2967532Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2967652Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2967885Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2968118Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2968345Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2968570Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2968798Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2969019Z triton_mm_6 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2969254Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2969479Z triton_mm_3 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2969540Z _scaled_mm 0.0254 ms 23.5% 2025-12-04T11:45:25.2969668Z SingleProcess AUTOTUNE benchmarking takes 0.0412 seconds and 0.1890 seconds precompiling for 9 choices 2025-12-04T11:45:25.2969744Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2969786Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2969852Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2969951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2970434Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2970471Z graph_break [] 2025-12-04T11:45:25.2970532Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2970605Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2970644Z Autotune Choices Stats: 2025-12-04T11:45:25.2971004Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_14", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2971073Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2971122Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2971240Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2971474Z triton_mm_14 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2971700Z triton_mm_10 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2971928Z triton_mm_13 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2972151Z triton_mm_15 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2972374Z triton_mm_9 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2972602Z triton_mm_11 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2972838Z triton_mm_8 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2973069Z triton_mm_12 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2973109Z _scaled_mm 0.0253 ms 23.4% 2025-12-04T11:45:25.2973277Z SingleProcess AUTOTUNE benchmarking takes 0.0480 seconds and 0.1965 seconds precompiling for 9 choices 2025-12-04T11:45:25.2973333Z =================================== FAILURES =================================== 2025-12-04T11:45:25.2973471Z _ TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.2973518Z Traceback (most recent call last): 2025-12-04T11:45:25.2973692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2973733Z method(*args, **kwargs) 2025-12-04T11:45:25.2973886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.2973927Z method(*args, **kwargs) 2025-12-04T11:45:25.2974078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.2974118Z with policy(): 2025-12-04T11:45:25.2974269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.2974310Z raise RuntimeError(msg) 2025-12-04T11:45:25.2974699Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2974702Z 2025-12-04T11:45:25.2974788Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2975044Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2975047Z 2025-12-04T11:45:25.2975135Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2975209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2975251Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2975307Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2975787Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2975887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2975923Z graph_break [] 2025-12-04T11:45:25.2975982Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2976060Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2976546Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.2976606Z current_size = base.storage().size() 2025-12-04T11:45:25.2976647Z Autotune Choices Stats: 2025-12-04T11:45:25.2977007Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:25.2977065Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2977126Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2977246Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2977486Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2977717Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2977941Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2978165Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2978388Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2978620Z triton_mm_6 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2978842Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2979068Z triton_mm_3 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2979108Z _scaled_mm 0.0254 ms 23.5% 2025-12-04T11:45:25.2979237Z SingleProcess AUTOTUNE benchmarking takes 0.0412 seconds and 0.1890 seconds precompiling for 9 choices 2025-12-04T11:45:25.2979310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2979353Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2979410Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2979511Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2979987Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2980025Z graph_break [] 2025-12-04T11:45:25.2980084Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2980173Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2980213Z Autotune Choices Stats: 2025-12-04T11:45:25.2980568Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_14", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:25.2980625Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2980674Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2980806Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2981039Z triton_mm_14 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2981282Z triton_mm_10 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2981506Z triton_mm_13 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2981731Z triton_mm_15 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2981956Z triton_mm_9 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2982199Z triton_mm_11 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2982425Z triton_mm_8 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2982648Z triton_mm_12 0.0064 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2982690Z _scaled_mm 0.0253 ms 23.4% 2025-12-04T11:45:25.2982816Z SingleProcess AUTOTUNE benchmarking takes 0.0480 seconds and 0.1965 seconds precompiling for 9 choices 2025-12-04T11:45:25.2982893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.2982934Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.2982990Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.2983089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.2983599Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.2983636Z graph_break [] 2025-12-04T11:45:25.2983696Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:25.2983769Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.2983833Z Autotune Choices Stats: 2025-12-04T11:45:25.2984192Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_23", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:25.2984249Z AUTOTUNE scaled_mm(3x32, 32x2048, 3x1, 1x2048, 2048) 2025-12-04T11:45:25.2984296Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.2984415Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.2984662Z triton_mm_23 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:25.2984900Z triton_mm_18 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2985128Z triton_mm_19 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2985350Z triton_mm_21 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2985577Z triton_mm_16 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2985803Z triton_mm_20 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.2986039Z triton_mm_22 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.2986265Z triton_mm_17 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:25.2986306Z _scaled_mm 0.0232 ms 26.4% 2025-12-04T11:45:25.2986433Z SingleProcess AUTOTUNE benchmarking takes 0.0561 seconds and 0.2245 seconds precompiling for 9 choices 2025-12-04T11:45:25.2986626Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5ea647e249ec60b4.xml - 2025-12-04T11:45:25.2986688Z =========================== short test summary info ============================ 2025-12-04T11:45:25.2987273Z FAILED [0.7886s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1119879168. 2025-12-04T11:45:25.2987277Z 2025-12-04T11:45:25.2987349Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.2987605Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2987620Z 2025-12-04T11:45:25.2987706Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.2987770Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.2987838Z ================== 1 failed, 187 deselected, 2 rerun in 3.77s ================== 2025-12-04T11:45:25.2987877Z Got exit code 1 2025-12-04T11:45:25.2988081Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:25.2988210Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.2988365Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-217f6d4f3c35cddb.xml 2025-12-04T11:45:25.2988423Z ============================= test session starts ============================== 2025-12-04T11:45:25.2988546Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.2988588Z cachedir: .pytest_cache 2025-12-04T11:45:25.2988747Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.2988793Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.2988832Z configfile: pytest.ini 2025-12-04T11:45:25.2988996Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.2989073Z collecting ... collected 188 items / 107 deselected / 81 selected 2025-12-04T11:45:25.2989129Z stepcurrent: skipping 107 already run items. 2025-12-04T11:45:25.2989172Z Running 81 items in this shard 2025-12-04T11:45:25.2989174Z 2025-12-04T11:45:25.2990126Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/pu/cpu64jj755szakmtfwan4p6lov5qwr65wfh2yepzoev4sjsympr4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.2990278Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.2990499Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.2990657Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.2990805Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.2991096Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.2991229Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.2991489Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.2991628Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.2991899Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.2992055Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.2992328Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.2992474Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.2992757Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.2992958Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.2993299Z E1204 11:24:50.586000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2994027Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/37/c37l5xtfj3iejjzqrk74mc6tliccgvyh7zrpktzfy7w6fivklgw4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.2994195Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.2994408Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.2994563Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.2994707Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.2994994Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.2995126Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.2995381Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.2995518Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.2995773Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.2995928Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.2996216Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.2996349Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.2996623Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.2996828Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.2997156Z E1204 11:24:50.600000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.2997882Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/xz/cxzsgupkkdchhe2o7g6tw4rosfcr2wjbwuy5aljgkzxb3z5zljvj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.2998028Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.2998241Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.2998395Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.2998555Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.2998837Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.2998968Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.2999223Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.2999360Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.2999613Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.2999768Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3000035Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3000168Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3000455Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3000647Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3000958Z E1204 11:24:50.603000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3001708Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/gx/cgxv6fwedwobotfiurfhod6psvmjxbxyeuwx7iregy3tt3f6eb3g.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3001853Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3002065Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3002218Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3002362Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3002645Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3002789Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3003043Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3003178Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3003456Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3003613Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3003881Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3004014Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3004375Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3004568Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3004900Z E1204 11:24:50.609000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3005622Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/sb/csbdvdci747xlsc7qpkk2n6z3jxmwermhzfwcsechytolof4kutv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3005781Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3006010Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3006164Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3006307Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3006590Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3006720Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3006977Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3007125Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3007376Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3007529Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3007801Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3007934Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3008209Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3008401Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3008711Z E1204 11:24:50.612000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3009435Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpwui_kksc/ya/cyaco5a2vq6ry54kx4loatbvtbxa5zxx7hnqlbqaadinvz23hzi2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3009590Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3009814Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3009967Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3010122Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3010406Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3010535Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3010788Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3010923Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3011176Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3011340Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3011610Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3011743Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3012017Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3012210Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3012522Z E1204 11:24:50.615000 932968 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3012573Z ('RERUN', {'yellow': True}) [3.2083s] [ 1%] 2025-12-04T11:45:25.3012911Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:24:52.434000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3013206Z E1204 11:24:52.434000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3013378Z E1204 11:24:52.434000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3013523Z E1204 11:24:52.437000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3013816Z E1204 11:24:52.437000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3013958Z E1204 11:24:52.437000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3014099Z E1204 11:24:52.439000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3014408Z E1204 11:24:52.439000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3014537Z E1204 11:24:52.439000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3014679Z E1204 11:24:52.495000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3014969Z E1204 11:24:52.495000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3015096Z E1204 11:24:52.495000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3015237Z E1204 11:24:52.497000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3015541Z E1204 11:24:52.497000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3015666Z E1204 11:24:52.497000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3015808Z E1204 11:24:52.499000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3016098Z E1204 11:24:52.499000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3016223Z E1204 11:24:52.499000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3016273Z ('RERUN', {'yellow': True}) [1.5477s] [ 1%] 2025-12-04T11:45:25.3016610Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:24:53.795000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3016901Z E1204 11:24:53.795000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3017026Z E1204 11:24:53.795000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3017168Z E1204 11:24:53.797000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3017471Z E1204 11:24:53.797000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3017597Z E1204 11:24:53.797000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3017739Z E1204 11:24:53.799000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3018029Z E1204 11:24:53.799000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3018163Z E1204 11:24:53.799000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3018316Z E1204 11:24:53.839000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3018612Z E1204 11:24:53.839000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3018736Z E1204 11:24:53.839000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3018877Z E1204 11:24:53.841000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3019172Z E1204 11:24:53.841000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3019296Z E1204 11:24:53.841000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3019438Z E1204 11:24:53.843000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3019740Z E1204 11:24:53.843000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.3019863Z E1204 11:24:53.843000 932968 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3019903Z FAILED [1.3596s] [ 1%] 2025-12-04T11:45:25.3019907Z 2025-12-04T11:45:25.3019960Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.3020121Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3020167Z Traceback (most recent call last): 2025-12-04T11:45:25.3020328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3020371Z method(*args, **kwargs) 2025-12-04T11:45:25.3020526Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3020566Z method(*args, **kwargs) 2025-12-04T11:45:25.3020718Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3020755Z with policy(): 2025-12-04T11:45:25.3020910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3020952Z raise RuntimeError(msg) 2025-12-04T11:45:25.3021372Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.3021384Z 2025-12-04T11:45:25.3021464Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3021740Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3021742Z 2025-12-04T11:45:25.3021830Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3021916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3021961Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3022018Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3022590Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3022692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3022729Z graph_break [] 2025-12-04T11:45:25.3022795Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3022869Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3023394Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3023442Z current_size = base.storage().size() 2025-12-04T11:45:25.3023497Z Autotune Choices Stats: 2025-12-04T11:45:25.3023873Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00791999977082014, "best_triton_pos": 0} 2025-12-04T11:45:25.3023938Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3023986Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3024089Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3024329Z triton_mm_34 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3024374Z _scaled_mm 0.0096 ms 82.9% 2025-12-04T11:45:25.3024608Z triton_mm_33 0.0096 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3024839Z triton_mm_29 0.0108 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3025069Z triton_mm_21 0.0109 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3025311Z triton_mm_16 0.0109 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3025541Z triton_mm_22 0.0110 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3025767Z triton_mm_30 0.0112 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3026012Z triton_mm_23 0.0116 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3026253Z triton_mm_15 0.0116 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3026386Z SingleProcess AUTOTUNE benchmarking takes 0.1432 seconds and 0.9894 seconds precompiling for 33 choices 2025-12-04T11:45:25.3026548Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3026593Z Traceback (most recent call last): 2025-12-04T11:45:25.3026750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3026791Z method(*args, **kwargs) 2025-12-04T11:45:25.3026943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3026984Z method(*args, **kwargs) 2025-12-04T11:45:25.3027136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3027172Z with policy(): 2025-12-04T11:45:25.3027335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3027376Z raise RuntimeError(msg) 2025-12-04T11:45:25.3027786Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.3027790Z 2025-12-04T11:45:25.3027863Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3028138Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3028140Z 2025-12-04T11:45:25.3028229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3028304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3028347Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3028406Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3028958Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3029071Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3029111Z graph_break [] 2025-12-04T11:45:25.3029175Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3029250Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3029735Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3029801Z current_size = base.storage().size() 2025-12-04T11:45:25.3029843Z Autotune Choices Stats: 2025-12-04T11:45:25.3030230Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00791999977082014, "best_triton_pos": 0} 2025-12-04T11:45:25.3030296Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3030343Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3030443Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3030679Z triton_mm_34 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3030724Z _scaled_mm 0.0096 ms 82.9% 2025-12-04T11:45:25.3030957Z triton_mm_33 0.0096 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3031199Z triton_mm_29 0.0108 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3031426Z triton_mm_21 0.0109 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3031657Z triton_mm_16 0.0109 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3031887Z triton_mm_22 0.0110 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3032116Z triton_mm_30 0.0112 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3032346Z triton_mm_23 0.0116 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3032574Z triton_mm_15 0.0116 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3032706Z SingleProcess AUTOTUNE benchmarking takes 0.1432 seconds and 0.9894 seconds precompiling for 33 choices 2025-12-04T11:45:25.3032791Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3032834Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3032890Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3032992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3033510Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3033561Z graph_break [] 2025-12-04T11:45:25.3033624Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3033697Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3033739Z Autotune Choices Stats: 2025-12-04T11:45:25.3034119Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008599000051617622, "best_triton_pos": 0} 2025-12-04T11:45:25.3034180Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3034227Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3034327Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3034561Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3034793Z triton_mm_71 0.0092 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3034835Z _scaled_mm 0.0093 ms 92.3% 2025-12-04T11:45:25.3035079Z triton_mm_54 0.0108 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3035307Z triton_mm_67 0.0108 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3035534Z triton_mm_60 0.0109 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3035763Z triton_mm_59 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3035990Z triton_mm_68 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3036220Z triton_mm_53 0.0117 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3036451Z triton_mm_61 0.0117 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3036596Z SingleProcess AUTOTUNE benchmarking takes 0.2415 seconds and 0.7354 seconds precompiling for 39 choices 2025-12-04T11:45:25.3036648Z =================================== FAILURES =================================== 2025-12-04T11:45:25.3036808Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3036856Z Traceback (most recent call last): 2025-12-04T11:45:25.3037011Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3037064Z method(*args, **kwargs) 2025-12-04T11:45:25.3037215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3037255Z method(*args, **kwargs) 2025-12-04T11:45:25.3037406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3037454Z with policy(): 2025-12-04T11:45:25.3037607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3037648Z raise RuntimeError(msg) 2025-12-04T11:45:25.3038056Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.3038059Z 2025-12-04T11:45:25.3038133Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3038413Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3038419Z 2025-12-04T11:45:25.3038507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3038582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3038633Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3038692Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3039239Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3039341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3039378Z graph_break [] 2025-12-04T11:45:25.3039445Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3039519Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3040004Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3040052Z current_size = base.storage().size() 2025-12-04T11:45:25.3040094Z Autotune Choices Stats: 2025-12-04T11:45:25.3040470Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00791999977082014, "best_triton_pos": 0} 2025-12-04T11:45:25.3040548Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3040596Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3040694Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3040937Z triton_mm_34 0.0079 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3040993Z _scaled_mm 0.0096 ms 82.9% 2025-12-04T11:45:25.3041227Z triton_mm_33 0.0096 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3041468Z triton_mm_29 0.0108 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3041697Z triton_mm_21 0.0109 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3041927Z triton_mm_16 0.0109 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3042160Z triton_mm_22 0.0110 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3042386Z triton_mm_30 0.0112 ms 71.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3042625Z triton_mm_23 0.0116 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3042855Z triton_mm_15 0.0116 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3042984Z SingleProcess AUTOTUNE benchmarking takes 0.1432 seconds and 0.9894 seconds precompiling for 33 choices 2025-12-04T11:45:25.3043058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3043101Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3043159Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3043291Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3043784Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3043823Z graph_break [] 2025-12-04T11:45:25.3043885Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3043958Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3043997Z Autotune Choices Stats: 2025-12-04T11:45:25.3044366Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008599000051617622, "best_triton_pos": 0} 2025-12-04T11:45:25.3044445Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3044492Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3044588Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3044823Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3045080Z triton_mm_71 0.0092 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3045123Z _scaled_mm 0.0093 ms 92.3% 2025-12-04T11:45:25.3045351Z triton_mm_54 0.0108 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3045578Z triton_mm_67 0.0108 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3045805Z triton_mm_60 0.0109 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3046031Z triton_mm_59 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3046271Z triton_mm_68 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3046499Z triton_mm_53 0.0117 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3046727Z triton_mm_61 0.0117 ms 73.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3046856Z SingleProcess AUTOTUNE benchmarking takes 0.2415 seconds and 0.7354 seconds precompiling for 39 choices 2025-12-04T11:45:25.3046932Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3046974Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3047031Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3047130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3047616Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3047654Z graph_break [] 2025-12-04T11:45:25.3047716Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3047802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3047842Z Autotune Choices Stats: 2025-12-04T11:45:25.3048210Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00860000029206276, "best_triton_pos": 0} 2025-12-04T11:45:25.3048268Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3048315Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3048423Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3048669Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3048915Z triton_mm_109 0.0092 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3048956Z _scaled_mm 0.0094 ms 91.9% 2025-12-04T11:45:25.3049183Z triton_mm_92 0.0102 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3049411Z triton_mm_105 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3049642Z triton_mm_98 0.0111 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3049881Z triton_mm_106 0.0111 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3050112Z triton_mm_97 0.0112 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3050340Z triton_mm_99 0.0116 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3050570Z triton_mm_91 0.0116 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3050700Z SingleProcess AUTOTUNE benchmarking takes 0.2426 seconds and 0.5921 seconds precompiling for 39 choices 2025-12-04T11:45:25.3050892Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-217f6d4f3c35cddb.xml - 2025-12-04T11:45:25.3050953Z =========================== short test summary info ============================ 2025-12-04T11:45:25.3051576Z FAILED [1.3596s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.3051590Z 2025-12-04T11:45:25.3051664Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3051943Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3051945Z 2025-12-04T11:45:25.3052032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3052095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.3052173Z ================== 1 failed, 107 deselected, 2 rerun in 6.13s ================== 2025-12-04T11:45:25.3052210Z Got exit code 1 2025-12-04T11:45:25.3052249Z Retrying single test... 2025-12-04T11:45:25.3052395Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2d80fc955dd00804.xml 2025-12-04T11:45:25.3052462Z ============================= test session starts ============================== 2025-12-04T11:45:25.3052574Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.3052614Z cachedir: .pytest_cache 2025-12-04T11:45:25.3052774Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.3052819Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.3052861Z configfile: pytest.ini 2025-12-04T11:45:25.3053023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.3053099Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.3053411Z stepcurrent: skipping 107 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3053457Z Running 1 items in this shard 2025-12-04T11:45:25.3053459Z 2025-12-04T11:45:25.3053827Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:03.401038470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3053830Z 2025-12-04T11:45:25.3054143Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3054442Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3054575Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3055063Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3055318Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3055551Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3055777Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3055977Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3056272Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3056521Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3056827Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3057061Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3057352Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3057589Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3057881Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3058131Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3058421Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3058653Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3058944Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3059180Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3059471Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3059668Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3059903Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3060203Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3060401Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3060632Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3060933Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3061174Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3061466Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3061687Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3061891Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3062093Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3062306Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3062484Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3062665Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3063194Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/xz/cxzsgupkkdchhe2o7g6tw4rosfcr2wjbwuy5aljgkzxb3z5zljvj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3063380Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3063598Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3063756Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3063901Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3064190Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3064323Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3064596Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3064734Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3064987Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3065155Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3065434Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3065572Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3065849Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3066041Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3066355Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3066650Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3066795Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3067273Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3067530Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3067759Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3067964Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3068166Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3068459Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3068695Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3068999Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3069232Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3069533Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3069774Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3070065Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3070295Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3070585Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3070818Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3071119Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3071350Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3071643Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3071841Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3072072Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3072363Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3072559Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3072789Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3073084Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3073365Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3073662Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3073896Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3074120Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3074322Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3074532Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3074698Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3074877Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3074978Z E1204 11:25:03.465000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3075136Z [W1204 11:25:03.793979367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3075140Z 2025-12-04T11:45:25.3075462Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3075756Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3075888Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3076364Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3076615Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3076842Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3077046Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3077246Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3077560Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3077793Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3078085Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3078336Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3078629Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3078858Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3079148Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3079382Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3079686Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3079917Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3080207Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3080441Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3080735Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3080931Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3081161Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3081454Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3081651Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3081892Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3082182Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3082413Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3082722Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3082944Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3083149Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3083377Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3083591Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3083758Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3083936Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3084477Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/pu/cpu64jj755szakmtfwan4p6lov5qwr65wfh2yepzoev4sjsympr4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3084624Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3084838Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3084995Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3085141Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3085430Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3085564Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3085819Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3085975Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3086231Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3086386Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3086653Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3086800Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3087089Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3087284Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3087602Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3087898Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3088028Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3088514Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3088766Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3088994Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3089201Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3089406Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3089696Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3089930Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3090221Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3090466Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3090757Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3090986Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3091308Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3091546Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3091837Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3092066Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3092358Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3092593Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3092893Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3093089Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3093353Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3093648Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3093848Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3094079Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3094372Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3094603Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3094907Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3095127Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3095332Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3095549Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3095773Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3095941Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3096117Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3096218Z E1204 11:25:03.527000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3096376Z [W1204 11:25:03.804352587 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3096378Z 2025-12-04T11:45:25.3096529Z [W1204 11:25:03.805357402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3096533Z 2025-12-04T11:45:25.3096685Z [W1204 11:25:03.806606254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3096687Z 2025-12-04T11:45:25.3096851Z [W1204 11:25:03.807377893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3096854Z 2025-12-04T11:45:25.3097167Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3097461Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3097593Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3098069Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3098322Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3098548Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3098763Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3098964Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3099258Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3099502Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3099802Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3100038Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3100330Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3100562Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3100854Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3101101Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3101393Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3101623Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3101914Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3102147Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3102438Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3102635Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3102866Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3103172Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3103393Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3103626Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3103935Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3104180Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3104475Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3104693Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3104899Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3105100Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3105314Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3105495Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3105674Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3106204Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/37/c37l5xtfj3iejjzqrk74mc6tliccgvyh7zrpktzfy7w6fivklgw4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3106353Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3106571Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3106724Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3106869Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3107157Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3107289Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3107565Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3107702Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3107958Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3108122Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3108402Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3108537Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3108813Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3109006Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3109319Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3109613Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3109753Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3110230Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3110484Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3110712Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3110918Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3111120Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3111413Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3111646Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3111950Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3112183Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3112484Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3112724Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3113019Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3113281Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3113572Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3113804Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3114109Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3114338Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3114629Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3114825Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3115061Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3115355Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3115549Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3115781Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3116072Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3116321Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3116610Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3116845Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3117065Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3117268Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3117480Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3117645Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3117822Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3117922Z E1204 11:25:03.538000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3118232Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3118534Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3118665Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3119147Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3119400Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3119624Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3119828Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3120029Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3120321Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3120564Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3120855Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3121100Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3121404Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3121636Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3121927Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3122158Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3122449Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3122696Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3122986Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3123220Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3123542Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3123737Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3123972Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3124263Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3124459Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3124689Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3124995Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3125227Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3125531Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3125762Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3125969Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3126169Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3126378Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3126545Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3126722Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3127270Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/ya/cyaco5a2vq6ry54kx4loatbvtbxa5zxx7hnqlbqaadinvz23hzi2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3127417Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3127632Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3127786Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3127932Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3128218Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3128348Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3128608Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3128746Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3129014Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3129173Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3129440Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3129574Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3129858Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3130062Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3130376Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3130667Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3130801Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3131296Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3131549Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3131775Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3131980Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3132181Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3132472Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3132705Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3133001Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3133234Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3133568Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3133800Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3134105Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3134346Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3134639Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3134868Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3135161Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3135396Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3135699Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3135895Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3136125Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3136417Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3136613Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3136844Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3137137Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3137368Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3137660Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3137895Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3138100Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3138300Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3138520Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3138703Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3138883Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3138986Z E1204 11:25:03.541000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3139292Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3139587Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3139717Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3140201Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3140453Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3140679Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3140887Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3141089Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3141381Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3141613Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3141906Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3142150Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3142441Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3142683Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3142986Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3143220Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3143551Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3143780Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3144071Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3144315Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3144605Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3144799Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3145034Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3145326Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3145521Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3145754Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3146045Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3146277Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3146581Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3146802Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3147014Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3147228Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3147453Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3147618Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3147795Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3148322Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/sb/csbdvdci747xlsc7qpkk2n6z3jxmwermhzfwcsechytolof4kutv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3148472Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3148698Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3148854Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3149004Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3149289Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3149423Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3149680Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3149817Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3150071Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3150226Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3150495Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3150639Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3150918Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3151110Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3151433Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3151736Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3151866Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3152345Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3152597Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3152837Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3153042Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3153242Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3153570Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3153806Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3154098Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3154328Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3154620Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3154852Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3155159Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3155390Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3155700Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3155946Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3156238Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3156473Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3156763Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3156960Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3157193Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3157495Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3157692Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3157922Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3158213Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3158446Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3158737Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3158961Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3159166Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3159378Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3159587Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3159753Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3159941Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3160043Z E1204 11:25:03.542000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3160365Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3160660Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3160796Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3161273Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3161535Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3161759Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3161963Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3162163Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3162454Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3162688Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3162980Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3163213Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3163536Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3163779Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3164068Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3164310Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3164618Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3164854Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3165146Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3165380Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3165672Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3165881Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3166112Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3166402Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3166597Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3166834Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3167128Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3167360Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3167652Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3167870Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3168087Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3168286Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3168498Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3168675Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3168865Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3169397Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmppr53hm3v/gx/cgxv6fwedwobotfiurfhod6psvmjxbxyeuwx7iregy3tt3f6eb3g.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3169543Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3169758Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3169912Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3170060Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3170355Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3170486Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3170749Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3170888Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3171144Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3171299Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3171567Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3171702Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3171976Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3172185Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3172500Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3172797Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3172936Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3173462Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3173715Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3175533Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3175743Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3175944Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3176261Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3176497Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3176796Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3177030Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3177321Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3177551Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3177843Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3178075Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3178384Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3178616Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3178905Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3179158Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3179450Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3179645Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3179876Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3180166Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3180365Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3180608Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3180897Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3181130Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3181422Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3181643Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3181847Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3182048Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3182259Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3182437Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3182617Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3182718Z E1204 11:25:03.543000 938887 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3182771Z ('RERUN', {'yellow': True}) [3.5297s] [100%] 2025-12-04T11:45:25.3183124Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:05.582384944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3183138Z 2025-12-04T11:45:25.3183310Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3183617Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3183915Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3184044Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3184526Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3184792Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3185017Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3185224Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3185424Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3185718Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3185951Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3186246Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3186480Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3186782Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3187016Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3187306Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3187549Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3187849Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3188069Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3188276Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3188473Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3188679Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3188880Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3189128Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3189419Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3189614Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3189846Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3190137Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3190357Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3190552Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3190769Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3190984Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3191179Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3191372Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3191589Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3191803Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3192009Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3192205Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3192437Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3192728Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3192960Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3193294Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3193513Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3193718Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3193913Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3194123Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3194324Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3194555Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3194844Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3195077Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3195381Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3195611Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3195901Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3196147Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3196449Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3196680Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3196969Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3197202Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3197492Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3197735Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3198026Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3198258Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3198549Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3198782Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3199071Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3199302Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3199593Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3199836Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3200129Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3200363Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3200563Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3200770Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3201061Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3201292Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3201583Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3201815Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3202121Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3202353Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3202645Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3202876Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3203168Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3203418Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3203709Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3203906Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3204118Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3204314Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3204519Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3204730Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3204977Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3205270Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3205464Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3205658Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3205855Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3206051Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3206300Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3206590Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3206822Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3207113Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3207309Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3207515Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3207716Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3207949Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3208247Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3208484Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3208686Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3208884Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3209095Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3209397Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3209631Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3209923Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3210164Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3210459Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3210699Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3210992Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3211224Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3211517Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3211716Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3211913Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3212137Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3212339Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3212548Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3212747Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3213039Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3213317Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3213625Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3213860Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3214154Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3214387Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3214678Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3214924Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3215215Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3215436Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3215639Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3215838Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3216035Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3216246Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3216446Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3216738Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3216971Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3217174Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3217369Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3217584Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3217889Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3218126Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3218418Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3218651Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3218946Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3219189Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3219483Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3219714Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3220009Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3220245Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3220540Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3220773Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3221065Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3221309Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3221600Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3221834Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3222155Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3222399Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3222692Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3222888Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3223087Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3223360Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3223667Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3223898Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3224191Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3224424Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3224717Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3224949Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3225239Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3225471Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3225777Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3227044Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3227285Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3227594Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3227827Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3228122Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3228351Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3228552Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3228751Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3228953Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3229258Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3229473Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3229676Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3229874Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3230075Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3230370Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3230590Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3230792Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3230999Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3231192Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3231386Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3231583Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3231814Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3232018Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3232215Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3232436Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3232641Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3232836Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3233057Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3233307Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3233502Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3233721Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3233927Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3234125Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3234321Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3234534Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3234735Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3234933Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3235147Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3235440Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3235675Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3235889Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3236086Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3236277Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3236474Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3236685Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3236888Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3237088Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3237289Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3237594Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3237807Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3238008Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3238206Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3238406Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3238700Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3238914Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3239115Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3239329Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3239528Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3239834Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3240038Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3240239Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3240428Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3240625Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3240840Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3241046Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3241243Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3241432Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3241624Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3241794Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3241921Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3242025Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3242151Z E1204 11:25:05.322000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3242308Z [W1204 11:25:05.591327884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3242312Z 2025-12-04T11:45:25.3242459Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3242754Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3243050Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3243180Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3243704Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3243975Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3244214Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3244424Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3244630Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3244922Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3245156Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3245448Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3245695Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3245987Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3246217Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3246511Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3246743Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3247033Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3247253Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3247457Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3247653Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3247877Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3248086Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3248316Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3248626Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3248821Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3249053Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3249344Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3249562Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3249756Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3249985Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3250191Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3250387Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3250582Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3250802Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3251007Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3251202Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3251394Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3251626Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3251915Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3252159Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3252466Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3252693Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3252898Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3253095Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3253337Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3253537Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3253772Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3254064Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3254317Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3254613Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3254843Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3255133Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3255365Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3255658Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3255888Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3256177Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3256426Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3256729Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3256959Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3257261Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3257493Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3257785Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3258014Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3258308Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3258543Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3258845Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3259076Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3259367Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3259586Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3259787Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3259983Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3260273Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3260506Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3260810Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3261051Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3261342Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3261582Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3261871Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3262101Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3262395Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3262628Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3262918Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3263125Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3263354Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3263548Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3263754Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3263952Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3264185Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3264481Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3264678Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3264872Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3265083Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3265276Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3265522Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3265825Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3266054Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3266350Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3266545Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3266752Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3266953Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3267187Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3267495Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3267715Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3267916Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3268113Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3268318Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3268611Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3268845Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3269139Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3269388Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3269694Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3269925Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3270235Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3270469Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3270763Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3270962Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3271159Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3271380Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3271594Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3271793Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3271992Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3272288Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3272523Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3272815Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3273048Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3273373Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3273604Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3273914Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3274161Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3274471Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3274691Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3274893Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3275090Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3275281Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3275490Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3275690Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3275999Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3276220Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3276423Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3276621Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3276821Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3277112Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3277347Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3277640Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3277872Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3278175Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3278418Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3278721Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3278953Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3279248Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3279480Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3279772Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3280004Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3280310Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3280544Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3280836Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3281068Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3281361Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3281594Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3281886Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3282084Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3282293Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3282528Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3282833Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3283075Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3283398Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3283632Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3283924Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3284156Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3284451Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3284697Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3284990Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3285186Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3285419Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3285711Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3285943Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3286235Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3286452Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3286654Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3286867Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3287088Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3287380Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3287606Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3287807Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3288005Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3288205Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3288502Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3288722Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3288933Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3289132Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3289324Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3289470Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3289666Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3289885Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3290092Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3290291Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3290514Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3290719Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3290922Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3291154Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3291358Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3291563Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3291782Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3291989Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3292190Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3292384Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3292598Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3292797Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3293006Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3293206Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3293535Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3293747Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3293955Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3294154Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3294345Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3294540Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3294752Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3294966Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3295163Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3295376Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3295668Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3295895Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3296099Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3296296Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3296497Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3296789Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3297001Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3297214Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3297413Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3297613Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3297910Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3298106Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3298308Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3298499Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3298693Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3298906Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3299123Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3299319Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3299525Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3299708Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3299892Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3300017Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3300122Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3300247Z E1204 11:25:05.324000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3300404Z [W1204 11:25:05.593470983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3300407Z 2025-12-04T11:45:25.3300550Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3300843Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3301140Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3301280Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3301766Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3302017Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3302244Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3302450Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3302650Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3302942Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3303175Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3303537Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3303785Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3304079Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3304324Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3304617Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3304848Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3305137Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3305357Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3305565Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3310842Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3311059Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3311256Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3311487Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3311779Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3311976Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3312206Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3312497Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3312735Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3312930Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3313159Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3313399Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3313609Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3313802Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3314020Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3314225Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3314420Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3314622Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3314853Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3315158Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3315388Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3315681Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3315902Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3316105Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3316301Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3316511Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3316710Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3316951Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3317244Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3317485Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3317786Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3318017Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3318312Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3318547Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3318838Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3319068Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3319370Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3319600Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3319890Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3320120Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3320410Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3320643Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3320934Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3321166Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3321467Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3321707Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3321996Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3322235Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3322526Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3322747Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3322948Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3323142Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3323468Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3323725Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3324016Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3324247Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3324537Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3324772Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3325061Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3325291Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3325581Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3325825Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3326129Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3326323Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3326531Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3326726Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3326935Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3327133Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3327364Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3327655Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3327849Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3328055Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3328250Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3328448Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3328681Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3328973Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3329203Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3329492Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3329689Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3329893Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3330106Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3330349Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3330644Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3330874Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3331075Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3331274Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3331473Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3331766Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3331998Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3332300Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3332535Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3332828Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3333062Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3333390Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3333626Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3333919Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3334118Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3334330Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3334552Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3334770Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3334979Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3335178Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3335470Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3335705Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3336003Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3336235Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3336532Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3336778Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3337071Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3337302Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3337596Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3337819Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3338020Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3338219Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3338411Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3338633Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3338834Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3339140Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3339380Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3339580Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3339779Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3339978Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3340271Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3340506Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3340799Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3341040Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3341334Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3341566Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3341856Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3342090Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3342383Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3342621Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3342913Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3343155Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3343492Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3343735Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3344026Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3344260Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3344558Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3344793Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3345085Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3345297Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3345493Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3345725Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3346018Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3346250Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3346547Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3346794Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3347089Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3347321Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3347626Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3347867Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3348167Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3348364Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3348599Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3348894Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3349125Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3349418Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3349632Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3349846Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3350045Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3350245Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3350539Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3350756Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3350958Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3351156Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3351356Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3351646Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3351877Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3352090Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3352287Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3352489Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3352638Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3352834Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3353056Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3353288Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3353484Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3353704Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3353923Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3354118Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3354338Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3354546Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3354741Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3354965Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3355171Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3355368Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3355563Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3355789Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3355990Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3356209Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3356412Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3356715Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3356930Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3357130Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3357327Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3357518Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3357712Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3357935Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3358138Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3358338Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3358538Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3358833Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3359046Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3359248Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3359445Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3359644Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3359949Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3360163Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3360373Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3360578Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3360778Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3361071Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3361265Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3361466Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3361655Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3361850Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3362075Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3362281Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3362477Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3362668Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3362848Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3363018Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3363145Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3363274Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3363402Z E1204 11:25:05.326000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3363558Z [W1204 11:25:05.633794608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3363560Z 2025-12-04T11:45:25.3363705Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3364014Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3364323Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3364453Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3364942Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3365196Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3365422Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3365628Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3365829Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3366124Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3366371Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3366662Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3366895Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3367186Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3367421Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3367719Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3367951Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3368243Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3368472Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3368686Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3368882Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3369100Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3369299Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3369531Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3369829Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3370023Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3370254Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3370558Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3370778Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3370973Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3371193Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3371398Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3371597Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3371794Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3372011Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3372216Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3372412Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3372622Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3372866Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3373157Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3373442Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3373737Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3373958Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3374162Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3374357Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3374563Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3374778Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3375011Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3375302Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3375537Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3375830Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3376061Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3376351Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3376581Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3376871Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3377116Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3377417Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3377659Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3377950Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3378182Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3378472Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3378703Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3378991Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3379233Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3379525Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3379755Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3380047Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3380278Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3380569Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3380786Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3380987Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3381193Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3381485Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3381728Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3382028Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3382260Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3382552Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3382784Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3383073Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3383331Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3383640Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3383873Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3384163Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3384360Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3384555Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3384751Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3384957Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3385159Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3385388Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3385697Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3385904Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3386100Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3386307Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3386500Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3386732Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3387022Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3387254Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3387548Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3387746Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3387970Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3388171Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3388404Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3388698Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3388920Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3389121Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3389321Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3389525Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3389817Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3390060Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3390363Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3390606Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3390899Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3391133Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3391426Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3391661Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3391954Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3392161Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3392359Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3392579Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3392781Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3392979Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3393179Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3393536Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3393882Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3394176Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3394425Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3394732Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3394965Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3395270Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3395505Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3395802Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3396023Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3396224Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3396423Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3396629Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3396844Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3397043Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3397335Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3397555Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3397760Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3397960Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3398159Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3398451Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3398698Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3398998Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3399230Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3399532Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3399769Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3400063Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3400294Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3400588Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3400819Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3401124Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3401356Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3401647Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3401883Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3402176Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3402409Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3402700Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3402933Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3403238Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3403485Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3403687Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3403940Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3404233Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3404466Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3404760Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3404994Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3405287Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3405534Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3405827Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3406060Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3406355Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3406552Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3406785Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3407078Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3407314Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3407620Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3407846Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3408049Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3408257Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3408460Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3408754Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3408968Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3409169Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3409367Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3409567Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3409875Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3410096Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3410297Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3410495Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3410687Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3410836Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3411030Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3411251Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3411456Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3411664Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3411895Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3412099Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3412303Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3412521Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3412731Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3412925Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3413144Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3413475Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3413675Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3413887Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3414101Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3414302Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3414499Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3414700Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3414995Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3415207Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3415408Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3415610Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3415813Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3416008Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3416234Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3416437Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3416648Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3416848Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3417141Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3417356Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3417560Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3417757Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3417956Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3418259Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3418473Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3418674Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3418871Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3419072Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3419367Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3419561Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3419762Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3419961Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3420156Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3420383Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3420587Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3420793Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3420981Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3421163Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3421333Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3421459Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3421563Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3421688Z E1204 11:25:05.367000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3421843Z [W1204 11:25:05.635911697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3421847Z 2025-12-04T11:45:25.3422000Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3422294Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3422590Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3422721Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3423205Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3423488Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3423713Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3423917Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3424130Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3424434Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3424668Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3424974Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3425207Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3425499Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3425730Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3426023Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3426254Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3426556Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3426777Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3426983Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3427183Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3427390Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3427588Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3427818Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3428111Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3428306Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3428546Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3428847Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3429066Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3429270Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3429487Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3429691Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3429888Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3430082Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3430300Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3430503Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3430709Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3430905Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3431138Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3431430Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3431662Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3431954Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3432172Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3432377Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3432580Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3432788Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3432997Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3433228Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3433564Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3433797Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3434088Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3434316Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3434608Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3434843Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3435150Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3435382Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3435672Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3435903Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3436193Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3436424Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3436713Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3436963Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3437253Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3437495Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3437796Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3438024Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3438317Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3438549Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3438839Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3439060Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3439270Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3439467Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3439756Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3439988Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3440280Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3440510Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3440801Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3441034Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3441325Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3441566Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3441869Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3442110Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3442399Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3442597Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3442791Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3442987Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3443193Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3443421Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3443668Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3443959Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3444157Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3444352Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3444548Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3444743Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3444977Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3445270Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3445501Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3445806Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3446013Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3446221Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3446432Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3446669Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3446966Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3447187Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3447389Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3447586Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3447786Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3448089Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3448321Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3448614Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3448848Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3449141Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3449372Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3449665Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3449908Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3450200Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3450410Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3450616Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3450840Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3451043Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3451242Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3451442Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3451737Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3451969Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3452277Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3452509Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3452803Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3453037Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3453377Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3453609Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3453900Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3454122Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3454337Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3454535Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3454742Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3454966Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3455164Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3455460Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3455680Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3455883Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3456082Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3456282Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3456589Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3456824Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3457117Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3457349Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3457645Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3457876Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3458168Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3458401Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3458703Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3458947Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3459238Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3459481Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3459773Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3460009Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3460305Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3460539Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3460837Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3461081Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3461375Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3461573Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3461768Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3462004Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3462295Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3462528Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3462823Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3463069Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3463404Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3463635Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3463938Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3464170Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3464462Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3464657Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3464894Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3465189Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3465434Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3465727Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3465941Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3466145Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3466344Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3466549Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3466843Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3467058Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3467274Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3467472Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3467683Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3467974Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3468210Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3468412Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3468610Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3468806Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3468954Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3469150Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3469370Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3469587Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3469782Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3470001Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3470210Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3470406Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3470630Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3470838Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3471033Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3471252Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3471470Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3471668Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3471871Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3472097Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3472297Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3472495Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3472697Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3472992Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3473206Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3473441Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3473654Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3473845Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3474043Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3474255Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3474458Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3474659Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3474859Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3475151Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3475363Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3475578Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3475776Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3475988Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3476281Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3476509Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3476712Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3476909Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3477110Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3477401Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3477597Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3477809Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3477999Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3478193Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3478411Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3478618Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3478814Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3479002Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3479181Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3479353Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3479480Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3479596Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3479720Z E1204 11:25:05.369000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3479877Z [W1204 11:25:05.638052866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3479880Z 2025-12-04T11:45:25.3480033Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3480328Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3480634Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3480763Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3481242Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3481496Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3481724Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3481939Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3482141Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3482436Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3482670Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3482962Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3483195Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3483496Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3483731Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3484041Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3484291Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3484582Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3484815Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3485022Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3485217Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3485424Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3485620Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3485854Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3486147Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3486356Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3486592Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3486882Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3487101Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3487296Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3487514Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3487716Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3487914Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3488108Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3490097Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3490324Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3490520Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3490726Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3490960Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3491256Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3491488Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3491784Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3492002Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3492218Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3492415Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3492620Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3492820Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3493050Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3493380Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3493615Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3493908Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3494139Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3494444Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3494689Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3494980Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3495223Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3495516Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3495748Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3496039Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3496270Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3496575Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3496805Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3497096Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3497328Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3497620Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3497853Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3498144Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3498376Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3498665Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3498896Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3499111Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3499306Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3499607Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3499841Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3500132Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3500364Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3500653Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3500886Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3501187Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3501419Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3501710Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3501944Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3502237Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3502433Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3502630Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3502824Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3503052Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3503274Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3503522Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3503831Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3504027Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3504224Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3504419Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3504614Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3504844Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3505135Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3505380Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3505673Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3505870Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3506076Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3506282Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3506516Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3506809Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3507031Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3507244Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3507444Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3507658Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3507951Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3508194Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3508488Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3508723Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3509015Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3509248Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3509550Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3509785Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3510076Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3510273Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3510471Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3510691Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3510893Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3511091Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3511293Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3511596Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3511830Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3512142Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3512383Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3512675Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3512908Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3513200Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3513465Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3513760Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3513998Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3514200Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3514400Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3514591Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3514801Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3515001Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3515296Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3515516Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3515718Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3515930Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3516130Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3516436Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3516681Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3516975Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3517209Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3517503Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3517739Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3518030Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3518279Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3518572Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3518804Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3519096Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3519331Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3519624Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3519857Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3520150Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3520393Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3520694Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3520926Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3521225Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3521423Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3521621Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3521859Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3522154Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3522384Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3522688Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3522920Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3523213Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3523480Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3523777Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3524012Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3524304Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3524504Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3524749Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3525053Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3525286Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3525590Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3525806Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3526014Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3526212Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3526413Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3526707Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3526933Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3527134Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3527333Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3527534Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3527830Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3528050Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3528254Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3528451Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3528645Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3528794Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3529002Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3529233Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3529439Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3529647Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3529865Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3530074Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3530268Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3530487Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3530694Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3530887Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3531120Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3531325Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3531525Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3531720Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3531934Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3532136Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3532333Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3532533Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3532825Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3533055Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3533295Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3533511Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3533721Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3533915Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3534131Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3534331Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3534529Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3534728Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3535022Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3535246Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3535450Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3535648Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3535847Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3536140Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3536352Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3536553Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3536749Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3536949Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3537256Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3537453Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3537666Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3537868Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3538066Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3538280Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3538486Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3538682Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3538872Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3539054Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3539226Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3539363Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3539467Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3539593Z E1204 11:25:05.371000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3539645Z ('RERUN', {'yellow': True}) [1.6369s] [100%] 2025-12-04T11:45:25.3540001Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:06.039974553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3540005Z 2025-12-04T11:45:25.3540149Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3540443Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3540740Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3540871Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3541356Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3541632Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3541858Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3542075Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3542273Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3542567Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3542801Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3543096Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3543436Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3543746Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3543978Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3544268Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3544501Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3544794Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3545014Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3545225Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3545422Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3545631Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3545842Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3546084Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3546373Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3546583Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3546814Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3547110Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3547331Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3547526Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3547743Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3547957Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3548154Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3548347Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3548566Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3548768Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3548967Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3549163Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3549393Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3549685Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3549925Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3550216Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3550450Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3550665Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3550862Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3551069Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3551268Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3551498Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3551790Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3552019Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3552319Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3552550Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3552844Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3553075Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3553400Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3553634Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3553926Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3554156Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3554461Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3554703Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3554997Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3555243Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3555534Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3555766Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3556056Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3556288Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3556576Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3556819Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3557112Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3557331Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3557532Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3557728Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3558018Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3558249Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3558540Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3558783Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3559086Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3559317Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3559618Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3559849Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3560139Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3560369Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3560660Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3560855Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3561061Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3561257Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3561463Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3561663Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3561894Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3562183Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3562378Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3562574Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3562767Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3562972Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3563213Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3563537Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3563780Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3564074Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3564271Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3564477Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3564678Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3564913Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3565221Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3565441Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3565643Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3565842Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3566041Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3566335Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3566568Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3566863Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3567096Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3567410Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3567657Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3567948Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3568193Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3568486Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3568685Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3568882Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3569105Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3569307Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3569517Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3569718Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3570009Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3570242Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3570534Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3570767Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3571063Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3571295Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3571587Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3571831Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3572136Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3572366Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3572568Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3572769Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3572964Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3573173Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3573393Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3573685Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3573920Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3574122Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3574320Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3574521Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3574818Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3575050Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3575343Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3575576Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3575867Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3576114Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3576415Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3576661Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3576955Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3577190Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3577483Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3577715Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3578007Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3578249Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3578542Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3578773Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3579069Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3579302Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3579593Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3579791Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3579987Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3580230Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3580522Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3580764Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3581069Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3581300Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3581593Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3581826Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3582119Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3582349Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3582655Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3582852Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3583086Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3583417Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3583649Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3583941Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3584156Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3584356Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3584554Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3584768Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3585076Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3585288Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3585503Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3585703Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3585902Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3586197Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3586416Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3586618Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3586829Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3587023Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3587172Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3587370Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3587589Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3587795Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3587991Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3588210Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3588416Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3588610Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3588841Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3589055Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3589250Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3589478Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3589682Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3589879Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3590073Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3590285Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3590486Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3590686Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3590898Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3591191Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3591403Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3591606Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3591804Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3591998Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3592195Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3592408Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3592611Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3592819Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3593017Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3593365Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3593577Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3593790Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3593988Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3594187Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3594485Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3594698Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3594899Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3595110Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3595310Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3595602Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3595798Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3595999Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3596189Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3596388Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3596600Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3596804Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3597000Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3597206Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3597395Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3597565Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3597708Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3597811Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3597938Z E1204 11:25:06.773000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3598096Z [W1204 11:25:06.042259470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3598100Z 2025-12-04T11:45:25.3598244Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3598537Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3598835Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3598966Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3599454Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3599707Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3599933Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3600142Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3600341Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3600635Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3600869Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3601159Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3601402Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3601703Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3601946Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3602242Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3602473Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3602764Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3602982Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3603188Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3603418Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3603641Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3603840Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3604072Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3604364Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3604559Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3604790Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3605080Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3605299Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3605507Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3605726Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3605945Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3606144Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3606353Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3606571Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3606775Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3606969Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3607164Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3607394Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3607684Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3607927Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3608217Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3608436Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3608640Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3608835Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3609042Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3609238Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3609470Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3609771Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3610005Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3610307Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3610546Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3610835Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3611066Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3611356Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3611586Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3611878Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3612121Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3612412Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3612643Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3612932Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3613164Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3613484Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3613715Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3614009Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3614256Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3614564Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3614794Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3615096Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3615314Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3615514Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3615712Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3616007Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3616239Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3616542Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3616773Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3617061Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3617293Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3617585Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3617814Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3618107Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3618338Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3618639Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3618843Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3619038Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3619243Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3619448Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3619649Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3619883Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3620179Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3620373Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3620568Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3620774Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3620967Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3621199Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3621490Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3621720Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3622014Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3622208Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3622416Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3622616Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3622862Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3623164Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3623414Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3623628Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3623831Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3624033Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3624325Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3624558Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3624848Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3625095Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3625387Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3625620Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3625914Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3626147Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3626440Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3626636Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3626833Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3627064Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3627266Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3627474Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3627676Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3627983Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3628216Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3628514Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3628744Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3629038Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3629270Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3629575Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3629808Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3630103Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3630324Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3630526Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3630724Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3630916Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3631126Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3631346Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3631647Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3631871Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3632083Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3632279Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3632482Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3632774Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3633006Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3633334Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3633567Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3633877Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3634111Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3634403Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3634635Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3634929Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3635161Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3635456Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3635703Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3635999Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3636244Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3636548Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3636782Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3637074Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3637308Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3637602Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3637798Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3638007Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3638240Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3638531Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3638764Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3639058Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3639290Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3639582Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3639815Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3640109Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3640355Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3640655Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3640860Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3641092Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3641385Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3641619Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3641911Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3642128Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3642330Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3642536Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3642737Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3643030Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3643242Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3643477Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3643677Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3643877Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3644171Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3644406Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3644607Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3644819Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3645009Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3645173Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3645367Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3645591Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3645798Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3645994Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3646215Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3646420Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3646634Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3646855Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3647061Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3647257Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3647480Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3647689Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3647890Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3648086Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3648299Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3648514Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3648711Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3648925Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3649220Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3649442Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3649645Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3649845Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3650038Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3650234Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3650446Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3650645Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3650854Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3651053Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3651346Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3651560Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3651760Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3651958Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3652156Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3652449Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3652670Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3652872Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3653079Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3653316Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3653627Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3653822Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3654024Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3654215Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3654412Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3654625Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3654829Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3655039Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3655230Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3655412Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3655581Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3655708Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3655811Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3655936Z E1204 11:25:06.775000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3656093Z [W1204 11:25:06.044368319 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3656096Z 2025-12-04T11:45:25.3656238Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3656531Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3656838Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3656968Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3657458Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3657723Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3657951Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3658158Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3658358Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3658649Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3658883Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3659192Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3659424Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3659714Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3659945Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3660235Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3660466Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3660761Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3660979Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3661199Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3661404Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3661610Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3661827Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3662057Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3662351Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3662547Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3662778Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3663072Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3663459Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3663655Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3663873Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3664078Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3664272Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3664466Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3664684Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3664887Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3665085Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3665280Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3665526Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3665830Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3666061Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3666365Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3666585Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3666789Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3666983Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3667192Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3667391Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3667634Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3667926Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3668232Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3668526Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3668757Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3669050Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3669283Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3669574Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3669817Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3670115Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3670346Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3670647Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3670878Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3671172Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3671403Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3671695Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3671925Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3672225Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3672456Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3672746Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3672977Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3673290Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3673510Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3673709Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3673907Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3674211Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3674445Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3674750Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3674994Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3675293Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3675525Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3675817Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3676048Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3676338Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3676582Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3676873Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3677068Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3677267Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3677464Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3677671Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3677869Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3678099Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3678388Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3678595Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3678803Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3678998Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3679204Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3679438Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3679729Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3679959Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3680250Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3680445Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3680652Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3680861Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3681098Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3681393Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3681612Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3681815Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3682014Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3682213Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3682505Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3682749Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3683055Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3683318Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3683624Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3683857Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3684150Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3684383Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3684674Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3684871Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3685083Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3685305Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3685506Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3685704Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3685902Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3686198Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3686430Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3686721Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3686957Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3687264Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3687510Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3687802Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3688042Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3688336Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3688556Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3688759Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3688959Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3689153Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3689374Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3689573Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3689866Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3690087Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3690289Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3690486Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3690686Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3690983Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3691218Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3691519Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3691762Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3692056Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3692300Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3692595Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3692828Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3693121Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3693395Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3693691Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3693941Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3694234Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3694468Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3694761Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3694994Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3695290Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3695521Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3695812Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3696032Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3696241Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3696473Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3696777Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3697013Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3697306Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3697539Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3697831Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3698062Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3698365Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3698598Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3698890Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3699088Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3699322Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3699615Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3699848Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3700139Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3700364Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3700575Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3700773Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3700986Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3701282Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3701494Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3701771Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3701969Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3702169Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3702473Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3702694Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3702897Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3703099Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3703325Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3703472Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3703669Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3703889Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3704095Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3704289Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3704525Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3704745Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3704943Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3705180Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3705385Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3705581Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3705801Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3706006Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3706204Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3706397Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3706625Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3706827Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3707026Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3707226Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3707518Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3707732Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3707935Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3708132Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3708323Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3708530Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3708746Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3708962Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3709159Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3709370Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3709664Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3709876Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3710077Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3710275Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3710475Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3710787Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3711001Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3711203Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3711401Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3711600Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3711894Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3712091Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3712293Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3712487Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3712695Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3712909Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3713125Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3713354Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3713557Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3713738Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3713910Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3714036Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3714139Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3714264Z E1204 11:25:06.777000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3714426Z [W1204 11:25:06.084372349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3714428Z 2025-12-04T11:45:25.3714571Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3714878Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3715176Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3715304Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3715787Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3716040Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3716266Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3716474Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3716671Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3716977Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3717225Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3717518Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3717758Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3718048Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3718289Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3718581Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3718812Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3719104Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3719335Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3719542Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3719738Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3719944Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3720145Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3720381Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3720672Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3720869Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3721099Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3721403Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3721632Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3721825Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3722055Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3722260Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3722458Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3722652Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3722870Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3723076Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3723306Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3723513Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3723745Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3724038Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3724269Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3724561Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3724779Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3724982Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3725177Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3725397Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3725600Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3725844Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3726141Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3726398Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3726688Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3726920Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3727209Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3727442Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3727744Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3727975Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3728267Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3728498Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3728792Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3729025Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3729316Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3729547Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3729850Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3730081Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3730385Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3730626Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3730914Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3731147Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3731438Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3731657Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3731858Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3732053Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3732361Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3732591Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3732883Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3733114Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3733435Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3733669Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3733958Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3734206Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3734505Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3734748Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3735050Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3735245Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3735449Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3735644Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3735857Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3736056Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3736286Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3736591Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3736787Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3736984Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3737178Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3737378Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3737610Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3737904Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3738136Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3738425Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3738630Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3738849Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3739056Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3739306Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3739600Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3739821Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3740024Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3740227Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3740429Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3740739Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3740971Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3741263Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3741496Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3741788Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3742022Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3742320Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3742555Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3742865Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3743065Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3743307Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3743541Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3743746Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3743948Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3744152Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3744451Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3744684Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3744976Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3745223Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3745522Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3745755Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3746051Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3746284Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3746584Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3746808Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3747011Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3747225Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3747426Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3747652Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3747864Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3748161Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3748383Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3748590Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3748788Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3748995Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3749298Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3749542Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3749837Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3750074Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3750364Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3750598Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3750890Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3751126Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3751418Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3751672Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3751977Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3752215Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3752521Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3752758Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3753056Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3753314Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3753612Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3753847Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3754153Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3754351Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3754551Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3754789Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3755092Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3755325Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3755619Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3755858Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3756167Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3756411Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3756704Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3756951Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3757256Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3757452Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3757683Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3757978Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3758215Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3758522Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3758737Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3758937Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3759145Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3759350Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3759642Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3759857Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3760060Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3760263Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3760485Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3760792Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3761011Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3761227Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3761432Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3761624Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3761772Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3761967Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3762197Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3762403Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3762614Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3762835Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3763042Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3763237Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3763491Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3763700Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3763894Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3764114Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3764320Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3764533Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3764733Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3764963Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3765181Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3765377Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3765587Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3765886Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3766103Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3766309Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3766508Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3766712Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3766914Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3767127Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3767328Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3767529Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3767732Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3768025Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3768239Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3768443Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3768651Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3768856Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3769163Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3769391Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3769591Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3769790Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3769992Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3770293Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3770490Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3770697Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3770899Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3771100Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3771314Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3771522Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3771722Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3771912Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3772100Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3772270Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3772399Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3772500Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3772629Z E1204 11:25:06.817000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3772797Z [W1204 11:25:06.086490868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3772799Z 2025-12-04T11:45:25.3772947Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3773296Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3773606Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3773735Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3774226Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3774478Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3774708Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3774915Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3775135Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3775427Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3775663Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3775956Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3776196Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3776487Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3776722Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3777018Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3777266Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3777570Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3777795Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3778011Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3778208Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3778416Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3778615Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3778847Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3779147Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3779342Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3779588Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3779882Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3780101Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3780296Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3780520Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3780725Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3780921Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3781115Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3781333Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3781554Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3781760Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3781955Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3782198Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3782489Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3782721Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3783012Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3783236Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3783475Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3783682Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3783890Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3784090Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3784322Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3784616Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3784849Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3785139Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3785370Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3785660Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3785904Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3786214Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3786454Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3786744Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3786976Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3787268Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3787498Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3787790Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3788029Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3788321Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3788552Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3788843Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3789073Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3789370Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3789600Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3789896Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3790125Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3790327Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3790534Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3790823Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3791075Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3791368Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3791603Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3791894Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3792124Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3792426Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3792658Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3792949Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3793181Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3793495Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3793691Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3793885Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3794081Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3794291Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3794506Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3794738Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3795044Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3795252Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3795447Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3795644Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3795837Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3796069Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3796359Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3796591Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3796895Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3797093Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3797305Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3797504Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3797738Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3799551Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3799778Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3799982Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3800204Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3800407Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3800711Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3800946Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3801253Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3801487Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3801780Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3802013Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3802306Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3802551Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3802851Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3803048Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3803246Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3803505Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3803709Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3803909Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3804108Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3804402Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3804650Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3804959Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3805193Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3805504Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3805739Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3806032Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3806267Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3806561Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3806784Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3806997Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3807197Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3807391Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3807601Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3807806Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3808099Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3808319Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3808522Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3808721Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3808936Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3809250Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3809483Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3809785Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3810018Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3810318Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3810549Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3810845Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3811076Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3811381Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3811615Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3811908Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3812142Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3812435Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3812667Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3812958Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3813192Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3813527Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3813774Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3814066Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3814281Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3814480Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3814712Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3815004Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3815237Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3815529Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3815774Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3816067Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3816300Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3816593Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3816828Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3817121Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3817318Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3817551Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3817857Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3818100Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3818395Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3818622Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3818826Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3819025Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3819227Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3819517Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3819732Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3819934Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3820142Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3820342Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3820635Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3820863Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3821065Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3821263Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3821454Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3821606Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3821801Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3822035Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3822253Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3822448Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3822679Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3822883Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3823082Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3823320Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3823525Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3823722Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3823941Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3824163Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3824360Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3824554Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3824767Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3824967Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3825166Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3825368Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3825662Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3825875Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3826095Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3826292Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3826499Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3826695Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3826921Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3827123Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3827320Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3827520Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3827811Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3828025Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3828236Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3828436Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3828636Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3828930Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3829146Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3829351Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3829550Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3829749Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3830042Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3830249Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3830451Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3830649Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3830842Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3831073Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3831278Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3831477Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3831667Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3831847Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3832023Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3832150Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3832254Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3832392Z E1204 11:25:06.819000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3832550Z [W1204 11:25:06.088565508 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3832553Z 2025-12-04T11:45:25.3832697Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.3832991Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3833324Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3833458Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3833941Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3834194Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3834442Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3834660Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3834859Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3835165Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3835400Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3835696Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3835930Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3836221Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3836452Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3836759Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3836990Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3837280Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3837502Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3837707Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3837906Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3838112Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3838312Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3838542Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3838846Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3839052Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3839282Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3839586Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3839804Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3840000Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3840219Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3840426Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3840622Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3840816Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3841058Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3841268Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3841466Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3841659Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3841890Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3842182Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3842414Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3842708Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3842936Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3843141Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3843389Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3843598Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3843811Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3844041Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3844334Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3844563Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3844856Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3845088Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3845394Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3845625Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3845916Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3846147Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3846438Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3846668Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3846958Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3847188Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3847491Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3847732Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3848021Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3848267Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3848562Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3848797Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3849087Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3849322Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3849622Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3849841Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3850040Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3850237Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.3850527Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3850760Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3851054Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3851288Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3851579Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3851820Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3852120Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3852351Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3852649Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3852883Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3853173Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3853399Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3853595Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3853790Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3854011Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3854210Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3854440Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3854732Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3854929Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3855123Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3855318Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3855512Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3855743Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3856049Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3856292Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3856585Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3856798Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3857010Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3857211Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3857446Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3857738Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3857962Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3858164Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3858375Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3858577Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3858877Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3859108Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3859403Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3859636Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3859930Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3860162Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3860469Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3860712Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3861002Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3861211Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3861410Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3861631Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3861837Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3862035Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3862239Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3862546Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3862780Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3863072Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3863338Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3863632Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3863864Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3864161Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3864392Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3864700Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3864921Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3865135Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3865348Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3865540Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3865750Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.3865949Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3866241Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3866461Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3866662Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3866876Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3867080Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3867374Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3867606Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3867902Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3868134Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3868427Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3868659Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3868964Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3869197Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3869499Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3869742Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3870032Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3870265Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3870556Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3870788Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3871084Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3871330Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3871622Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3871853Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3872146Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3872345Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3872540Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3872772Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3873064Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3873341Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3873652Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3873884Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3874187Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3874419Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3874711Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3874943Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3875235Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3875430Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3875679Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3875973Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3876205Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3876498Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3876712Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3876915Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3877111Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3877313Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3877607Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3877836Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3878052Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3878248Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3878463Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3878755Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3878977Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3879179Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3879378Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3879570Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3879717Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.3879924Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3880146Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3880353Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3880550Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3880770Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3880978Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3881173Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3881395Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3881599Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3881806Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3882026Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.3882249Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.3882460Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.3882657Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3882870Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3883074Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3883303Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3883503Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3883796Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3884022Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3884223Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3884422Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3884613Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.3884809Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3885023Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3885228Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3885427Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3885628Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3885921Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3886145Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3886359Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3886556Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3886773Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3887069Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3887284Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.3887487Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.3887685Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.3887884Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3888186Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3888383Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.3888583Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.3888773Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.3888967Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.3889180Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.3889386Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.3889584Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.3889774Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.3889956Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.3890142Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.3890268Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.3890381Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.3890506Z E1204 11:25:06.822000 938887 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.3890558Z FAILED [1.4962s] [100%] 2025-12-04T11:45:25.3890560Z 2025-12-04T11:45:25.3890617Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.3890779Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3890828Z Traceback (most recent call last): 2025-12-04T11:45:25.3890991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3891037Z method(*args, **kwargs) 2025-12-04T11:45:25.3891190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3891231Z method(*args, **kwargs) 2025-12-04T11:45:25.3891382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3891421Z with policy(): 2025-12-04T11:45:25.3891575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3891617Z raise RuntimeError(msg) 2025-12-04T11:45:25.3892034Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.3892050Z 2025-12-04T11:45:25.3892128Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3892406Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3892409Z 2025-12-04T11:45:25.3892500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3892579Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3892624Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3892682Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3893244Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3893366Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3893404Z graph_break [] 2025-12-04T11:45:25.3893471Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3893546Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3894036Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3894100Z current_size = base.storage().size() 2025-12-04T11:45:25.3894142Z Autotune Choices Stats: 2025-12-04T11:45:25.3894533Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-12-04T11:45:25.3894609Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3894657Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3894758Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3895008Z triton_mm_33 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3895245Z triton_mm_34 0.0086 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3895474Z triton_mm_29 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3895702Z triton_mm_16 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3895929Z triton_mm_22 0.0110 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3896168Z triton_mm_30 0.0112 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3896398Z triton_mm_21 0.0113 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3896629Z triton_mm_23 0.0118 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3896857Z triton_mm_15 0.0118 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3897087Z triton_mm_31 0.0122 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3897217Z SingleProcess AUTOTUNE benchmarking takes 0.1636 seconds and 1.1622 seconds precompiling for 33 choices 2025-12-04T11:45:25.3897382Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3897428Z Traceback (most recent call last): 2025-12-04T11:45:25.3897586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3897639Z method(*args, **kwargs) 2025-12-04T11:45:25.3897792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3897832Z method(*args, **kwargs) 2025-12-04T11:45:25.3897984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3898031Z with policy(): 2025-12-04T11:45:25.3898186Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3898238Z raise RuntimeError(msg) 2025-12-04T11:45:25.3898652Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.3898655Z 2025-12-04T11:45:25.3898730Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3899010Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3899012Z 2025-12-04T11:45:25.3899101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3899175Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3899219Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3899276Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3899826Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3899937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3899975Z graph_break [] 2025-12-04T11:45:25.3900040Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3900113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3900602Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3900650Z current_size = base.storage().size() 2025-12-04T11:45:25.3900691Z Autotune Choices Stats: 2025-12-04T11:45:25.3901068Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-12-04T11:45:25.3901131Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3901179Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3901279Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3901526Z triton_mm_33 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3901773Z triton_mm_34 0.0086 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3902011Z triton_mm_29 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3902238Z triton_mm_16 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3902478Z triton_mm_22 0.0110 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3902707Z triton_mm_30 0.0112 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3902934Z triton_mm_21 0.0113 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3903164Z triton_mm_23 0.0118 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3903431Z triton_mm_15 0.0118 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3903684Z triton_mm_31 0.0122 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3903813Z SingleProcess AUTOTUNE benchmarking takes 0.1636 seconds and 1.1622 seconds precompiling for 33 choices 2025-12-04T11:45:25.3903889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3903931Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3903989Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3904089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3904574Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3904611Z graph_break [] 2025-12-04T11:45:25.3904676Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3904750Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3904792Z Autotune Choices Stats: 2025-12-04T11:45:25.3905157Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00851999968290329, "best_triton_pos": 0} 2025-12-04T11:45:25.3905218Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3905279Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3905377Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3905616Z triton_mm_72 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3905858Z triton_mm_71 0.0091 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3906100Z triton_mm_54 0.0103 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3906327Z triton_mm_67 0.0103 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3906556Z triton_mm_60 0.0104 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3906783Z triton_mm_59 0.0110 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3907009Z triton_mm_68 0.0110 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3907238Z triton_mm_53 0.0116 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3907488Z triton_mm_61 0.0116 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3907720Z triton_mm_69 0.0122 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3907851Z SingleProcess AUTOTUNE benchmarking takes 0.2370 seconds and 0.7325 seconds precompiling for 39 choices 2025-12-04T11:45:25.3907905Z =================================== FAILURES =================================== 2025-12-04T11:45:25.3908065Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.3908113Z Traceback (most recent call last): 2025-12-04T11:45:25.3908270Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3908311Z method(*args, **kwargs) 2025-12-04T11:45:25.3908464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.3908504Z method(*args, **kwargs) 2025-12-04T11:45:25.3908655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.3908692Z with policy(): 2025-12-04T11:45:25.3908846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.3908886Z raise RuntimeError(msg) 2025-12-04T11:45:25.3909310Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.3909313Z 2025-12-04T11:45:25.3909386Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3909680Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3909691Z 2025-12-04T11:45:25.3909781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3909855Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3909897Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3909955Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3910507Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3910605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3910642Z graph_break [] 2025-12-04T11:45:25.3910706Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3910780Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3911275Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.3911323Z current_size = base.storage().size() 2025-12-04T11:45:25.3911363Z Autotune Choices Stats: 2025-12-04T11:45:25.3911734Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00848000030964613, "best_triton_pos": 0} 2025-12-04T11:45:25.3911795Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3911843Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3911941Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3912182Z triton_mm_33 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3912415Z triton_mm_34 0.0086 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3912643Z triton_mm_29 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3912875Z triton_mm_16 0.0107 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3913120Z triton_mm_22 0.0110 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3913396Z triton_mm_30 0.0112 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3913622Z triton_mm_21 0.0113 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3913867Z triton_mm_23 0.0118 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3914097Z triton_mm_15 0.0118 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3914326Z triton_mm_31 0.0122 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3914457Z SingleProcess AUTOTUNE benchmarking takes 0.1636 seconds and 1.1622 seconds precompiling for 33 choices 2025-12-04T11:45:25.3914530Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3914573Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3914629Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3914730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3915233Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3915272Z graph_break [] 2025-12-04T11:45:25.3915335Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3915409Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3915450Z Autotune Choices Stats: 2025-12-04T11:45:25.3915817Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00851999968290329, "best_triton_pos": 0} 2025-12-04T11:45:25.3915877Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3915925Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3916022Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3916257Z triton_mm_72 0.0085 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3916490Z triton_mm_71 0.0091 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3916716Z triton_mm_54 0.0103 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3916958Z triton_mm_67 0.0103 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3917198Z triton_mm_60 0.0104 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3917442Z triton_mm_59 0.0110 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3917668Z triton_mm_68 0.0110 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3917899Z triton_mm_53 0.0116 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3918128Z triton_mm_61 0.0116 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3918356Z triton_mm_69 0.0122 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3918487Z SingleProcess AUTOTUNE benchmarking takes 0.2370 seconds and 0.7325 seconds precompiling for 39 choices 2025-12-04T11:45:25.3918561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.3918602Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.3918672Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.3918771Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.3919259Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.3919296Z graph_break [] 2025-12-04T11:45:25.3919359Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.3919432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.3919473Z Autotune Choices Stats: 2025-12-04T11:45:25.3919845Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.3919905Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.3919952Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.3920051Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.3920289Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3920531Z triton_mm_109 0.0091 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3920771Z triton_mm_105 0.0104 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3920998Z triton_mm_92 0.0108 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3921238Z triton_mm_97 0.0110 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3921467Z triton_mm_98 0.0110 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3921697Z triton_mm_106 0.0112 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.3921929Z triton_mm_99 0.0117 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3922162Z triton_mm_91 0.0118 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3922407Z triton_mm_107 0.0122 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.3922536Z SingleProcess AUTOTUNE benchmarking takes 0.2438 seconds and 0.5889 seconds precompiling for 39 choices 2025-12-04T11:45:25.3922730Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2d80fc955dd00804.xml - 2025-12-04T11:45:25.3922791Z =========================== short test summary info ============================ 2025-12-04T11:45:25.3923465Z FAILED [1.4962s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.3923470Z 2025-12-04T11:45:25.3923542Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.3923820Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3923823Z 2025-12-04T11:45:25.3923910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.3923972Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.3924042Z ================== 1 failed, 187 deselected, 2 rerun in 6.68s ================== 2025-12-04T11:45:25.3924094Z Got exit code 1 2025-12-04T11:45:25.3924141Z Retrying single test... 2025-12-04T11:45:25.3924284Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-494da847e5d3520c.xml 2025-12-04T11:45:25.3924342Z ============================= test session starts ============================== 2025-12-04T11:45:25.3924455Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.3924508Z cachedir: .pytest_cache 2025-12-04T11:45:25.3924668Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.3924729Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.3924769Z configfile: pytest.ini 2025-12-04T11:45:25.3924934Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.3925009Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.3925292Z stepcurrent: skipping 107 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.3925337Z Running 1 items in this shard 2025-12-04T11:45:25.3925339Z 2025-12-04T11:45:25.3925692Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:16.995534303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3925696Z 2025-12-04T11:45:25.3925856Z [W1204 11:25:17.324849927 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3925858Z 2025-12-04T11:45:25.3926175Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3926488Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3926623Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3927107Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3927363Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3927591Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3927800Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3928001Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3928295Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3928546Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3928849Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3929082Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3929391Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3929623Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3929913Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3930147Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3930441Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3930673Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3930980Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3931210Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3931503Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3931699Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3931930Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3932223Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3932418Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3932648Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3932950Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3933203Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3933527Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3933761Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3933967Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3934167Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3934379Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3934546Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3934725Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3935270Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/xz/cxzsgupkkdchhe2o7g6tw4rosfcr2wjbwuy5aljgkzxb3z5zljvj.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3935421Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3935640Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3935796Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3935944Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3936234Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3936369Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3936631Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3936772Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3937024Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3937196Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3937480Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3937616Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3937903Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3938095Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3938413Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3938710Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3938840Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3939320Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3939583Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3939809Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3940014Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3940214Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3940506Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3940739Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3941032Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3941264Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3941564Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3941804Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3942095Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3942336Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3942628Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3942859Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3943148Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3943404Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3943697Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3943908Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3944140Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3944430Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3944625Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3944857Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3945148Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3945378Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3945670Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3945904Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3946124Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3946325Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3946843Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3947277Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3947682Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3947999Z E1204 11:25:17.048000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3948449Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3949112Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3949575Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3950251Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3951013Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3951538Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3952012Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3952453Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3952993Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3953590Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3954167Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3954749Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3955443Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3956014Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3956586Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3957148Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3957707Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3958265Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3958821Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3959381Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3959954Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3960477Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3960940Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3961496Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3962019Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3962481Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3963040Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3963638Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3964213Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3964773Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3965236Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3965694Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3966140Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3966552Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3966931Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3967668Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/pu/cpu64jj755szakmtfwan4p6lov5qwr65wfh2yepzoev4sjsympr4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.3968368Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.3968783Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.3969188Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.3969525Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.3969994Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.3970451Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.3970876Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.3971307Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.3971736Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.3972184Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.3972642Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.3973090Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.3973570Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.3974085Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.3974637Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3975280Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3975738Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3976383Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3977145Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3977659Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3978137Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3978584Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3979116Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3979675Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3980237Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3980797Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3981356Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3981915Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3982485Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3983051Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3983637Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3984210Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3984770Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3985328Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3985883Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3986407Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3986870Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3987447Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3987968Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.3988428Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3988985Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3989545Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3990101Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3990646Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.3991108Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.3991566Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.3992014Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.3992445Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.3992825Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.3993153Z E1204 11:25:17.073000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.3993465Z [W1204 11:25:17.345016864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.3993658Z 2025-12-04T11:45:25.3993967Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.3994605Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.3995065Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.3995708Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.3996479Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.3996990Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.3997452Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.3997895Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.3998432Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.3998997Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.3999562Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4000123Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4000695Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4001267Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4001826Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4002395Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4002953Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4003540Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4004101Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4004658Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4005219Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4005763Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4006226Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4006785Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4007305Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4007767Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4008323Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4008879Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4009440Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4010002Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4010467Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4010925Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4011383Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4011793Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4012172Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4012911Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/37/c37l5xtfj3iejjzqrk74mc6tliccgvyh7zrpktzfy7w6fivklgw4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4013649Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4014048Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4014454Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4014807Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4015277Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4015732Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4016159Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4016587Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4017014Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4017457Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4017913Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4018351Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4018795Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4019314Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4019871Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4020536Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4020988Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4021636Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4022393Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4022907Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4023406Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4023864Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4024397Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4024964Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4025525Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4026084Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4026639Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4027194Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4027751Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4028320Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4028888Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4029445Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4030013Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4030573Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4031131Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4031651Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4032113Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4032669Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4033204Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4033705Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4034356Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4034917Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4035476Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4036027Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4036489Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4036929Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4037374Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4037806Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4038197Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4038513Z E1204 11:25:17.080000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.4038809Z [W1204 11:25:17.350033061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4039014Z 2025-12-04T11:45:25.4039322Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4039960Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4040417Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4041059Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4041814Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4042340Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4042805Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4043245Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4043792Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4044352Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4044917Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4045474Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4046033Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4046607Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4047178Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4047738Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4048315Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4048876Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4049432Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4049991Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4050547Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4051071Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4051545Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4052108Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4052628Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4053091Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4053685Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4054243Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4054799Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4055345Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4055819Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4056260Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4056725Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4057139Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4057530Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4058272Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/sb/csbdvdci747xlsc7qpkk2n6z3jxmwermhzfwcsechytolof4kutv.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4058973Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4059370Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4059778Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4060114Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4060597Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4061049Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4061474Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4061905Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4062334Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4062780Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4063240Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4063710Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4064155Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4064655Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4065208Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4065862Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4066318Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4066980Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4067737Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4068249Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4068794Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4069237Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4069785Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4070347Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4070905Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4071466Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4072029Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4072585Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4073141Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4073737Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4074307Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4074865Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4075434Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4076003Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4076561Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4077085Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4077552Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4078110Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4078632Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4079107Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4079664Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4080218Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4080779Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4081325Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4081787Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4082229Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4082673Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4083085Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4083510Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4083825Z E1204 11:25:17.083000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.4084137Z [W1204 11:25:17.370965828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4084333Z 2025-12-04T11:45:25.4084642Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4085291Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4085748Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4086396Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4087153Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4087664Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4088144Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4088588Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4089119Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4089680Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4090240Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4090799Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4091359Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4091919Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4092475Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4093048Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4093646Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4094216Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4094770Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4095328Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4098287Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4098816Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4099282Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4099866Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4100391Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4100852Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4101414Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4101971Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4102528Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4103074Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4103576Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4104021Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4104481Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4104895Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4105288Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4106052Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/ya/cyaco5a2vq6ry54kx4loatbvtbxa5zxx7hnqlbqaadinvz23hzi2.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4106761Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4107162Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4107566Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4107902Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4108370Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4108826Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4109264Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4109697Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4110127Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4110572Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4111031Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4111472Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4111917Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4112419Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4112960Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4113657Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4114115Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4114779Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4115553Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4116066Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4116536Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4116976Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4117505Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4118071Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4118647Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4119205Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4119768Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4120325Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4120883Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4121442Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4122004Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4122562Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4123129Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4123737Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4124307Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4124827Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4125292Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4125849Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4126367Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4126827Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4127389Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4127967Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4128526Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4129073Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4129534Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4129977Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4130428Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4130840Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4131219Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4131534Z E1204 11:25:17.104000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.4131845Z [W1204 11:25:17.374612625 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4132036Z 2025-12-04T11:45:25.4132362Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4132996Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4133487Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4134134Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4134889Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4135398Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4135862Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4136300Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4136850Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4137411Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4137969Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4138527Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4139087Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4139647Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4140203Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4140764Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4141333Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4141904Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4142461Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4143031Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4143621Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4144143Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4144605Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4145163Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4145685Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4146166Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4146723Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4147281Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4147846Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4148400Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4148860Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4149301Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4149747Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4150174Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4150552Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4151309Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp9ek09r68/gx/cgxv6fwedwobotfiurfhod6psvmjxbxyeuwx7iregy3tt3f6eb3g.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4152025Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4152423Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4152829Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4153163Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4153656Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4154110Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4154532Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4154960Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4155402Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4155844Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4156303Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4156739Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4157187Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4157693Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4158239Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4158878Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4159349Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4160005Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4160757Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4161286Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4161749Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4162190Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4162722Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4163310Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4163868Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4164441Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4164997Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4165555Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4166108Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4166668Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4167227Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4167785Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4168344Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4168915Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4169486Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4170006Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4170480Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4171039Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4171561Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4172022Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4172579Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4173137Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4173737Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4174285Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4174745Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4175187Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.4175634Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.4176050Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.4176430Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.4176747Z E1204 11:25:17.108000 944811 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.4176939Z ('RERUN', {'yellow': True}) [3.5087s] [100%] 2025-12-04T11:45:25.4177376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:18.156655028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4177778Z 2025-12-04T11:45:25.4177926Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4178414Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4179032Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4179499Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4180141Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4180897Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4181405Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4181868Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4182306Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4182847Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4183438Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4184000Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4184560Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4185119Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4185676Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4186237Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4186808Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4187367Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4187926Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4188401Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4188837Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4189276Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4189716Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4190188Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4190747Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4191272Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4191750Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4192311Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4192856Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4193338Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4193795Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4194263Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4194703Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4195140Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4195598Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4196080Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4196528Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4196987Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4197475Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4198032Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4198590Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4199147Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4199692Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4200159Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4200607Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4201066Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4201519Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4201993Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4202563Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4203129Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4203712Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4204279Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4204849Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4205436Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4206021Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4206590Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4207171Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4207727Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4208282Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4208849Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4209419Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4209659Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4209973Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4210214Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4210512Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4210747Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4211050Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4211283Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4211580Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4211799Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4212016Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4212229Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4212527Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4212779Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4213070Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4213344Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4213638Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4213876Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4214175Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4214425Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4214728Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4214962Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4215251Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4215449Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4215644Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4215846Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4216054Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4216253Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4216499Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4216801Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4216995Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4217205Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4217401Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4217594Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4217825Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4218115Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4218346Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4218646Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4218842Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4219049Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4219249Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4219483Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4219778Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4220000Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4220203Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4220402Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4220629Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4220933Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4221167Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4221469Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4221709Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4222001Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4222234Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4222533Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4222767Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4223073Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4223298Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4223499Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4223720Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4223922Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4224121Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4224321Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4224614Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4224846Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4225159Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4225405Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4225702Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4225948Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4226240Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4226473Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4226763Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4226985Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4227189Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4227398Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4227592Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4227803Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4228004Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4228298Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4228520Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4228720Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4228919Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4229119Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4229421Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4229665Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4229956Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4230201Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4230496Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4230729Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4231021Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4231253Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4231557Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4231791Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4232084Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4232316Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4232611Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4232846Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4233136Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4233402Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4233709Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4233944Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4234249Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4234459Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4234656Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4234890Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4235186Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4235419Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4235710Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4235961Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4236253Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4236485Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4236776Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4237009Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4237309Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4237509Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4237742Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4238032Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4238277Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4238577Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4238803Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4239004Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4239205Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4239408Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4239701Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4239916Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4240117Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4240329Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4240531Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4240822Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4241043Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4241245Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4241445Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4241637Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4241787Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4241984Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4242222Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4242429Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4242637Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4242858Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4243077Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4243302Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4243522Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4243728Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4243922Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4244141Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4244347Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4244558Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4244755Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4244968Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4245170Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4245367Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4245568Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4245861Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4246073Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4246273Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4246486Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4246687Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4246882Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4247111Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4247312Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4247510Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4247713Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4248005Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4248218Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4248418Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4248628Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4248828Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4249120Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4249335Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4249540Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4249738Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4249938Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4250231Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4250425Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4250637Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4250839Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4251035Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4251265Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4251471Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4251670Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4251861Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4252041Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4252212Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4252338Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4252444Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4252569Z E1204 11:25:18.896000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4252737Z [W1204 11:25:18.165211844 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4252740Z 2025-12-04T11:45:25.4252885Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4253178Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4253498Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4253630Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4254117Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4254370Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4254613Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4254818Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4255030Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4255322Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4255571Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4255864Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4256097Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4256392Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4256623Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4256914Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4257162Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4257452Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4257673Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4257878Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4258075Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4258282Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4258483Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4258715Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4259017Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4259213Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4259454Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4259756Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4259974Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4260170Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4260388Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4260592Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4260791Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4260986Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4261215Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4261419Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4261614Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4261809Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4262042Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4262335Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4262567Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4262858Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4263079Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4263321Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4263531Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4263739Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4263952Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4264181Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4264475Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4264707Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4264997Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4265228Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4265537Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4265769Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4266058Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4266290Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4266580Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4266812Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4267100Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4267330Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4267638Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4267887Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4268179Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4268421Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4268714Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4268946Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4269236Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4269468Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4269755Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4269985Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4270188Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4270384Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4270675Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4270906Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4271197Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4271426Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4271717Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4271956Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4272259Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4272491Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4272793Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4273024Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4273334Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4273530Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4273726Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4273921Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4274130Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4274343Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4274576Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4274867Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4275063Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4275258Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4275453Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4275648Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4275878Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4276168Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4276410Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4276713Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4276921Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4277130Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4277334Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4277571Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4277863Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4278083Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4278286Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4278493Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4278697Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4278990Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4279224Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4279519Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4279751Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4280046Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4280277Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4280580Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4280824Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4281116Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4281325Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4281521Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4281742Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4281945Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4282144Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4282348Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4282641Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4282882Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4283177Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4283440Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4283732Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4283966Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4284262Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4284495Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4284790Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4285031Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4285246Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4285444Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4285649Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4285860Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4286060Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4286353Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4286572Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4286774Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4286973Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4287186Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4287480Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4287713Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4288005Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4288238Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4288531Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4288762Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4289055Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4289302Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4289604Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4289836Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4290138Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4290372Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4290664Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4290895Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4291187Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4291431Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4291725Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4291957Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4292250Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4292448Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4292644Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4292877Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4293169Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4293419Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4293726Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4293975Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4294270Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4294512Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4294806Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4295039Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4295331Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4295528Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4295760Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4296066Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4296299Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4296595Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4296809Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4297011Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4297210Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4297410Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4297703Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4297925Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4298128Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4298336Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4298546Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4298840Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4299062Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4299265Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4299462Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4299653Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4299801Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4300017Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4300239Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4300445Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4300642Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4300862Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4301072Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4301267Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4301486Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4301691Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4301885Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4302118Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4302333Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4302530Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4302733Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4302945Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4303148Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4303664Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4303865Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4304159Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4304373Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4304588Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4304787Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4304978Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4305174Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4305388Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4305590Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4305789Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4305989Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4306282Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4306508Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4306723Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4306921Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4307135Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4307427Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4307642Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4307845Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4308048Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4308249Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4308541Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4308748Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4308948Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4309137Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4309336Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4309549Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4309754Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4309950Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4310140Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4310321Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4310508Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4310634Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4310739Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4310876Z E1204 11:25:18.898000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4311034Z [W1204 11:25:18.167347453 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4311047Z 2025-12-04T11:45:25.4311193Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4311485Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4311782Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4311914Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4312399Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4312654Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4312890Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4313096Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4313328Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4313623Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4313860Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4314160Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4314395Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4314687Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4314932Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4315234Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4315465Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4315774Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4315994Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4316203Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4316400Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4316607Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4316804Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4317050Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4317341Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4317535Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4317766Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4318056Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4318277Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4318472Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4318693Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4318897Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4319102Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4319299Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4319527Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4319742Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4319937Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4320132Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4320364Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4320654Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4320887Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4321179Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4321409Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4321613Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4321808Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4322018Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4322216Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4322446Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4322738Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4322971Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4323296Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4323542Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4323844Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4324087Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4324378Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4324610Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4324900Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4325131Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4325420Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4325668Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4325960Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4326190Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4326481Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4326712Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4327002Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4327232Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4327522Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4327764Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4328070Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4328288Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4328502Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4328701Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4328993Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4329223Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4329512Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4329743Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4330044Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4330280Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4330571Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4330802Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4331096Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4331326Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4331616Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4331812Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4332017Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4332212Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4332439Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4332640Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4332882Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4333173Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4333394Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4333594Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4333790Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4333984Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4334214Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4334523Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4334755Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4335047Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4335245Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4335452Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4335654Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4335886Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4336178Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4336413Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4336628Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4336827Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4337041Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4337335Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4337569Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4337861Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4338094Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4338388Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4338634Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4338927Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4339160Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4339456Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4339653Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4339852Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4340071Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4340273Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4340470Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4340684Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4340987Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4341221Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4341532Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4341767Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4342062Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4342293Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4342586Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4342820Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4343123Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4343369Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4343570Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4343769Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4343962Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4344177Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4344379Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4344670Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4344890Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4345106Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4345318Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4345517Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4345824Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4346057Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4346352Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4346587Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4346893Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4347125Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4347430Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4347663Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4347957Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4348187Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4348481Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4348715Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4349009Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4349241Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4349557Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4349801Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4350093Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4350339Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4350632Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4350830Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4351028Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4351266Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4351561Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4351803Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4352097Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4352330Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4352623Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4352857Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4353149Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4353402Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4353695Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4353906Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4354150Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4354442Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4354687Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4354980Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4355195Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4355395Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4355596Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4355797Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4356106Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4356322Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4356521Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4356721Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4356921Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4357216Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4357435Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4357637Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4357835Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4358036Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4358187Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4358393Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4358613Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4358830Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4359026Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4359247Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4359455Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4359650Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4359869Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4360076Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4360283Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4360506Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4360711Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4360909Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4361105Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4361318Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4361520Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4361718Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4361919Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4362222Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4362446Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4362649Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4362861Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4363052Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4363278Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4363495Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4363695Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4363893Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4364092Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4364407Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4364620Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4364823Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4365024Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4365224Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4365517Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4365730Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4365931Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4366132Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4366344Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4366649Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4366843Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4367063Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4367253Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4367451Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4367664Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4367870Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4368067Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4368255Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4368447Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4368623Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4368750Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4368852Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4368979Z E1204 11:25:18.900000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4369133Z [W1204 11:25:18.208005024 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4369137Z 2025-12-04T11:45:25.4369283Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4369578Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4369872Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4370003Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4370482Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4370756Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4370980Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4371197Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4371400Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4371691Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4371926Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4372218Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4372453Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4372755Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4372990Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4373315Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4373545Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4373839Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4374059Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4374265Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4374463Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4374685Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4374885Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4375128Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4375419Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4375627Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4375860Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4376153Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4376372Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4376568Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4376784Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4377005Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4377202Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4377396Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4377614Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4377819Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4378015Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4378208Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4378439Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4378730Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4378970Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4379276Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4379497Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4379719Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4379912Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4380121Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4380319Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4380550Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4380841Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4381072Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4381377Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4381612Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4381906Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4382137Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4382428Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4382659Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4382951Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4383182Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4383515Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4383763Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4384068Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4384298Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4384591Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4384823Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4385113Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4385345Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4385650Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4385880Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4386170Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4386393Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4386596Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4386792Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4387083Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4387316Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4387606Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4387852Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4388151Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4388391Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4388681Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4388912Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4389202Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4389430Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4389719Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4389919Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4390124Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4390320Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4390528Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4390726Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4390959Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4391253Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4391450Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4391644Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4391839Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4392043Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4392286Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4392579Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4392823Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4393116Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4393337Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4393545Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4393747Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4393980Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4394290Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4394631Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4394836Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4395039Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4395242Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4395536Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4395769Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4396062Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4396294Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4396604Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4396856Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4397166Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4397399Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4397692Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4397891Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4398087Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4398308Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4398509Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4398719Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4398918Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4399211Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4399443Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4399740Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4399973Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4400265Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4400497Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4400799Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4401042Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4401334Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4401567Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4401772Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4401972Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4402165Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4402375Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4402576Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4402869Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4403099Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4403332Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4403530Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4403728Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4404021Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4404257Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4404549Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4404783Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4405091Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4405335Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4405628Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4405872Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4406165Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4406398Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4406694Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4406928Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4407220Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4407468Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4407761Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4407994Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4408286Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4408519Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4408813Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4409012Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4409210Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4409455Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4409762Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4409996Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4410297Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4410531Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4410825Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4411057Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4411350Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4411586Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4411890Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4412087Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4412320Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4412611Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4412846Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4413138Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4413469Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4413674Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4413895Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4414098Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4414403Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4414632Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4414833Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4415032Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4415232Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4415524Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4415747Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4415947Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4416164Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4416358Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4416507Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4416704Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4416922Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4417130Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4417326Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4417547Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4417753Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4419761Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4420009Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4420229Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4420427Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4420657Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4420863Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4421059Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4421255Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4421468Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4421671Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4421870Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4422081Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4422378Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4422593Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4422794Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4422992Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4423184Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4423404Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4423618Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4423818Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4424033Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4424234Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4424540Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4424767Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4424968Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4425168Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4425371Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4425663Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4425877Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4426078Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4426291Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4426491Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4426783Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4426981Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4427183Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4427374Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4427570Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4427783Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4427986Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4428195Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4428384Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4428577Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4428747Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4428885Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4428989Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4429116Z E1204 11:25:18.941000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4429275Z [W1204 11:25:18.210123323 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4429278Z 2025-12-04T11:45:25.4429423Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4429718Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4430018Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4430149Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4430648Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4430902Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4431127Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4431334Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4431537Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4431829Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4432064Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4432366Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4432608Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4432900Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4433143Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4433464Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4433698Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4433992Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4434213Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4434417Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4434630Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4434837Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4435036Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4435266Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4435555Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4435755Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4435986Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4436280Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4436498Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4436707Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4436936Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4437139Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4437347Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4437540Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4437760Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4437965Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4438159Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4438354Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4438587Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4438889Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4439121Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4439411Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4439629Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4439834Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4440030Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4440242Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4440441Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4440673Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4440981Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4441222Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4441515Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4441758Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4442050Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4442281Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4442571Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4442802Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4443093Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4443355Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4443646Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4443876Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4444175Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4444405Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4444698Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4444932Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4445224Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4445469Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4445771Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4446020Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4446314Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4446534Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4446734Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4446930Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4447222Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4447453Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4447755Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4447986Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4448277Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4448509Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4448804Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4449038Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4449327Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4449557Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4449858Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4450064Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4450260Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4450469Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4450677Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4450878Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4451114Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4451403Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4451598Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4451792Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4452000Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4452195Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4452426Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4452716Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4452948Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4453241Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4453479Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4453687Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4453903Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4454136Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4454443Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4454675Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4454876Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4455075Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4455275Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4455569Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4455802Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4456095Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4456340Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4456632Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4456864Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4457156Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4457389Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4457681Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4457881Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4458077Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4458311Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4458521Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4458723Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4458933Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4459225Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4459460Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4459752Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4459984Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4460278Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4460523Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4460815Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4461046Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4461338Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4461559Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4461762Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4461961Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4462152Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4462363Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4462576Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4462885Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4463106Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4463349Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4463550Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4463751Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4464044Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4464276Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4464569Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4464818Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4465113Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4465346Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4465637Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4465871Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4466162Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4466393Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4466684Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4466932Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4467238Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4467470Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4467779Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4468011Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4468303Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4468533Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4468825Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4469024Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4469232Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4469465Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4469758Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4469990Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4470284Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4470515Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4470808Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4471040Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4471343Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4471585Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4471877Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4472087Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4472320Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4472612Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4472844Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4473136Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4473380Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4473598Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4473797Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4473996Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4474295Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4474511Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4474715Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4474913Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4475114Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4475407Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4475641Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4475858Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4476055Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4476258Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4476405Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4476603Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4476823Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4477032Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4477228Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4477446Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4477653Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4477856Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4478077Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4478286Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4478481Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4478706Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4478912Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4479108Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4479303Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4479515Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4479743Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4479953Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4480152Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4480460Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4480672Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4480876Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4481081Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4481271Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4481467Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4481679Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4481892Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4482091Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4482290Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4482584Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4482797Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4482999Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4483196Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4483426Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4483718Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4483947Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4484161Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4484358Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4484570Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4484861Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4485058Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4485262Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4485452Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4485648Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4485865Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4486087Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4486283Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4486472Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4486651Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4486822Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4486950Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4487055Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4487182Z E1204 11:25:18.943000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4487339Z [W1204 11:25:18.212223772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4487343Z 2025-12-04T11:45:25.4487488Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4487781Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4488089Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4488230Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4488712Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4488978Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4489203Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4489409Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4489609Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4489902Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4490147Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4490443Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4490676Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4490966Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4491197Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4491488Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4491719Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4492012Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4492243Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4492449Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4492659Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4492867Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4493076Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4493339Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4493630Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4493824Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4494056Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4494347Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4494589Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4494784Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4495002Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4495206Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4495403Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4495598Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4495818Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4496023Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4496220Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4496430Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4496663Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4496969Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4497214Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4497504Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4497725Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4497929Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4498124Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4498332Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4498531Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4498773Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4499068Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4499299Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4499589Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4499820Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4500111Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4500340Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4500633Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4500874Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4501174Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4501407Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4501709Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4501939Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4502230Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4502460Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4502749Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4502980Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4503307Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4503538Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4503832Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4504063Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4504354Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4504573Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4504775Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4504969Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4505274Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4505517Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4505806Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4506051Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4506342Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4506576Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4506865Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4507097Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4507388Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4507629Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4507920Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4508116Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4508314Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4508512Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4508719Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4508919Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4509149Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4509440Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4509646Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4509849Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4510043Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4510252Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4510485Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4510780Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4511015Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4511305Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4511500Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4511719Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4511922Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4512155Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4512447Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4512668Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4512870Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4513073Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4513321Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4513615Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4513860Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4514166Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4514399Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4514703Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4514936Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4515228Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4515463Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4515755Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4515952Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4516161Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4516382Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4516585Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4516783Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4516984Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4517278Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4517510Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4517806Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4518037Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4518342Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4518583Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4518888Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4519120Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4519412Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4519632Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4519834Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4520035Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4520227Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4520450Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4520651Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4520942Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4521163Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4521366Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4521565Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4521764Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4522056Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4522291Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4522594Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4522837Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4523138Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4523394Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4523687Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4523921Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4524212Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4524443Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4524755Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4524988Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4525281Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4525514Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4525806Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4527023Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4527316Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4527548Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4527863Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4528077Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4528284Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4528519Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4528810Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4529044Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4529340Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4529570Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4529862Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4530106Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4530402Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4530636Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4530927Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4531125Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4531359Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4531692Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4531926Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4532230Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4532444Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4532657Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4532856Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4533058Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4533381Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4533595Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4533800Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4533998Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4534197Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4534507Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4534727Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4534929Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4535127Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4535321Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4535469Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4535689Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4535909Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4536114Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4536323Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4536544Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4536765Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4536959Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4537178Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4537385Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4537581Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4537802Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4538006Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4538203Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4538397Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4538624Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4538827Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4539025Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4539226Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4539519Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4539735Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4539949Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4540147Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4540337Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4540544Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4540758Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4540969Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4541167Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4541367Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4541661Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4541876Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4542078Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4542275Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4542475Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4542782Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4542995Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4543196Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4543405Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4543606Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4543899Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4544122Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4544325Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4544513Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4544720Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4544932Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4545151Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4545349Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4545538Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4545719Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4545888Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4546017Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4546119Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4546247Z E1204 11:25:18.945000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4546299Z ('RERUN', {'yellow': True}) [1.6230s] [100%] 2025-12-04T11:45:25.4546654Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:25:20.600328581 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4546658Z 2025-12-04T11:45:25.4546816Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4547112Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4547408Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4547537Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4548019Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4548284Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4548511Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4548728Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4548931Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4549235Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4549468Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4549759Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4549993Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4550284Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4550515Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4550808Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4551043Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4551346Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4551567Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4551771Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4551967Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4552175Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4552386Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4552616Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4552904Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4553110Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4553376Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4553680Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4553898Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4554093Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4554314Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4554519Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4554714Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4554907Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4555124Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4555344Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4555541Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4555740Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4555970Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4556262Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4556494Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4556798Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4557017Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4557223Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4557433Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4557650Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4557852Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4558084Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4558376Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4558612Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4558905Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4559135Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4559425Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4559673Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4559966Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4560198Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4560488Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4560721Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4561012Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4561253Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4561541Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4561779Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4562081Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4562312Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4562603Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4562836Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4563126Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4563392Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4563683Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4563902Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4564117Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4564314Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4564604Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4564834Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4565131Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4565362Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4565667Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4565896Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4566201Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4566446Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4566737Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4566967Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4567261Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4567459Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4567657Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4567850Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4568058Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4568256Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4568497Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4568788Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4568983Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4569177Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4569372Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4569568Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4569810Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4570099Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4570341Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4570631Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4570839Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4571046Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4571246Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4571479Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4571775Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4571998Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4572199Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4572399Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4572609Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4572905Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4573138Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4573454Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4573688Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4573983Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4574236Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4574528Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4574773Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4575083Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4575282Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4575478Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4575698Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4575900Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4576099Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4576298Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4576592Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4576825Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4577134Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4577369Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4577660Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4577892Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4578184Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4578427Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4578719Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4578939Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4579154Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4579365Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4579556Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4579766Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4579966Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4580261Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4580480Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4580681Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4580879Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4581078Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4581384Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4581617Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4581910Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4582144Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4582438Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4582684Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4582974Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4583206Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4583549Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4583800Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4584093Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4584325Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4584620Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4584853Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4585148Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4585379Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4585687Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4585923Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4586217Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4586416Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4586613Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4586846Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4587157Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4587389Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4587689Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4587938Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4588242Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4588479Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4588780Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4589015Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4589308Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4589507Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4589740Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4590049Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4590283Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4590580Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4590800Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4591005Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4591205Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4591432Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4591723Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4591937Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4592257Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4592475Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4592687Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4592983Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4593205Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4593464Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4593665Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4593856Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4594005Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4594203Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4594447Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4594654Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4594851Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4595070Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4595275Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4595472Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4595694Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4595914Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4596110Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4596333Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4596552Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4596765Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4596958Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4597171Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4597372Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4597573Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4597778Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4598069Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4598281Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4598482Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4598694Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4598887Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4599083Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4599296Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4599497Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4599696Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4599918Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4600213Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4600424Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4600637Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4600846Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4601048Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4601340Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4601552Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4601755Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4601954Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4602154Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4602447Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4602643Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4602857Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4603048Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4603243Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4603493Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4603698Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4603895Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4604106Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4604285Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4604456Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4604584Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4604701Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4604828Z E1204 11:25:20.333000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4604998Z [W1204 11:25:20.602606748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4605000Z 2025-12-04T11:45:25.4605145Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4605437Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4605734Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4605864Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4606345Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4606599Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4606825Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4607046Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4607246Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4607537Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4607772Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4608066Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4608317Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4608607Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4608840Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4609145Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4609390Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4609679Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4609897Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4610104Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4610301Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4610509Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4610709Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4610940Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4611243Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4611441Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4611675Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4611965Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4612185Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4612379Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4612610Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4612814Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4613009Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4613215Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4613477Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4613702Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4613900Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4614095Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4614326Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4614619Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4614850Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4615139Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4615358Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4615577Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4615775Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4615981Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4616184Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4616415Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4616707Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4616949Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4617238Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4617482Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4617771Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4618010Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4618301Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4618532Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4618825Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4619056Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4619346Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4619576Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4619876Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4620108Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4620399Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4620629Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4620923Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4621165Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4621455Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4621684Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4621986Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4622215Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4622417Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4622612Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4622902Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4623136Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4623462Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4623692Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4623982Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4624237Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4624529Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4624759Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4625049Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4625280Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4625574Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4625784Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4625979Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4626174Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4626395Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4626608Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4626838Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4627129Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4627328Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4627523Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4627720Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4627919Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4628150Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4628450Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4628682Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4628972Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4629167Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4629373Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4629574Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4629820Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4630114Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4630335Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4630547Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4630747Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4630959Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4631251Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4631483Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4631775Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4632009Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4632300Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4632535Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4632840Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4633073Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4633394Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4633590Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4633787Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4634007Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4634225Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4634423Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4634623Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4634934Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4635181Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4635474Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4635705Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4635998Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4636230Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4636525Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4636757Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4637051Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4637286Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4637489Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4637688Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4637880Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4638090Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4638292Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4638596Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4638815Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4639015Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4639225Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4639448Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4639742Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4639975Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4640267Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4640500Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4640795Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4641030Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4641324Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4641566Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4641863Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4642095Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4642387Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4642620Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4642923Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4643156Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4643481Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4643727Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4644033Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4644269Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4644564Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4644764Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4644962Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4645193Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4645484Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4645717Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4646024Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4646257Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4646552Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4646786Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4647080Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4647330Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4647621Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4647819Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4648062Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4648365Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4648598Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4648891Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4649111Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4649313Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4649514Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4649715Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4650007Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4650230Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4650433Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4650636Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4650835Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4651130Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4651353Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4651569Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4651770Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4651962Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4652121Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4652317Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4652548Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4652753Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4652949Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4653171Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4653407Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4653607Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4653826Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4654031Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4654227Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4654461Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4654669Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4654866Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4655063Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4655277Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4655479Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4655693Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4655897Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4656191Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4656428Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4656643Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4656842Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4657034Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4657229Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4657443Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4657644Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4657844Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4658046Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4658340Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4658566Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4658769Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4658967Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4659167Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4659459Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4659673Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4659888Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4660086Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4660286Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4660593Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4660799Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4661003Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4661192Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4661389Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4661604Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4661808Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4662007Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4662195Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4662374Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4662546Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4662683Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4662787Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4662916Z E1204 11:25:20.336000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4663072Z [W1204 11:25:20.604722397 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4663074Z 2025-12-04T11:45:25.4663218Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4663551Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4663850Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4663998Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4664477Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4664753Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4664997Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4665204Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4665404Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4665696Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4665931Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4666224Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4666458Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4666751Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4666997Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4667292Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4667524Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4667814Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4668038Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4668257Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4668455Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4668661Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4668873Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4669106Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4669410Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4669609Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4669840Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4670133Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4670354Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4670550Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4670767Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4670972Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4671180Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4671379Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4671598Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4671807Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4672007Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4672200Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4672449Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4672741Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4672974Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4673341Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4673579Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4673786Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4673981Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4674186Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4674387Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4674625Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4674916Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4675146Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4675454Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4675686Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4675979Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4676209Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4676503Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4676736Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4677043Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4677275Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4677565Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4677814Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4678121Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4678351Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4678643Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4678875Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4679168Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4679400Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4679693Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4679939Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4680231Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4680450Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4680649Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4680846Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4681136Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4681381Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4681679Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4681913Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4682218Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4682460Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4682754Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4682986Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4683326Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4683560Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4683851Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4684049Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4684247Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4684459Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4684668Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4684871Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4685104Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4685395Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4685607Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4685804Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4685999Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4686194Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4686444Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4686749Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4686982Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4687273Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4687468Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4687677Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4687880Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4688114Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4688409Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4688650Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4688855Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4689055Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4689260Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4689554Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4689790Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4690097Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4690329Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4690623Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4690872Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4691181Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4691413Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4691708Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4691908Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4692105Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4692327Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4692530Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4692731Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4692943Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4693241Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4693517Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4693811Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4694047Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4694357Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4694590Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4694883Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4695130Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4695438Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4695660Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4695865Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4696066Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4696264Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4696475Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4696676Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4696971Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4697190Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4697408Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4697609Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4697812Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4698107Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4698345Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4698982Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4699214Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4699509Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4699755Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4700062Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4700296Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4700591Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4700827Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4701120Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4701353Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4701644Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4701879Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4702183Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4702415Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4702708Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4702941Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4703241Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4703508Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4703707Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4703942Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4704252Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4704507Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4704800Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4705035Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4705329Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4705567Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4705865Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4706096Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4706392Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4706603Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4706840Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4707135Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4707366Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4707661Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4707890Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4708095Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4708295Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4708510Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4708818Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4709032Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4709235Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4709435Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4709639Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4709937Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4710164Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4710370Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4710570Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4710774Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4710924Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4711121Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4711340Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4711547Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4711745Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4711970Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4712190Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4712385Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4712609Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4712829Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4713039Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4713294Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4713500Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4713697Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4713894Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4714113Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4714315Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4714514Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4714716Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4715030Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4715246Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4715449Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4715648Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4715841Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4716039Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4716268Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4716472Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4716669Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4716886Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4717196Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4717413Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4717616Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4717814Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4718022Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4718317Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4718535Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4718737Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4718940Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4719153Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4719454Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4719655Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4719859Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4720054Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4720251Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4720480Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4720688Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4720884Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4721096Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4721278Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4721462Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4721590Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4721696Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4721821Z E1204 11:25:20.338000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4721981Z [W1204 11:25:20.644428212 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4721983Z 2025-12-04T11:45:25.4722129Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4722427Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4722724Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4722856Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4723392Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4723649Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4723876Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4724088Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4724289Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4724584Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4724835Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4725131Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4725378Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4725687Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4725922Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4726212Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4726448Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4726740Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4729568Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4729784Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4729985Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4730224Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4730426Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4730665Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4730963Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4731161Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4731396Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4731703Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4731924Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4732119Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4732352Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4732558Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4732766Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4732962Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4733184Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4733455Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4733655Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4733854Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4734087Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4734383Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4734639Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4734936Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4735156Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4735364Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4735563Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4735771Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4735986Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4736219Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4736513Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4736759Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4737070Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4737304Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4737596Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4737831Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4738123Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4738358Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4738652Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4738884Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4739189Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4739420Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4739713Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4739946Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4740239Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4740494Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4740787Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4741020Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4741328Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4741573Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4741864Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4742084Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4742287Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4742485Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4742777Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4743009Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4743345Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4743597Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4743890Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4744124Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4744417Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4744650Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4744959Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4745192Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4745487Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4745700Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4745913Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4746108Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4746317Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4746518Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4746750Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4747045Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4747243Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4747438Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4747633Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4747841Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4748075Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4748368Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4748601Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4748897Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4749106Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4749314Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4749518Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4749764Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4750073Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4750297Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4750501Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4750701Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4750906Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4751204Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4751437Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4751731Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4751965Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4752273Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4752511Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4752806Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4753043Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4753364Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4753584Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4753783Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4754004Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4754222Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4754437Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4754644Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4754941Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4755176Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4755472Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4755709Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4756002Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4756236Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4756554Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4756790Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4757086Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4757310Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4757517Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4757718Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4757928Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4758140Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4758340Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4758650Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4758885Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4759087Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4759287Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4759488Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4759790Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4760024Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4760318Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4760553Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4760860Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4761097Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4761391Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4761624Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4761922Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4762169Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4762466Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4762700Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4763011Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4763283Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4763579Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4763811Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4764108Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4764346Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4764642Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4764843Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4765039Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4765287Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4765583Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4765818Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4766113Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4766348Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4766661Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4766896Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4767193Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4767442Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4767746Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4767946Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4768179Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4768475Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4768708Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4769004Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4769222Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4769425Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4769638Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4769843Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4770139Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4770351Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4770554Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4770755Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4770969Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4771265Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4771486Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4771702Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4771902Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4772113Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4772263Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4772459Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4772682Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4772889Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4773087Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4773331Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4773541Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4773738Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4773981Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4774192Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4774387Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4774612Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4774818Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4775019Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4775232Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4775450Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4775652Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4775871Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4776092Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4776390Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4776607Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4776809Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4777009Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4777202Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4777400Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4777616Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4777818Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4778033Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4778236Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4778535Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4778749Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4778952Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4779153Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4779367Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4779664Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4779878Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4780094Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4780304Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4780507Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4780804Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4781002Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4781206Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4781397Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4781593Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4781806Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4782012Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4782221Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4782413Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4782596Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4782768Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4782897Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4783003Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4783136Z E1204 11:25:20.377000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4783317Z [W1204 11:25:20.646495771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4783340Z 2025-12-04T11:45:25.4783488Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4783785Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4784085Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4784232Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4784732Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4784988Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4785216Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4785424Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4785626Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4785919Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4786155Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4786462Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4786698Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4786989Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4787220Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4787518Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4787752Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4788056Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4788276Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4788498Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4788696Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4788916Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4789115Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4789347Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4789643Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4789839Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4790074Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4790366Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4790588Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4790794Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4791017Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4791225Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4791420Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4791618Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4791838Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4792056Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4792252Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4792447Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4792691Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4792986Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4793232Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4793557Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4793778Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4793985Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4794183Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4794391Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4794592Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4794826Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4795132Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4795367Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4795659Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4795891Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4796185Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4796431Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4796724Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4796958Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4797264Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4797510Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4797803Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4798036Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4798328Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4798562Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4798854Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4799086Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4799380Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4799625Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4799919Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4800150Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4800442Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4800663Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4800877Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4801073Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4801365Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4801614Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4801918Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4802151Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4802442Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4802675Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4802969Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4803200Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4803523Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4803755Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4804068Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4804266Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4804461Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4804657Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4804867Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4805067Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4805320Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4805613Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4805823Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4806019Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4806230Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4806426Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4806659Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4806952Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4807184Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4807480Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4807679Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4807891Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4808106Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4808346Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4808643Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4808869Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4809073Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4809277Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4809499Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4809794Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4810029Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4810340Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4810589Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4810883Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4811119Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4811414Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4811648Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4811947Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4812145Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4812351Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4812583Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4812791Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4812993Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4813194Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4813513Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4813751Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4814066Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4814299Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4814608Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4814844Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4815153Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4815388Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4815682Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4815907Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4816113Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4816314Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4816509Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4816719Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4816935Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4817230Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4817455Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4817658Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4817859Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4818062Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4818368Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4818604Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4818899Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4819147Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4819451Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4819686Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4819981Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4820217Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4820519Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4820756Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4821051Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4821306Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4821603Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4821842Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4822135Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4822371Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4822676Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4822913Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4823208Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4823451Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4823653Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4823902Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4824200Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4824435Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4824734Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4824973Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4825269Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4825506Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4825816Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4826056Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4826349Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4826549Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4826791Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4827086Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4827336Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4827631Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4827870Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4828076Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4828287Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4828492Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4828787Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4829005Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4829208Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4829409Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4829611Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4829906Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4830143Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4830348Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4830547Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4830739Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4830890Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4831088Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4831324Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4831533Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4831728Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4831950Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4832173Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4832385Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4832607Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4832816Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4833014Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4833239Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4833475Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4833673Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4833869Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4834085Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4834303Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4834506Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4834710Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4835009Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4835223Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4835427Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4835640Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4835833Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4836029Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4836258Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4836462Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4836684Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4836887Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4837183Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4837401Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4837604Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4837804Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4838007Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4838302Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4838528Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4838731Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4838936Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4839138Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4839437Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4839633Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4839850Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4840044Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4840240Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4840468Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4840675Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4840888Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4841079Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4841263Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4841438Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4841568Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4841675Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4841803Z E1204 11:25:20.379000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4841963Z [W1204 11:25:20.648606931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.4841965Z 2025-12-04T11:45:25.4842111Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4842411Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4842722Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.4842856Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.4843362Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.4843622Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.4843854Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.4844077Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.4844282Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4844576Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4844825Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4845136Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4845369Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4845662Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4845897Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4846193Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4846429Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4846722Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4846960Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4847167Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4847367Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4847576Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4847777Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4848011Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4848319Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4848517Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4848749Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4849053Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4849285Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4849482Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4849700Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4849905Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4850104Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4850300Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4850521Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4850727Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4850926Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4851133Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4851367Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4851661Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4851894Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4852188Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4852407Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4852626Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4852821Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4853031Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4853279Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4853526Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4853820Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4854053Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4854348Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4854580Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4854874Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4855106Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4855399Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4855648Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4855942Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4856174Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4856465Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4856700Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4857006Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4857238Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4857530Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4857777Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4858087Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4858318Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4858610Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4858843Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4859136Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4859358Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4859560Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4859758Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.4860063Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4860301Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4860596Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4860828Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4861124Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4861368Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4861662Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4861895Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4862199Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4862446Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4862741Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4862938Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4863136Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4863362Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4863572Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4863772Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4864005Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4864297Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4864513Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4864710Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4864908Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4865105Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4865340Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4865635Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4865881Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4866174Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4866384Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4866594Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4866810Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4867047Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4867343Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4867567Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4867772Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4867972Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4868173Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4868467Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4868715Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4869012Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4869247Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4869542Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4869780Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4870095Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4870334Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4870628Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4870841Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4871050Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4871276Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4871478Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4871679Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4871882Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4872181Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4872419Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4872714Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4872951Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4873287Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4873528Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4873825Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4874061Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4874360Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4874600Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4874806Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4875006Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4875217Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4875443Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.4875645Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4875942Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4876165Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4876374Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4876575Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4876781Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4877076Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4877314Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4877623Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4877858Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4878154Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4878388Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4878687Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4878936Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4879232Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4879484Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4879780Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4880033Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4880326Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4880562Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4880862Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4881097Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4881394Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4881629Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4881937Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4882140Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4882340Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4882575Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4882870Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4883108Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4883446Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4883681Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4883994Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4884231Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4884541Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4884777Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4885073Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4885270Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4885509Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4885804Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4886038Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.4886356Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4886575Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4886782Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4886983Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4887189Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4887487Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4887717Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4887920Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4888120Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4888337Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4888649Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4888879Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.4889085Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4889284Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4889483Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4889634Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.4889838Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4890060Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4890273Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4890471Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4890706Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4890920Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4891118Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4891342Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4891550Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.4891750Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4891983Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.4892192Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.4892393Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.4892602Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4892830Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4893038Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4893240Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4893476Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4893776Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4893993Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4894200Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4894401Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4894595Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.4894813Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.4895029Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4895235Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4895434Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4895642Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4895943Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4896172Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4896376Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4896576Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4896794Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4897101Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4897319Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.4897524Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.4897723Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.4897928Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.4898226Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.4898427Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.4898628Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.4898821Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.4899030Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.4899251Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.4899458Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.4899656Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.4899850Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.4900033Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.4900219Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.4900346Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.4900454Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.4900582Z E1204 11:25:20.382000 944811 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4900626Z FAILED [1.4623s] [100%] 2025-12-04T11:45:25.4900640Z 2025-12-04T11:45:25.4900699Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.4900865Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4900917Z Traceback (most recent call last): 2025-12-04T11:45:25.4901109Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4901157Z method(*args, **kwargs) 2025-12-04T11:45:25.4901312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4901357Z method(*args, **kwargs) 2025-12-04T11:45:25.4901510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4901554Z with policy(): 2025-12-04T11:45:25.4901708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4901756Z raise RuntimeError(msg) 2025-12-04T11:45:25.4902173Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.4902177Z 2025-12-04T11:45:25.4902261Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4902544Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.4902546Z 2025-12-04T11:45:25.4902639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4902720Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4902768Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4902828Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4903443Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4903549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4903587Z graph_break [] 2025-12-04T11:45:25.4903654Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4903730Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4904227Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4904293Z current_size = base.storage().size() 2025-12-04T11:45:25.4904338Z Autotune Choices Stats: 2025-12-04T11:45:25.4904722Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00839999970048666, "best_triton_pos": 0} 2025-12-04T11:45:25.4904804Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4904854Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4904961Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4905219Z triton_mm_34 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4905458Z triton_mm_33 0.0091 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4905693Z triton_mm_29 0.0106 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4905921Z triton_mm_16 0.0107 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4906151Z triton_mm_21 0.0109 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4906380Z triton_mm_22 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4906609Z triton_mm_30 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4906843Z triton_mm_23 0.0114 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4907091Z triton_mm_15 0.0116 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4907321Z triton_mm_31 0.0124 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4907457Z SingleProcess AUTOTUNE benchmarking takes 0.1604 seconds and 1.1204 seconds precompiling for 33 choices 2025-12-04T11:45:25.4907619Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4907667Z Traceback (most recent call last): 2025-12-04T11:45:25.4907828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4907870Z method(*args, **kwargs) 2025-12-04T11:45:25.4908025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4908078Z method(*args, **kwargs) 2025-12-04T11:45:25.4908231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4908269Z with policy(): 2025-12-04T11:45:25.4908424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4908465Z raise RuntimeError(msg) 2025-12-04T11:45:25.4908883Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.4908897Z 2025-12-04T11:45:25.4908973Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4909265Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.4909267Z 2025-12-04T11:45:25.4909359Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4909433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4909477Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4909534Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4910094Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4910194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4910233Z graph_break [] 2025-12-04T11:45:25.4910298Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4910372Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4910864Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4910926Z current_size = base.storage().size() 2025-12-04T11:45:25.4910970Z Autotune Choices Stats: 2025-12-04T11:45:25.4911347Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00839999970048666, "best_triton_pos": 0} 2025-12-04T11:45:25.4911411Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4911460Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4911561Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4911802Z triton_mm_34 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4912037Z triton_mm_33 0.0091 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4912278Z triton_mm_29 0.0106 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4912510Z triton_mm_16 0.0107 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4912752Z triton_mm_21 0.0109 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4912989Z triton_mm_22 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4913219Z triton_mm_30 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4913483Z triton_mm_23 0.0114 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4913718Z triton_mm_15 0.0116 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4913950Z triton_mm_31 0.0124 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4914083Z SingleProcess AUTOTUNE benchmarking takes 0.1604 seconds and 1.1204 seconds precompiling for 33 choices 2025-12-04T11:45:25.4914158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4914200Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4914259Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4914358Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4914873Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4914913Z graph_break [] 2025-12-04T11:45:25.4914980Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4915053Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4915095Z Autotune Choices Stats: 2025-12-04T11:45:25.4915468Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008200000040233135, "best_triton_pos": 0} 2025-12-04T11:45:25.4915531Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4915581Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4915681Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4915936Z triton_mm_72 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4916168Z triton_mm_71 0.0091 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4916397Z triton_mm_67 0.0107 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4916645Z triton_mm_59 0.0107 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4916893Z triton_mm_54 0.0108 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4917121Z triton_mm_60 0.0110 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4917349Z triton_mm_68 0.0111 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4917587Z triton_mm_61 0.0112 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4917822Z triton_mm_53 0.0120 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4918053Z triton_mm_69 0.0121 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4918185Z SingleProcess AUTOTUNE benchmarking takes 0.2361 seconds and 0.7303 seconds precompiling for 39 choices 2025-12-04T11:45:25.4918241Z =================================== FAILURES =================================== 2025-12-04T11:45:25.4918412Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4918463Z Traceback (most recent call last): 2025-12-04T11:45:25.4918624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4918667Z method(*args, **kwargs) 2025-12-04T11:45:25.4918822Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4918865Z method(*args, **kwargs) 2025-12-04T11:45:25.4919017Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4919057Z with policy(): 2025-12-04T11:45:25.4919213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4919255Z raise RuntimeError(msg) 2025-12-04T11:45:25.4919674Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.4919687Z 2025-12-04T11:45:25.4919763Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4920047Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.4920049Z 2025-12-04T11:45:25.4920139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4920226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4920269Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4920330Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4920898Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4921002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4921044Z graph_break [] 2025-12-04T11:45:25.4921108Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4921184Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4921677Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4921730Z current_size = base.storage().size() 2025-12-04T11:45:25.4921774Z Autotune Choices Stats: 2025-12-04T11:45:25.4922154Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00839999970048666, "best_triton_pos": 0} 2025-12-04T11:45:25.4922217Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4922268Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4922377Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4922621Z triton_mm_34 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4922856Z triton_mm_33 0.0091 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4923087Z triton_mm_29 0.0106 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4923353Z triton_mm_16 0.0107 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4923582Z triton_mm_21 0.0109 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4923827Z triton_mm_22 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4924055Z triton_mm_30 0.0110 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4924304Z triton_mm_23 0.0114 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4924550Z triton_mm_15 0.0116 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4924780Z triton_mm_31 0.0124 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4924915Z SingleProcess AUTOTUNE benchmarking takes 0.1604 seconds and 1.1204 seconds precompiling for 33 choices 2025-12-04T11:45:25.4924990Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4925039Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4925098Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4925202Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4925698Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4925741Z graph_break [] 2025-12-04T11:45:25.4925805Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4925881Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4925923Z Autotune Choices Stats: 2025-12-04T11:45:25.4926314Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008200000040233135, "best_triton_pos": 0} 2025-12-04T11:45:25.4926381Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4926429Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4926531Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4926766Z triton_mm_72 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4927001Z triton_mm_71 0.0091 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4927231Z triton_mm_67 0.0107 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4927472Z triton_mm_59 0.0107 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4927702Z triton_mm_54 0.0108 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4927933Z triton_mm_60 0.0110 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4928176Z triton_mm_68 0.0111 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4928420Z triton_mm_61 0.0112 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4928654Z triton_mm_53 0.0120 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4928885Z triton_mm_69 0.0121 ms 67.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4929023Z SingleProcess AUTOTUNE benchmarking takes 0.2361 seconds and 0.7303 seconds precompiling for 39 choices 2025-12-04T11:45:25.4929099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4929144Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4929200Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4929303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4929790Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4929829Z graph_break [] 2025-12-04T11:45:25.4929897Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4929982Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4930028Z Autotune Choices Stats: 2025-12-04T11:45:25.4930401Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00827999971807003, "best_triton_pos": 0} 2025-12-04T11:45:25.4930465Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4930512Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4930617Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4930858Z triton_mm_110 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4931095Z triton_mm_109 0.0092 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4931157Z _scaled_mm 0.0093 ms 88.8% 2025-12-04T11:45:25.4931389Z triton_mm_105 0.0104 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4931619Z triton_mm_92 0.0106 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4931860Z triton_mm_106 0.0111 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4932104Z triton_mm_98 0.0111 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4932337Z triton_mm_97 0.0114 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4932570Z triton_mm_99 0.0116 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4932804Z triton_mm_91 0.0118 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4932939Z SingleProcess AUTOTUNE benchmarking takes 0.2469 seconds and 0.5830 seconds precompiling for 39 choices 2025-12-04T11:45:25.4933136Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-494da847e5d3520c.xml - 2025-12-04T11:45:25.4933198Z =========================== short test summary info ============================ 2025-12-04T11:45:25.4933889Z FAILED [1.4623s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.4933893Z 2025-12-04T11:45:25.4933970Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4934251Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.4934253Z 2025-12-04T11:45:25.4934344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4934408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.4934484Z ================== 1 failed, 187 deselected, 2 rerun in 6.61s ================== 2025-12-04T11:45:25.4934524Z Got exit code 1 2025-12-04T11:45:25.4934752Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.4934883Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.4935046Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03523e315972c336.xml 2025-12-04T11:45:25.4935105Z ============================= test session starts ============================== 2025-12-04T11:45:25.4935220Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.4935262Z cachedir: .pytest_cache 2025-12-04T11:45:25.4935423Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.4935484Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.4935530Z configfile: pytest.ini 2025-12-04T11:45:25.4935699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.4935780Z collecting ... collected 188 items / 108 deselected / 80 selected 2025-12-04T11:45:25.4935837Z stepcurrent: skipping 108 already run items. 2025-12-04T11:45:25.4935898Z Running 80 items in this shard 2025-12-04T11:45:25.4935900Z 2025-12-04T11:45:25.4936851Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/us/cusvykiwim4awxmesmmtokb7uqkrvkg6j4sttbmvico7lgyjh6ag.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4937009Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4937237Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4937397Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4937548Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4937840Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4937990Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4938256Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4938400Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4938660Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4938818Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4939096Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4939246Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4939527Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4939724Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4940046Z E1204 11:25:30.591000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4940812Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/gs/cgskehlfe747cdsaomsadiaalz26dwrddeyodwm5zmjfk435sr57.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4940961Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4941180Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4941336Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4941487Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4941779Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4941912Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4942172Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4942324Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4942582Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4942739Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4943014Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4943148Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4943464Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4943680Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4943995Z E1204 11:25:30.612000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4944731Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/3v/c3vtzftvwqwx2urxq2vpokyujcvg36j3wdinrekyjgxuofdo2ac4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4944910Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4945128Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4945284Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4945429Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4945720Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4945852Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4946110Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4946247Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4946504Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4946662Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4946954Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4947095Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4947372Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4947570Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4947890Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4948635Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/ui/cui7dqlnurfmvrkr5ycejewmf6ibzpqmws5ikb5yw2p2q6b7xfj5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4948799Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4949013Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4949181Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4949326Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4949614Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4949745Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4950008Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4950148Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4950404Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4950560Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4950829Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4950966Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4951256Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4951452Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4951766Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4952505Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/tb/ctbjlwornjfpqshgimqikeesgo4n4qwmnnewdptmvcwhfir47lnl.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4952665Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4952879Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4953035Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4953192Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4953524Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4953657Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4953912Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4954051Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4954306Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4954464Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4954735Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4954872Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4955148Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4955346Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4955679Z E1204 11:25:30.613000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4956405Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpgelyg75m/ol/collzhpgreu4voltxf626zwwt4jfw35jwr4zflvckeinb46jgauo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.4956559Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.4956774Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.4956947Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.4957098Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.4957385Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.4957533Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.4957791Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.4957943Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.4958198Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.4958358Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.4958632Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.4958767Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.4959046Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.4959239Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.4959559Z E1204 11:25:30.614000 950735 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.4959612Z ('RERUN', {'yellow': True}) [3.4226s] [ 1%] 2025-12-04T11:45:25.4959968Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:25:32.427000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4960269Z E1204 11:25:32.427000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4960401Z E1204 11:25:32.427000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4960546Z E1204 11:25:32.429000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4960843Z E1204 11:25:32.429000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4960973Z E1204 11:25:32.429000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4961129Z E1204 11:25:32.431000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4961426Z E1204 11:25:32.431000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4961555Z E1204 11:25:32.431000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4961714Z E1204 11:25:32.484000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4962010Z E1204 11:25:32.484000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4962155Z E1204 11:25:32.484000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4962301Z E1204 11:25:32.486000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4962595Z E1204 11:25:32.486000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4962725Z E1204 11:25:32.486000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4962866Z E1204 11:25:32.488000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4963164Z E1204 11:25:32.488000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4963315Z E1204 11:25:32.488000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4963367Z ('RERUN', {'yellow': True}) [1.5254s] [ 1%] 2025-12-04T11:45:25.4963704Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:25:33.767000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4964022Z E1204 11:25:33.767000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4964152Z E1204 11:25:33.767000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4964295Z E1204 11:25:33.769000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4964591Z E1204 11:25:33.769000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4964717Z E1204 11:25:33.769000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4964865Z E1204 11:25:33.771000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4965162Z E1204 11:25:33.771000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4965305Z E1204 11:25:33.771000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4965447Z E1204 11:25:33.809000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4965743Z E1204 11:25:33.809000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4965886Z E1204 11:25:33.809000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4966031Z E1204 11:25:33.811000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4966342Z E1204 11:25:33.811000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4966470Z E1204 11:25:33.811000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4966617Z E1204 11:25:33.813000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.4966912Z E1204 11:25:33.813000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.4967045Z E1204 11:25:33.813000 950735 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.4967086Z FAILED [1.3692s] [ 1%] 2025-12-04T11:45:25.4967088Z 2025-12-04T11:45:25.4967147Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.4967308Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4967357Z Traceback (most recent call last): 2025-12-04T11:45:25.4967515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4967561Z method(*args, **kwargs) 2025-12-04T11:45:25.4967715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4967760Z method(*args, **kwargs) 2025-12-04T11:45:25.4967913Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4967953Z with policy(): 2025-12-04T11:45:25.4968121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4968165Z raise RuntimeError(msg) 2025-12-04T11:45:25.4968582Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.4968585Z 2025-12-04T11:45:25.4968662Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4968943Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.4968946Z 2025-12-04T11:45:25.4969038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4969131Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4969178Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4969239Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4969801Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4969919Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4969961Z graph_break [] 2025-12-04T11:45:25.4970028Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4970105Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4970608Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4970660Z current_size = base.storage().size() 2025-12-04T11:45:25.4970701Z Autotune Choices Stats: 2025-12-04T11:45:25.4971082Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00827999971807003, "best_triton_pos": 0} 2025-12-04T11:45:25.4971146Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4971203Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4971305Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4971553Z triton_mm_34 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4971595Z _scaled_mm 0.0092 ms 90.4% 2025-12-04T11:45:25.4971832Z triton_mm_33 0.0094 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4972076Z triton_mm_29 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4972304Z triton_mm_21 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4972533Z triton_mm_22 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4972759Z triton_mm_30 0.0110 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4972989Z triton_mm_16 0.0111 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4973229Z triton_mm_23 0.0117 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4973489Z triton_mm_15 0.0122 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4973626Z SingleProcess AUTOTUNE benchmarking takes 0.1557 seconds and 1.2048 seconds precompiling for 33 choices 2025-12-04T11:45:25.4973803Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4973853Z Traceback (most recent call last): 2025-12-04T11:45:25.4974013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4974060Z method(*args, **kwargs) 2025-12-04T11:45:25.4974230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4974276Z method(*args, **kwargs) 2025-12-04T11:45:25.4974430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4974472Z with policy(): 2025-12-04T11:45:25.4974628Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4974674Z raise RuntimeError(msg) 2025-12-04T11:45:25.4975088Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.4975091Z 2025-12-04T11:45:25.4975168Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4975446Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.4975451Z 2025-12-04T11:45:25.4975540Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4975617Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4975661Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4975722Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4976289Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4976393Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4976430Z graph_break [] 2025-12-04T11:45:25.4976498Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4976573Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4977066Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4977129Z current_size = base.storage().size() 2025-12-04T11:45:25.4977175Z Autotune Choices Stats: 2025-12-04T11:45:25.4977550Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00827999971807003, "best_triton_pos": 0} 2025-12-04T11:45:25.4977613Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4977684Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4977784Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4978025Z triton_mm_34 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4978069Z _scaled_mm 0.0092 ms 90.4% 2025-12-04T11:45:25.4978311Z triton_mm_33 0.0094 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4978539Z triton_mm_29 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4978769Z triton_mm_21 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4978996Z triton_mm_22 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4979233Z triton_mm_30 0.0110 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4979463Z triton_mm_16 0.0111 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4979692Z triton_mm_23 0.0117 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4979933Z triton_mm_15 0.0122 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4980066Z SingleProcess AUTOTUNE benchmarking takes 0.1557 seconds and 1.2048 seconds precompiling for 33 choices 2025-12-04T11:45:25.4980145Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4980190Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4980248Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4980349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4980841Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4980897Z graph_break [] 2025-12-04T11:45:25.4980963Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4981040Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4981085Z Autotune Choices Stats: 2025-12-04T11:45:25.4981461Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_71", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.009080000221729279, "best_triton_pos": 0} 2025-12-04T11:45:25.4981537Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4981589Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4981689Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4981942Z triton_mm_71 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4982172Z triton_mm_72 0.0092 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4982405Z triton_mm_67 0.0106 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4982639Z triton_mm_59 0.0107 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4982868Z triton_mm_68 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4983097Z triton_mm_54 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4983350Z triton_mm_60 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4983597Z triton_mm_61 0.0113 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4983829Z triton_mm_53 0.0116 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4984060Z triton_mm_69 0.0122 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4984194Z SingleProcess AUTOTUNE benchmarking takes 0.2284 seconds and 0.7450 seconds precompiling for 39 choices 2025-12-04T11:45:25.4984250Z =================================== FAILURES =================================== 2025-12-04T11:45:25.4984412Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.4984460Z Traceback (most recent call last): 2025-12-04T11:45:25.4984623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4984683Z method(*args, **kwargs) 2025-12-04T11:45:25.4984839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.4984881Z method(*args, **kwargs) 2025-12-04T11:45:25.4985038Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.4985076Z with policy(): 2025-12-04T11:45:25.4985231Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.4985288Z raise RuntimeError(msg) 2025-12-04T11:45:25.4985718Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.4985722Z 2025-12-04T11:45:25.4985797Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4986077Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.4986080Z 2025-12-04T11:45:25.4986171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.4986246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4986292Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4986351Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4986907Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4987007Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4987052Z graph_break [] 2025-12-04T11:45:25.4987116Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4987195Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4987700Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.4987753Z current_size = base.storage().size() 2025-12-04T11:45:25.4987795Z Autotune Choices Stats: 2025-12-04T11:45:25.4988170Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00827999971807003, "best_triton_pos": 0} 2025-12-04T11:45:25.4988236Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4988286Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4988389Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4988627Z triton_mm_34 0.0083 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4988684Z _scaled_mm 0.0092 ms 90.4% 2025-12-04T11:45:25.4988915Z triton_mm_33 0.0094 ms 87.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4989145Z triton_mm_29 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4989382Z triton_mm_21 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4989625Z triton_mm_22 0.0110 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4989855Z triton_mm_30 0.0110 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4990082Z triton_mm_16 0.0111 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4990315Z triton_mm_23 0.0117 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4990545Z triton_mm_15 0.0122 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4990679Z SingleProcess AUTOTUNE benchmarking takes 0.1557 seconds and 1.2048 seconds precompiling for 33 choices 2025-12-04T11:45:25.4990754Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4990801Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4990858Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4990960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4991461Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4991504Z graph_break [] 2025-12-04T11:45:25.4991571Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4991644Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4991689Z Autotune Choices Stats: 2025-12-04T11:45:25.4992060Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_71", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.009080000221729279, "best_triton_pos": 0} 2025-12-04T11:45:25.4992125Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4992175Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4992279Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4992536Z triton_mm_71 0.0091 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4992771Z triton_mm_72 0.0092 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4992998Z triton_mm_67 0.0106 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4993238Z triton_mm_59 0.0107 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4993509Z triton_mm_68 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4993734Z triton_mm_54 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4993962Z triton_mm_60 0.0108 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4994194Z triton_mm_61 0.0113 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4994426Z triton_mm_53 0.0116 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4994656Z triton_mm_69 0.0122 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4994786Z SingleProcess AUTOTUNE benchmarking takes 0.2284 seconds and 0.7450 seconds precompiling for 39 choices 2025-12-04T11:45:25.4994864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.4994909Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.4994986Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.4995090Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.4995583Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.4995621Z graph_break [] 2025-12-04T11:45:25.4995688Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.4995764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.4995810Z Autotune Choices Stats: 2025-12-04T11:45:25.4996181Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.4996260Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.4996308Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.4996410Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.4996647Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4996704Z _scaled_mm 0.0092 ms 93.9% 2025-12-04T11:45:25.4996940Z triton_mm_109 0.0092 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4997180Z triton_mm_105 0.0104 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4997414Z triton_mm_92 0.0107 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4997641Z triton_mm_106 0.0108 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4997871Z triton_mm_98 0.0109 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4998098Z triton_mm_97 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.4998329Z triton_mm_99 0.0115 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4998561Z triton_mm_91 0.0116 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.4998691Z SingleProcess AUTOTUNE benchmarking takes 0.2303 seconds and 0.5984 seconds precompiling for 39 choices 2025-12-04T11:45:25.4998903Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03523e315972c336.xml - 2025-12-04T11:45:25.4998967Z =========================== short test summary info ============================ 2025-12-04T11:45:25.4999602Z FAILED [1.3692s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.4999606Z 2025-12-04T11:45:25.4999682Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.4999958Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.4999972Z 2025-12-04T11:45:25.5000069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.5000134Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.5000205Z ================== 1 failed, 108 deselected, 2 rerun in 6.34s ================== 2025-12-04T11:45:25.5000245Z Got exit code 1 2025-12-04T11:45:25.5000288Z Retrying single test... 2025-12-04T11:45:25.5000436Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed4d63a860bf7891.xml 2025-12-04T11:45:25.5000498Z ============================= test session starts ============================== 2025-12-04T11:45:25.5000625Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.5000671Z cachedir: .pytest_cache 2025-12-04T11:45:25.5000833Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.5000882Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.5000936Z configfile: pytest.ini 2025-12-04T11:45:25.5001101Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.5001177Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.5001453Z stepcurrent: skipping 108 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5001498Z Running 1 items in this shard 2025-12-04T11:45:25.5001500Z 2025-12-04T11:45:25.5001859Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:42.262243465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5001862Z 2025-12-04T11:45:25.5002025Z [W1204 11:25:43.604441447 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5002027Z 2025-12-04T11:45:25.5002343Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5002645Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5002792Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5003333Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5003593Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5003824Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5004036Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5004252Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5004550Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5004788Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5005102Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5005352Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5005646Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5005881Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5006175Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5006411Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5006706Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5006936Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5010371Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5010651Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5010952Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5011151Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5011387Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5011681Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5011892Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5012124Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5012415Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5012660Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5012963Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5013188Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5013431Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5013633Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5013849Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5014021Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5014202Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5014734Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/us/cusvykiwim4awxmesmmtokb7uqkrvkg6j4sttbmvico7lgyjh6ag.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5014901Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5015122Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5015279Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5015425Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5015716Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5015855Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5016114Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5016268Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5016524Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5016682Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5016968Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5017102Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5017391Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5017585Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5017902Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5018200Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5018334Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5018819Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5019074Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5019315Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5019525Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5019728Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5020020Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5020257Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5020573Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5020805Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5021097Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5021344Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5021647Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5021881Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5022171Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5022403Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5022696Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5022930Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5023219Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5023435Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5023688Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5023980Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5024179Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5024409Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5024702Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5024948Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5025242Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5025463Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5025691Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5025893Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5026142Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5026312Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5026490Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5026596Z E1204 11:25:43.334000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5026904Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5027200Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5027332Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5027811Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5028075Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5028299Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5028505Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5028704Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5028996Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5029243Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5029534Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5029767Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5030071Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5030316Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5030609Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5030840Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5031133Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5031366Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5031659Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5031893Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5032189Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5032398Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5032632Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5033052Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5033276Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5033510Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5033825Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5034055Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5034345Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5034583Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5034804Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5035004Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5035217Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5035385Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5035564Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5036091Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/ol/collzhpgreu4voltxf626zwwt4jfw35jwr4zflvckeinb46jgauo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5036239Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5036454Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5036610Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5036773Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5037065Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5037198Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5037455Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5037593Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5037847Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5038016Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5038285Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5038419Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5038695Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5038902Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5039228Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5039524Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5039653Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5040134Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5040389Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5040613Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5040823Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5041035Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5041331Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5041566Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5041857Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5042091Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5042399Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5042630Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5042921Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5043170Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5043505Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5043739Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5044033Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5044265Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5044560Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5044758Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5044991Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5045280Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5045493Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5045727Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5046017Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5046254Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5046547Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5046782Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5046987Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5047189Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5047398Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5047583Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5047775Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5047878Z E1204 11:25:43.358000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5048037Z [W1204 11:25:43.642877010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5048040Z 2025-12-04T11:45:25.5048192Z [W1204 11:25:43.647371705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5048196Z 2025-12-04T11:45:25.5048506Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5048804Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5048934Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5049408Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5049670Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5049897Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5050103Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5050301Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5050594Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5050830Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5051137Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5051368Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5051672Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5051904Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5052203Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5052437Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5052727Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5052962Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5053286Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5053522Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5053812Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5054022Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5054257Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5054547Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5054745Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5054977Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5055269Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5055514Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5055805Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5056040Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5056246Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5056463Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5056673Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5056842Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5057021Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5057547Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/tb/ctbjlwornjfpqshgimqikeesgo4n4qwmnnewdptmvcwhfir47lnl.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5057697Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5057911Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5058069Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5058217Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5058525Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5058659Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5058914Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5059053Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5059307Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5059462Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5059743Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5059878Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5060153Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5060358Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5060685Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5060979Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5061111Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5061587Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5061843Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5062069Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5062273Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5062475Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5062780Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5063018Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5063335Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5063567Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5063860Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5064105Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5064396Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5064642Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5064935Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5065183Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5065476Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5065707Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5065999Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5066201Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5066431Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5066721Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5066916Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5067162Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5067457Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5067688Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5067979Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5068198Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5068418Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5068620Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5068828Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5069008Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5069185Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5069301Z E1204 11:25:43.380000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5069612Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5069910Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5070041Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5070517Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5070770Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5070994Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5071212Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5071412Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5071704Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5071939Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5072230Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5072464Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5072766Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5072998Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5073305Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5073555Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5073864Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5074094Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5074387Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5074621Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5074914Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5075109Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5075340Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5075646Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5075842Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5076075Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5076363Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5076597Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5076890Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5077122Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5077328Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5077527Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5077749Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5077913Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5078106Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5078630Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/ui/cui7dqlnurfmvrkr5ycejewmf6ibzpqmws5ikb5yw2p2q6b7xfj5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5078778Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5078997Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5079154Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5079301Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5079586Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5079718Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5079984Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5080125Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5080381Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5080536Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5080805Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5080941Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5081242Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5081435Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5081748Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5082054Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5082184Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5082673Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5082925Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5083156Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5083400Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5083601Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5083893Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5084126Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5084436Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5084669Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5084965Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5085198Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5085493Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5085737Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5086029Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5086261Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5086564Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5086809Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5087100Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5087296Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5087529Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5087821Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5088016Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5088248Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5088542Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5088786Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5089077Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5089297Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5089503Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5089706Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5089928Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5090099Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5090277Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5090379Z E1204 11:25:43.382000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5090556Z [W1204 11:25:43.655416958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5090558Z 2025-12-04T11:45:25.5090712Z [W1204 11:25:43.656265716 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5090715Z 2025-12-04T11:45:25.5091038Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5091333Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5091467Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5091943Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5092196Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5092423Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5092629Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5092845Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5093140Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5093407Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5093699Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5093933Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5094246Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5094477Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5094770Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5095016Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5095325Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5095559Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5095848Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5096085Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5096375Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5096573Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5096804Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5097095Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5097306Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5097541Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5097839Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5098072Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5098364Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5098599Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5098803Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5099003Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5099228Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5099398Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5099588Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5100117Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/gs/cgskehlfe747cdsaomsadiaalz26dwrddeyodwm5zmjfk435sr57.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5100266Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5100485Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5100643Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5100788Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5101074Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5101204Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5101476Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5101613Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5101870Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5102026Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5102297Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5102432Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5102709Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5103008Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5103358Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5103669Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5103800Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5104292Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5104547Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5104776Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5104987Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5105190Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5105480Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5105714Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5106018Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5106256Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5106546Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5106779Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5107076Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5107334Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5107625Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5107856Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5108165Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5108409Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5108703Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5108902Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5109133Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5109428Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5109627Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5109862Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5110152Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5110397Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5110690Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5110909Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5111117Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5111318Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5111533Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5111711Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5111890Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5111992Z E1204 11:25:43.389000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5112299Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5112603Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5112747Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5113222Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5113505Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5113732Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5113938Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5114139Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5114430Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5114680Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5114974Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5115204Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5115496Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5115730Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5116034Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5116267Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5116560Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5116806Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5117110Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5117341Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5117632Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5117828Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5118062Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5118351Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5118547Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5118780Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5119083Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5119317Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5119608Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5119827Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5120033Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5120235Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5120458Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5120621Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5120799Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5121347Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpqmwvvfc9/3v/c3vtzftvwqwx2urxq2vpokyujcvg36j3wdinrekyjgxuofdo2ac4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5121495Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5121710Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5121866Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5122012Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5122298Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5122431Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5122686Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5122824Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5123078Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5123277Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5123549Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5123684Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5123958Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5124151Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5124560Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5124870Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5125000Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5125474Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5125758Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5125986Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5126191Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5126394Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5126686Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5126923Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5127214Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5127446Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5127751Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5127983Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5128277Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5128508Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5128799Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5129043Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5129332Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5129563Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5129868Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5130076Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5130306Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5130600Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5130797Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5131028Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5131321Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5131551Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5131842Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5132072Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5132279Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5132479Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5132687Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5132855Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5133033Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5133149Z E1204 11:25:43.390000 956654 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5133203Z ('RERUN', {'yellow': True}) [3.4127s] [100%] 2025-12-04T11:45:25.5133594Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:45.350785914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5133596Z 2025-12-04T11:45:25.5133743Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5134056Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5134368Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5134499Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5134979Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5135235Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5135462Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5135668Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5135866Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5136159Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5136405Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5136699Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5136930Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5137225Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5137459Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5137765Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5137997Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5138289Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5138523Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5138749Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5138944Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5139154Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5139356Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5139588Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5139882Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5140080Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5140312Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5140619Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5140842Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5141037Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5141257Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5141463Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5141665Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5141872Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5142092Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5142302Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5142498Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5142706Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5142951Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5143244Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5143514Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5143809Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5144029Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5144236Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5144434Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5144643Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5144863Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5145097Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5145389Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5145622Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5145914Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5146149Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5146454Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5146686Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5146993Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5147226Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5147533Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5147765Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5148057Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5148288Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5148581Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5148813Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5149103Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5149348Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5149644Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5149877Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5150167Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5150399Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5150702Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5150920Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5151124Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5151333Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5151631Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5151874Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5152169Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5152401Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5152692Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5152926Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5153216Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5153484Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5153793Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5154031Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5154329Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5154526Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5154726Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5154923Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5155145Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5155345Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5155575Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5155892Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5156104Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5156301Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5156499Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5156694Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5156927Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5157218Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5157450Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5157738Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5157936Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5158155Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5158360Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5158595Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5158890Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5159113Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5159331Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5159533Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5159734Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5160043Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5160278Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5160585Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5160821Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5161115Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5161349Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5161643Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5161878Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5162171Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5162384Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5162586Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5162808Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5163011Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5163209Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5163452Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5163760Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5163997Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5164290Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5164535Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5164845Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5165078Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5165370Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5165603Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5165900Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5166122Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5166323Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5166523Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5166729Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5166943Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5167144Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5167440Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5167663Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5167863Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5168079Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5168280Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5168574Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5168818Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5169125Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5169360Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5169653Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5169892Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5170184Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5170419Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5170714Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5170946Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5171252Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5171486Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5171778Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5172010Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5172305Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5172560Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5172854Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5173088Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5173426Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5173642Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5173843Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5174078Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5174375Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5174608Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5174904Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5175136Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5175433Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5175682Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5175979Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5176216Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5176507Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5176706Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5176953Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5177246Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5177479Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5177786Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5178012Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5178214Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5178415Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5178615Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5178910Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5179125Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5179326Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5179524Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5179726Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5180032Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5180252Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5180453Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5180654Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5180849Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5181013Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5181208Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5181429Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5181633Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5181843Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5182073Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5182282Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5182477Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5182695Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5182905Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5183101Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5183356Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5183561Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5183758Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5183971Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5184187Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5184389Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5184585Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5184789Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5185086Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5185319Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5185521Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5185721Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5185927Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5186124Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5186351Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5186551Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5186751Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5186951Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5187246Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5187462Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5187665Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5187862Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5188082Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5188380Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5188593Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5188795Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5188992Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5189194Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5189499Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5189695Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5189898Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5190099Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5190295Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5190522Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5190727Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5190925Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5191114Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5191295Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5191467Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5191595Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5191698Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5191827Z E1204 11:25:45.090000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5191985Z [W1204 11:25:45.359216622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5191988Z 2025-12-04T11:45:25.5192144Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5192440Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5192736Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5192866Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5193385Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5193656Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5193882Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5194088Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5194306Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5194612Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5194848Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5195139Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5195372Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5195666Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5195898Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5196190Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5196421Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5196727Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5196947Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5197152Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5197347Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5197559Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5197760Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5198002Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5198292Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5198498Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5198733Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5199037Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5199258Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5199453Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5199671Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5199877Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5200074Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5200269Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5200487Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5200693Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5200900Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5201098Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5201331Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5201622Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5201858Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5202160Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5202380Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5202584Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5202790Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5203001Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5203216Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5203529Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5203826Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5204062Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5204356Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5204587Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5204879Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5205127Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5205420Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5205651Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5205944Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5206176Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5206468Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5206713Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5207002Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5207246Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5207536Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5207782Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5208074Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5208306Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5208600Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5208831Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5209123Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5209343Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5209554Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5209754Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5210044Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5210276Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5210568Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5210803Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5211106Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5211335Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5211624Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5211865Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5212168Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5212398Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5212690Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5212890Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5213088Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5213307Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5213513Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5213713Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5213959Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5214253Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5214450Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5214643Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5214839Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5215032Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5215279Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5215570Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5215801Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5216106Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5216315Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5216523Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5216724Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5216959Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5217253Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5217475Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5217680Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5217882Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5218094Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5218387Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5218622Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5218915Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5219150Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5219444Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5219687Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5219981Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5220230Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5220534Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5220731Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5220928Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5221149Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5221353Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5221555Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5221755Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5222048Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5222283Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5222591Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5222827Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5223119Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5223390Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5223685Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5223932Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5224223Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5224443Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5224662Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5224878Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5225072Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5225282Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5225482Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5225776Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5225998Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5226201Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5226398Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5226598Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5226907Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5227146Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5227438Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5227671Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5227966Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5228211Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5228504Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5228736Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5229043Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5229286Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5229584Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5229817Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5230110Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5230346Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5230638Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5230872Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5231165Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5231408Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5231707Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5231906Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5232106Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5232340Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5232646Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5232883Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5233175Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5233455Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5233763Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5233997Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5234291Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5234531Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5234826Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5235022Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5235257Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5235550Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5235797Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5236092Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5236309Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5236515Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5236717Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5236942Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5237237Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5237452Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5237667Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5237869Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5238087Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5238380Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5238602Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5238806Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5239009Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5239203Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5239352Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5239550Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5239770Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5239988Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5240185Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5240407Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5240613Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5240812Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5241035Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5241256Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5241450Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5241671Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5241892Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5242089Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5242300Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5242512Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5242714Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5242914Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5243117Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5243444Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5243657Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5243861Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5244073Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5244267Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5244462Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5244676Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5244878Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5245078Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5245295Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5245590Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5245806Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5246021Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5246225Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5246441Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5246735Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5246949Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5247151Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5247351Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5247552Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5247845Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5248043Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5248255Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5248447Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5248643Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5248855Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5249058Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5249259Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5249461Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5249643Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5249814Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5249940Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5250057Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5250187Z E1204 11:25:45.092000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5250347Z [W1204 11:25:45.361396901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5250362Z 2025-12-04T11:45:25.5250508Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5250801Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5251096Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5251229Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5251707Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5251960Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5252187Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5252409Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5252613Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5252903Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5253137Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5253470Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5253716Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5254006Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5254237Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5254542Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5254789Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5255088Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5255309Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5255515Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5255715Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5255924Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5256123Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5256354Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5256658Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5256856Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5257088Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5257382Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5257599Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5257796Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5258028Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5258232Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5258428Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5258634Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5258853Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5259069Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5259268Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5259464Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5259698Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5259991Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5260224Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5260514Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5260733Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5260951Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5261147Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5261354Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5261554Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5261785Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5262081Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5262325Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5262617Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5262847Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5263151Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5263430Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5263721Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5263955Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5264249Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5264486Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5264776Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5265008Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5265315Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5265547Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5265839Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5266070Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5266363Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5266598Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5266905Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5267137Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5267442Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5267661Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5267878Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5268077Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5268370Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5268603Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5268897Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5269129Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5269418Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5269650Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5269955Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5270189Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5270479Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5270712Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5271004Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5271213Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5271408Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5271604Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5271824Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5272032Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5272266Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5272558Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5272754Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5272952Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5273150Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5273373Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5273605Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5273897Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5274141Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5274434Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5274630Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5274838Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5275041Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5275275Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5275583Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5275804Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5276023Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5276223Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5276441Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5276737Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5276968Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5277262Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5277495Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5277787Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5278020Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5278333Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5278570Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5278863Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5279061Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5279260Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5279482Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5279696Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5279896Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5280098Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5280401Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5280651Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5280944Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5281181Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5281473Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5281707Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5282002Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5282233Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5282526Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5282758Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5282962Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5283161Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5283384Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5283596Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5283798Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5284109Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5284328Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5284529Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5284747Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5284951Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5285261Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5285493Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5285787Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5286021Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5286317Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5286550Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5286843Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5287093Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5287387Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5287621Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5287916Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5288154Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5288458Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5288690Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5288985Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5289231Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5289535Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5289768Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5290064Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5290264Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5290461Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5290697Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5290990Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5291223Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5291527Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5291762Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5292056Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5292287Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5292588Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5292838Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5293134Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5293345Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5293592Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5293968Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5294202Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5294497Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5294712Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5294918Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5295121Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5295322Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5295614Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5295843Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5296047Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5296247Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5296448Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5296741Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5296963Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5297182Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5297381Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5297571Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5297730Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5297928Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5298158Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5298365Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5298561Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5298780Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5298988Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5299185Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5299407Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5299614Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5299811Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5300044Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5300253Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5300453Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5300646Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5300862Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5301064Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5301287Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5301489Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5301785Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5302011Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5302215Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5302431Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5302622Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5302818Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5303031Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5303231Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5303466Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5303668Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5303964Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5304194Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5304399Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5304596Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5304797Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5305089Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5305304Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5305521Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5305718Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5305917Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5306226Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5306424Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5306640Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5306832Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5307029Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5307244Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5307451Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5307651Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5307841Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5308021Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5308194Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5308330Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5308434Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5308561Z E1204 11:25:45.094000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5308718Z [W1204 11:25:45.401427110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5308720Z 2025-12-04T11:45:25.5308866Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5309158Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5309454Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5309596Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5310075Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5310342Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5310579Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5310789Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5310989Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5311281Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5311515Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5311809Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5312043Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5312334Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5312580Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5312871Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5313104Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5313427Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5313652Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5313860Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5314071Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5314278Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5314475Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5314722Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5315026Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5315223Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5315457Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5315752Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5315976Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5316172Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5316391Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5316595Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5316812Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5317008Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5317226Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5317431Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5317625Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5317826Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5318057Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5318359Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5318590Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5318892Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5319113Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5319328Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5319522Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5319729Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5319928Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5320162Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5320455Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5320687Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5320977Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5321219Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5321511Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5321743Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5322034Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5322268Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5324511Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5324749Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5325043Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5325306Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5325612Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5325842Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5326132Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5326365Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5326654Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5326888Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5327176Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5327407Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5327713Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5327932Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5328132Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5328326Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5328618Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5328862Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5329154Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5329386Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5329688Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5329937Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5330226Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5330457Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5330748Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5330980Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5331270Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5331468Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5331667Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5331874Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5332084Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5332284Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5332515Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5332806Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5333002Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5333209Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5333431Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5333625Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5333870Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5334182Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5334414Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5334704Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5334901Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5335108Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5335309Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5335542Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5335834Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5336068Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5336271Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5336471Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5336671Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5336964Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5337197Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5337504Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5337737Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5338029Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5338274Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5338580Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5338813Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5339105Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5339305Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5339504Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5339724Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5339927Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5340124Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5340339Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5340632Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5340868Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5341161Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5341393Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5341687Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5341930Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5342222Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5342466Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5342768Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5342991Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5343193Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5343427Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5343621Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5343832Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5344032Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5344323Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5344544Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5344760Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5344962Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5345161Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5345456Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5345689Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5345983Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5346232Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5346523Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5346779Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5347086Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5347319Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5347610Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5347843Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5348137Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5348369Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5348661Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5348892Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5349196Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5349431Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5349722Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5349954Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5350249Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5350459Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5350656Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5350887Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5351195Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5351439Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5351732Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5351964Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5352258Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5352495Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5352790Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5353022Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5353348Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5353559Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5353793Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5354086Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5354318Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5354611Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5354841Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5355043Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5355242Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5355442Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5355749Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5355979Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5356179Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5356378Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5356578Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5356872Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5357092Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5357294Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5357492Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5357697Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5357848Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5358046Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5358266Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5358471Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5358671Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5358890Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5359111Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5359308Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5359529Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5359748Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5359955Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5360177Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5360381Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5360578Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5360775Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5360987Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5361189Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5361386Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5361586Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5361892Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5362106Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5362307Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5362505Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5362698Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5362893Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5363120Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5363346Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5363543Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5363756Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5364052Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5364283Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5364485Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5364683Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5364883Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5365176Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5365390Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5365590Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5365787Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5366000Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5366296Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5366493Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5366693Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5366882Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5367078Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5367312Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5367517Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5367715Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5367915Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5368096Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5368276Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5368405Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5368508Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5368636Z E1204 11:25:45.134000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5368791Z [W1204 11:25:45.403611959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5368795Z 2025-12-04T11:45:25.5368941Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5369239Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5369536Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5369666Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5370158Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5370415Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5370640Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5370847Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5371046Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5371339Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5371588Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5371879Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5372124Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5372414Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5372657Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5372946Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5373177Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5373504Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5373724Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5373929Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5374123Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5374330Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5374542Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5374773Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5375063Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5375257Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5375491Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5375798Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5376016Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5376211Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5376441Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5376646Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5376854Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5377048Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5377264Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5377468Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5377666Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5377863Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5378092Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5378382Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5378627Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5378917Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5379135Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5379339Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5379533Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5379739Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5379949Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5380181Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5380470Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5380714Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5381018Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5381249Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5381538Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5381769Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5382059Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5382290Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5382581Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5382813Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5383119Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5383381Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5383670Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5383900Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5384191Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5384438Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5384730Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5384961Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5385265Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5385510Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5385800Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5386019Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5386220Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5386418Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5386709Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5386942Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5387233Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5387478Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5387769Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5387997Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5388287Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5388519Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5388822Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5389050Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5389340Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5389552Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5389759Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5389956Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5390161Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5390359Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5390590Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5390881Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5391077Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5391270Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5391464Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5391670Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5391904Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5392193Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5392423Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5392715Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5392922Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5393128Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5393345Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5393578Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5393887Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5394123Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5394326Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5394523Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5394724Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5395018Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5395253Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5395543Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5395775Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5396083Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5396316Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5396610Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5396840Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5397135Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5397346Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5397542Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5397762Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5397976Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5398177Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5398393Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5398687Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5398921Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5399214Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5399448Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5399738Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5399970Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5400273Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5400506Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5400796Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5401017Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5401220Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5401418Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5401623Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5401833Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5402031Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5402337Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5402569Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5402773Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5402970Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5403170Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5403498Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5403735Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5404026Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5404258Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5404563Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5404797Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5405094Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5405325Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5405619Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5405856Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5406166Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5406397Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5406702Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5406935Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5407240Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5407474Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5407765Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5407998Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5408293Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5408491Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5408687Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5408929Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5409223Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5409455Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5409745Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5409978Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5410271Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5410516Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5410809Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5411052Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5411356Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5411553Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5411785Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5412076Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5412311Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5412606Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5412819Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5413022Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5413220Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5413559Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5413852Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5414067Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5414269Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5414468Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5414683Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5414973Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5415194Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5415418Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5415617Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5415822Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5415972Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5416167Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5416386Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5416592Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5416788Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5417008Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5417212Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5417408Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5417642Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5417850Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5418048Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5418267Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5418474Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5418670Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5418876Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5419088Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5419291Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5419502Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5419703Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5420010Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5420222Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5420423Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5420623Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5420816Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5421015Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5421226Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5421426Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5421634Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5421838Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5422134Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5422346Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5422546Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5422747Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5422961Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5423287Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5423499Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5423714Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5423912Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5424125Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5424421Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5424617Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5424819Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5425009Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5425205Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5425419Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5425623Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5425821Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5426025Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5426208Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5426378Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5426505Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5426611Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5426736Z E1204 11:25:45.137000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5426894Z [W1204 11:25:45.405704549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5426910Z 2025-12-04T11:45:25.5427054Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5427347Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5427642Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5427785Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5428282Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5428535Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5428761Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5428969Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5429171Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5429462Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5429696Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5429999Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5430231Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5430523Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5430753Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5431043Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5431277Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5431588Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5431807Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5432011Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5432220Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5432441Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5432639Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5432872Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5433163Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5433385Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5433619Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5433912Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5434129Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5434341Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5434561Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5434764Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5434959Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5435153Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5435372Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5435591Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5435787Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5435981Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5436213Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5436522Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5436766Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5437056Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5437273Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5437481Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5437676Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5437884Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5438083Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5438313Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5438617Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5438848Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5439137Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5439367Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5439658Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5439901Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5440191Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5440422Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5440724Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5440966Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5441256Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5441485Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5441776Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5442006Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5442298Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5442527Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5442821Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5443066Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5443386Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5443616Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5443904Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5444123Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5444338Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5444533Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5444822Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5445065Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5445372Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5445603Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5445892Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5446124Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5446415Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5446647Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5446935Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5447166Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5447477Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5447678Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5447872Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5448066Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5448273Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5448470Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5448715Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5449004Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5449198Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5449403Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5449609Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5449806Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5450037Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5450326Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5450558Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5450848Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5451043Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5451249Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5451452Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5451695Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5451989Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5452211Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5452413Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5452612Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5452825Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5453117Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5453375Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5453685Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5453930Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5454224Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5454457Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5454751Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5454984Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5455278Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5455477Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5455672Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5455906Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5456110Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5456307Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5456507Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5456800Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5457036Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5457341Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5457574Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5457866Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5458110Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5458414Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5458645Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5458938Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5459161Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5459365Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5459562Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5459753Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5459963Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5460175Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5460469Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5460688Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5460889Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5461088Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5461287Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5461594Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5461825Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5462116Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5462361Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5462672Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5462905Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5463195Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5463450Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5463745Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5463979Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5464272Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5464524Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5464820Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5465053Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5465346Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5465577Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5465870Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5466117Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5466411Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5466624Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5466821Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5467069Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5467360Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5467594Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5467887Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5468119Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5468415Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5468648Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5468953Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5469187Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5469481Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5469677Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5469909Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5470203Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5470446Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5470739Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5470964Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5471169Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5471381Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5471581Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5471873Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5472086Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5472287Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5472486Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5472686Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5472981Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5473215Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5473444Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5473643Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5473834Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5473980Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5474178Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5474400Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5474624Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5474820Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5475039Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5475258Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5475466Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5475691Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5475895Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5476089Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5476310Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5476514Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5476711Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5476905Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5477116Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5477331Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5477532Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5477733Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5478027Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5478239Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5478441Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5478656Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5478846Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5479042Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5479273Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5479473Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5479688Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5479889Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5480188Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5480400Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5480602Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5480802Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5481001Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5481295Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5481520Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5481724Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5481920Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5482119Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5482414Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5482612Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5482827Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5483018Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5483215Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5483470Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5483675Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5483886Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5484076Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5484256Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5484430Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5484557Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5484659Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5484788Z E1204 11:25:45.139000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5484841Z ('RERUN', {'yellow': True}) [1.5744s] [100%] 2025-12-04T11:45:25.5485196Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:46.702184385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5485198Z 2025-12-04T11:45:25.5485343Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5485649Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5485945Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5486076Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5486554Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5486809Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5487052Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5487257Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5487457Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5487764Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5488013Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5488308Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5488543Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5488838Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5489069Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5489362Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5489593Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5489883Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5490116Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5490323Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5490519Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5490725Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5490924Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5491154Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5491460Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5491656Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5491886Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5492190Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5492421Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5492616Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5492833Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5493040Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5493235Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5493461Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5493680Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5493884Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5494082Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5494288Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5494523Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5494817Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5495047Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5495339Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5495578Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5495783Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5495976Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5496201Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5496403Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5496652Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5496947Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5497177Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5497469Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5497700Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5497988Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5498220Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5498526Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5498761Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5499050Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5499280Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5499570Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5499801Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5500102Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5500331Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5500637Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5500881Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5501173Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5501403Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5501692Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5501925Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5502215Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5502434Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5502632Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5502839Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5503133Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5503397Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5503688Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5503920Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5504212Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5504462Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5504753Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5504999Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5505302Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5505535Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5505826Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5506023Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5506218Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5506415Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5506622Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5506820Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5507050Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5507354Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5507553Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5507747Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5507944Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5508139Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5508371Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5508672Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5508903Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5509192Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5509399Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5509620Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5509821Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5510054Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5510351Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5510571Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5510773Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5510971Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5511172Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5511484Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5511717Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5512009Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5512240Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5512537Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5512769Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5513072Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5513338Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5513644Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5513842Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5514055Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5514275Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5514476Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5514676Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5514879Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5515172Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5515404Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5515695Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5515943Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5516235Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5516468Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5516760Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5516993Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5517306Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5517526Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5517728Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5517936Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5518129Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5518351Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5518551Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5518844Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5519065Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5519268Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5519466Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5519667Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5519957Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5520203Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5520499Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5520731Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5521022Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5521255Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5521562Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5521796Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5522092Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5522340Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5522641Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5522874Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5523167Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5523435Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5523729Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5523961Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5524255Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5524488Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5524796Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5524996Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5525190Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5525424Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5525716Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5525966Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5526258Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5526491Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5526804Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5527059Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5527352Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5527584Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5527879Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5528078Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5528310Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5528603Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5528835Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5529143Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5529357Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5529559Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5529758Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5529961Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5530257Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5530483Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5530684Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5530897Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5531100Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5531404Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5531626Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5531828Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5532027Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5532219Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5532367Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5532564Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5532783Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5532991Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5533197Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5533451Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5533662Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5533856Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5534076Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5534283Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5534492Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5534714Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5534920Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5535132Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5535326Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5535557Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5535757Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5535958Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5536158Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5536454Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5536668Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5536867Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5537064Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5537270Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5537467Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5537679Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5537881Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5538080Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5538284Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5538594Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5538804Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5539008Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5539217Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5539417Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5539721Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5539933Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5540134Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5540333Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5540539Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5540832Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5541028Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5541228Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5541430Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5541628Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5541840Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5542045Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5542240Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5542430Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5542611Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5542793Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5542917Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5543020Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5543145Z E1204 11:25:46.435000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5543361Z [W1204 11:25:46.704487382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5543363Z 2025-12-04T11:45:25.5543514Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5543821Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5544118Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5544247Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5544730Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5544989Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5545212Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5545420Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5545633Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5545925Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5546158Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5546452Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5546686Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5546991Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5547223Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5547515Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5547764Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5548064Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5548284Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5548491Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5548686Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5548894Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5549093Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5549325Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5549617Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5549816Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5550058Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5550348Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5550568Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5550761Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5550981Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5551197Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5551392Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5551587Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5551805Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5552022Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5552230Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5552427Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5552658Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5552951Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5553184Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5553503Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5553723Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5553926Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5554147Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5554359Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5554558Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5554787Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5555082Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5555316Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5555620Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5555850Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5556139Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5556388Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5556696Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5556927Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5557217Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5557447Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5557744Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5557973Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5558264Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5558508Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5558798Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5559033Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5559322Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5559554Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5559845Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5560096Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5560386Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5560616Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5560818Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5561027Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5561319Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5561551Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5561842Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5562075Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5562363Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5562592Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5562894Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5563127Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5563453Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5563683Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5563977Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5564173Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5564396Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5564591Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5564800Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5565018Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5565262Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5565554Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5565749Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5565946Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5566148Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5566346Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5566576Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5566867Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5567100Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5567404Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5567601Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5567807Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5568011Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5568249Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5568545Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5568777Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5568978Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5569190Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5569389Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5569696Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5569928Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5570221Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5570455Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5570750Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5570983Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5571274Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5571521Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5571813Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5572010Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5572208Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5572428Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5572629Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5572841Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5573044Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5573367Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5573617Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5573925Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5574159Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5574452Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5574686Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5574981Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5575214Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5575508Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5575730Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5575957Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5576159Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5576351Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5576561Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5576765Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5577058Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5577295Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5577496Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5577695Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5577906Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5578216Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5578449Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5578740Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5578973Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5579265Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5579500Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5579792Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5580032Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5580335Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5580569Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5580860Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5581092Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5581388Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5581635Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5581928Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5582160Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5582469Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5582714Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5583005Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5583203Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5583437Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5583669Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5583964Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5585461Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5585758Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5586014Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5586312Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5586545Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5586855Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5587090Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5587402Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5587600Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5587835Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5588153Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5588387Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5588683Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5588898Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5589102Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5589303Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5589505Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5589801Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5590061Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5590277Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5590476Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5590681Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5590977Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5591197Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5591402Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5591612Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5591804Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5591953Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5592153Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5592392Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5592599Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5592796Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5593021Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5593234Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5593466Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5593691Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5593896Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5594116Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5594337Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5594559Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5594758Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5594953Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5595168Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5595372Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5595574Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5595792Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5596084Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5596298Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5596514Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5596715Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5596908Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5597105Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5597317Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5597519Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5597721Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5597922Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5598218Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5598443Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5598659Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5598858Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5599063Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5599356Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5599568Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5599773Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5599983Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5600185Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5600479Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5600692Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5600893Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5601085Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5601281Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5601494Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5601700Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5601898Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5602092Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5602274Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5602458Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5602587Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5602689Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5602827Z E1204 11:25:46.437000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5602985Z [W1204 11:25:46.706603511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5602988Z 2025-12-04T11:45:25.5603135Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5603459Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5603755Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5603885Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5604386Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5604644Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5604885Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5605093Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5605292Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5605585Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5605820Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5606115Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5606349Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5606639Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5606888Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5607192Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5607427Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5607718Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5607938Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5608147Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5608346Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5608573Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5608771Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5609003Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5609310Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5609506Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5609736Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5610027Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5610247Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5610442Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5610664Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5610868Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5611076Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5611271Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5611502Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5611709Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5611903Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5612099Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5612331Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5612624Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5612866Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5613157Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5613427Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5613632Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5613832Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5614039Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5614238Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5614471Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5614764Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5614996Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5615304Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5615538Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5615841Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5616073Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5616367Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5616598Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5616890Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5617137Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5617430Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5617664Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5617968Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5618199Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5618491Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5618723Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5619014Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5619245Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5619535Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5622940Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5623291Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5623510Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5623711Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5623905Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5624201Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5624433Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5624752Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5624984Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5625276Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5625526Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5625818Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5626049Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5626342Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5626575Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5626867Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5627064Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5627281Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5627476Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5627707Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5627910Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5628140Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5628432Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5628627Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5628826Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5629032Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5629227Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5629461Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5629761Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5629993Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5630282Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5630479Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5630686Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5630891Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5631129Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5631424Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5631664Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5631874Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5632074Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5632275Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5632569Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5632802Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5633095Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5633377Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5633674Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5633913Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5634222Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5634457Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5634749Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5634947Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5635144Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5635365Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5635569Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5635768Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5635992Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5638020Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5638261Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5638556Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5638790Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5639086Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5639320Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5639638Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5639874Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5640178Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5640399Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5640604Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5640804Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5641000Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5641212Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5641412Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5641705Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5641940Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5642143Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5642354Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5642556Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5642849Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5643082Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5643402Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5643636Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5643945Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5644180Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5644487Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5644722Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5645016Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5645249Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5645543Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5645776Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5646069Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5646303Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5646615Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5646862Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5647157Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5647392Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5647686Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5647886Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5648094Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5648326Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5648624Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5648864Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5649160Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5649395Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5649690Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5649925Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5650218Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5650452Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5650760Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5650960Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5651203Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5651496Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5651732Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5652027Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5652244Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5652457Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5652657Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5652858Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5653151Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5653420Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5653622Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5653820Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5654021Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5654319Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5654542Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5654746Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5654944Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5655163Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5655314Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5655520Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5655746Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5655951Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5656148Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5656370Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5656578Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5656788Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5657010Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5657215Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5657423Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5657645Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5657853Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5658049Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5658245Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5658460Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5658664Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5658863Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5659063Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5659367Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5659592Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5659797Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5659994Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5660186Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5660383Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5660598Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5660808Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5661009Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5661212Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5661504Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5661735Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5661936Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5662136Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5662336Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5662632Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5662846Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5663049Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5663291Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5663520Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5663830Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5664025Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5664227Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5664420Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5664617Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5664833Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5665049Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5665248Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5665438Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5665622Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5665805Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5665934Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5666040Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5666169Z E1204 11:25:46.440000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5666327Z [W1204 11:25:46.744184417 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5666329Z 2025-12-04T11:45:25.5666474Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5666771Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5667070Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5667204Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5667697Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5667965Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5668311Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5668518Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5668718Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5669012Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5669247Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5669550Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5669785Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5670091Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5670325Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5670620Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5670854Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5671148Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5671371Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5671577Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5671773Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5671990Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5672191Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5672434Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5672729Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5672925Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5673156Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5673495Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5673731Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5673926Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5674145Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5674364Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5674559Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5674754Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5674974Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5675180Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5675377Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5675572Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5675804Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5676095Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5676341Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5676648Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5676869Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5677076Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5677273Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5677481Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5677681Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5677929Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5678222Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5678454Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5678762Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5678992Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5679285Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5679515Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5679810Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5680043Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5680334Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5680575Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5680869Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5681109Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5681401Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5681633Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5681924Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5682159Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5682465Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5682697Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5682992Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5683236Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5683562Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5683783Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5683984Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5684182Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5684477Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5684712Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5685019Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5685252Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5685556Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5685788Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5686080Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5686311Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5686604Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5686852Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5687143Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5687341Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5687548Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5687749Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5687956Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5688156Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5688389Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5688680Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5688879Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5689075Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5689284Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5689479Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5689722Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5690016Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5690247Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5690540Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5690738Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5690947Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5691158Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5691396Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5691691Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5691924Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5692127Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5692328Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5692529Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5692824Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5693059Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5693417Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5693662Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5693958Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5694209Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5694504Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5694737Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5695031Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5695231Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5695443Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5695666Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5695867Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5696082Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5696286Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5696583Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5696819Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5697115Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5697350Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5697644Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5697879Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5698182Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5698429Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5698725Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5698945Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5699148Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5699347Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5699542Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5699763Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5699965Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5700261Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5700494Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5700698Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5700898Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5701099Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5701469Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5701705Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5701999Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5702231Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5702540Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5702782Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5703076Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5703361Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5703659Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5703893Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5704184Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5704434Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5704725Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5704979Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5705273Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5705508Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5705805Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5706039Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5706332Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5706529Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5706726Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5706975Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5707282Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5707518Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5707811Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5708048Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5708344Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5708578Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5708885Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5709119Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5709422Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5709619Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5709853Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5710146Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5710378Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5710676Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5710893Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5711096Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5711316Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5711523Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5711827Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5712041Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5712243Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5712443Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5712647Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5712940Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5713174Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5713415Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5713628Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5713821Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5713969Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5714167Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5714388Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5714594Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5714792Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5715016Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5715227Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5715433Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5715654Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5715876Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5716073Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5716295Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5716503Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5716701Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5716899Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5717128Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5717330Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5717531Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5717743Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5718039Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5718252Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5718456Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5718658Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5718849Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5719048Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5719260Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5719464Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5719675Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5719879Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5720183Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5720400Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5720604Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5720802Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5721003Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5721307Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5721521Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5721723Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5721933Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5722135Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5722428Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5722627Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5722829Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5723020Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5723216Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5723461Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5723668Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5723877Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5724071Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5724267Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5724442Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5724570Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5724673Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5724800Z E1204 11:25:46.477000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5724960Z [W1204 11:25:46.746289766 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5724963Z 2025-12-04T11:45:25.5725110Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5725418Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5725715Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5725845Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5726347Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5726606Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5726834Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5727044Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5727244Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5727537Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5727770Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5728071Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5728313Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5728607Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5728841Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5729133Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5729366Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5729657Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5729888Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5730092Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5730291Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5730509Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5730709Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5730941Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5731236Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5731435Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5731666Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5731963Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5732184Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5732391Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5732622Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5732827Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5733024Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5733219Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5733472Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5733683Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5733881Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5734093Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5734324Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5734617Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5734861Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5735153Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5735374Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5735580Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5735778Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5735987Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5736186Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5736416Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5736722Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5736967Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5737262Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5737494Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5737784Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5738019Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5738321Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5738552Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5738842Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5739084Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5739374Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5739605Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5739896Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5740126Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5740421Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5740655Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5740954Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5741186Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5741485Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5741718Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5742008Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5742227Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5742429Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5742639Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5742936Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5743168Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5743508Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5743738Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5744029Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5744262Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5744555Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5744787Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5745077Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5745324Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5745615Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5745824Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5746021Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5746217Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5746424Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5746623Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5746857Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5747160Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5747355Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5747553Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5747761Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5747956Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5748186Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5748478Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5748708Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5749000Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5749197Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5749403Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5749615Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5749851Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5750154Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5750375Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5750576Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5750776Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5750978Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5751283Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5751516Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5751809Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5752056Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5752352Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5752587Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5752879Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5753113Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5753432Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5753632Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5753829Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5754064Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5754279Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5754479Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5754682Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5754973Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5755207Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5755499Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5755746Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5756038Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5756270Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5756577Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5756811Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5757106Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5757330Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5757533Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5757734Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5757926Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5758137Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5758348Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5758660Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5758882Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5759084Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5759285Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5759485Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5759779Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5760022Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5760314Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5760549Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5760851Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5761086Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5761377Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5761615Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5761909Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5762143Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5762438Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5762682Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5762987Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5763219Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5763544Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5763777Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5764077Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5764311Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5764616Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5764816Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5765026Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5765262Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5765557Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5765791Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5766086Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5766322Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5766618Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5766850Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5767158Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5767406Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5767701Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5767899Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5768131Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5768426Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5768661Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5768967Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5769186Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5769399Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5769598Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5769800Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5770095Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5770310Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5770511Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5770711Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5770915Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5771212Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5771444Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5771660Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5771858Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5772053Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5772202Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5772399Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5772621Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5772828Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5773036Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5773279Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5773488Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5773699Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5773921Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5774128Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5774321Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5774543Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5774750Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5774948Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5775145Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5775362Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5775581Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5775799Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5776001Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5776295Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5776510Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5776711Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5776911Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5777115Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5777314Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5777530Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5777743Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5777946Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5778148Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5778443Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5778655Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5778857Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5779056Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5779255Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5779549Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5779770Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5779984Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5780183Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5780384Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5780678Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5780874Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5781078Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5781280Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5781477Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5781690Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5781907Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5782104Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5782299Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5782483Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5782654Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5782783Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5782885Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5783013Z E1204 11:25:46.479000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5783171Z [W1204 11:25:46.748370846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5783174Z 2025-12-04T11:45:25.5783382Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.5783674Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5783989Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5784136Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5784622Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5784879Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5785105Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5785313Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5785525Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5785818Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5786055Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5786361Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5786598Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5786891Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5787125Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5787418Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5787651Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5787941Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5788169Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5788377Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5788582Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5788793Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5788990Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5789226Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5789522Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5789717Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5789960Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5790251Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5790489Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5790686Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5790906Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5791112Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5791307Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5791504Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5791726Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5791934Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5792130Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5792338Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5792571Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5792871Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5793104Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5793439Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5793661Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5793865Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5794077Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5794286Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5794485Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5794729Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5795020Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5795253Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5795543Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5795775Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5796068Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5796300Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5796592Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5796838Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5797142Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5797373Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5797666Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5797899Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5798192Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5798432Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5798724Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5798956Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5799257Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5799487Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5799779Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5800008Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5800299Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5800516Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5800719Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5800918Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.5801224Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5801470Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5801760Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5801991Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5802281Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5802516Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5802805Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5803044Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5803365Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5803609Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5803901Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5804097Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5804292Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5804488Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5804696Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5804894Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5805125Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5805427Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5805623Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5805832Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5806030Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5806223Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5806457Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5806748Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5806981Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5807289Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5807485Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5807695Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5807909Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5808151Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5808445Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5808668Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5808869Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5809070Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5809273Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5809565Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5809811Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5810113Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5810350Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5810644Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5810879Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5811173Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5811417Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5811712Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5811910Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5812117Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5812338Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5812542Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5812743Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5812943Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5813239Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5813492Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5813787Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5814035Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5814328Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5814575Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5814869Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5815104Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5815397Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5815620Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5815834Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5816032Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5816226Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5816449Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.5816652Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5816945Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5817166Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5817371Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5817573Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5817772Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5818063Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5818307Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5818600Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5818843Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5819136Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5819371Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5819664Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5819900Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5820204Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5820436Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5820730Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5820976Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5821268Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5821501Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5821792Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5822028Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5822322Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5822555Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5822859Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5823074Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5823295Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5823528Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5823822Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5824053Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5824346Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5824595Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5824885Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5825119Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5825425Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5825660Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5825954Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5826151Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5826388Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5826682Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5826916Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5827221Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5827440Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5827655Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5827854Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5828056Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5828347Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5828563Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5828776Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5828974Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5829175Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5829472Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5829706Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5829907Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5830107Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5830299Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5830447Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.5830642Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5830862Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5831069Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5831274Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5831499Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5831714Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5831912Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5832133Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5832339Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5832533Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5832755Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.5832971Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.5833168Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.5833385Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5833613Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5833816Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5834014Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5834213Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5834508Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5834721Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5834922Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5835120Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5835311Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.5835520Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5835751Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5835953Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5836152Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5836352Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5836646Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5836860Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5837073Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5837272Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5837472Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5837776Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5837992Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.5838195Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.5838398Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.5838597Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5838892Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5839087Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.5839289Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.5839480Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.5839695Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.5839925Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.5840128Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.5840326Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.5840515Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.5840697Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.5840867Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.5840995Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.5841109Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.5841234Z E1204 11:25:46.481000 956654 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.5841276Z FAILED [1.3543s] [100%] 2025-12-04T11:45:25.5841278Z 2025-12-04T11:45:25.5841336Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.5841498Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.5841557Z Traceback (most recent call last): 2025-12-04T11:45:25.5841723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5841765Z method(*args, **kwargs) 2025-12-04T11:45:25.5841921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5841961Z method(*args, **kwargs) 2025-12-04T11:45:25.5842113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.5842150Z with policy(): 2025-12-04T11:45:25.5842306Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.5842346Z raise RuntimeError(msg) 2025-12-04T11:45:25.5842765Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.5842769Z 2025-12-04T11:45:25.5842848Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.5843129Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5843131Z 2025-12-04T11:45:25.5843222Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.5843329Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5843376Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5843449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5844018Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5844121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5844160Z graph_break [] 2025-12-04T11:45:25.5844225Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5844302Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5844788Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.5844839Z current_size = base.storage().size() 2025-12-04T11:45:25.5844882Z Autotune Choices Stats: 2025-12-04T11:45:25.5845263Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:25.5845341Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5845390Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5845491Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5845731Z triton_mm_34 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5845978Z triton_mm_33 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5846205Z triton_mm_16 0.0106 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5846431Z triton_mm_29 0.0106 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5846658Z triton_mm_22 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5846883Z triton_mm_30 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5847111Z triton_mm_21 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5847348Z triton_mm_23 0.0119 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5847577Z triton_mm_15 0.0121 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5847813Z triton_mm_31 0.0122 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5847948Z SingleProcess AUTOTUNE benchmarking takes 0.1548 seconds and 1.1513 seconds precompiling for 33 choices 2025-12-04T11:45:25.5848107Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.5848155Z Traceback (most recent call last): 2025-12-04T11:45:25.5848313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5848356Z method(*args, **kwargs) 2025-12-04T11:45:25.5848509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5848551Z method(*args, **kwargs) 2025-12-04T11:45:25.5848704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.5848752Z with policy(): 2025-12-04T11:45:25.5848906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.5848947Z raise RuntimeError(msg) 2025-12-04T11:45:25.5849365Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.5849378Z 2025-12-04T11:45:25.5849457Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.5849734Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5849737Z 2025-12-04T11:45:25.5849825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.5849902Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5849945Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5850005Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5850556Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5850662Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5850700Z graph_break [] 2025-12-04T11:45:25.5850766Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5850844Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5851340Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.5851391Z current_size = base.storage().size() 2025-12-04T11:45:25.5851433Z Autotune Choices Stats: 2025-12-04T11:45:25.5851818Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:25.5851882Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5851934Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5852034Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5852272Z triton_mm_34 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5852505Z triton_mm_33 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5852730Z triton_mm_16 0.0106 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5852966Z triton_mm_29 0.0106 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5853191Z triton_mm_22 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5853468Z triton_mm_30 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5853694Z triton_mm_21 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5853925Z triton_mm_23 0.0119 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5854154Z triton_mm_15 0.0121 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5854381Z triton_mm_31 0.0122 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5854514Z SingleProcess AUTOTUNE benchmarking takes 0.1548 seconds and 1.1513 seconds precompiling for 33 choices 2025-12-04T11:45:25.5854590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5854635Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5854692Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5854795Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5855299Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5855339Z graph_break [] 2025-12-04T11:45:25.5855415Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5855493Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5855536Z Autotune Choices Stats: 2025-12-04T11:45:25.5855903Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:25.5855967Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5856015Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5856116Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5856351Z triton_mm_72 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5856597Z triton_mm_71 0.0092 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5856820Z triton_mm_67 0.0106 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5857049Z triton_mm_54 0.0109 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5857288Z triton_mm_60 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5857512Z triton_mm_68 0.0112 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5857738Z triton_mm_59 0.0113 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5857964Z triton_mm_61 0.0115 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5858194Z triton_mm_53 0.0121 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5858419Z triton_mm_69 0.0123 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5858552Z SingleProcess AUTOTUNE benchmarking takes 0.2372 seconds and 0.7565 seconds precompiling for 39 choices 2025-12-04T11:45:25.5858607Z =================================== FAILURES =================================== 2025-12-04T11:45:25.5858775Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.5858825Z Traceback (most recent call last): 2025-12-04T11:45:25.5858984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5859028Z method(*args, **kwargs) 2025-12-04T11:45:25.5859194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.5859237Z method(*args, **kwargs) 2025-12-04T11:45:25.5859388Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.5859427Z with policy(): 2025-12-04T11:45:25.5859579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.5859624Z raise RuntimeError(msg) 2025-12-04T11:45:25.5860036Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.5860039Z 2025-12-04T11:45:25.5860117Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.5860403Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5860405Z 2025-12-04T11:45:25.5860495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.5860569Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5860615Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5860673Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5861236Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5861338Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5861374Z graph_break [] 2025-12-04T11:45:25.5861439Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5861512Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5862004Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.5862054Z current_size = base.storage().size() 2025-12-04T11:45:25.5862097Z Autotune Choices Stats: 2025-12-04T11:45:25.5862469Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:25.5862532Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5862580Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5862689Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5862928Z triton_mm_34 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5863168Z triton_mm_33 0.0092 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5863430Z triton_mm_16 0.0106 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5863653Z triton_mm_29 0.0106 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5863879Z triton_mm_22 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5864103Z triton_mm_30 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5864341Z triton_mm_21 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5864569Z triton_mm_23 0.0119 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5864809Z triton_mm_15 0.0121 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5865037Z triton_mm_31 0.0122 ms 70.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5865168Z SingleProcess AUTOTUNE benchmarking takes 0.1548 seconds and 1.1513 seconds precompiling for 33 choices 2025-12-04T11:45:25.5865242Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5865283Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5865340Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5865440Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5865933Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5865972Z graph_break [] 2025-12-04T11:45:25.5866036Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5866111Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5866150Z Autotune Choices Stats: 2025-12-04T11:45:25.5866540Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:25.5866601Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5866650Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5866762Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5866998Z triton_mm_72 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5867223Z triton_mm_71 0.0092 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5867448Z triton_mm_67 0.0106 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5867674Z triton_mm_54 0.0109 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5867916Z triton_mm_60 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5868141Z triton_mm_68 0.0112 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5868367Z triton_mm_59 0.0113 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5868605Z triton_mm_61 0.0115 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5868832Z triton_mm_53 0.0121 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5869059Z triton_mm_69 0.0123 ms 72.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5869193Z SingleProcess AUTOTUNE benchmarking takes 0.2372 seconds and 0.7565 seconds precompiling for 39 choices 2025-12-04T11:45:25.5869266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.5869310Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.5869366Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.5869467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.5869952Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.5869995Z graph_break [] 2025-12-04T11:45:25.5870060Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.5870148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.5870190Z Autotune Choices Stats: 2025-12-04T11:45:25.5870567Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00860000029206276, "best_triton_pos": 0} 2025-12-04T11:45:25.5870629Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.5870680Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.5870777Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.5871012Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5871243Z triton_mm_109 0.0091 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5871468Z triton_mm_105 0.0104 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5871705Z triton_mm_106 0.0107 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5871931Z triton_mm_92 0.0109 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5872171Z triton_mm_97 0.0110 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5872397Z triton_mm_98 0.0110 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.5872625Z triton_mm_99 0.0114 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5872854Z triton_mm_91 0.0115 ms 74.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5873084Z triton_mm_107 0.0122 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.5873215Z SingleProcess AUTOTUNE benchmarking takes 0.2333 seconds and 0.5815 seconds precompiling for 39 choices 2025-12-04T11:45:25.5873451Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ed4d63a860bf7891.xml - 2025-12-04T11:45:25.5873515Z =========================== short test summary info ============================ 2025-12-04T11:45:25.5874158Z FAILED [1.3543s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.5874164Z 2025-12-04T11:45:25.5874251Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.5874529Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5874533Z 2025-12-04T11:45:25.5874621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.5874687Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.5874756Z ================== 1 failed, 187 deselected, 2 rerun in 6.36s ================== 2025-12-04T11:45:25.5874796Z Got exit code 1 2025-12-04T11:45:25.5874838Z Retrying single test... 2025-12-04T11:45:25.5874989Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8b5bdf466bdf190.xml 2025-12-04T11:45:25.5875049Z ============================= test session starts ============================== 2025-12-04T11:45:25.5875165Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.5875218Z cachedir: .pytest_cache 2025-12-04T11:45:25.5875379Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.5875427Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.5875471Z configfile: pytest.ini 2025-12-04T11:45:25.5875636Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.5875714Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.5875987Z stepcurrent: skipping 108 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.5876048Z Running 1 items in this shard 2025-12-04T11:45:25.5876050Z 2025-12-04T11:45:25.5876406Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:56.351007993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5876409Z 2025-12-04T11:45:25.5876564Z [W1204 11:25:56.701686464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5876566Z 2025-12-04T11:45:25.5876720Z [W1204 11:25:56.710820922 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5876723Z 2025-12-04T11:45:25.5877041Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5877344Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5877482Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5877978Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5878244Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5878472Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5878682Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5878884Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5879183Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5879423Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5879725Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5879962Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5880254Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5880501Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5880794Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5881025Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5881319Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5881554Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5881847Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5882077Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5882377Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5882592Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5882823Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5883115Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5883343Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5883577Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5883872Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5884119Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5884409Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5884630Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5884849Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5885051Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5885263Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5885432Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5885614Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5886143Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/ol/collzhpgreu4voltxf626zwwt4jfw35jwr4zflvckeinb46jgauo.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5886294Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5886513Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5886682Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5886831Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5887132Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5887269Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5887529Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5887671Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5887928Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5888084Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5888366Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5888500Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5888777Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5888981Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5889300Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5889595Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5889725Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5890207Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5890460Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5890689Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5890907Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5891109Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5891414Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5891648Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5891940Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5892171Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5892466Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5892709Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5893000Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5893237Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5893576Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5893810Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5894101Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5894335Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5894629Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5894826Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5895057Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5895361Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5895560Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5895804Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5896096Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5896331Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5896621Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5896845Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5897064Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5897267Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5897479Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5897661Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5897842Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5897947Z E1204 11:25:56.410000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5898105Z [W1204 11:25:56.714821524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5898108Z 2025-12-04T11:45:25.5898261Z [W1204 11:25:56.723254232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5898263Z 2025-12-04T11:45:25.5898575Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5898871Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5899005Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5899500Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5899753Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5899990Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5900198Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5900402Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5900695Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5900932Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5901237Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5901470Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5901763Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5902007Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5902299Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5902534Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5902825Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5903059Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5903379Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5903612Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5903916Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5904117Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5904363Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5904654Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5904851Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5905084Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5905377Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5905620Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5905912Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5906133Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5906351Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5906553Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5906763Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5906932Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5907111Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5907645Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/us/cusvykiwim4awxmesmmtokb7uqkrvkg6j4sttbmvico7lgyjh6ag.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5907793Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5908009Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5908165Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5908320Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5908618Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5908751Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5909010Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5909148Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5909403Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5909565Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5909837Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5909990Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5910266Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5910461Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5910784Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5911077Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5911207Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5911685Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5911943Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5912169Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5912376Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5912589Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5912892Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5913129Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5913443Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5913679Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5913972Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5914222Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5914515Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5914747Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5915057Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5915289Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5915582Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5915815Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5916107Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5916306Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5916540Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5916836Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5917046Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5917291Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5917583Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5917815Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5918108Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5918330Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5918535Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5918746Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5918962Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5919130Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5919317Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5919423Z E1204 11:25:56.447000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5919578Z [W1204 11:25:56.726779981 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.5919580Z 2025-12-04T11:45:25.5919889Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5920180Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5920313Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5920788Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5921042Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5921281Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5921497Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5921698Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5921990Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5922229Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5922529Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5922761Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5923063Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5923320Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5923705Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5923938Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5924233Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5924468Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5924759Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5924994Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5925285Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5925481Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5925731Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5926040Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5926242Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5926472Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5926764Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5926995Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5927290Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5927525Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5927730Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5927931Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5928152Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5928322Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5928501Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5929026Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/3v/c3vtzftvwqwx2urxq2vpokyujcvg36j3wdinrekyjgxuofdo2ac4.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5929176Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5929392Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5929548Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5929693Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5929994Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5930126Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5930396Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5930535Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5930794Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5930951Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5931218Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5932962Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5933303Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5933501Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5933817Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5934129Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5934264Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5934750Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5935004Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5935231Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5935438Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5935636Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5935945Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5936198Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5936490Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5936724Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5937021Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5937254Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5937544Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5937790Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5938081Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5938323Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5938616Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5938848Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5939140Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5939339Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5939574Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5939866Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5940062Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5940302Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5940604Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5940835Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5941126Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5941346Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5941552Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5941753Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5941974Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5942140Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5942319Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5942431Z E1204 11:25:56.460000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5942742Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5943037Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5943167Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5943705Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5943961Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5944189Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5944394Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5944604Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5944909Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5945140Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5945432Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5945663Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5945955Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5946185Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5946489Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5946722Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5947023Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5947255Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5947545Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5947775Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5948070Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5948268Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5948500Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5948794Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5949002Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5949251Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5949540Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5949773Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5950062Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5950282Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5950488Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5950702Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5950913Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5951080Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5951269Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5951798Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/gs/cgskehlfe747cdsaomsadiaalz26dwrddeyodwm5zmjfk435sr57.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5951946Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5952161Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5952317Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5952464Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5952752Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5952887Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5953154Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5953321Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5953592Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5953751Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5954018Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5954155Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5954433Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5954627Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5954957Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5955248Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5955379Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5955872Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5956126Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5956352Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5956559Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5956760Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5957052Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5957287Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5957590Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5957833Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5958127Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5958358Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5958649Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5958879Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5959171Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5959411Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5959701Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5959942Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5960232Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5960430Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5960661Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5960954Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5961154Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5961385Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5961675Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5961916Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5962218Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5962436Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5962645Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5962847Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5963056Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5963225Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5963440Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5963555Z E1204 11:25:56.468000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5963862Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5964158Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5964302Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5964778Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5965034Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5965258Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5965464Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5965664Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5965956Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5966210Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5966515Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5966750Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5967040Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5967271Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5967565Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5967797Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5968100Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5968331Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5968634Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5968865Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5969157Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5969354Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5969584Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5969881Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5970077Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5970307Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5970606Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5970838Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5971137Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5971357Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5971567Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5971766Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5971979Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5972156Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5972335Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5972866Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/ui/cui7dqlnurfmvrkr5ycejewmf6ibzpqmws5ikb5yw2p2q6b7xfj5.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5973031Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5973246Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5973430Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5973576Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5973862Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5973995Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5974251Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5974391Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5974646Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5974814Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5975084Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5975229Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5975505Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5975697Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5976011Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5976305Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5976448Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5976925Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5977188Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5977416Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5977625Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5977823Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5978116Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5978351Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5978647Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5978879Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5979182Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5979413Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5979713Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5979945Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5980237Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5980469Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5980758Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5981003Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5981298Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5981516Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5981749Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5982039Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5982236Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5982467Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5982763Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5982996Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5983321Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5983555Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5983763Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5983977Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5984187Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5984354Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5984534Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5984636Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.5984947Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5985255Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5985385Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5985858Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5986129Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5986355Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5986558Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5986758Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5987049Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5987282Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5987572Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5987815Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5988115Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5988347Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5988641Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5988872Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5989162Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5989393Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5989692Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5989925Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5990224Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5990419Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5990651Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5990944Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5991142Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.5991375Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5991665Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5991896Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5992195Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5992415Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.5992631Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.5992832Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.5993042Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.5993209Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.5993404Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.5993933Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpva1j9gdv/tb/ctbjlwornjfpqshgimqikeesgo4n4qwmnnewdptmvcwhfir47lnl.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.5994095Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.5994310Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.5994482Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.5994627Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.5994913Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.5995043Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.5995303Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.5995440Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.5995697Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.5995852Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.5996121Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.5996256Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.5996545Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.5996751Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.5997065Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.5997358Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.5997488Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.5997967Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.5998234Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.5998459Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.5998664Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.5998875Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.5999168Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5999401Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.5999692Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.5999927Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6000217Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6000448Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6000747Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6000980Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6001279Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6001510Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6001801Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6002031Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6002325Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6002533Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6002762Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6003051Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6003286Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6003520Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6003810Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6004042Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6004334Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6004553Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6004762Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6004962Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6005191Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6005370Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6005547Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6005649Z E1204 11:25:56.472000 962578 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6005704Z ('RERUN', {'yellow': True}) [3.8193s] [100%] 2025-12-04T11:45:25.6006059Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:58.504283320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6006063Z 2025-12-04T11:45:25.6006209Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6006503Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6006809Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6006939Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6007417Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6007683Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6007909Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6008112Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6008312Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6008603Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6008836Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6009129Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6009374Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6009676Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6009907Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6010198Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6010428Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6010718Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6010936Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6011151Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6011348Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6011556Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6011767Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6011997Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6012288Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6012485Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6012717Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6013011Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6013230Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6013454Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6013687Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6013913Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6014111Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6014306Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6014523Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6014728Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6014925Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6015118Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6015364Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6015655Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6015900Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6016195Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6016414Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6016620Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6016814Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6017023Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6017221Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6017453Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6017755Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6017987Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6018289Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6018523Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6018815Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6019047Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6019340Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6019582Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6019871Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6020101Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6020403Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6020634Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6020929Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6021159Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6021451Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6021681Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6021972Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6022213Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6022505Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6022746Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6023037Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6023279Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6023483Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6023679Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6023984Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6024215Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6024506Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6024751Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6025040Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6025270Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6025563Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6025797Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6026086Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6026317Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6026618Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6026815Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6027023Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6027220Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6027426Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6027625Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6027858Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6028150Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6028357Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6028551Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6028750Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6028953Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6029186Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6029477Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6029709Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6030003Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6030200Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6030407Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6030607Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6030857Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6031160Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6031380Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6031583Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6031781Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6031985Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6032284Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6032529Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6032819Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6033052Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6033384Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6033617Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6033910Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6034143Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6034438Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6034636Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6034835Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6035057Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6035279Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6035498Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6035699Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6035993Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6036227Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6036519Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6036753Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6037081Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6037320Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6037633Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6037865Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6038157Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6038377Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6038580Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6038779Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6038974Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6039184Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6039384Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6039694Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6039928Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6040132Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6040328Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6040529Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6040820Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6041054Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6041358Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6041591Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6041895Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6042128Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6042422Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6042655Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6042947Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6043181Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6043495Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6043727Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6044037Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6044289Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6044582Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6044814Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6045107Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6045340Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6045647Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6045847Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6046043Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6046297Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6046592Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6046829Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6047122Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6047354Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6047648Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6047880Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6048171Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6048413Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6048718Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6048916Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6049151Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6049446Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6049679Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6049970Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6050194Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6050398Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6050605Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6050805Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6051097Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6051311Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6051514Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6051712Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6051913Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6052205Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6052427Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6052639Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6052852Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6053044Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6053194Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6053418Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6053640Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6053848Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6054045Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6054281Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6054487Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6054684Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6054919Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6055124Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6055320Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6055540Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6055747Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6055947Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6056141Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6056356Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6056555Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6056772Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6056989Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6057281Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6057494Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6057696Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6057897Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6058090Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6058297Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6058508Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6058711Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6058919Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6059120Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6059413Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6059624Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6059827Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6060024Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6060226Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6060524Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6060737Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6060949Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6061156Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6061358Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6061648Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6061844Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6062046Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6062236Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6062448Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6062664Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6062871Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6063077Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6063302Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6063484Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6063654Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6063780Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6063885Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6064012Z E1204 11:25:58.243000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6064169Z [W1204 11:25:58.513025213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6064171Z 2025-12-04T11:45:25.6064316Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6064609Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6064921Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6065052Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6065544Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6065796Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6066023Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6066231Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6066429Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6066733Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6066966Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6067262Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6067508Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6067799Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6068031Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6068321Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6068554Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6068843Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6069062Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6069277Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6069474Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6069691Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6069890Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6070121Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6070412Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6070608Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6070848Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6071139Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6071358Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6071562Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6071783Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6071988Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6072183Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6072378Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6072596Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6072800Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6072995Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6073189Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6073463Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6073768Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6074000Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6074296Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6074519Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6074726Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6074924Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6075144Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6075342Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6075573Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6075877Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6076108Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6076401Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6076633Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6076924Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6077157Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6077447Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6077678Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6077980Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6078226Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6078519Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6078751Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6079044Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6079277Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6079578Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6079809Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6080099Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6080346Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6080637Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6080870Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6081164Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6081385Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6081588Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6081784Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6082075Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6082315Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6082617Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6082849Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6083138Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6083404Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6083700Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6083947Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6084237Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6084471Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6084776Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6084973Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6085170Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6085364Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6085572Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6085775Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6086007Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6086297Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6086505Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6086702Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6086908Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6087104Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6087335Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6087628Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6087862Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6088155Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6088361Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6088569Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6088773Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6089015Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6089309Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6089530Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6089732Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6089931Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6090134Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6090434Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6090667Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6090971Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6091218Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6091510Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6091744Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6092037Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6092274Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6092575Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6092776Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6092975Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6093206Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6093441Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6093640Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6093844Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6094138Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6094374Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6094674Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6094906Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6095224Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6095473Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6095766Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6096000Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6096294Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6096518Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6096720Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6096936Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6097129Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6097347Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6097561Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6097857Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6098079Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6098281Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6098481Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6098682Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6098975Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6099210Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6099514Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6099758Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6100053Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6100289Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6100582Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6100819Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6101111Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6101354Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6101649Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6101892Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6102189Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6102423Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6102718Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6102951Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6103245Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6103507Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6103798Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6104011Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6104222Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6104456Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6104754Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6104988Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6105284Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6105517Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6105823Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6106055Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6106363Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6106597Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6106889Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6107088Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6107321Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6107616Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6107852Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6108145Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6108371Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6108593Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6108793Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6108995Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6109287Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6109505Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6109707Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6110000Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6110203Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6110498Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6110738Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6110940Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6111139Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6111334Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6111484Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6111682Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6111904Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6112111Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6112305Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6112538Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6112747Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6112954Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6113176Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6113398Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6113594Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6113814Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6114021Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6114235Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6114429Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6114643Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6114857Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6115055Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6115258Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6115551Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6115765Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6115967Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6116167Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6116361Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6116555Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6116780Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6116993Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6117191Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6117390Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6117686Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6117898Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6118099Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6118308Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6118508Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6118803Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6119024Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6119226Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6119423Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6119623Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6119916Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6120112Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6120313Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6120504Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6120702Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6120925Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6121143Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6121339Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6121530Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6121710Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6121882Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6122009Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6122113Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6122240Z E1204 11:25:58.246000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6122408Z [W1204 11:25:58.515307440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6122410Z 2025-12-04T11:45:25.6122555Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6122849Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6123158Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6123301Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6123779Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6124033Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6124259Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6124468Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6124667Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6124973Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6125210Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6125515Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6125751Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6126042Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6126274Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6126565Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6126812Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6127103Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6127322Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6127543Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6127740Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6127947Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6128146Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6128377Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6128669Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6128864Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6129096Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6129395Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6129626Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6129823Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6130044Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6130250Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6130445Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6130642Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6130870Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6131075Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6131270Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6131465Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6131709Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6132002Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6132239Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6132531Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6132753Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6132957Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6133152Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6133387Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6133598Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6133844Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6134136Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6134368Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6134662Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6134897Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6135188Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6135439Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6135729Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6135991Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6136284Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6136516Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6136808Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6137041Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6137333Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6137563Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6137851Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6138095Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6138399Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6138629Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6138920Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6139152Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6139448Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6139681Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6139884Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6140082Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6140384Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6140620Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6140912Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6141146Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6141435Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6141670Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6141963Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6142193Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6142504Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6142754Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6143046Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6143242Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6143463Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6143661Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6143869Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6144089Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6144318Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6144611Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6144826Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6145019Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6145216Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6145408Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6145642Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6145934Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6146168Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6146458Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6146672Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6146882Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6147100Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6147335Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6147629Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6147852Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6148055Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6148265Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6148465Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6148763Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6149006Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6149299Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6149533Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6149827Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6150063Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6150357Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6150589Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6150893Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6151092Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6151300Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6151522Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6151724Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6151923Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6152125Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6152422Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6152666Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6152959Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6153192Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6153535Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6153769Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6154059Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6154294Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6154587Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6154808Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6155009Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6155225Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6155419Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6155647Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6155852Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6156144Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6156367Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6156572Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6156771Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6156989Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6157281Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6157518Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6157825Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6158063Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6158357Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6158592Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6158887Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6159118Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6159410Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6159657Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6159963Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6160200Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6160494Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6160729Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6161021Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6161256Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6161558Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6161794Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6162098Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6162295Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6162494Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6162727Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6163020Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6163286Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6163577Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6163811Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6164113Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6164361Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6164653Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6164887Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6165182Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6165381Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6165617Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6165923Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6166155Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6166448Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6166679Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6166883Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6167081Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6167284Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6167578Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6167794Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6167996Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6168196Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6168409Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6168711Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6168934Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6169135Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6169335Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6169528Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6169683Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6169881Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6170114Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6170323Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6170519Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6170753Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6170959Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6171155Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6171377Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6171583Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6171781Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6172002Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6172208Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6172406Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6172612Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6172836Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6173040Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6173238Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6173467Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6173759Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6173972Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6174189Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6174387Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6174580Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6174792Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6175007Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6175210Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6175410Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6175614Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6175908Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6176123Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6176326Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6176526Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6176739Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6177046Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6177263Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6177464Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6177667Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6177866Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6178165Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6178372Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6178574Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6178766Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6178973Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6179187Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6179393Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6179592Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6179782Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6179964Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6180135Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6180262Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6180367Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6180492Z E1204 11:25:58.248000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6180650Z [W1204 11:25:58.555492678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6180663Z 2025-12-04T11:45:25.6180809Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6181118Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6181415Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6181547Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6182028Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6182280Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6182516Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6182721Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6182920Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6183225Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6183496Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6183786Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6184020Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6184314Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6184545Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6184836Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6185081Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6185373Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6185610Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6185819Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6186016Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6186224Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6186427Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6186661Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6186975Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6187172Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6187402Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6187709Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6187930Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6188129Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6188348Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6188555Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6188752Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6188947Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6189166Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6189381Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6189579Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6189784Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6190021Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6190315Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6190547Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6190843Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6191077Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6191284Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6191479Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6191700Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6191902Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6192132Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6192425Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6192659Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6192951Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6193181Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6193499Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6193743Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6194050Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6194282Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6194574Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6194807Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6195102Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6195334Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6195638Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6195870Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6196176Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6196408Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6196702Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6196934Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6197224Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6197459Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6197749Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6197970Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6198181Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6198389Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6198681Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6198912Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6199204Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6199436Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6199733Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6199974Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6200267Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6200512Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6200803Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6201037Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6201328Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6201525Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6201721Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6201919Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6202129Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6202335Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6202567Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6202869Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6203070Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6203278Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6203476Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6203671Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6203902Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6204210Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6204442Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6204733Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6204940Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6205149Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6205352Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6205586Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6205878Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6206100Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6206302Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6206501Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6206719Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6207035Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6207267Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6207562Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6207796Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6208089Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6208321Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6208627Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6208862Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6209167Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6209368Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6209566Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6209787Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6209988Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6210187Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6210391Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6210684Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6210926Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6211222Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6211472Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6211765Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6211997Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6212291Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6212523Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6212826Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6213045Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6213272Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6213488Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6213682Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6213899Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6214100Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6214395Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6214616Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6214820Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6215018Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6215218Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6215522Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6215768Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6216065Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6216301Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6216597Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6216833Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6217137Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6217372Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6217665Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6217910Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6218203Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6218437Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6218732Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6218967Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6219260Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6219491Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6219797Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6220031Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6220333Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6220533Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6220729Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6220965Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6221258Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6221502Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6221794Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6222028Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6222334Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6222568Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6222863Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6223099Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6223428Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6223628Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6223860Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6224175Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6224407Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6224715Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6224932Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6225134Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6225334Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6225537Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6225835Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6226060Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6226266Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6226478Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6226681Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6226975Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6227196Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6227402Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6227600Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6227793Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6227941Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6228137Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6228367Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6228576Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6228783Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6229003Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6229210Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6229406Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6229628Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6229835Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6230043Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6230263Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6230469Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6230678Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6230872Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6231088Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6231291Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6231490Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6231691Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6231989Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6232208Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6232410Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6232620Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6232824Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6233019Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6233232Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6233468Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6233667Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6233870Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6234175Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6234387Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6234591Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6234804Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6235006Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6235300Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6235515Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6235720Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6235919Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6236122Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6236416Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6236611Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6236826Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6237032Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6237229Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6237442Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6237647Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6237844Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6238036Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6238217Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6238401Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6238526Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6238630Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6238755Z E1204 11:25:58.288000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6238927Z [W1204 11:25:58.557684576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6238930Z 2025-12-04T11:45:25.6239079Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6239372Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6239667Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6239797Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6240277Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6240532Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6240768Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6240975Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6241183Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6241479Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6241713Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6242008Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6242244Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6242546Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6242777Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6243068Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6244786Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6245080Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6245308Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6245517Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6245713Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6245921Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6246119Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6246350Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6246663Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6246863Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6247109Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6247400Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6247622Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6247818Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6248038Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6248257Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6248452Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6248646Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6248864Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6249086Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6249280Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6249474Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6249707Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6250001Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6250232Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6250525Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6250744Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6250959Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6251166Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6251372Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6251571Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6251801Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6252093Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6252326Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6252625Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6252859Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6253149Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6253430Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6253720Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6253952Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6254241Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6254474Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6254768Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6254996Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6255301Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6255549Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6255839Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6256073Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6256365Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6256596Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6256888Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6257145Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6257440Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6257671Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6257872Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6258069Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6258359Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6258589Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6258880Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6259114Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6259406Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6259647Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6259948Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6260181Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6260471Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6260701Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6260992Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6261189Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6261396Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6261592Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6261800Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6262009Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6262242Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6262534Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6262729Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6262924Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6263119Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6263337Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6263567Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6263875Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6264109Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6264413Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6264611Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6264817Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6265022Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6265256Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6265549Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6265782Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6265984Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6266199Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6266401Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6266695Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6266928Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6267221Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6267456Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6267748Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6267980Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6268282Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6268527Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6268819Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6269018Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6269217Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6269438Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6269640Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6269848Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6270049Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6270342Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6270591Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6270891Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6271124Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6271417Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6271649Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6271941Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6272174Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6272476Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6272699Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6272919Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6273121Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6273345Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6273559Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6273760Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6274054Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6274286Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6274488Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6274690Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6274904Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6275199Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6275433Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6275730Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6275964Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6276256Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6276490Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6276801Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6277036Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6277343Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6277575Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6277872Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6278109Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6278402Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6278647Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6278939Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6279173Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6279480Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6279714Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6280005Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6280209Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6280408Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6280642Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6280934Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6281178Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6281471Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6281718Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6282013Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6282245Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6282540Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6282778Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6283085Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6283314Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6283548Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6283853Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6284089Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6284387Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6284602Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6284805Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6285005Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6285207Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6285501Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6285732Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6285949Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6286147Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6286347Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6286639Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6286858Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6287064Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6287277Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6287470Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6287617Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6287813Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6288049Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6288257Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6288454Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6288673Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6288884Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6289083Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6289304Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6289515Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6289727Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6289948Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6290166Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6290365Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6290559Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6290773Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6290975Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6291176Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6291389Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6291680Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6291897Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6292108Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6292308Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6292500Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6292695Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6292908Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6293109Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6293333Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6293534Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6293827Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6294055Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6294272Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6294474Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6294675Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6294969Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6295181Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6295382Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6295595Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6295798Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6296091Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6296303Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6296507Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6296697Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6296893Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6297106Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6297314Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6297512Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6297700Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6297881Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6298064Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6298192Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6298296Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6298432Z E1204 11:25:58.291000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6298590Z [W1204 11:25:58.559801306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6298592Z 2025-12-04T11:45:25.6298737Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6299031Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6299329Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6299462Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6299952Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6300206Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6300442Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6300648Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6300852Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6301145Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6301380Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6301671Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6301905Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6302194Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6302438Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6302746Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6302980Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6303301Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6303523Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6303814Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6304010Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6304232Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6304431Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6304661Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6304975Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6305169Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6305400Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6305692Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6305912Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6306108Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6306325Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6306529Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6306739Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6306934Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6307164Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6307368Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6307563Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6307756Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6307993Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6308289Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6308530Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6308821Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6309050Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6309258Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6309457Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6309664Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6309861Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6310094Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6310385Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6310617Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6310919Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6311149Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6311449Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6311680Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6311973Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6312204Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6312497Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6312743Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6313034Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6313302Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6313613Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6313846Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6314136Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6314371Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6314662Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6314896Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6315192Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6315443Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6315749Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6315969Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6316169Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6316364Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6316655Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6316887Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6317192Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6317430Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6317724Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6317965Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6318256Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6318487Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6318777Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6319008Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6319298Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6319494Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6319700Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6319900Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6320118Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6320318Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6320547Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6320838Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6321033Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6321228Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6321437Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6321629Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6321862Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6322167Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6322402Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6322691Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6322887Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6323094Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6323323Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6323559Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6323854Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6324092Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6324311Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6324513Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6324716Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6325012Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6325247Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6325539Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6325784Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6326076Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6326309Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6326618Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6326854Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6327148Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6327347Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6327546Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6327766Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6327970Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6328168Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6328376Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6328681Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6328916Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6329213Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6329447Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6329740Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6329972Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6330273Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6330506Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6330812Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6331033Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6331236Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6331436Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6331629Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6331839Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6332038Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6332330Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6332565Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6332769Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6332978Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6333180Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6333506Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6333743Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6334036Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6334268Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6334578Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6334811Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6335114Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6335347Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6335639Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6335872Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6336169Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6336403Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6336696Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6336928Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6337232Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6337483Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6337776Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6338006Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6338299Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6338499Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6338709Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6338940Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6339235Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6339479Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6339770Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6340005Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6340297Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6340530Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6340823Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6341058Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6341359Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6341556Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6341800Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6342094Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6342328Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6342621Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6342837Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6343049Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6343274Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6343475Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6343768Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6343997Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6344199Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6344397Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6344599Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6344891Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6345114Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6345316Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6345514Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6345724Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6345873Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6346084Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6346304Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6346510Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6346712Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6346932Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6347139Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6347346Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6347565Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6347773Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6347979Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6348199Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6348404Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6348600Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6348796Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6349010Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6349211Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6349409Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6349608Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6349913Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6350139Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6350340Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6350537Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6350729Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6350922Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6351137Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6351347Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6351544Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6351745Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6352037Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6352262Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6352463Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6352662Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6352863Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6353156Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6353404Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6353605Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6353803Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6354023Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6354328Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6354526Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6354728Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6354917Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6355112Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6355327Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6355543Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6355740Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6355927Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6356108Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6356291Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6356416Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6356520Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6356646Z E1204 11:25:58.293000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6356699Z ('RERUN', {'yellow': True}) [1.5556s] [100%] 2025-12-04T11:45:25.6357055Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:25:59.852622575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6357058Z 2025-12-04T11:45:25.6357204Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6357497Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6357794Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6357925Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6358424Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6358679Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6358903Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6359111Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6359311Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6359604Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6359849Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6360140Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6360383Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6360673Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6360905Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6361195Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6361428Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6361721Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6361941Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6362147Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6362353Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6362561Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6362769Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6363000Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6363319Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6363517Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6363752Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6364054Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6364274Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6364470Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6364701Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6364907Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6365102Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6365296Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6365514Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6365719Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6365915Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6366112Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6366342Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6366646Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6366891Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6367182Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6367401Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6367605Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6367801Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6368009Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6368221Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6368452Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6368744Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6368990Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6369280Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6369511Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6369801Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6370032Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6370322Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6370555Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6370855Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6371088Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6371390Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6371621Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6371911Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6372144Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6372436Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6372679Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6372971Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6373201Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6373531Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6373762Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6374052Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6374270Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6374473Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6374670Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6374960Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6375191Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6375499Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6375743Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6376035Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6376266Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6376558Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6376789Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6377092Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6377322Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6377617Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6377826Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6378020Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6378216Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6378421Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6378619Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6378852Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6379145Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6379338Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6379546Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6379745Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6379956Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6380188Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6380476Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6380707Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6380998Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6381206Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6381413Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6381614Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6381851Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6382159Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6382380Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6382580Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6382779Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6382980Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6383298Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6383531Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6383836Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6384071Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6384384Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6384621Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6384915Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6385146Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6385438Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6385659Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6385855Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6386075Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6386290Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6386489Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6386690Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6386988Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6387221Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6387515Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6387746Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6388039Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6388282Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6388585Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6388818Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6389113Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6389337Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6389541Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6389740Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6389945Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6390155Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6390355Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6390658Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6390879Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6391081Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6391279Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6391481Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6391777Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6392012Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6392303Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6392546Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6392852Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6393084Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6393412Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6393647Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6394026Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6394259Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6394566Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6394800Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6395105Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6395339Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6395632Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6395865Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6396157Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6396393Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6396687Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6396883Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6397096Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6397341Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6397639Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6397872Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6398167Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6398402Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6398697Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6398942Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6399241Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6399484Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6399777Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6399973Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6400207Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6400499Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6400734Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6401026Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6401240Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6401456Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6401656Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6401874Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6402167Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6402380Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6402581Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6402781Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6402991Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6403313Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6403535Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6403751Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6403950Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6404143Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6404290Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6404486Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6404707Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6404916Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6405111Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6405330Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6405551Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6405747Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6405981Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6406191Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6406385Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6406605Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6406811Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6407008Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6407216Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6407428Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6407630Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6407837Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6408040Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6408338Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6408551Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6408753Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6408953Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6409145Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6409340Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6409554Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6409766Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6409978Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6410181Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6410476Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6410690Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6410893Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6411097Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6411307Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6411600Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6411814Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6412025Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6412223Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6412424Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6412718Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6412915Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6413118Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6413460Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6413656Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6413871Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6414089Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6414302Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6414492Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6414675Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6414847Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6414974Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6415078Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6415203Z E1204 11:25:59.586000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6415364Z [W1204 11:25:59.854885672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6415379Z 2025-12-04T11:45:25.6415525Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6415820Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6416116Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6416259Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6416740Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6416992Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6417220Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6417427Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6417628Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6417919Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6418179Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6418483Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6418714Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6419007Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6419239Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6419531Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6419764Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6420067Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6420287Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6420491Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6420701Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6420908Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6421107Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6421338Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6421630Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6421828Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6422061Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6422354Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6422584Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6422793Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6423011Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6423219Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6423466Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6423662Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6423883Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6424101Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6424299Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6424494Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6424730Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6425037Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6425269Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6425561Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6425781Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6425990Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6426184Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6426393Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6426594Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6426840Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6427147Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6427380Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6427671Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6427902Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6428194Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6428438Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6428728Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6428963Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6429272Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6429503Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6429792Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6430025Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6430317Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6430547Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6430840Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6431079Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6431372Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6431615Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6431907Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6432138Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6432429Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6432651Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6432861Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6433055Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6433392Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6433643Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6433937Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6434169Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6434465Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6434697Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6434991Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6435226Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6435529Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6435760Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6436065Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6436265Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6436460Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6436656Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6436865Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6437064Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6437310Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6437600Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6437797Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6438005Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6438201Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6438398Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6438630Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6438921Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6439153Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6439445Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6439640Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6439864Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6440068Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6440312Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6440607Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6440829Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6441032Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6441231Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6441441Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6441734Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6441968Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6442272Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6442504Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6442798Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6443032Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6443401Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6443636Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6443929Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6444127Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6444337Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6444571Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6444774Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6444973Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6445173Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6445468Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6445703Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6446009Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6446243Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6446535Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6446782Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6447076Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6447308Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6447601Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6447827Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6448031Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6448229Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6448422Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6448645Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6448855Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6449151Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6449369Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6449571Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6449769Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6449971Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6450283Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6450515Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6450809Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6451051Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6451345Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6451577Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6451871Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6452105Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6452400Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6452634Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6452937Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6453182Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6453510Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6453743Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6454036Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6454269Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6454563Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6454812Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6455106Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6455317Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6455513Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6455746Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6456038Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6456272Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6456565Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6456798Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6457092Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6457340Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6457647Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6457880Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6458174Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6458373Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6458607Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6458901Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6459144Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6459437Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6459662Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6459866Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6460066Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6460266Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6460561Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6460775Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6460979Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6461178Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6461378Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6461681Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6461916Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6462122Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6462321Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6462513Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6462662Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6462862Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6463082Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6463348Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6463545Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6463764Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6463986Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6464182Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6464406Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6464610Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6464809Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6465031Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6465236Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6465435Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6465631Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6465858Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6466078Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6466281Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6466486Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6466784Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6467000Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6467203Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6467416Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6467608Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6467807Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6468030Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6468235Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6468434Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6468635Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6468932Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6469144Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6469347Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6469545Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6469749Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6470053Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6470281Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6470485Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6470683Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6470883Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6471179Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6471378Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6471589Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6471780Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6471979Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6472204Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6472411Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6472608Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6472800Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6472980Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6473152Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6473320Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6473425Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6473552Z E1204 11:25:59.588000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6473710Z [W1204 11:25:59.856994271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6473713Z 2025-12-04T11:45:25.6473858Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6474164Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6474474Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6474605Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6475087Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6475344Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6475571Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6475794Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6475992Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6476290Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6476539Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6476833Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6477066Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6477357Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6477592Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6477885Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6478120Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6478421Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6478642Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6478859Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6479055Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6479264Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6479463Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6479696Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6479986Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6480199Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6480433Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6480724Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6480954Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6481150Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6481369Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6481575Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6481772Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6481969Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6482187Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6482393Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6482606Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6482801Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6483043Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6483382Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6483612Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6483905Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6484126Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6484343Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6484538Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6484748Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6484962Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6485193Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6485486Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6485717Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6486008Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6486240Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6486530Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6486762Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6487066Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6487313Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6487605Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6487834Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6488125Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6488356Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6488647Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6488888Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6489179Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6489423Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6489716Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6489949Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6490239Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6490473Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6490763Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6490984Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6491184Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6491388Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6491690Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6491923Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6492215Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6492448Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6492742Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6492973Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6493313Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6493546Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6493850Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6494083Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6494383Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6494578Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6494776Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6494973Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6495184Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6495383Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6495628Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6495920Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6496129Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6496328Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6496523Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6496720Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6496953Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6497247Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6497491Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6497781Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6497995Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6498202Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6498404Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6498636Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6498931Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6499152Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6499355Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6499555Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6499754Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6500058Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6500303Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6500598Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6500831Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6501125Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6501360Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6501651Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6501894Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6502188Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6502398Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6502595Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6502816Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6503018Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6503217Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6503451Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6503747Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6503983Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6504292Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6504526Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6504832Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6505066Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6505361Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6505597Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6505893Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6506128Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6506331Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6506533Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6506735Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6506945Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6507145Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6507438Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6507659Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6507864Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6508068Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6508268Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6508573Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6508806Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6509112Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6509348Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6509643Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6509877Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6510171Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6510416Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6510714Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6510947Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6511252Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6511485Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6511777Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6512009Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6512304Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6512537Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6512830Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6513076Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6513411Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6513612Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6513808Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6514040Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6514334Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6514570Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6514882Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6515114Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6515410Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6515665Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6515957Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6516192Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6516486Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6516684Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6516918Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6517212Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6517459Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6517765Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6517980Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6518182Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6518381Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6518582Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6518875Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6519101Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6519304Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6519504Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6519716Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6520014Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6520235Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6520439Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6520640Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6520833Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6520983Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6521183Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6521403Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6521618Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6521818Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6522052Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6522263Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6522462Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6522682Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6522889Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6523084Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6523336Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6523542Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6523743Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6523955Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6524170Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6524374Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6524577Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6524779Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6525076Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6525290Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6525491Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6525691Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6525897Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6526108Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6526322Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6526524Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6526722Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6526928Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6527226Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6527452Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6527652Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6527850Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6528060Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6528354Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6528566Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6528769Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6528969Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6529175Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6529469Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6529665Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6529867Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6530065Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6530282Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6530494Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6530701Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6530897Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6531086Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6531271Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6531441Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6531581Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6531684Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6531809Z E1204 11:25:59.590000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6531964Z [W1204 11:25:59.896349022 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6531980Z 2025-12-04T11:45:25.6532125Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6532418Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6532717Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6532849Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6533370Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6533629Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6533855Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6534077Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6534278Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6534581Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6534818Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6535108Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6535344Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6535636Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6535883Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6536174Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6536406Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6536711Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6536930Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6537137Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6537333Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6537542Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6537744Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6537975Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6538267Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6538471Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6538704Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6539005Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6539229Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6539424Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6539643Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6539851Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6540056Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6540249Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6540470Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6540677Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6540884Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6541078Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6541309Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6541600Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6541833Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6542124Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6542344Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6542549Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6542759Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6542979Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6543179Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6543432Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6543722Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6543954Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6544245Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6544491Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6544786Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6545016Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6545323Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6545554Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6545848Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6546080Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6546372Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6546605Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6546897Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6547149Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6547454Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6547689Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6547981Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6548211Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6548503Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6548733Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6549037Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6549258Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6549469Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6549664Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6549959Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6550190Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6550479Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6550713Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6551005Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6551234Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6551535Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6551774Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6552064Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6552296Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6552587Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6554346Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6554544Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6554759Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6554967Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6555169Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6555414Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6555710Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6555907Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6556101Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6556297Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6556491Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6556722Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6557013Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6557256Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6557547Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6557755Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6557963Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6558163Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6558399Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6558693Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6558925Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6559127Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6559328Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6559541Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6559834Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6560068Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6560360Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6560594Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6560891Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6561124Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6561416Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6561656Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6561958Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6562156Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6562355Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6562577Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6562780Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6562980Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6563195Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6563518Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6563750Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6564068Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6564301Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6564594Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6564827Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6565120Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6565353Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6565647Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6565881Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6566085Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6566296Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6566490Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6566698Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6566899Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6567191Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6567412Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6567626Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6567825Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6568027Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6568328Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6568562Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6568853Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6569085Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6569379Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6569610Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6569902Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6570146Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6570439Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6570680Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6570973Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6571205Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6571497Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6571731Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6572033Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6572266Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6572562Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6572804Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6573097Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6573321Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6573518Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6573750Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6574043Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6574277Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6574585Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6574821Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6575126Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6575361Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6575654Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6575886Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6576183Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6576398Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6576632Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6576925Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6577174Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6577470Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6577687Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6577890Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6578089Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6578289Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6578581Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6578794Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6579006Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6579208Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6579418Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6579716Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6579938Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6580139Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6580338Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6580544Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6580692Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6580889Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6581110Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6581330Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6581524Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6581746Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6581951Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6582147Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6582368Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6582573Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6582767Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6582998Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6583204Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6583442Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6583638Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6583849Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6584055Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6584257Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6584461Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6584769Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6584981Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6585183Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6585393Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6585585Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6585781Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6585993Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6586198Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6586398Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6586600Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6586894Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6587106Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6587321Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6587531Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6587733Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6588027Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6588241Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6588441Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6588642Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6588851Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6589145Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6589341Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6589554Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6589743Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6589938Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6590151Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6590358Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6590555Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6590745Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6590929Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6591099Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6591226Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6591340Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6591468Z E1204 11:25:59.629000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6591634Z [W1204 11:25:59.898424712 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6591638Z 2025-12-04T11:45:25.6591782Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6592076Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6592372Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6592504Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6592987Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6593290Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6593517Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6593737Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6593937Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6594231Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6594467Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6594758Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6594990Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6595289Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6595519Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6595825Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6596074Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6596367Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6596589Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6596798Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6596999Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6597205Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6597417Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6597648Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6597940Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6598148Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6598378Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6598668Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6598888Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6599084Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6599306Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6599511Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6599705Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6599909Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6600128Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6600343Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6600540Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6600733Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6600968Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6601262Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6601508Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6601800Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6602017Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6602233Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6602427Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6602634Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6602833Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6603063Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6603393Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6603623Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6603918Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6604161Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6604452Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6604694Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6604985Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6605217Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6605505Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6605738Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6606040Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6606273Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6606565Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6606808Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6607099Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6607328Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6607619Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6607851Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6608140Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6608375Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6608674Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6608906Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6609106Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6609303Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6609594Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6609827Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6610117Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6610357Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6610646Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6610877Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6611180Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6611410Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6611700Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6611931Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6612223Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6612420Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6612616Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6612828Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6613036Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6613246Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6613505Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6613796Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6613993Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6614188Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6614386Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6614597Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6614828Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6615118Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6615363Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6615657Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6615851Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6616059Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6616261Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6616494Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6616787Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6617008Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6617221Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6617434Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6617636Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6617932Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6618166Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6618458Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6618691Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6618994Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6619226Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6619519Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6619764Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6620056Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6620255Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6620451Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6620674Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6620875Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6621075Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6621275Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6621577Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6621821Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6622113Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6622347Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6622643Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6622878Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6623179Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6623436Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6623730Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6623966Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6624167Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6624367Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6624559Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6624773Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6624978Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6625271Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6625491Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6625704Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6625902Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6626115Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6626410Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6626641Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6626933Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6627167Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6627460Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6627706Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6627998Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6628246Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6628537Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6628770Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6629062Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6629294Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6629590Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6629822Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6630124Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6630356Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6630658Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6630890Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6631182Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6631380Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6631578Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6631823Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6632118Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6632352Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6632654Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6632887Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6633180Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6633442Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6633736Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6633968Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6634264Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6634474Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6634707Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6635014Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6635247Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6635539Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6635752Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6635955Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6636154Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6636368Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6636665Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6636891Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6637094Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6637292Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6637494Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6637789Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6638011Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6638213Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6638410Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6638603Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6638761Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6638960Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6639191Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6639397Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6639593Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6639813Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6640018Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6640213Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6640444Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6640647Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6640843Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6641077Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6641284Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6641483Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6641677Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6641890Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6642092Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6642291Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6642491Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6642784Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6643007Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6643217Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6643449Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6643640Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6643836Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6644047Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6644250Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6644468Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6644667Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6644960Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6645186Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6645387Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6645586Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6645789Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6646080Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6646296Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6646499Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6646698Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6646898Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6647203Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6647412Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6647614Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6647805Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6648004Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6648218Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6648424Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6648631Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6648819Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6648998Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6649172Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6649308Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6649414Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6649541Z E1204 11:25:59.631000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6649696Z [W1204 11:25:59.900498122 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6649699Z 2025-12-04T11:45:25.6649844Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6650141Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6650439Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6650570Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6651048Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6651312Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6651553Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6651758Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6651956Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6652247Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6652484Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6652775Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6653019Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6653340Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6653585Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6653876Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6654108Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6654399Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6654621Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6654827Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6655024Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6655231Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6655441Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6655674Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6655975Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6656171Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6656402Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6656691Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6656911Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6657118Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6657338Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6657542Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6657748Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6657943Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6658162Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6658366Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6658561Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6658756Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6658988Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6659285Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6659517Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6659818Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6660062Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6660267Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6660461Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6660668Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6660866Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6661098Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6661401Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6661635Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6661928Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6662170Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6662460Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6662690Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6662981Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6663212Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6663535Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6663768Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6664076Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6664307Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6664607Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6664839Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6665129Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6665362Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6665650Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6665895Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6666187Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6666421Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6666723Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6666941Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6667143Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6667338Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6667631Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6667863Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6668230Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6668480Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6668772Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6669014Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6669304Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6669536Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6669826Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6670060Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6670360Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6670555Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6670750Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6670958Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6671167Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6671369Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6671599Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6671891Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6672089Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6672285Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6672479Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6672673Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6672914Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6673215Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6673484Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6673775Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6673971Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6674180Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6674381Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6674629Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6674923Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6675159Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6675360Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6675560Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6675762Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6676057Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6676290Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6676586Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6676818Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6677127Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6677360Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6677662Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6677895Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6678191Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6678390Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6678587Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6678817Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6679019Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6679217Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6679429Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6679721Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6679954Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6680249Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6680486Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6680780Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6681012Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6681304Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6681545Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6681848Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6682070Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6682270Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6682469Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6682663Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6682875Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6683087Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6683420Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6683644Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6683860Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6684059Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6684259Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6684551Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6684787Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6685083Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6685319Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6685612Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6685858Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6686162Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6686397Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6686689Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6686922Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6687216Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6687461Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6687754Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6687988Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6688289Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6688521Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6688813Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6689047Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6689338Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6689537Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6689735Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6689969Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6690272Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6690516Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6690810Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6691041Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6691333Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6691569Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6691871Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6692103Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6692398Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6692612Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6692844Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6693137Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6693400Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6693691Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6693906Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6694108Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6694309Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6694533Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6694842Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6695057Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6695260Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6695458Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6695658Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6695952Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6696186Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6696388Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6696589Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6696795Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6696947Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6697146Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6697365Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6697569Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6697767Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6697989Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6698196Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6698395Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6698625Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6698833Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6699041Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6699266Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6699470Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6699667Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6699863Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6700075Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6700291Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6700488Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6700689Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6700992Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6701205Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6701411Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6701610Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6701805Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6702001Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6702217Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6702417Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6702615Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6702825Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6703131Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6703380Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6703583Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6703782Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6703982Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6704280Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6704509Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6704710Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6704910Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6705123Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6705417Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6705614Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6705814Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6706007Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6706203Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6706416Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6706622Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6706819Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6707020Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6707202Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6707385Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6707513Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6707618Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6707746Z E1204 11:25:59.633000 962578 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6707786Z FAILED [1.3546s] [100%] 2025-12-04T11:45:25.6707788Z 2025-12-04T11:45:25.6707848Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.6708011Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6708061Z Traceback (most recent call last): 2025-12-04T11:45:25.6708229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6708289Z method(*args, **kwargs) 2025-12-04T11:45:25.6708442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6708484Z method(*args, **kwargs) 2025-12-04T11:45:25.6708636Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6708674Z with policy(): 2025-12-04T11:45:25.6708830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6708882Z raise RuntimeError(msg) 2025-12-04T11:45:25.6709296Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.6709300Z 2025-12-04T11:45:25.6709377Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6709657Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.6709659Z 2025-12-04T11:45:25.6709749Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6709832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6709879Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6709940Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6710509Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6710612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6710652Z graph_break [] 2025-12-04T11:45:25.6710717Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6710802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6711304Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6711357Z current_size = base.storage().size() 2025-12-04T11:45:25.6711399Z Autotune Choices Stats: 2025-12-04T11:45:25.6711777Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008239000104367733, "best_triton_pos": 0} 2025-12-04T11:45:25.6711840Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6711892Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6711993Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6712235Z triton_mm_34 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6712478Z triton_mm_33 0.0089 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6712521Z _scaled_mm 0.0092 ms 89.6% 2025-12-04T11:45:25.6712748Z triton_mm_29 0.0106 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6712974Z triton_mm_22 0.0106 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6713209Z triton_mm_30 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6713459Z triton_mm_16 0.0109 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6713682Z triton_mm_21 0.0112 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6713910Z triton_mm_23 0.0114 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6714137Z triton_mm_15 0.0116 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6714272Z SingleProcess AUTOTUNE benchmarking takes 0.1596 seconds and 1.4581 seconds precompiling for 33 choices 2025-12-04T11:45:25.6714433Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6714480Z Traceback (most recent call last): 2025-12-04T11:45:25.6714639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6714695Z method(*args, **kwargs) 2025-12-04T11:45:25.6714851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6714893Z method(*args, **kwargs) 2025-12-04T11:45:25.6715057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6715097Z with policy(): 2025-12-04T11:45:25.6715250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6715293Z raise RuntimeError(msg) 2025-12-04T11:45:25.6715704Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.6715707Z 2025-12-04T11:45:25.6715784Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6716059Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.6716062Z 2025-12-04T11:45:25.6716165Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6716240Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6716284Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6716341Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6716893Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6717009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6717047Z graph_break [] 2025-12-04T11:45:25.6717118Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6717191Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6717675Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6717724Z current_size = base.storage().size() 2025-12-04T11:45:25.6717767Z Autotune Choices Stats: 2025-12-04T11:45:25.6718137Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008239000104367733, "best_triton_pos": 0} 2025-12-04T11:45:25.6718200Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6718250Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6718350Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6718601Z triton_mm_34 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6718831Z triton_mm_33 0.0089 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6718876Z _scaled_mm 0.0092 ms 89.6% 2025-12-04T11:45:25.6719112Z triton_mm_29 0.0106 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6719342Z triton_mm_22 0.0106 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6719566Z triton_mm_30 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6719792Z triton_mm_16 0.0109 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6720017Z triton_mm_21 0.0112 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6720254Z triton_mm_23 0.0114 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6720482Z triton_mm_15 0.0116 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6720625Z SingleProcess AUTOTUNE benchmarking takes 0.1596 seconds and 1.4581 seconds precompiling for 33 choices 2025-12-04T11:45:25.6720699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6720742Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6720799Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6720902Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6721393Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6721431Z graph_break [] 2025-12-04T11:45:25.6721497Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6721571Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6721611Z Autotune Choices Stats: 2025-12-04T11:45:25.6721975Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008440000005066395, "best_triton_pos": 0} 2025-12-04T11:45:25.6722036Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6722086Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6722184Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6722427Z triton_mm_72 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6722471Z _scaled_mm 0.0092 ms 91.3% 2025-12-04T11:45:25.6722714Z triton_mm_71 0.0095 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6722938Z triton_mm_67 0.0106 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6723162Z triton_mm_60 0.0109 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6723411Z triton_mm_54 0.0109 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6723636Z triton_mm_68 0.0111 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6723873Z triton_mm_59 0.0112 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6724099Z triton_mm_61 0.0116 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6724325Z triton_mm_53 0.0120 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6724471Z SingleProcess AUTOTUNE benchmarking takes 0.2354 seconds and 0.7447 seconds precompiling for 39 choices 2025-12-04T11:45:25.6724525Z =================================== FAILURES =================================== 2025-12-04T11:45:25.6724683Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6724729Z Traceback (most recent call last): 2025-12-04T11:45:25.6724886Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6724926Z method(*args, **kwargs) 2025-12-04T11:45:25.6725080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6725122Z method(*args, **kwargs) 2025-12-04T11:45:25.6725275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6725312Z with policy(): 2025-12-04T11:45:25.6725467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6725508Z raise RuntimeError(msg) 2025-12-04T11:45:25.6725915Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.6725917Z 2025-12-04T11:45:25.6725992Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6726282Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.6726286Z 2025-12-04T11:45:25.6726388Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6726463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6726508Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6726563Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6727115Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6727214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6727251Z graph_break [] 2025-12-04T11:45:25.6727316Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6727392Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6727889Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6727937Z current_size = base.storage().size() 2025-12-04T11:45:25.6727978Z Autotune Choices Stats: 2025-12-04T11:45:25.6728348Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008239000104367733, "best_triton_pos": 0} 2025-12-04T11:45:25.6728422Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6728470Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6728569Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6728804Z triton_mm_34 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6729033Z triton_mm_33 0.0089 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6729076Z _scaled_mm 0.0092 ms 89.6% 2025-12-04T11:45:25.6729301Z triton_mm_29 0.0106 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6729526Z triton_mm_22 0.0106 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6729756Z triton_mm_30 0.0107 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6729989Z triton_mm_16 0.0109 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6730224Z triton_mm_21 0.0112 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6730452Z triton_mm_23 0.0114 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6730677Z triton_mm_15 0.0116 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6730806Z SingleProcess AUTOTUNE benchmarking takes 0.1596 seconds and 1.4581 seconds precompiling for 33 choices 2025-12-04T11:45:25.6730880Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6730923Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6730978Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6731079Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6731577Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6731615Z graph_break [] 2025-12-04T11:45:25.6731678Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6731753Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6731803Z Autotune Choices Stats: 2025-12-04T11:45:25.6732165Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008440000005066395, "best_triton_pos": 0} 2025-12-04T11:45:25.6732226Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6732273Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6732371Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6732602Z triton_mm_72 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6732645Z _scaled_mm 0.0092 ms 91.3% 2025-12-04T11:45:25.6732871Z triton_mm_71 0.0095 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6733095Z triton_mm_67 0.0106 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6733347Z triton_mm_60 0.0109 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6733587Z triton_mm_54 0.0109 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6733826Z triton_mm_68 0.0111 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6734049Z triton_mm_59 0.0112 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6734278Z triton_mm_61 0.0116 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6734504Z triton_mm_53 0.0120 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6734636Z SingleProcess AUTOTUNE benchmarking takes 0.2354 seconds and 0.7447 seconds precompiling for 39 choices 2025-12-04T11:45:25.6734711Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6734754Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6734826Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6734926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6735406Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6735455Z graph_break [] 2025-12-04T11:45:25.6735521Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6735593Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6735635Z Autotune Choices Stats: 2025-12-04T11:45:25.6736003Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00824000034481287, "best_triton_pos": 0} 2025-12-04T11:45:25.6736065Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512) 2025-12-04T11:45:25.6736112Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1] 2025-12-04T11:45:25.6736210Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:25.6736445Z triton_mm_110 0.0082 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6736675Z triton_mm_109 0.0091 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6736716Z _scaled_mm 0.0093 ms 88.8% 2025-12-04T11:45:25.6736941Z triton_mm_105 0.0101 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6737181Z triton_mm_98 0.0109 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6737406Z triton_mm_92 0.0109 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6737641Z triton_mm_97 0.0112 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6737868Z triton_mm_106 0.0112 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6738096Z triton_mm_91 0.0116 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6738322Z triton_mm_99 0.0118 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6738453Z SingleProcess AUTOTUNE benchmarking takes 0.2415 seconds and 0.5732 seconds precompiling for 39 choices 2025-12-04T11:45:25.6738659Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8b5bdf466bdf190.xml - 2025-12-04T11:45:25.6738720Z =========================== short test summary info ============================ 2025-12-04T11:45:25.6739348Z FAILED [1.3546s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.6740910Z 2025-12-04T11:45:25.6740985Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6741262Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.6741266Z 2025-12-04T11:45:25.6741353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6741418Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.6742125Z ================== 1 failed, 187 deselected, 2 rerun in 6.75s ================== 2025-12-04T11:45:25.6742166Z Got exit code 1 2025-12-04T11:45:25.6742388Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.6742516Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.6742676Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-34864d86bebed23f.xml 2025-12-04T11:45:25.6742737Z ============================= test session starts ============================== 2025-12-04T11:45:25.6742851Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.6742895Z cachedir: .pytest_cache 2025-12-04T11:45:25.6743056Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.6743110Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.6743155Z configfile: pytest.ini 2025-12-04T11:45:25.6743374Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.6743455Z collecting ... collected 188 items / 109 deselected / 79 selected 2025-12-04T11:45:25.6743509Z stepcurrent: skipping 109 already run items. 2025-12-04T11:45:25.6743569Z Running 79 items in this shard 2025-12-04T11:45:25.6743571Z 2025-12-04T11:45:25.6744499Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/ls/clsz3pw6df2a34jt6xo3d3gjeuftmwhtidcbm3bwfi6oo5vd4ynt.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6744650Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6744877Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6745034Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6745181Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6745472Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6745608Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6745919Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6746059Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6746318Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6746496Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6746768Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6746905Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6747182Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6747376Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6747708Z E1204 11:26:09.056000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6748446Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/r4/cr4lhfrqaioh72mszu7uw33vidvvifl54t7dnzux5u3igorq5u7n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6748594Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6748810Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6748968Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6749115Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6749403Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6749533Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6749792Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6749931Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6750184Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6750354Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6750622Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6750769Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6751043Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6751239Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6751555Z E1204 11:26:09.072000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6752288Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/er/cerkoaafshjcwmveujbgwheu2k6meupgz4u5nuusisbcjmcgwdcs.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6752456Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6752669Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6752824Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6752968Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6753272Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6753404Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6753661Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6753797Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6754055Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6754213Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6754481Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6754632Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6754905Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6755099Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6755424Z E1204 11:26:09.077000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6756152Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/uf/cufd7cduhyda3gsn2mf3vxoda7cjmfh34zd2lh6nl2gaj6lwzlsg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6756311Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6756524Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6756693Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6756837Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6757121Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6757252Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6757507Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6757644Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6757896Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6758050Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6758318Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6758457Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6758735Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6758941Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6759257Z E1204 11:26:09.079000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6759988Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/kg/ckgbjh5i4zvhzrssaf4w3rzo5ia4fltkisg7jjbod3fwgekpzhxz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6760135Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6760348Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6760503Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6760661Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6760958Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6761089Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6761343Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6761478Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6761733Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6761889Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6762158Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6762294Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6762569Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6762763Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6763081Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6763843Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpcyq_f2yk/xd/cxd42t5trhxky44n3336i3cm3qdvfy5q2fmnzsekr4jpjdkdxdpm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6764004Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6764219Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6764373Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6764517Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6764799Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6764944Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6765213Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6765351Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6765607Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6765762Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6766032Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6766165Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6766440Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6766632Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6766947Z E1204 11:26:09.083000 968502 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6767000Z ('RERUN', {'yellow': True}) [3.2986s] [ 1%] 2025-12-04T11:45:25.6767342Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:26:11.019000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6767655Z E1204 11:26:11.019000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6767785Z E1204 11:26:11.019000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6767930Z E1204 11:26:11.021000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6768236Z E1204 11:26:11.021000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6768365Z E1204 11:26:11.021000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6768508Z E1204 11:26:11.023000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6768800Z E1204 11:26:11.023000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6768926Z E1204 11:26:11.023000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6769086Z E1204 11:26:11.080000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6769379Z E1204 11:26:11.080000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6769516Z E1204 11:26:11.080000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6769659Z E1204 11:26:11.082000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6769952Z E1204 11:26:11.082000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6770080Z E1204 11:26:11.082000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6770222Z E1204 11:26:11.084000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6770514Z E1204 11:26:11.084000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6770640Z E1204 11:26:11.084000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6770689Z ('RERUN', {'yellow': True}) [1.6404s] [ 1%] 2025-12-04T11:45:25.6771026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda E1204 11:26:12.470000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6771319Z E1204 11:26:12.470000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6771459Z E1204 11:26:12.470000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6771601Z E1204 11:26:12.472000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6771891Z E1204 11:26:12.472000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6772018Z E1204 11:26:12.472000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6772171Z E1204 11:26:12.474000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6772465Z E1204 11:26:12.474000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6772590Z E1204 11:26:12.474000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6772731Z E1204 11:26:12.515000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6773022Z E1204 11:26:12.515000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6773163Z E1204 11:26:12.515000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6773336Z E1204 11:26:12.517000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6773641Z E1204 11:26:12.517000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6773771Z E1204 11:26:12.517000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6773913Z E1204 11:26:12.519000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6774207Z E1204 11:26:12.519000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.6774332Z E1204 11:26:12.519000 968502 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6774373Z FAILED [1.4598s] [ 1%] 2025-12-04T11:45:25.6774375Z 2025-12-04T11:45:25.6774431Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.6774591Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6774638Z Traceback (most recent call last): 2025-12-04T11:45:25.6774796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6774839Z method(*args, **kwargs) 2025-12-04T11:45:25.6774995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6775038Z method(*args, **kwargs) 2025-12-04T11:45:25.6775191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6775228Z with policy(): 2025-12-04T11:45:25.6775382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6775442Z raise RuntimeError(msg) 2025-12-04T11:45:25.6775852Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.6775855Z 2025-12-04T11:45:25.6775932Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6776224Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.6776226Z 2025-12-04T11:45:25.6776317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6776395Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6776441Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6776499Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6777060Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6777178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6777215Z graph_break [] 2025-12-04T11:45:25.6777281Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6777355Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6777856Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6777905Z current_size = base.storage().size() 2025-12-04T11:45:25.6777949Z Autotune Choices Stats: 2025-12-04T11:45:25.6778328Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:25.6778398Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6778450Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6778575Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6778815Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6779054Z triton_mm_33 0.0091 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6779098Z _scaled_mm 0.0096 ms 92.5% 2025-12-04T11:45:25.6779325Z triton_mm_22 0.0106 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6779566Z triton_mm_29 0.0111 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6779792Z triton_mm_16 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6780032Z triton_mm_21 0.0115 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6780258Z triton_mm_30 0.0116 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6780491Z triton_mm_15 0.0118 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6780721Z triton_mm_23 0.0120 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6780852Z SingleProcess AUTOTUNE benchmarking takes 0.1577 seconds and 1.0218 seconds precompiling for 33 choices 2025-12-04T11:45:25.6781024Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6781072Z Traceback (most recent call last): 2025-12-04T11:45:25.6781230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6781284Z method(*args, **kwargs) 2025-12-04T11:45:25.6781441Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6781481Z method(*args, **kwargs) 2025-12-04T11:45:25.6781633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6781671Z with policy(): 2025-12-04T11:45:25.6781825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6781866Z raise RuntimeError(msg) 2025-12-04T11:45:25.6782276Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.6782280Z 2025-12-04T11:45:25.6782356Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6782631Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.6782633Z 2025-12-04T11:45:25.6782720Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6782796Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6782842Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6782900Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6783488Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6783605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6783643Z graph_break [] 2025-12-04T11:45:25.6783708Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6783805Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6784288Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6784341Z current_size = base.storage().size() 2025-12-04T11:45:25.6784381Z Autotune Choices Stats: 2025-12-04T11:45:25.6784751Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:25.6784821Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6784872Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6785010Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6785261Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6785498Z triton_mm_33 0.0091 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6785542Z _scaled_mm 0.0096 ms 92.5% 2025-12-04T11:45:25.6785770Z triton_mm_22 0.0106 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6785999Z triton_mm_29 0.0111 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6786228Z triton_mm_16 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6786456Z triton_mm_21 0.0115 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6786681Z triton_mm_30 0.0116 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6786913Z triton_mm_15 0.0118 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6787144Z triton_mm_23 0.0120 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6787293Z SingleProcess AUTOTUNE benchmarking takes 0.1577 seconds and 1.0218 seconds precompiling for 33 choices 2025-12-04T11:45:25.6787367Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6787411Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6787466Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6787579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6788066Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6788104Z graph_break [] 2025-12-04T11:45:25.6788168Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6788241Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6788282Z Autotune Choices Stats: 2025-12-04T11:45:25.6788660Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.6788729Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6788778Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6788899Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6789146Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6789382Z triton_mm_71 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6789424Z _scaled_mm 0.0095 ms 89.9% 2025-12-04T11:45:25.6789656Z triton_mm_67 0.0106 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6789884Z triton_mm_68 0.0108 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6790109Z triton_mm_54 0.0110 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6790335Z triton_mm_60 0.0112 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6790562Z triton_mm_59 0.0114 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6790796Z triton_mm_61 0.0120 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6791037Z triton_mm_53 0.0122 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6791170Z SingleProcess AUTOTUNE benchmarking takes 0.2457 seconds and 0.7874 seconds precompiling for 39 choices 2025-12-04T11:45:25.6791226Z =================================== FAILURES =================================== 2025-12-04T11:45:25.6791399Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.6791447Z Traceback (most recent call last): 2025-12-04T11:45:25.6791607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6791653Z method(*args, **kwargs) 2025-12-04T11:45:25.6791807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.6791851Z method(*args, **kwargs) 2025-12-04T11:45:25.6792002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.6792040Z with policy(): 2025-12-04T11:45:25.6792194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.6792238Z raise RuntimeError(msg) 2025-12-04T11:45:25.6792657Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.6792660Z 2025-12-04T11:45:25.6792747Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6793021Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.6793025Z 2025-12-04T11:45:25.6793112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6793187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6793232Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6793323Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6793879Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6793982Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6794020Z graph_break [] 2025-12-04T11:45:25.6794084Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6794157Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6794648Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.6794713Z current_size = base.storage().size() 2025-12-04T11:45:25.6794756Z Autotune Choices Stats: 2025-12-04T11:45:25.6795124Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:25.6795191Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6795256Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6795376Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6795613Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6795849Z triton_mm_33 0.0091 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6795893Z _scaled_mm 0.0096 ms 92.5% 2025-12-04T11:45:25.6796122Z triton_mm_22 0.0106 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6796368Z triton_mm_29 0.0111 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6796606Z triton_mm_16 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6796836Z triton_mm_21 0.0115 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6797064Z triton_mm_30 0.0116 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6797295Z triton_mm_15 0.0118 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6797525Z triton_mm_23 0.0120 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6797655Z SingleProcess AUTOTUNE benchmarking takes 0.1577 seconds and 1.0218 seconds precompiling for 33 choices 2025-12-04T11:45:25.6797729Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6797772Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6797833Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6797934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6798424Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6798481Z graph_break [] 2025-12-04T11:45:25.6798546Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6798620Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6798661Z Autotune Choices Stats: 2025-12-04T11:45:25.6799028Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.6799106Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6799159Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6799278Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6799516Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6799748Z triton_mm_71 0.0092 ms 93.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6799794Z _scaled_mm 0.0095 ms 89.9% 2025-12-04T11:45:25.6800034Z triton_mm_67 0.0106 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6800264Z triton_mm_68 0.0108 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6800502Z triton_mm_54 0.0110 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6800727Z triton_mm_60 0.0112 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6800956Z triton_mm_59 0.0114 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6801186Z triton_mm_61 0.0120 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6801419Z triton_mm_53 0.0122 ms 70.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6801549Z SingleProcess AUTOTUNE benchmarking takes 0.2457 seconds and 0.7874 seconds precompiling for 39 choices 2025-12-04T11:45:25.6801625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.6801669Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.6801728Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.6801829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.6802318Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.6802369Z graph_break [] 2025-12-04T11:45:25.6802433Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.6802506Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.6802547Z Autotune Choices Stats: 2025-12-04T11:45:25.6802917Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-12-04T11:45:25.6802992Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.6803048Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.6803169Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.6803441Z triton_mm_110 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6803483Z _scaled_mm 0.0092 ms 97.0% 2025-12-04T11:45:25.6803734Z triton_mm_109 0.0094 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6803966Z triton_mm_105 0.0104 ms 86.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6804210Z triton_mm_106 0.0109 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6804442Z triton_mm_97 0.0110 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6804667Z triton_mm_92 0.0111 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6804895Z triton_mm_98 0.0111 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.6805126Z triton_mm_99 0.0118 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6805359Z triton_mm_91 0.0122 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.6805490Z SingleProcess AUTOTUNE benchmarking takes 0.2524 seconds and 0.6407 seconds precompiling for 39 choices 2025-12-04T11:45:25.6805685Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-34864d86bebed23f.xml - 2025-12-04T11:45:25.6805747Z =========================== short test summary info ============================ 2025-12-04T11:45:25.6806378Z FAILED [1.4598s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.6806394Z 2025-12-04T11:45:25.6806471Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.6806758Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.6806761Z 2025-12-04T11:45:25.6806850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.6806916Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.6806987Z ================== 1 failed, 109 deselected, 2 rerun in 6.42s ================== 2025-12-04T11:45:25.6807026Z Got exit code 1 2025-12-04T11:45:25.6807066Z Retrying single test... 2025-12-04T11:45:25.6807214Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c80ef871d8d60908.xml 2025-12-04T11:45:25.6807272Z ============================= test session starts ============================== 2025-12-04T11:45:25.6807385Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.6807427Z cachedir: .pytest_cache 2025-12-04T11:45:25.6807597Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.6807644Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.6807693Z configfile: pytest.ini 2025-12-04T11:45:25.6807866Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.6807944Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.6808214Z stepcurrent: skipping 109 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.6808260Z Running 1 items in this shard 2025-12-04T11:45:25.6808262Z 2025-12-04T11:45:25.6808616Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:21.119086481 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6808620Z 2025-12-04T11:45:25.6808779Z [W1204 11:26:22.465173874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6808782Z 2025-12-04T11:45:25.6809098Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6809398Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6809535Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6810021Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6810303Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6810531Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6810759Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6810965Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6811262Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6811499Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6811805Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6812039Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6812346Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6812579Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6812871Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6813103Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6813428Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6813660Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6813953Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6814186Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6814476Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6814690Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6814922Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6815227Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6815424Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6815658Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6815951Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6816197Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6816488Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6816722Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6816928Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6817131Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6817345Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6817514Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6817696Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6818233Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/er/cerkoaafshjcwmveujbgwheu2k6meupgz4u5nuusisbcjmcgwdcs.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6818381Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6818601Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6818769Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6818917Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6819206Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6819348Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6819609Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6819749Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6820008Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6820169Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6820459Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6820601Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6820886Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6821084Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6821396Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6821692Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6821825Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6822307Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6822563Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6822792Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6823000Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6823210Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6823536Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6823785Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6824080Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6824316Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6824605Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6824849Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6825142Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6825389Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6825678Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6825909Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6826204Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6826436Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6826728Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6826924Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6827158Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6827452Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6827660Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6827890Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6828182Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6828423Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6828715Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6828934Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6830926Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6831164Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6831389Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6831561Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6831743Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6831846Z E1204 11:26:22.180000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6832005Z [W1204 11:26:22.483718975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6832008Z 2025-12-04T11:45:25.6832317Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6832617Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6832751Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6833232Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6833525Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6833769Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6833978Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6834194Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6834486Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6834722Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6835013Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6835246Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6835551Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6835797Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6836090Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6836320Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6836614Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6836846Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6837136Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6837370Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6837662Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6837859Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6838100Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6838391Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6838599Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6838831Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6839124Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6839354Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6839644Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6839876Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6840095Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6840297Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6840508Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6840673Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6840853Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6841382Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/ls/clsz3pw6df2a34jt6xo3d3gjeuftmwhtidcbm3bwfi6oo5vd4ynt.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6841529Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6841747Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6841904Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6842051Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6842352Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6842487Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6842746Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6842896Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6843152Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6843346Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6843616Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6843749Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6844039Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6844235Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6844563Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6844857Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6844988Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6845472Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6845726Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6845952Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6846161Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6846361Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6846667Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6846899Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6847191Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6847442Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6847734Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6847967Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6848256Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6848499Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6848798Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6849031Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6849320Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6849553Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6849844Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6850040Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6850275Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6850567Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6850763Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6851008Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6851297Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6851527Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6851828Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6852049Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6852254Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6852457Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6852686Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6852853Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6853044Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6853146Z E1204 11:26:22.215000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6853476Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6853771Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6853903Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6854383Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6854635Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6854863Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6855067Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6855281Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6855572Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6855806Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6856111Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6856344Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6856634Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6856864Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6857172Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6857418Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6857709Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6857943Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6858235Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6858467Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6858756Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6858952Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6859185Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6859477Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6859685Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6859917Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6860209Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6860449Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6860741Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6860960Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6861165Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6861376Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6861585Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6861766Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6861945Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6862469Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/kg/ckgbjh5i4zvhzrssaf4w3rzo5ia4fltkisg7jjbod3fwgekpzhxz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6862618Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6862834Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6862989Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6863134Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6863459Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6863591Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6863848Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6864013Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6864269Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6864425Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6864707Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6864846Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6865118Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6865311Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6865639Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6865945Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6866077Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6866556Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6866812Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6867038Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6867244Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6867443Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6867734Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6867968Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6868271Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6868504Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6868796Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6869039Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6869330Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6869560Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6869849Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6870090Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6870390Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6870622Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6870912Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6871112Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6871347Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6871641Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6871836Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6872067Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6872357Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6872602Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6872891Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6873108Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6873353Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6873556Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6873767Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6873932Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6874110Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6874231Z E1204 11:26:22.217000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6874386Z [W1204 11:26:22.494833755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6874389Z 2025-12-04T11:45:25.6874709Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6875001Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6875134Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6875609Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6875865Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6876092Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6876298Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6876498Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6876803Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6877037Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6877328Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6877572Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6877862Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6878095Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6878387Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6878630Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6878940Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6879171Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6879460Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6879692Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6879982Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6880179Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6880411Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6880708Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6880903Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6881153Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6881444Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6881673Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6881978Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6882199Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6882405Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6882606Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6882829Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6883000Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6883186Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6883750Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/r4/cr4lhfrqaioh72mszu7uw33vidvvifl54t7dnzux5u3igorq5u7n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6883896Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6884112Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6884267Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6884414Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6884700Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6884832Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6885093Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6885231Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6885500Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6885656Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6885925Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6886078Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6886351Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6886545Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6886854Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6887161Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6887291Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6887785Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6888039Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6888265Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6888471Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6888673Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6888966Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6889199Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6889491Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6889736Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6890029Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6890262Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6890564Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6890796Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6891087Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6891317Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6891618Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6891859Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6892152Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6892350Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6892582Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6892871Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6893066Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6893335Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6893624Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6893857Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6894148Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6894383Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6894591Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6894806Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6895017Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6895184Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6895361Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6895462Z E1204 11:26:22.228000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6895617Z [W1204 11:26:22.496450591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6895620Z 2025-12-04T11:45:25.6895945Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6896252Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6896382Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6896858Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6897110Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6897337Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6897540Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6897739Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6898030Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6898263Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6898564Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6898796Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6899285Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6899517Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6899808Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6900037Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6900338Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6900579Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6900870Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6901101Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6901392Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6901589Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6901825Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6902117Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6902311Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6902545Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6902840Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6903080Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6903396Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6903629Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6903834Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6904038Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6904248Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6904413Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6904603Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6905142Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/uf/cufd7cduhyda3gsn2mf3vxoda7cjmfh34zd2lh6nl2gaj6lwzlsg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6905290Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6905507Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6905663Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6905808Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6906096Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6906228Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6906489Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6906626Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6906880Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6907051Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6907319Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6907452Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6907728Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6907931Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6908244Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6908540Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6908669Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6909169Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6909421Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6909646Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6909853Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6910053Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6910345Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6910579Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6910877Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6911113Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6911402Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6911648Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6911939Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6912190Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6912482Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6912714Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6913005Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6913281Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6913587Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6913783Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6914016Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6914309Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6914504Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6914738Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6915028Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6915262Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6915556Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6915781Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6916002Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6916201Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6916411Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6916596Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6916774Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6916876Z E1204 11:26:22.229000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6917031Z [W1204 11:26:22.499217942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6917033Z 2025-12-04T11:45:25.6917339Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6917649Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6917780Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6918269Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6918522Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6918746Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6918953Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6919152Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6919444Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6919680Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6919970Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6920214Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6920505Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6920748Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6921041Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6921273Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6921564Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6921804Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6922094Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6922335Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6922630Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6922826Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6923060Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6923401Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6923596Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6923825Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6924118Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6924350Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6924662Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6924880Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6925102Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6925302Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6925513Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6925679Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6925855Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6926502Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpbb3jd8f0/xd/cxd42t5trhxky44n3336i3cm3qdvfy5q2fmnzsekr4jpjdkdxdpm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.6926664Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.6926878Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.6927034Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.6927180Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.6927471Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.6927606Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.6927865Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.6928002Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.6928255Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.6928412Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.6928683Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.6928839Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.6929114Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.6929307Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.6929633Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6929926Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6930056Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6930544Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6930797Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6931037Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6931243Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6931442Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6931740Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6931974Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6932266Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6932498Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6932790Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6933021Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6933351Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6933582Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6933887Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6934121Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6934414Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6934645Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6934947Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6935143Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6935388Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6935677Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6935872Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6936105Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6936398Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6936630Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6936919Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6937139Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6937343Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6937558Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.6937767Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.6937932Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.6938122Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.6938222Z E1204 11:26:22.234000 974421 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.6938275Z ('RERUN', {'yellow': True}) [3.4268s] [100%] 2025-12-04T11:45:25.6938631Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:23.219286125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6938633Z 2025-12-04T11:45:25.6938780Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6939073Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6939379Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6939522Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6939996Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6940250Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6940474Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6940680Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6940881Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6941172Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6941407Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6941699Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6941946Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6942235Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6942477Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6942768Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6942998Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6943325Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6943559Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6943765Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6943981Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6944188Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6944386Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6944619Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6944912Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6945108Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6945338Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6945631Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6945851Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6946061Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6946277Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6946480Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6946690Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6946885Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6947103Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6947308Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6947503Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6947711Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6947945Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6948247Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6948477Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6948767Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6948987Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6949193Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6949388Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6949596Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6949795Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6950026Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6950336Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6950568Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6950859Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6951100Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6951393Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6951622Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6951915Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6952159Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6952462Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6952696Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6952987Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6953220Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6953539Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6953771Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6954061Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6954294Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6954585Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6954832Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6955128Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6955371Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6955660Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6955881Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6956081Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6956276Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.6956581Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6956825Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6957116Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6957353Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6957646Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6957877Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6958168Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6958398Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6958690Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6958920Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6959224Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6959420Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6959615Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6959824Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6960031Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6960232Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6960463Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6960769Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6960967Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6961172Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6961369Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6961563Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6961797Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6962089Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6962323Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6962615Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6962809Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6963018Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6963218Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6963499Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6963790Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6964029Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6964233Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6964435Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6964636Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6964927Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6965177Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6965483Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6965718Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6966010Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6966243Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6966540Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6966774Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6967069Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6967266Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6967465Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6967699Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6967899Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6968097Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6968309Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6968603Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6968837Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6969132Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6969376Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6969668Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6969912Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6970203Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6970436Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6970728Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6970950Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6971154Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6971354Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6971546Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6971755Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.6971972Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6972264Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6972484Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6972696Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6972894Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6973096Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6973420Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6973670Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6973963Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6974212Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6974506Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6974739Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6975034Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6975266Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6975558Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6975788Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6976087Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6976333Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6976624Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6976857Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6977168Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6977400Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6977692Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6977924Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6978227Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6978436Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6978635Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6978868Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6979162Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6979395Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6979690Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6979990Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6980286Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6980522Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6980833Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6981068Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6981361Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6981568Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6981801Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6982093Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6982324Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6982628Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6982843Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6983057Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6983275Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6983476Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6983770Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6983984Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6984187Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6984389Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6984588Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6984880Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6985119Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.6985321Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6985521Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6985727Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6985875Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.6986073Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6986292Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6986499Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6986711Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6986932Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6987154Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6987352Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6987573Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6987787Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.6987985Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6988206Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.6988413Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.6988608Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.6988807Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6989020Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6989235Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6989435Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6989634Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6989943Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6990158Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6990362Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6990559Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6990751Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.6990958Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.6991181Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6991385Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6991583Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6991783Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6992080Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6992295Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6992495Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6992692Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6992894Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6993185Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6993445Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.6993645Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.6993846Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.6994068Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6994364Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6994563Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.6994764Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.6994953Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.6995164Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.6995391Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.6995596Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.6995792Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.6995981Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.6996162Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.6996332Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.6996461Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.6996564Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.6996691Z E1204 11:26:23.958000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.6996850Z [W1204 11:26:23.227499527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.6996853Z 2025-12-04T11:45:25.6996998Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.6997293Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.6997602Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.6997734Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.6998213Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.6998476Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.6998703Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.6998910Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.6999122Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.6999414Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.6999661Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.6999952Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7000184Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7000476Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7000708Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7001000Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7001232Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7001524Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7001744Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7001959Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7002155Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7002362Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7002571Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7002805Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7003096Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7003328Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7003576Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7003958Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7004180Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7004379Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7004599Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7004804Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7005001Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7005197Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7005418Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7005620Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7005817Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7006013Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7006263Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7006554Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7006798Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7007091Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7007311Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7007518Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7007712Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7007931Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7008148Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7008382Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7008673Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7008904Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7009195Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7009426Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7009716Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7009947Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7010237Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7010478Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7010772Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7011004Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7011306Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7011536Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7011827Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7012057Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7012360Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7012600Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7012891Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7013123Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7013453Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7013684Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7013974Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7014193Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7014395Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7014590Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7014899Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7015129Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7015424Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7015670Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7015961Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7016193Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7016484Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7016728Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7017033Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7017264Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7017554Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7017753Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7017951Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7018147Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7018353Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7018551Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7018783Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7019074Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7019295Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7019489Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7019684Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7019894Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7020127Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7020419Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7020649Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7020952Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7021147Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7021365Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7021566Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7021800Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7022095Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7022320Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7022523Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7022720Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7022921Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7023215Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7023496Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7023788Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7024020Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7024332Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7024565Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7024860Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7025094Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7025406Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7025617Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7025813Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7026033Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7026235Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7026435Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7026637Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7026930Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7027165Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7027458Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7027692Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7028001Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7028236Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7028540Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7028771Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7029065Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7029284Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7029500Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7029704Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7029905Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7030116Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7030315Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7030608Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7030828Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7031032Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7031229Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7031429Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7031725Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7031958Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7032261Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7032493Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7032797Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7033029Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7033352Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7033585Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7033890Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7034125Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7034430Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7034664Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7034957Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7035189Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7035484Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7035716Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7036007Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7036240Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7036548Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7036746Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7036942Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7037187Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7037478Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7037711Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7038002Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7038247Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7038540Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7038784Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7039077Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7039309Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7039602Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7039801Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7040032Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7040325Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7040558Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7040850Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7041075Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7041278Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7041493Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7041694Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7041989Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7042202Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7042403Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7042614Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7042815Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7043120Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7043365Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7043569Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7043769Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7043964Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7044111Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7044306Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7044525Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7044733Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7044930Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7045169Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7045373Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7045570Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7045808Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7046013Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7046209Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7046429Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7046632Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7046844Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7047053Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7047267Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7047467Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7047665Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7047868Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7048166Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7048379Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7048579Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7048779Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7048969Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7049175Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7049386Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7049588Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7049798Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7049997Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7050294Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7050505Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7050706Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7050915Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7051133Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7051426Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7051639Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7051842Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7052040Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7052240Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7052537Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7052732Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7052935Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7053127Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7053363Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7053575Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7053782Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7053994Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7054182Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7054363Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7054535Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7054661Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7054766Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7054906Z E1204 11:26:23.960000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7055061Z [W1204 11:26:23.229643326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7055063Z 2025-12-04T11:45:25.7055221Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7055516Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7055815Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7055947Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7056427Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7056681Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7056906Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7057116Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7057315Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7057630Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7057862Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7058154Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7058399Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7058689Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7058923Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7059224Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7059459Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7059758Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7059978Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7060182Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7060381Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7060589Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7060791Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7061021Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7061310Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7061508Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7061743Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7062044Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7062264Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7062473Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7062691Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7062896Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7063092Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7063305Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7063538Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7063744Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7063954Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7064151Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7064383Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7064677Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7064911Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7065202Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7065424Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7065630Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7065828Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7066049Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7066252Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7066484Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7066787Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7067020Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7067310Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7067541Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7067841Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7068071Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7068373Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7068607Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7068899Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7069129Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7069422Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7069652Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7069945Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7070182Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7070483Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7070714Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7071009Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7071251Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7071541Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7071773Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7072064Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7072295Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7072514Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7072709Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7073001Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7073236Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7073553Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7073786Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7074075Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7074307Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7074597Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7074841Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7075130Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7075360Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7075667Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7075865Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7076060Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7076254Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7076473Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7076673Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7076919Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7077212Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7077406Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7077601Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7077794Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7077992Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7078226Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7078514Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7078749Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7079040Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7079247Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7079453Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7079670Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7079905Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7080200Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7080422Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7080624Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7080835Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7081034Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7081338Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7081571Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7081864Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7082096Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7082388Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7082624Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7082917Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7083150Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7083472Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7083669Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7083867Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7084103Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7084305Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7084504Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7084705Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7085001Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7085251Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7085558Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7085791Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7086086Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7086319Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7086611Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7086845Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7087135Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7087358Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7087559Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7087774Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7087965Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7088176Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7088390Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7088681Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7088902Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7089103Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7089301Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7089519Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7089823Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7090056Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7090348Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7090583Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7090874Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7091107Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7091396Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7091631Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7091930Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7092173Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7092465Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7092708Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7093001Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7093235Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7093545Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7093794Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7094087Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7094336Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7094628Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7094825Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7095025Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7095257Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7095551Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7095782Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7096077Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7096310Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7096619Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7096855Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7097161Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7097393Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7097687Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7097883Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7098128Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7098424Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7098669Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7098963Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7099178Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7099383Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7099583Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7099785Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7100081Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7100298Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7100500Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7100711Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7100910Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7101205Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7101438Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7101642Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7101843Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7102033Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7102181Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7102387Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7102607Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7102825Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7103022Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7103241Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7103483Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7103683Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7103908Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7104113Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7104307Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7104530Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7104734Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7104949Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7105145Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7105357Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7105579Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7105779Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7105984Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7106278Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7106493Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7106708Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7106918Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7107113Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7107307Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7107520Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7107722Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7107926Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7108127Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7108423Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7108639Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7108839Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7109051Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7109251Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7109543Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7109766Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7109969Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7110168Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7110369Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7110664Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7110870Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7111087Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7111278Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7111474Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7111687Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7111892Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7112092Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7112282Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7112465Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7112637Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7112766Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7112869Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7112996Z E1204 11:26:23.963000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7113164Z [W1204 11:26:24.270216729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7113166Z 2025-12-04T11:45:25.7113332Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7113627Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7113939Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7114069Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7114550Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7114818Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7115048Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7115268Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7115467Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7115757Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7115993Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7116283Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7116516Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7116808Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7117039Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7117332Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7117577Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7117867Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7118099Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7118307Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7118505Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7118712Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7118911Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7119153Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7119447Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7119653Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7119887Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7120176Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7120398Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7120595Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7120815Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7121020Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7121216Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7121409Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7121627Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7121849Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7122045Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7122251Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7122486Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7122781Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7123013Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7123326Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7123564Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7123781Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7123978Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7124188Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7124390Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7124625Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7124915Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7125149Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7125439Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7125673Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7125965Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7126207Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7126501Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7126747Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7127040Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7127274Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7127562Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7127802Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7128102Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7128335Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7128627Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7128861Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7129153Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7129386Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7129678Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7129908Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7130201Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7130433Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7130633Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7130829Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7131130Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7131364Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7131656Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7131887Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7132186Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7132417Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7132720Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7132951Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7133243Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7133501Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7133797Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7133993Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7134187Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7134383Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7134589Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7134805Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7135035Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7135328Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7135541Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7135738Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7135937Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7136131Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7136379Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7136671Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7136921Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7137213Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7137406Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7137617Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7137818Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7138052Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7138349Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7138572Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7138775Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7138986Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7139187Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7139481Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7139726Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7140019Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7140254Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7140550Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7140795Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7141098Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7141330Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7141623Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7141821Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7142017Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7143808Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7144015Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7144215Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7144420Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7144716Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7144972Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7145266Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7145499Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7145809Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7146043Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7146335Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7146567Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7146878Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7147114Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7147316Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7147514Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7147708Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7147918Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7148120Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7148411Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7148631Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7148838Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7149037Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7149252Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7149543Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7149777Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7150080Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7150314Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7150607Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7150839Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7151142Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7151385Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7151680Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7151914Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7152206Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7152438Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7152729Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7152961Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7153290Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7153523Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7153829Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7154062Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7154369Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7154565Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7154763Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7154993Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7155284Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7155537Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7155841Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7156074Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7156367Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7156601Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7156893Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7157126Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7157500Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7157698Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7157932Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7158240Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7158473Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7158769Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7158995Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7159199Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7159399Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7159600Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7159903Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7160127Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7160330Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7160526Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7160725Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7161020Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7161242Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7161443Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7161643Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7161835Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7161983Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7162180Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7162411Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7162618Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7162812Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7163044Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7163272Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7163469Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7163689Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7163914Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7164111Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7164344Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7164552Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7164749Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7164945Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7165156Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7165360Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7165561Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7165760Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7166055Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7166267Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7166481Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7166679Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7166870Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7167079Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7167291Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7167494Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7167694Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7167898Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7168199Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7168422Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7168624Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7168822Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7169021Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7169315Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7169528Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7169727Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7169926Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7170128Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7170422Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7170629Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7170829Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7171017Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7171228Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7171440Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7171645Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7171841Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7172030Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7172223Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7172395Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7172533Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7172639Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7172765Z E1204 11:26:24.003000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7172922Z [W1204 11:26:24.272356458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7172925Z 2025-12-04T11:45:25.7173072Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7173397Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7173694Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7173826Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7174307Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7174562Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7174804Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7175010Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7175211Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7175520Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7175755Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7176047Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7176278Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7176585Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7176832Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7177125Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7177357Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7177651Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7177875Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7178080Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7178277Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7178485Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7178685Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7178918Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7179221Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7179418Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7179659Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7179950Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7180170Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7180366Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7180583Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7180798Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7180993Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7181198Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7181418Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7181624Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7181821Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7182014Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7182246Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7182537Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7182769Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7183061Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7183317Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7183521Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7183718Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7183943Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7184142Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7184374Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7184663Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7184893Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7185199Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7185444Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7185736Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7185966Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7186261Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7186494Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7186783Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7187013Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7187306Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7187536Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7187845Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7188075Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7188380Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7188612Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7188902Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7189132Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7189433Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7189664Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7189964Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7190183Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7190385Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7190581Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7190880Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7191113Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7191406Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7191639Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7191929Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7192172Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7192461Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7192703Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7192993Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7193227Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7193548Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7193757Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7193953Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7194163Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7194370Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7194567Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7194800Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7195089Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7195286Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7195482Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7195678Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7195874Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7196104Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7196409Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7196638Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7196928Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7197137Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7197345Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7197546Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7197780Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7198086Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7198317Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7198519Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7198717Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7198917Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7199211Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7199446Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7199738Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7199970Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7200265Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7200499Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7200800Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7201032Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7201334Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7201533Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7201729Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7201950Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7202151Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7202360Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7202582Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7202875Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7203109Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7203430Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7203663Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7203956Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7204188Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7204479Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7204711Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7205020Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7205241Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7205442Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7205655Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7205847Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7206056Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7206255Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7206559Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7206779Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7207006Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7207207Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7207407Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7207702Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7207934Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7208228Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7208459Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7208751Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7208983Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7209287Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7209521Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7209812Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7210055Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7210348Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7210581Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7210873Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7211116Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7211419Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7211653Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7211949Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7212185Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7212475Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7212673Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7212868Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7213101Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7213423Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7213674Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7213967Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7214201Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7214509Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7214743Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7215037Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7215268Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7215574Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7215784Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7216016Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7216308Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7216541Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7216835Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7217049Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7217251Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7217448Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7217649Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7217944Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7218170Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7218370Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7218578Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7218778Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7219077Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7219297Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7219499Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7219713Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7219905Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7220064Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7220261Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7220481Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7220689Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7220884Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7221108Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7221315Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7221512Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7221735Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7221940Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7222148Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7222368Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7222573Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7222783Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7222977Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7223191Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7223424Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7223626Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7223843Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7224150Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7224363Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7224562Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7224760Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7224950Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7225147Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7225359Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7225558Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7225761Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7225960Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7226254Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7226484Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7226686Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7226898Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7227098Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7227395Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7227606Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7227808Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7228019Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7228229Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7228521Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7228716Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7228918Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7229107Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7229303Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7229515Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7229720Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7229916Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7230106Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7230288Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7230470Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7230596Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7230698Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7230834Z E1204 11:26:24.005000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7230990Z [W1204 11:26:24.274524156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7230993Z 2025-12-04T11:45:25.7231138Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7231435Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7231733Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7231864Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7232364Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7232620Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7232845Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7233052Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7233281Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7233574Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7233808Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7234099Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7234332Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7234638Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7234871Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7235161Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7235415Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7235707Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7235928Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7236134Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7236344Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7236551Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7236763Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7236998Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7237292Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7237488Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7237721Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7238011Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7238231Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7238426Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7238644Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7238864Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7239058Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7239254Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7239489Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7239693Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7239889Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7240083Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7240313Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7240622Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7240864Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7241156Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7241373Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7241578Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7241776Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7241985Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7242183Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7242414Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7242706Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7242936Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7243237Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7243510Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7243799Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7244048Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7244343Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7244572Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7244877Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7245108Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7245411Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7245642Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7245931Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7246163Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7246459Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7246693Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7246983Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7247215Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7247506Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7247749Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7248040Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7248269Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7248470Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7248666Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7248961Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7249192Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7249494Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7249737Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7250027Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7250258Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7250549Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7250781Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7251072Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7251306Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7251599Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7251796Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7252007Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7252202Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7252409Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7252620Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7252851Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7253143Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7253448Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7253662Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7253858Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7254066Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7254298Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7254589Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7254822Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7255112Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7255309Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7255514Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7255717Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7255958Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7256251Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7256486Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7256687Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7256901Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7257101Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7257396Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7257630Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7257932Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7258166Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7258472Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7258706Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7258998Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7259233Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7259526Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7259725Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7259921Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7260143Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7260346Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7260556Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7260759Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7261053Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7261299Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7261593Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7261826Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7262118Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7262362Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7262665Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7262901Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7263195Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7263459Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7263661Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7263862Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7264054Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7264267Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7264469Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7264762Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7264997Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7265198Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7265401Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7265616Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7265910Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7266145Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7266437Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7266683Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7266993Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7267230Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7267522Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7267757Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7268052Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7268288Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7268582Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7268814Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7269108Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7269353Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7269645Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7269877Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7270184Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7270420Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7270712Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7270910Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7271119Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7271361Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7271655Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7271887Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7272182Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7272415Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7272711Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7272943Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7273237Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7273514Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7273819Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7274017Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7274251Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7274557Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7274793Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7275089Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7275303Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7275518Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7275719Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7275933Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7276226Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7276439Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7276643Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7276843Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7277043Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7277341Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7277564Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7277765Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7277974Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7278165Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7278312Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7278507Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7278739Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7278947Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7279148Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7279369Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7279588Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7279784Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7280015Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7280222Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7280417Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7280639Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7280844Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7281043Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7281238Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7281451Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7281657Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7281858Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7282070Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7282365Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7282577Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7282794Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7282991Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7283187Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7283432Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7283645Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7283872Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7284076Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7284290Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7284584Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7284795Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7284997Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7285196Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7285398Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7285691Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7285905Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7286108Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7286321Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7286521Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7286815Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7287026Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7287228Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7287420Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7287617Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7287830Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7288047Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7288247Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7288448Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7288631Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7288801Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7288929Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7289032Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7289159Z E1204 11:26:24.008000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7289213Z ('RERUN', {'yellow': True}) [1.6019s] [100%] 2025-12-04T11:45:25.7289568Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:25.633745502 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7289571Z 2025-12-04T11:45:25.7289715Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7290013Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7290309Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7290450Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7290936Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7291201Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7291426Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7291633Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7291831Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7292137Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7292372Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7292679Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7292912Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7293209Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7293474Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7293857Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7294089Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7294383Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7294606Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7294815Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7295035Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7295242Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7295441Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7295693Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7295987Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7296184Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7296416Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7296721Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7296954Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7297152Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7297371Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7297578Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7297777Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7297974Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7298195Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7298405Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7298600Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7298799Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7299031Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7299338Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7299569Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7299878Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7300098Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7300304Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7300499Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7300704Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7300915Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7301158Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7301453Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7301687Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7301980Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7302213Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7302508Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7302740Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7303032Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7303296Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7303601Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7303830Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7304120Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7304364Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7304657Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7304892Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7305185Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7305430Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7305732Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7305964Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7306255Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7306487Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7306779Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7306999Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7307202Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7307400Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7307694Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7307938Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7308229Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7308459Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7308760Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7308993Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7309282Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7309513Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7309817Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7310058Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7310349Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7310543Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7310739Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7310934Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7311143Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7311343Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7311572Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7311868Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7312065Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7312270Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7312464Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7312659Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7312903Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7313195Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7313466Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7313755Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7313966Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7314174Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7314394Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7314628Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7314922Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7315145Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7315347Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7315548Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7315747Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7316043Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7316276Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7316593Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7316828Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7317122Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7317367Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7317660Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7317893Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7318186Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7318395Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7318603Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7318825Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7319028Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7319227Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7319429Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7319723Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7319956Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7320248Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7320481Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7320773Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7321017Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7321314Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7321559Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7321850Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7322073Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7322274Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7322474Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7322677Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7322902Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7323105Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7323434Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7323657Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7323860Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7324064Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7324263Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7324555Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7324791Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7325084Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7325330Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7325621Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7325868Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7326162Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7326396Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7326690Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7326935Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7327242Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7327474Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7327766Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7328001Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7328295Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7328530Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7328823Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7329056Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7329348Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7329558Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7329755Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7329986Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7330293Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7330526Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7330823Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7331055Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7331358Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7331590Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7331903Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7332136Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7332429Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7332626Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7332859Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7333154Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7333425Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7333719Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7333948Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7334149Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7334348Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7334564Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7334856Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7335071Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7335271Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7335471Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7335685Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7335990Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7336210Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7336411Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7336608Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7336801Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7336948Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7337145Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7337365Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7337571Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7337771Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7337991Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7338210Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7338405Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7338623Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7338839Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7339034Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7339254Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7339457Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7339655Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7339864Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7340093Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7340297Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7340495Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7340697Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7340993Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7341207Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7341407Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7341604Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7341796Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7341991Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7342217Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7342418Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7342617Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7342830Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7343123Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7343365Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7343565Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7343763Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7343977Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7344284Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7344499Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7344702Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7344901Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7345101Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7345393Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7345589Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7345791Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7345980Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7346175Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7346400Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7346607Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7346813Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7347014Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7347194Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7347366Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7347493Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7347595Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7347722Z E1204 11:26:25.367000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7347878Z [W1204 11:26:25.636124708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7347881Z 2025-12-04T11:45:25.7348048Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7348355Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7348650Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7348781Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7349263Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7349518Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7349746Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7349951Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7350152Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7350443Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7350691Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7350982Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7351230Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7351524Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7351757Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7352051Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7352292Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7352584Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7352814Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7353019Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7353218Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7353448Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7353649Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7353883Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7354175Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7354370Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7354604Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7354899Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7355131Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7355325Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7355557Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7355762Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7355961Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7356156Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7356373Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7356590Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7356786Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7356994Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7357226Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7357517Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7357751Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7358042Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7358264Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7358470Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7358666Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7358877Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7359087Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7359322Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7359611Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7359854Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7360145Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7360378Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7360675Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7360915Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7361215Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7361446Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7361735Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7361969Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7362260Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7362491Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7362779Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7363014Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7363333Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7363579Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7363870Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7364100Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7364412Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7364645Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7364936Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7365155Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7365374Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7365583Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7365875Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7366109Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7366401Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7366632Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7366925Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7367154Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7367448Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7367682Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7367987Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7368294Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7368586Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7368795Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7368990Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7369190Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7369396Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7369594Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7369836Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7370142Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7370339Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7370533Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7370730Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7370924Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7371156Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7371448Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7371679Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7371970Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7372178Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7372389Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7372593Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7372830Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7373134Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7373394Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7373598Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7373795Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7374013Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7374318Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7374554Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7374851Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7375087Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7375383Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7375617Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7375909Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7376142Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7376437Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7376647Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7376847Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7377069Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7377285Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7377487Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7377690Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7377983Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7378216Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7378521Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7378764Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7379055Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7379287Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7379584Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7379818Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7380113Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7380333Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7380540Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7380740Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7380951Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7381162Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7381361Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7381666Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7381890Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7382092Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7382291Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7382490Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7382794Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7383042Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7383356Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7383590Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7383886Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7384124Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7384418Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7384648Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7384943Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7385177Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7385486Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7385718Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7386026Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7386260Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7386557Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7386792Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7387098Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7387331Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7387637Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7387834Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7388032Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7388267Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7388562Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7388800Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7389096Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7389330Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7389623Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7389867Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7390160Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7390403Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7390697Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7390893Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7391126Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7391430Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7391664Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7391967Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7392181Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7392384Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7392585Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7392786Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7393078Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7393324Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7393531Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7393732Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7393949Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7394240Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7394461Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7394677Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7394875Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7395070Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7395219Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7395417Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7395650Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7395860Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7396078Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7396300Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7396506Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7396705Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7396925Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7397133Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7397327Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7397546Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7397753Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7397950Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7398158Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7398370Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7398572Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7398783Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7398982Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7399279Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7399491Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7399692Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7399901Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7400105Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7400311Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7400525Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7400728Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7400927Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7401127Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7401422Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7401636Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7401839Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7402036Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7402248Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7402543Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7402760Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7402974Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7403175Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7403403Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7403697Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7403893Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7404113Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7404322Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7404517Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7404730Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7404937Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7405135Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7405327Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7405509Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7405683Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7405808Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7405913Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7406039Z E1204 11:26:25.369000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7406197Z [W1204 11:26:25.638286207 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7406212Z 2025-12-04T11:45:25.7406356Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7406651Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7406946Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7407090Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7407573Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7407826Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7408062Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7408267Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7408479Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7408772Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7409006Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7409301Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7409534Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7409826Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7410057Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7410352Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7410584Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7410886Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7411105Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7411320Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7411516Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7411726Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7411926Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7412157Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7412463Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7412658Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7412901Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7413191Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7413534Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7413731Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7413950Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7414156Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7414352Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7414546Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7414767Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7414972Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7415184Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7415380Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7415612Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7415921Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7416154Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7416448Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7416667Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7416888Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7417096Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7417303Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7417505Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7417735Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7418028Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7418262Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7418556Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7418791Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7419082Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7419315Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7419614Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7419845Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7420145Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7420379Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7420671Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7420900Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7421205Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7421447Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7421739Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7421968Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7422261Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7422493Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7422783Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7423015Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7423338Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7423559Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7423781Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7423975Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7424266Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7424511Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7424803Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7425036Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7425328Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7425573Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7425876Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7426110Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7426400Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7426635Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7426922Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7427123Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7427320Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7427514Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7427723Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7427920Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7428166Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7428458Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7428652Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7428865Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7429061Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7429258Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7429488Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7429793Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7430025Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7430325Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7430525Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7430730Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7430934Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7431167Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7431461Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7431682Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7431887Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7432085Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7432297Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7432591Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7432823Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7433131Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7433396Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7433692Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7433927Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7434234Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7434481Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7434774Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7434971Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7435170Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7435396Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7435601Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7435797Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7436000Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7436294Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7436527Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7436833Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7437067Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7437359Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7437608Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7437902Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7438134Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7438437Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7438658Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7438874Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7439074Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7439265Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7439476Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7439675Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7439974Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7440193Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7440395Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7440595Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7440794Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7441100Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7441334Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7441628Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7441871Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7442167Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7442402Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7442694Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7442937Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7443241Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7443501Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7443793Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7444026Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7444319Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7444555Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7444852Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7445084Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7445379Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7445632Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7445922Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7446132Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7446327Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7446564Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7446855Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7447091Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7447401Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7447646Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7447938Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7448169Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7448464Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7448695Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7448988Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7449186Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7449421Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7449717Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7449958Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7450250Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7450478Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7450679Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7450880Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7451079Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7451372Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7451596Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7451800Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7452010Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7453753Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7454049Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7454273Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7454476Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7454677Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7454869Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7455017Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7455216Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7455439Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7455667Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7455863Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7456082Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7456303Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7456498Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7456723Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7456929Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7457125Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7457359Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7457577Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7457777Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7457972Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7458185Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7458387Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7458586Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7458787Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7459079Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7459296Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7459496Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7459708Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7459898Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7460095Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7460320Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7460527Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7460727Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7460925Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7461217Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7461443Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7461665Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7461864Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7462064Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7462360Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7462572Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7462775Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7462971Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7463172Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7463493Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7463689Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7463912Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7464106Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7464306Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7464533Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7464740Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7464938Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7465127Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7465307Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7465491Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7465619Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7465722Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7465862Z E1204 11:26:25.371000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7466019Z [W1204 11:26:25.679862995 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7466022Z 2025-12-04T11:45:25.7466168Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7466462Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7466762Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7466896Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7467378Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7467634Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7467860Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7468077Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7468276Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7468567Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7468812Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7469107Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7469343Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7469632Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7469873Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7470176Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7470410Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7470700Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7470921Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7471128Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7471324Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7471536Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7471735Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7471969Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7472260Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7472466Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7472699Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7473000Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7473219Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7473441Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7473662Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7473871Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7474082Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7474277Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7474510Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7474715Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7474909Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7475107Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7475337Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7475629Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7475861Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7476154Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7476374Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7476592Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7476786Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7476996Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7477209Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7477441Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7477734Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7477964Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7478274Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7478512Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7478813Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7479045Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7479335Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7479566Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7479858Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7480087Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7480377Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7480610Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7480910Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7481150Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7481440Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7481682Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7481972Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7482202Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7482495Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7482737Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7483044Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7483289Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7483490Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7483685Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7483978Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7484212Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7484501Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7484731Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7485029Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7485269Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7485579Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7485810Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7486114Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7486345Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7486635Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7486829Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7487037Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7487238Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7487462Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7487662Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7487894Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7488188Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7488382Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7488578Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7488771Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7488964Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7489197Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7489487Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7489730Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7490021Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7490232Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7490440Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7490644Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7490877Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7491170Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7491404Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7491616Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7491816Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7492015Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7492312Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7492548Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7492841Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7493073Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7493403Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7493640Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7493945Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7494201Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7494507Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7494729Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7494928Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7495150Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7495351Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7495550Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7495775Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7496085Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7496318Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7496612Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7496848Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7497143Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7497377Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7497669Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7497905Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7498195Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7498429Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7498630Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7498829Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7499031Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7499243Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7499447Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7499740Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7499971Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7500174Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7500384Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7500584Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7500876Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7501116Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7501408Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7501640Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7501940Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7502177Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7502467Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7502715Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7503007Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7503244Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7503565Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7503797Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7504093Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7504330Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7504653Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7504899Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7505190Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7505423Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7505715Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7505915Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7506113Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7506346Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7506642Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7506876Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7507190Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7507421Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7507714Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7507959Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7508251Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7508484Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7508784Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7508984Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7509226Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7509518Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7509750Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7510042Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7510256Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7510457Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7510657Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7510858Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7511155Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7511389Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7511589Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7511787Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7511998Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7512289Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7512511Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7512711Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7512908Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7513110Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7513288Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7513498Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7513721Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7513926Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7514123Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7514344Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7514552Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7514745Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7514964Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7515170Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7515364Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7515600Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7515806Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7516076Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7516286Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7516497Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7516701Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7516900Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7517100Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7517410Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7517634Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7517837Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7518035Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7518229Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7518425Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7518638Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7518838Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7519035Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7519234Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7519528Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7519753Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7519953Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7520151Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7520362Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7520657Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7520871Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7521071Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7521270Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7521480Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7521783Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7521979Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7522181Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7522370Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7522569Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7522782Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7522988Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7523184Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7523399Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7523581Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7523750Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7523893Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7523996Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7524122Z E1204 11:26:25.413000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7524278Z [W1204 11:26:25.682019194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7524294Z 2025-12-04T11:45:25.7524438Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7524734Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7525031Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7525162Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7525652Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7525925Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7526151Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7526356Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7526560Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7526850Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7527088Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7527380Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7527614Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7527906Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7528148Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7528438Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7528668Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7528970Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7529190Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7529400Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7529597Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7529818Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7530019Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7530260Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7530552Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7530748Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7530981Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7531271Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7531491Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7531688Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7531908Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7532113Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7532321Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7532515Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7532733Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7532947Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7533143Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7533366Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7533601Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7533897Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7534147Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7534453Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7534672Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7534876Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7535071Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7535281Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7535479Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7535712Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7536003Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7536238Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7536532Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7536775Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7537065Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7537308Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7537598Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7537832Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7538120Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7538362Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7538656Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7538899Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7539188Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7539418Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7539712Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7539945Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7540238Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7540467Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7540758Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7540991Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7541293Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7541512Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7541723Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7541920Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7542212Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7542442Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7542748Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7542979Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7543309Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7543541Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7543832Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7544063Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7544354Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7544585Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7544874Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7545072Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7545266Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7545476Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7545684Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7545887Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7546132Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7546424Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7546620Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7546813Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7547008Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7547215Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7547458Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7547749Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7547982Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7548278Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7548473Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7548682Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7548882Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7549119Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7549412Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7549645Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7549846Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7550044Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7550260Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7550554Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7550789Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7551082Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7551325Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7551618Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7551861Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7552154Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7552386Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7552681Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7552881Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7553077Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7553323Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7553525Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7553723Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7553936Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7554228Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7554461Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7554767Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7555002Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7555296Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7555529Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7555833Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7556079Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7556372Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7556592Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7556794Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7556993Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7557186Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7557397Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7557598Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7557892Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7558111Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7558325Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7558522Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7558721Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7559030Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7559263Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7559558Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7559791Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7560095Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7560336Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7560630Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7560862Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7561156Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7561387Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7561681Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7561912Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7562206Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7562439Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7562742Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7562973Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7563291Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7563538Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7563832Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7564027Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7564224Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7564472Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7564776Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7565009Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7565300Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7565533Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7565824Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7566056Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7566349Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7566581Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7566875Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7567086Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7567318Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7567610Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7567855Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7568148Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7568360Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7568561Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7568769Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7568969Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7569275Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7569489Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7569691Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7569890Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7570089Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7570382Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7570602Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7570805Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7571002Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7571210Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7571358Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7571555Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7571774Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7571993Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7572189Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7572410Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7572614Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7572808Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7573042Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7573293Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7573487Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7573709Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7573916Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7574112Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7574310Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7574523Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7574724Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7574925Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7575125Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7575438Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7575649Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7575853Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7576067Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7576258Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7576458Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7576671Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7576873Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7577083Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7577283Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7577586Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7577799Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7578000Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7578200Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7578402Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7578695Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7578908Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7579110Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7579308Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7579587Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7579880Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7580076Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7580291Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7580483Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7580682Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7580897Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7581101Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7581309Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7581498Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7581687Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7581858Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7581985Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7582088Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7582215Z E1204 11:26:25.415000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7582371Z [W1204 11:26:25.684163673 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7582374Z 2025-12-04T11:45:25.7582519Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7582815Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7583111Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7583242Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7583753Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7584021Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7584245Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7584464Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7584666Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7584963Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7585202Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7585507Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7585739Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7586045Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7586276Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7586567Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7586799Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7587090Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7587309Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7587515Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7587713Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7587920Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7588130Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7588362Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7588654Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7588859Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7589091Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7589385Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7589604Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7589811Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7590045Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7590250Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7590445Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7590639Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7590858Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7591062Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7591259Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7591454Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7591684Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7591979Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7592223Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7592514Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7592733Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7592950Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7593145Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7593395Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7593594Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7593823Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7594134Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7594385Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7594677Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7594907Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7595197Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7595429Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7595719Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7595951Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7596242Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7596476Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7596783Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7597013Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7597306Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7597551Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7597844Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7598079Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7598380Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7598612Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7598915Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7599148Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7599438Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7599659Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7599861Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7600057Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7600348Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7600580Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7600871Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7601115Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7601410Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7601642Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7601942Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7602174Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7602463Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7602694Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7602997Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7603202Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7603430Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7603626Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7603835Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7604032Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7604267Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7604558Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7604754Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7604952Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7605147Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7605360Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7605589Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7605881Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7606126Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7606418Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7606617Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7606824Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7607050Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7607285Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7607592Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7607814Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7608017Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7608220Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7608422Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7608716Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7608948Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7609244Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7609476Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7609781Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7610014Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7610307Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7610550Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7610845Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7611044Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7611240Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7611476Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7611690Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7611890Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7612093Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7612384Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7612618Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7612914Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7613148Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7613467Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7613700Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7613993Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7614241Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7614533Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7614768Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7614972Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7615174Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7615366Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7615579Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7615792Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7616098Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7616319Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7616521Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7616724Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7616921Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7617215Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7617448Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7617743Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7617976Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7618268Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7618512Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7618803Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7619047Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7619339Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7619576Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7619872Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7620118Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7620420Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7620651Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7620944Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7621177Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7621469Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7621702Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7621994Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7622193Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7622392Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7622641Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7622933Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7623169Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7623508Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7623742Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7624036Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7624269Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7624578Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7624826Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7625119Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7625316Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7625549Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7625840Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7626073Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7626365Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7626581Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7626785Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7626986Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7627200Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7627493Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7627720Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7627921Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7628122Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7628321Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7628614Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7628846Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7629057Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7629257Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7629450Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7629598Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7629797Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7630017Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7630224Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7630420Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7630638Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7630847Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7631042Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7631278Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7631485Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7631680Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7631912Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7632117Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7632315Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7632509Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7632723Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7632934Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7633146Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7633381Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7633674Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7633888Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7634090Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7634290Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7634482Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7634678Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7634893Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7635093Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7635306Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7635505Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7635797Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7636023Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7636227Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7636426Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7636627Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7636919Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7637145Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7637360Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7637558Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7637757Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7638049Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7638245Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7638450Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7638639Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7638836Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7639048Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7639254Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7639469Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7639659Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7639839Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7640008Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7640146Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7640249Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7640379Z E1204 11:26:25.417000 974421 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7640422Z FAILED [1.5613s] [100%] 2025-12-04T11:45:25.7640424Z 2025-12-04T11:45:25.7640481Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.7640644Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.7640695Z Traceback (most recent call last): 2025-12-04T11:45:25.7640856Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7640901Z method(*args, **kwargs) 2025-12-04T11:45:25.7641064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7641108Z method(*args, **kwargs) 2025-12-04T11:45:25.7641267Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.7641308Z with policy(): 2025-12-04T11:45:25.7641463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.7641506Z raise RuntimeError(msg) 2025-12-04T11:45:25.7641918Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.7641923Z 2025-12-04T11:45:25.7642001Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.7642280Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.7642285Z 2025-12-04T11:45:25.7642373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.7642455Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7642499Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7642560Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7643123Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7643227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7643294Z graph_break [] 2025-12-04T11:45:25.7643363Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7643439Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7643932Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.7644003Z current_size = base.storage().size() 2025-12-04T11:45:25.7644045Z Autotune Choices Stats: 2025-12-04T11:45:25.7644426Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008798999711871147, "best_triton_pos": 0} 2025-12-04T11:45:25.7644495Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7644548Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7644670Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7644914Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7645164Z triton_mm_33 0.0092 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7645408Z triton_mm_29 0.0108 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7645637Z triton_mm_21 0.0109 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7645865Z triton_mm_16 0.0110 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7646092Z triton_mm_30 0.0112 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7646319Z triton_mm_22 0.0114 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7646547Z triton_mm_23 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7646775Z triton_mm_15 0.0123 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7647003Z triton_mm_31 0.0126 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7647153Z SingleProcess AUTOTUNE benchmarking takes 0.1567 seconds and 1.1694 seconds precompiling for 33 choices 2025-12-04T11:45:25.7647314Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.7647360Z Traceback (most recent call last): 2025-12-04T11:45:25.7647517Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7647559Z method(*args, **kwargs) 2025-12-04T11:45:25.7647713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7647771Z method(*args, **kwargs) 2025-12-04T11:45:25.7647923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.7647961Z with policy(): 2025-12-04T11:45:25.7648115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.7648157Z raise RuntimeError(msg) 2025-12-04T11:45:25.7648566Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.7648569Z 2025-12-04T11:45:25.7648645Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.7648931Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.7648933Z 2025-12-04T11:45:25.7649022Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.7649110Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7649153Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7649210Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7649763Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7649865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7649902Z graph_break [] 2025-12-04T11:45:25.7649968Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7650044Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7650533Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.7650581Z current_size = base.storage().size() 2025-12-04T11:45:25.7650623Z Autotune Choices Stats: 2025-12-04T11:45:25.7650997Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008798999711871147, "best_triton_pos": 0} 2025-12-04T11:45:25.7651074Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7651127Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7651248Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7651485Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7651718Z triton_mm_33 0.0092 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7651956Z triton_mm_29 0.0108 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7652184Z triton_mm_21 0.0109 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7652410Z triton_mm_16 0.0110 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7652646Z triton_mm_30 0.0112 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7652873Z triton_mm_22 0.0114 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7653120Z triton_mm_23 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7653369Z triton_mm_15 0.0123 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7653599Z triton_mm_31 0.0126 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7653731Z SingleProcess AUTOTUNE benchmarking takes 0.1567 seconds and 1.1694 seconds precompiling for 33 choices 2025-12-04T11:45:25.7653807Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7653852Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7653910Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7654010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7654499Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7654540Z graph_break [] 2025-12-04T11:45:25.7654603Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7654677Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7654717Z Autotune Choices Stats: 2025-12-04T11:45:25.7655100Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:25.7655164Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7655215Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7655334Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7655586Z triton_mm_72 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7655817Z triton_mm_71 0.0092 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7655863Z _scaled_mm 0.0094 ms 92.7% 2025-12-04T11:45:25.7656088Z triton_mm_54 0.0110 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7656313Z triton_mm_68 0.0112 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7656553Z triton_mm_59 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7656792Z triton_mm_60 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7657019Z triton_mm_67 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7657248Z triton_mm_61 0.0118 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7657480Z triton_mm_53 0.0121 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7657616Z SingleProcess AUTOTUNE benchmarking takes 0.2359 seconds and 0.7903 seconds precompiling for 39 choices 2025-12-04T11:45:25.7657671Z =================================== FAILURES =================================== 2025-12-04T11:45:25.7657832Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.7657879Z Traceback (most recent call last): 2025-12-04T11:45:25.7658036Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7658078Z method(*args, **kwargs) 2025-12-04T11:45:25.7658232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.7658272Z method(*args, **kwargs) 2025-12-04T11:45:25.7658423Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.7658471Z with policy(): 2025-12-04T11:45:25.7658626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.7658666Z raise RuntimeError(msg) 2025-12-04T11:45:25.7659074Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.7659086Z 2025-12-04T11:45:25.7659163Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.7659442Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.7659445Z 2025-12-04T11:45:25.7659534Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.7659607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7659652Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7659708Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7660271Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7660370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7660408Z graph_break [] 2025-12-04T11:45:25.7660481Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7660558Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7661043Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.7661092Z current_size = base.storage().size() 2025-12-04T11:45:25.7661135Z Autotune Choices Stats: 2025-12-04T11:45:25.7661511Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008798999711871147, "best_triton_pos": 0} 2025-12-04T11:45:25.7661576Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7661626Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7661748Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7661985Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7662223Z triton_mm_33 0.0092 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7662451Z triton_mm_29 0.0108 ms 81.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7662690Z triton_mm_21 0.0109 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7662914Z triton_mm_16 0.0110 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7663156Z triton_mm_30 0.0112 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7663405Z triton_mm_22 0.0114 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7663634Z triton_mm_23 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7663863Z triton_mm_15 0.0123 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7664108Z triton_mm_31 0.0126 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7664243Z SingleProcess AUTOTUNE benchmarking takes 0.1567 seconds and 1.1694 seconds precompiling for 33 choices 2025-12-04T11:45:25.7664332Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7664375Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7664431Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7664534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7665019Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7665058Z graph_break [] 2025-12-04T11:45:25.7665121Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7665197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7665244Z Autotune Choices Stats: 2025-12-04T11:45:25.7665611Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:25.7665676Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7665728Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7665848Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7666086Z triton_mm_72 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7666334Z triton_mm_71 0.0092 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7666376Z _scaled_mm 0.0094 ms 92.7% 2025-12-04T11:45:25.7666604Z triton_mm_54 0.0110 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7666845Z triton_mm_68 0.0112 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7667070Z triton_mm_59 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7667298Z triton_mm_60 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7667522Z triton_mm_67 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7667771Z triton_mm_61 0.0118 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7668012Z triton_mm_53 0.0121 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7668143Z SingleProcess AUTOTUNE benchmarking takes 0.2359 seconds and 0.7903 seconds precompiling for 39 choices 2025-12-04T11:45:25.7668219Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.7668260Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.7668318Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.7668417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.7668904Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.7668944Z graph_break [] 2025-12-04T11:45:25.7669009Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.7669082Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.7669124Z Autotune Choices Stats: 2025-12-04T11:45:25.7669491Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.7669560Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.7669609Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.7669732Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.7669984Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7670218Z triton_mm_109 0.0092 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7670453Z triton_mm_105 0.0108 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7670689Z triton_mm_98 0.0110 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7670917Z triton_mm_92 0.0110 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7671143Z triton_mm_106 0.0112 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7671381Z triton_mm_97 0.0114 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.7671612Z triton_mm_99 0.0118 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7671852Z triton_mm_91 0.0120 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7672081Z triton_mm_107 0.0124 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.7672209Z SingleProcess AUTOTUNE benchmarking takes 0.2567 seconds and 0.6421 seconds precompiling for 39 choices 2025-12-04T11:45:25.7672408Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c80ef871d8d60908.xml - 2025-12-04T11:45:25.7672471Z =========================== short test summary info ============================ 2025-12-04T11:45:25.7673097Z FAILED [1.5613s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.7673101Z 2025-12-04T11:45:25.7673176Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.7673480Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.7673482Z 2025-12-04T11:45:25.7673574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.7673654Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.7673728Z ================== 1 failed, 187 deselected, 2 rerun in 6.61s ================== 2025-12-04T11:45:25.7673765Z Got exit code 1 2025-12-04T11:45:25.7673805Z Retrying single test... 2025-12-04T11:45:25.7673950Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51332554a7ab49c3.xml 2025-12-04T11:45:25.7674010Z ============================= test session starts ============================== 2025-12-04T11:45:25.7674122Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.7674178Z cachedir: .pytest_cache 2025-12-04T11:45:25.7674337Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.7674386Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.7674426Z configfile: pytest.ini 2025-12-04T11:45:25.7674594Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.7674670Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.7674940Z stepcurrent: skipping 109 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.7674983Z Running 1 items in this shard 2025-12-04T11:45:25.7674986Z 2025-12-04T11:45:25.7675351Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:35.934365633 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7675354Z 2025-12-04T11:45:25.7675684Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7675984Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7676118Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7676603Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7676860Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7677090Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7677298Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7677503Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7677796Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7678044Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7678337Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7678581Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7678872Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7679105Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7679398Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7679639Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7679931Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7680174Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7680464Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7680696Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7680988Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7681186Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7681417Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7681711Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7681912Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7682144Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7682448Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7682677Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7682986Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7683207Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7683448Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7683649Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7683861Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7684041Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7684221Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7684771Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/r4/cr4lhfrqaioh72mszu7uw33vidvvifl54t7dnzux5u3igorq5u7n.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7684918Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7685140Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7685297Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7685443Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7685736Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7685868Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7686125Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7686265Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7686523Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7686697Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7686966Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7687104Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7687393Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7687587Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7687905Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7688201Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7688345Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7688840Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7689095Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7689320Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7689528Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7689730Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7690023Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7690257Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7690553Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7690787Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7691090Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7691324Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7691626Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7691858Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7692152Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7692383Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7692686Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7692916Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7693219Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7693442Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7693682Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7693978Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7694174Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7694409Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7694700Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7694936Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7695228Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7695463Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7695673Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7695874Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7696102Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7696270Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7696450Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7696553Z E1204 11:26:35.986000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7696713Z [W1204 11:26:36.287628704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7696715Z 2025-12-04T11:45:25.7696869Z [W1204 11:26:36.289208242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7696885Z 2025-12-04T11:45:25.7697195Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7697503Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7697633Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7698116Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7698371Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7698596Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7698803Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7699003Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7699296Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7699544Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7699838Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7700074Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7700378Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7700615Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7700907Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7701142Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7701541Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7701783Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7702076Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7702305Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7702597Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7702795Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7703033Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7703376Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7703573Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7703805Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7704112Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7704347Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7704638Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7704873Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7705080Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7705282Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7705496Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7705662Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7705854Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7706392Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/er/cerkoaafshjcwmveujbgwheu2k6meupgz4u5nuusisbcjmcgwdcs.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7706542Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7706757Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7706915Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7707061Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7707348Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7707481Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7707740Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7707886Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7708139Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7708308Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7708578Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7708712Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7709002Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7709194Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7709510Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7709801Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7709933Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7710432Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7710684Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7710913Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7711122Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7711325Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7711619Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7711853Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7712149Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7712383Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7712688Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7712917Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7713208Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7713476Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7713767Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7714001Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7714291Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7714542Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7714853Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7715052Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7715284Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7715576Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7715774Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7716008Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7716301Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7716533Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7716828Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7717068Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7717273Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7717475Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7717699Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7717867Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7718047Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7718151Z E1204 11:26:36.021000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7718307Z [W1204 11:26:36.295272934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7718312Z 2025-12-04T11:45:25.7718632Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7718927Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7719069Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7719546Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7719800Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7720023Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7720230Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7720428Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7720719Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7720953Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7721248Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7721493Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7721786Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7722033Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7722324Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7722555Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7722845Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7723091Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7723416Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7723647Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7723939Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7724138Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7724370Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7724663Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7724861Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7725093Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7725386Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7725619Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7725922Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7726142Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7726367Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7726568Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7726780Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7726946Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7727125Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7727660Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/ls/clsz3pw6df2a34jt6xo3d3gjeuftmwhtidcbm3bwfi6oo5vd4ynt.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7727820Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7728035Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7728191Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7728339Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7728627Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7728762Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7729018Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7729157Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7729411Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7729570Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7729839Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7729987Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7730262Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7730454Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7730781Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7731080Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7731213Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7731707Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7731973Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7732199Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7732405Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7732607Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7732901Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7733137Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7733456Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7733688Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7733983Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7734214Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7734522Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7734752Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7735061Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7735294Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7735585Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7735818Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7736122Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7736318Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7736564Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7736855Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7737053Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7737287Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7737582Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7737814Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7738107Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7738328Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7738534Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7738746Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7738957Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7739125Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7739314Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7739420Z E1204 11:26:36.027000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7739731Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7740021Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7740154Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7740652Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7740905Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7741128Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7741337Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7741537Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7741830Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7742064Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7742355Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7742590Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7742884Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7743126Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7743446Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7743690Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7743983Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7744214Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7744504Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7744750Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7745056Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7745254Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7745485Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7745778Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7745973Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7746206Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7746494Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7748197Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7748493Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7748716Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7748943Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7749143Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7749354Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7749539Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7749717Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7750246Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/kg/ckgbjh5i4zvhzrssaf4w3rzo5ia4fltkisg7jjbod3fwgekpzhxz.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7750392Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7750621Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7750776Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7750934Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7751223Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7751357Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7751616Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7751753Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7752011Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7752167Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7752436Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7752571Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7752849Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7753053Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7753404Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7753701Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7753844Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7754322Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7754574Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7754814Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7755021Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7755237Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7755531Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7755767Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7756062Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7756296Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7756587Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7756819Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7757110Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7757341Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7757645Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7757876Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7758179Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7758412Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7758702Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7758896Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7759138Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7759429Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7759636Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7759867Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7760157Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7760392Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7760683Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7760903Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7761107Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7761310Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7761520Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7761697Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7761875Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7761976Z E1204 11:26:36.034000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7762135Z [W1204 11:26:36.321513364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7762149Z 2025-12-04T11:45:25.7762457Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7762760Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7762889Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7763418Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7763671Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7763908Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7764114Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7764311Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7764603Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7764838Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7765132Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7765364Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7765654Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7765886Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7766197Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7766428Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7766732Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7766962Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7767253Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7767485Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7767785Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7767981Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7768225Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7768518Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7768712Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7768945Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7769236Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7769470Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7769767Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7769990Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7770195Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7770405Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7770615Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7770779Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7770967Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7771492Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/xd/cxd42t5trhxky44n3336i3cm3qdvfy5q2fmnzsekr4jpjdkdxdpm.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7771639Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7771854Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7772022Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7772170Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7772466Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7772599Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7772854Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7772993Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7773273Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7773430Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7773698Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7773831Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7774108Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7774300Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7774630Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7774924Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7775052Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7775546Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7775797Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7776022Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7776239Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7776443Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7776750Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7776985Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7777276Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7777508Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7777801Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7778032Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7778323Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7778556Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7778846Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7779091Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7779382Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7779624Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7779915Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7780110Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7780340Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7780641Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7780839Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7781087Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7781384Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7781618Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7781913Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7782135Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7782341Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7782541Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7782750Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7782920Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7783098Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7786377Z E1204 11:26:36.054000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7786534Z [W1204 11:26:36.326018479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7786537Z 2025-12-04T11:45:25.7786845Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7787155Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7787284Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7787761Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7788024Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7788249Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7788467Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7788666Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7788956Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7789191Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7789486Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7789717Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7790006Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7790243Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7790532Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7790777Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7791066Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7791311Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7791604Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7791836Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7792131Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7792325Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7792570Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7792869Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7793064Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7793310Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7793602Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7793834Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7794125Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7794344Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7794551Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7794750Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7794978Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7795142Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7795319Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7795843Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmpadogkfir/uf/cufd7cduhyda3gsn2mf3vxoda7cjmfh34zd2lh6nl2gaj6lwzlsg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.7796003Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.7796221Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.7796377Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.7796522Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.7796819Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.7796951Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.7798341Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.7798483Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.7798739Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.7798899Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.7799167Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.7799901Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.7800191Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.7800386Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.7800705Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7801002Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7801154Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7801634Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7801897Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7802125Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7802330Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7802534Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7802826Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7803062Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7803434Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7803666Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7803957Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7804187Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7804495Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7804724Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7805018Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7805250Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7805555Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7805786Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7806076Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7806286Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7806518Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7806810Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7807009Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7807243Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7807533Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7807783Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7808073Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7808295Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7808500Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7808715Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.7808925Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.7809090Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.7809268Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.7809370Z E1204 11:26:36.059000 980345 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.7809424Z ('RERUN', {'yellow': True}) [3.7609s] [100%] 2025-12-04T11:45:25.7809790Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:37.175434505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7809793Z 2025-12-04T11:45:25.7809940Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7810232Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7810535Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7810667Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7811144Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7811396Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7811622Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7811835Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7812046Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7812337Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7812569Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7812860Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7813104Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7813424Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7813656Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7813945Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7814191Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7814482Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7814714Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7814921Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7815119Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7815329Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7815527Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7815759Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7816049Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7816258Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7816489Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7816781Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7817000Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7817195Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7817434Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7817639Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7817836Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7818030Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7818260Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7818465Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7818659Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7818868Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7819097Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7819391Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7819621Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7819911Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7820131Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7820335Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7820541Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7820747Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7820946Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7821179Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7821482Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7821716Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7822006Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7822238Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7822540Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7822772Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7823060Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7823333Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7823625Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7823858Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7824148Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7824378Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7824669Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7824915Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7825203Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7825434Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7825724Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7825971Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7826261Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7826494Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7826783Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7827014Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7827216Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7827411Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7827715Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7827945Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7828243Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7828474Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7828764Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7828996Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7829299Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7829531Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7829821Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7830052Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7830355Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7830551Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7830746Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7830941Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7831151Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7831360Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7831593Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7831883Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7832089Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7832286Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7832482Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7832677Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7832911Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7833202Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7833476Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7833786Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7833981Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7834188Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7834389Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7834640Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7834933Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7835154Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7835357Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7835569Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7835772Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7836065Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7836312Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7836605Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7836838Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7837131Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7837364Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7837659Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7837903Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7838194Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7838393Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7838589Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7838821Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7839024Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7839223Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7839423Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7839717Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7839965Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7840257Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7840499Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7840791Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7841026Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7841319Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7841551Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7841844Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7842068Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7842286Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7842485Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7842678Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7842887Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7843097Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7843421Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7843642Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7843844Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7844043Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7844262Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7844556Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7844801Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7845093Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7845326Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7845619Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7845850Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7846144Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7846378Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7846683Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7846918Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7847211Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7847446Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7847750Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7847984Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7848276Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7848508Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7848816Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7849047Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7849364Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7849565Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7849763Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7849996Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7850286Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7850519Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7850812Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7851057Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7851350Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7851584Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7851887Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7852119Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7852411Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7852607Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7852840Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7853144Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7853412Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7853718Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7853932Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7854138Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7854336Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7854535Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7854830Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7855043Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7855261Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7855458Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7855659Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7855953Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7856177Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7856396Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7856595Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7856787Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7856935Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7857146Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7857366Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7857573Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7857778Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7857998Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7858206Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7858401Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7858624Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7858829Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7859024Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7859245Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7859464Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7859662Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7859857Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7860069Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7860281Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7860480Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7860681Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7860976Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7861187Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7861402Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7861600Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7861792Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7862001Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7862212Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7862415Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7862612Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7862812Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7863108Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7863343Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7863559Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7863756Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7863957Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7864250Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7864476Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7864678Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7864875Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7865075Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7865369Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7865590Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7865791Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7865981Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7866195Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7866409Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7866617Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7866814Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7867005Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7867187Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7867357Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7867488Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7867602Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7867729Z E1204 11:26:37.914000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7867886Z [W1204 11:26:37.183803344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7867889Z 2025-12-04T11:45:25.7868034Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7868327Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7868637Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7868767Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7869244Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7869498Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7869736Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7869945Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7870154Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7870445Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7870682Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7870976Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7871209Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7871500Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7871734Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7872047Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7872281Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7872572Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7872793Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7873014Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7873209Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7873431Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7873630Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7873860Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7874166Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7874362Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7874606Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7874898Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7875121Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7875315Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7875532Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7875737Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7875932Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7876141Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7876360Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7876564Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7876760Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7876954Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7877199Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7877491Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7877722Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7878012Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7878243Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7878446Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7878641Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7878857Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7879057Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7879289Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7879582Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7879815Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7880105Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7880339Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7880638Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7880868Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7881159Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7881413Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7881704Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7881933Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7882225Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7882466Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7882759Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7882989Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7883314Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7883544Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7883838Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7884068Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7884358Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7884589Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7884896Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7885114Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7885315Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7885509Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7885812Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7886046Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7886343Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7886574Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7886862Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7887107Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7887396Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7887639Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7887928Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7888161Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7888453Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7888650Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7888846Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7889042Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7889260Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7889458Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7889689Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7889978Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7890186Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7890381Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7890575Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7890772Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7891002Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7891305Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7891535Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7891834Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7892029Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7892236Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7892438Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7892671Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7892968Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7893191Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7893439Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7893639Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7893838Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7894131Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7894376Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7894671Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7894904Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7895197Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7895445Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7895739Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7895973Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7896281Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7896482Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7896681Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7896900Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7897102Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7897300Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7897502Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7897814Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7898050Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7898343Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7898577Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7898883Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7899116Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7899410Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7899642Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7899946Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7900170Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7900373Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7900581Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7900772Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7900985Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7901184Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7901476Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7901697Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7901902Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7902114Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7902315Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7902611Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7902844Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7903152Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7903411Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7903703Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7903938Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7904246Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7904478Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7904771Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7905019Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7905315Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7905546Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7905839Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7906072Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7906365Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7906610Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7906902Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7907138Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7907434Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7907645Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7907841Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7908075Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7908369Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7908611Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7908907Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7909151Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7909444Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7909681Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7909978Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7910211Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7910504Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7910702Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7910948Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7911242Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7911477Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7911767Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7911993Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7912195Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7912395Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7912594Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7912899Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7913114Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7913334Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7913555Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7913755Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7914051Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7914272Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7914474Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7914672Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7914863Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7915012Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7915220Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7915440Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7915646Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7915841Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7916073Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7916279Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7916475Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7916697Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7916903Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7917114Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7917335Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7917539Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7917746Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7917941Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7918156Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7918357Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7918556Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7918760Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7919052Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7919279Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7919479Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7919678Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7919869Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7920064Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7920286Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7920486Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7920686Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7920885Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7921192Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7921404Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7921609Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7921816Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7922014Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7922309Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7922520Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7922721Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7922919Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7923121Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7923460Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7923747Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7923951Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7924139Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7924336Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7924563Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7924769Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7924967Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7925156Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7925353Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7925525Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7925652Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7925756Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7925898Z E1204 11:26:37.917000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7926051Z [W1204 11:26:37.186046222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7926053Z 2025-12-04T11:45:25.7926200Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7926495Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7926791Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7926922Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7927401Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7927666Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7927895Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7928103Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7928303Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7928605Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7928840Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7929129Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7929364Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7929671Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7929903Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7930194Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7930436Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7930727Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7930946Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7931152Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7931348Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7931555Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7931755Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7931994Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7932286Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7932483Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7932717Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7933017Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7933237Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7933463Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7933680Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7933899Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7934095Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7934289Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7934520Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7934726Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7934925Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7935119Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7935349Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7935640Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7935872Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7936176Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7936396Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7936601Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7936795Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7937004Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7937218Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7937452Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7937743Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7937975Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7938278Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7938509Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7938810Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7939040Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7939335Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7939568Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7939859Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7940091Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7940395Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7940625Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7940915Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7941149Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7941453Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7941687Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7941978Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7942209Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7942501Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7942742Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7943032Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7943285Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7943485Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7943685Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.7943976Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7944209Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7944500Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7944733Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7945038Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7945268Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7945561Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7945793Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7946106Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7946336Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7946629Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7946825Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7947034Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7947230Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7947435Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7947647Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7947877Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7948171Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7948365Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7948560Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7948756Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7948952Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7949193Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7949482Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7949716Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7950006Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7950213Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7950421Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7950622Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7950855Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7951162Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7951384Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7951586Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7951794Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7951994Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7952287Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7952521Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7952812Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7953045Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7953392Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7953628Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7953920Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7954151Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7954456Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7954655Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7954852Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7955072Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7955273Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7955485Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7955686Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7955980Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7956225Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7956518Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7956753Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7957045Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7957280Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7957571Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7957818Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7958110Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7958334Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7958537Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7958745Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7958938Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7959148Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7959348Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7959639Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7959870Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7960073Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7960269Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7960484Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7960779Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7961014Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7961305Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7961543Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7961838Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7962090Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7962383Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7962615Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7962913Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7963159Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7963488Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7963723Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7964014Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7964266Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7964556Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7964802Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7965096Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7965333Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7965628Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7965825Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7966024Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7966256Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7966564Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7966796Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7967088Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7967320Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7967626Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7967860Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7968152Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7968385Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7968692Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7968889Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7969120Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7969420Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7969655Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7969948Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7970163Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7970365Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7970564Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7970776Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7971069Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7971284Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7971484Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7971683Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7971896Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7972189Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7972413Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7972613Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7972822Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7973013Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7973161Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.7973408Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7973628Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7973834Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7974031Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7974252Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7974459Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7974656Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7974877Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7975095Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.7975293Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7975515Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.7975720Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7975927Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7976124Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7976335Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7976539Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7976737Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7976951Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7977245Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7977457Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7977678Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7977875Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7978071Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.7978265Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7978479Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7978681Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7978880Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7979093Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7979387Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7979600Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7979801Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7980001Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7980210Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7980501Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7980714Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.7980914Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.7981126Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.7981326Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7981620Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7981826Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.7982026Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.7982221Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.7982416Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.7982629Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.7982834Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.7983034Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.7983233Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.7983447Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.7983619Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.7983746Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.7983849Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.7983974Z E1204 11:26:37.919000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.7984145Z [W1204 11:26:37.227788818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.7984147Z 2025-12-04T11:45:25.7984293Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.7984586Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.7984881Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.7985011Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.7985504Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.7985759Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.7986002Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.7986207Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.7986407Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7986697Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7986933Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7987226Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7987473Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7987765Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7987996Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7988287Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7988531Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7988820Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7989040Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7989245Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7989454Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7989662Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7989863Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7990105Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7990397Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7990596Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7990826Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7991117Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7991336Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7991532Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7991760Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7991965Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7992162Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7992358Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7992576Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7992799Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7992995Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7993190Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.7993468Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7993780Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7994012Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7994303Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7994535Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.7994741Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.7994938Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.7995149Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.7995350Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.7995581Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7995873Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7996117Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7996408Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7996638Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7996928Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7997175Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7997472Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7997706Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7997994Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7998238Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7998528Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7998770Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7999060Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7999294Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.7999584Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.7999816Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8000108Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8000341Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8000640Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8000872Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8001161Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8001382Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8001598Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8001793Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8002086Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8002319Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8002622Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8002853Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8003152Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8003415Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8003708Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8003943Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8004231Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8004465Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8004759Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8004968Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8005163Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8005359Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8005566Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8005780Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8006015Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8006306Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8006503Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8006698Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8006908Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8007102Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8007333Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8007638Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8007868Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8008159Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8008353Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8008564Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8008765Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8009011Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8009308Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8009529Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8009731Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8009930Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8010158Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8010451Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8010685Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8010979Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8011224Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8011520Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8011761Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8012054Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8012289Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8012580Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8012779Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8012974Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8013203Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8013441Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8013640Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8013844Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8014135Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8014383Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8014675Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8014908Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8015199Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8015445Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8015739Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8015972Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8016278Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8016498Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8016702Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8016900Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8017093Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8017307Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8017508Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8017815Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8018034Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8018236Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8018434Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8018646Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8018940Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8019172Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8019467Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8019709Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8020005Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8020236Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8020538Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8020771Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8021068Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8021300Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8021591Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8021824Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8022130Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8022361Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8022655Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8022885Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8023191Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8023441Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8023737Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8023936Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8024148Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8024384Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8024677Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8024922Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8025213Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8025447Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8025743Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8025976Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8026269Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8026521Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8026812Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8027011Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8027244Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8027549Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8027781Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8028076Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8028291Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8028506Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8028705Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8028906Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8029210Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8029422Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8029630Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8029828Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8030030Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8030325Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8030545Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8030758Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8030956Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8031148Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8031297Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8031495Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8031725Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8031932Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8032131Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8032352Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8032560Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8032766Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8032987Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8033191Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8033439Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8033659Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8033867Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8034064Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8034259Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8034474Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8034679Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8034897Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8035097Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8035393Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8035609Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8035826Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8036026Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8036216Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8036415Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8036626Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8036842Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8037042Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8037245Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8037552Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8037765Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8037970Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8038167Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8038369Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8038660Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8038874Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8039087Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8039285Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8039489Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8039780Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8039989Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8040190Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8040380Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8040576Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8040789Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8041005Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8041201Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8041392Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8041589Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8041759Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8041884Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8041989Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8042114Z E1204 11:26:37.961000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8042271Z [W1204 11:26:37.229983936 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8042273Z 2025-12-04T11:45:25.8042421Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8042714Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8043012Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8043150Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8043656Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8043912Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8044152Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8044358Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8044557Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8044853Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8045100Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8045392Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8045625Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8045929Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8046162Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8046457Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8046687Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8046977Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8047196Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8047497Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8047694Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8047901Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8048099Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8048333Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8048639Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8048896Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8049130Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8049420Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8049652Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8049848Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8050066Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8050280Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8050477Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8050674Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8050893Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8051098Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8051293Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8051489Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8051731Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8052021Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8052256Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8052547Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8052775Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8052980Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8053176Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8053402Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8053599Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8053849Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8054143Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8054374Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8054675Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8054908Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8055199Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8055430Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8055725Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8055956Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8056261Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8056491Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8056781Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8057009Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8057315Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8057546Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8057840Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8058074Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8058383Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8058615Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8058914Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8059146Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8059440Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8059658Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8059860Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8060057Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8060352Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8060594Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8060886Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8061119Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8063102Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8063391Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8063684Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8063919Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8064209Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8064459Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8064750Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8064960Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8065155Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8065352Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8065561Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8065759Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8065992Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8066283Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8066480Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8066688Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8066884Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8067079Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8067309Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8067612Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8067844Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8068134Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8068331Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8068548Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8068750Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8068984Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8069291Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8069513Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8069717Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8069916Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8070116Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8070411Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8070643Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8070948Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8071182Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8071478Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8071711Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8072017Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8072249Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8072544Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8072741Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8072952Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8073173Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8073407Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8073621Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8073826Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8074122Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8074356Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8074651Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8074882Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8075191Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8075422Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8075715Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8075951Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8076266Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8076489Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8076691Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8076891Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8077083Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8077309Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8077509Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8077801Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8078031Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8078235Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8078437Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8078636Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8078929Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8079161Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8079477Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8079710Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8080002Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8080236Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8080546Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8080783Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8081076Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8081309Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8081614Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8081848Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8082141Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8082383Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8082676Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8082913Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8083209Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8083485Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8083777Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8083991Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8084188Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8084421Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8084716Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8084963Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8085260Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8085495Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8085791Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8086038Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8086332Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8086566Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8086872Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8087070Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8087303Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8087596Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8087831Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8088125Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8088352Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8088554Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8088754Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8088956Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8089250Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8089474Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8089677Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8089876Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8090076Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8090380Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8090602Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8090805Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8091013Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8091204Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8091356Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8091553Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8091775Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8091982Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8092179Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8092404Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8092627Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8092824Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8093048Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8093282Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8093492Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8093715Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8093920Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8094120Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8094315Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8094545Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8094749Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8094947Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8095164Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8095457Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8095673Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8095874Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8096074Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8096270Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8096544Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8096774Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8096979Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8097178Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8097379Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8097673Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8097898Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8098099Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8098299Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8098499Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8098805Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8099018Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8099221Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8099429Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8099629Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8099926Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8100120Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8100322Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8100513Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8100708Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8100934Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8101140Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8101337Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8101528Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8101711Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8101893Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8102023Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8102126Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8102253Z E1204 11:26:37.963000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8102411Z [W1204 11:26:37.232136045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8102413Z 2025-12-04T11:45:25.8102560Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8102867Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8103166Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8103333Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8103831Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8104090Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8104315Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8104524Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8104724Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8105019Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8105268Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8105561Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8105799Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8106110Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8106346Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8106639Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8106872Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8107164Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8107398Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8107608Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8107822Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8108029Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8108230Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8108464Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8108758Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8108956Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8109188Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8109493Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8109713Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8109910Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8110128Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8110336Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8110543Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8110742Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8110965Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8111171Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8111379Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8111575Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8111809Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8112116Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8112348Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8112643Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8112863Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8113071Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8113308Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8113519Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8113734Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8113965Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8114257Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8114491Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8114796Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8115029Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8115321Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8115556Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8115864Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8116097Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8116389Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8116632Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8116924Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8117156Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8117447Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8117681Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8117977Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8118221Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8118513Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8118744Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8119035Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8119278Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8119570Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8119792Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8119993Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8120206Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8120500Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8120733Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8121033Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8121266Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8121560Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8121790Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8122083Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8122314Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8122622Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8122853Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8123147Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8123413Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8123626Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8123823Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8124030Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8124230Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8124461Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8124780Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8124978Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8125189Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8125386Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8125580Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8125814Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8126104Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8126338Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8126630Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8126840Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8127049Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8127251Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8127488Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8127782Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8128016Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8128220Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8128419Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8128621Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8128925Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8129159Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8129452Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8129699Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8129997Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8130230Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8130523Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8130755Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8131048Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8131259Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8131457Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8131681Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8131884Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8132088Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8132300Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8132594Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8132827Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8133122Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8133433Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8133729Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8133977Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8134270Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8134510Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8134802Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8135024Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8135230Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8135429Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8135637Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8135847Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8136048Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8136341Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8136577Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8136783Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8136983Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8137186Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8137478Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8137725Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8138019Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8138270Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8138563Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8138798Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8139094Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8139327Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8139621Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8139866Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8140161Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8140395Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8140689Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8140946Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8141240Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8141477Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8141775Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8142018Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8142317Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8142515Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8142723Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8142955Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8143291Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8143525Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8143822Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8144058Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8144366Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8144601Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8144894Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8145127Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8145434Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8145632Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8145868Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8146162Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8146399Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8146707Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8146920Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8147136Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8147335Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8147539Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8147832Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8148045Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8148247Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8148448Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8148664Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8148959Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8149183Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8149385Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8149586Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8149788Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8149939Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8150137Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8150358Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8150565Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8150774Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8150995Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8151201Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8151409Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8151631Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8151839Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8152036Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8152257Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8152463Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8152661Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8152870Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8153085Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8153329Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8153529Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8153730Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8154039Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8154253Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8154458Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8154657Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8154864Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8155060Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8155275Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8155489Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8155687Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8155890Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8156184Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8156398Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8156599Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8156798Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8157020Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8157314Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8157530Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8157733Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8157933Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8158144Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8158438Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8158636Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8158838Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8159042Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8159238Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8159455Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8159674Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8159873Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8160070Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8160252Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8160423Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8160551Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8160656Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8160782Z E1204 11:26:37.965000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8160837Z ('RERUN', {'yellow': True}) [1.6586s] [100%] 2025-12-04T11:45:25.8161201Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda [W1204 11:26:39.604496601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8161204Z 2025-12-04T11:45:25.8161350Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8161644Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8161941Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8162086Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8162568Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8162823Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8163049Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8163299Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8163499Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8163790Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8164038Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8164333Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8164569Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8164860Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8165094Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8165387Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8165633Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8165925Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8166145Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8166351Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8166563Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8166773Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8166972Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8167204Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8167495Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8167707Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8167939Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8168239Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8168460Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8168659Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8168877Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8169083Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8169279Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8169474Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8169704Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8169910Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8170105Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8170301Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8170534Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8170836Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8171069Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8171363Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8171582Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8171798Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8171994Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8172202Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8172416Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8172648Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8172942Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8173173Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8173495Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8173729Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8174034Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8174264Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8174555Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8174786Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8175090Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8175323Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8175613Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8175844Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8176150Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8176384Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8176674Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8176917Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8177208Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8177441Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8177731Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8177962Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8178253Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8178484Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8178686Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8178883Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8179172Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8179416Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8179707Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8179939Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8180229Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8180470Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8180765Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8180996Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8181296Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8181526Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8181819Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8182016Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8182211Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8182406Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8182614Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8182824Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8183056Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8183387Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8183583Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8183792Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8183987Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8184181Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8184413Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8184704Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8184949Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8185240Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8185453Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8185662Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8185864Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8186100Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8186391Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8186615Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8186816Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8187030Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8187232Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8187525Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8187763Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8188065Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8188299Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8188590Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8188826Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8189135Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8189367Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8189660Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8189870Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8190069Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8190293Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8190497Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8190697Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8190899Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8191193Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8191441Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8191737Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8191971Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8192265Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8192511Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8192802Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8193039Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8193362Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8193602Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8193805Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8194002Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8194210Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8194420Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8194624Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8194919Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8195141Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8195345Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8195544Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8195756Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8196050Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8196285Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8196577Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8196825Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8197120Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8197354Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8197650Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8197893Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8198187Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8198429Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8198722Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8198958Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8199249Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8199481Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8199776Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8200011Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8200314Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8200548Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8200842Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8201040Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8201250Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8201482Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8201775Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8202009Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8202315Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8202550Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8202852Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8203086Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8203408Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8203643Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8203937Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8204134Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8204370Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8204678Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8204915Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8205209Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8205424Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8205645Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8205843Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8206045Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8206336Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8206564Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8206768Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8206967Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8207182Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8207474Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8207700Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8207901Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8208102Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8208294Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8208445Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8208642Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8208873Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8209083Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8209279Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8209505Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8209722Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8209917Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8210136Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8210346Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8210543Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8210778Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8210984Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8211182Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8211392Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8211604Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8211809Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8212006Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8212209Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8212506Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8212721Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8212935Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8213133Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8213360Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8213557Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8213774Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8213990Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8214188Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8214392Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8214683Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8214909Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8215111Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8215312Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8215526Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8215821Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8216040Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8216240Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8216437Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8216637Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8216932Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8217142Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8217347Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8217540Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8217735Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8217950Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8218168Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8218366Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8218556Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8218739Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8218921Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8219047Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8219152Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8219278Z E1204 11:26:39.338000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8219436Z [W1204 11:26:39.606831687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8219448Z 2025-12-04T11:45:25.8219592Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8219888Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8220187Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8220319Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8220803Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8221059Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8221300Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8221506Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8221707Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8222000Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8222249Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8222547Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8222783Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8223075Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8223364Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8223659Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8223892Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8224197Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8224420Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8224626Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8224823Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8225033Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8225232Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8225465Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8225772Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8225968Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8226200Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8226491Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8226726Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8226923Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8227143Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8227349Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8227558Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8227754Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8227973Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8228189Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8228386Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8228583Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8228819Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8229112Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8229345Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8229638Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8229869Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8230076Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8230273Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8230479Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8230679Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8230920Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8231212Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8231444Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8231737Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8231983Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8232277Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8232520Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8232810Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8233046Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8233367Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8233599Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8233889Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8234140Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8234434Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8234664Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8234955Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8235199Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8235494Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8235724Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8236014Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8236263Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8236557Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8236779Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8236999Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8237195Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8237490Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8237721Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8238012Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8238244Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8238547Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8238778Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8239071Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8239303Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8239605Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8239838Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8240127Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8240324Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8240520Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8240728Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8240935Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8241133Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8241381Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8241670Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8241868Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8242062Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8242258Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8242453Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8242684Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8242985Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8243216Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8243544Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8243741Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8243964Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8244167Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8244401Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8244694Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8244928Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8245132Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8245330Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8245547Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8245841Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8246078Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8246372Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8246606Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8246900Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8247145Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8247438Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8247672Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8247965Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8248176Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8248375Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8248596Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8248798Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8248996Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8249210Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8249503Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8249737Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8250040Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8250276Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8250570Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8250807Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8251103Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8251334Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8251638Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8251860Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8252065Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8252264Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8252468Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8252680Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8252882Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8253177Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8253424Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8253648Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8253846Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8254048Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8254357Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8254590Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8254886Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8255118Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8255414Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8255646Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8255953Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8256187Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8256480Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8256715Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8257021Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8257258Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8257553Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8257786Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8258092Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8258326Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8258631Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8258863Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8259157Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8259355Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8259551Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8259786Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8260085Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8260332Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8260624Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8260858Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8261151Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8261397Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8261689Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8261921Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8262214Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8262424Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8262657Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8262949Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8263191Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8263512Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8263726Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8263929Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8264129Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8264330Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8264638Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8264852Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8265055Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8265255Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8265455Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8265761Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8265984Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8266186Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8266385Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8266591Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8266739Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8266937Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8267171Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8267380Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8267578Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8267800Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8268007Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8268204Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8268425Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8268632Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8268839Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8269060Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8269269Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8269466Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8269676Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8269892Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8270092Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8270292Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8270491Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8270797Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8271010Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8271212Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8271422Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8271614Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8271813Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8272025Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8272226Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8272424Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8272625Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8272931Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8273143Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8273376Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8273574Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8273776Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8274087Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8274301Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8274503Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8274700Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8274917Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8275210Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8275406Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8275622Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8275811Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8276010Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8276224Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8276432Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8276628Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8276818Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8277000Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8277184Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8277310Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8277415Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8277540Z E1204 11:26:39.340000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8277697Z [W1204 11:26:39.608984866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8277700Z 2025-12-04T11:45:25.8277844Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8278149Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8278448Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8278579Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8279059Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8279325Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8279551Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8279768Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8279969Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8280263Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8280498Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8280792Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8281026Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8281329Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8281563Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8281853Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8282087Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8282390Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8282611Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8282819Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8283019Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8283240Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8283474Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8283708Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8284014Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8284212Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8284447Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8284740Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8284961Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8285158Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8285382Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8285613Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8285811Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8286006Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8286226Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8286431Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8286644Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8286842Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8287072Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8287367Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8287616Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8287911Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8288130Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8288345Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8288539Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8288748Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8288952Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8289181Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8289476Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8289708Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8290011Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8290245Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8290536Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8290768Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8291071Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8291303Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8291596Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8291825Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8292131Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8292362Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8292667Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8292900Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8293192Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8293449Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8293739Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8293973Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8294282Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8294515Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8294810Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8295031Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8295237Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8295448Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8295741Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8295972Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8296263Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8296510Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8296799Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8297049Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8297342Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8297579Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8297873Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8298104Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8298396Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8298595Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8298803Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8298999Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8299211Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8299415Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8299657Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8299951Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8300147Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8300343Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8300538Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8300744Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8300974Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8301266Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8301513Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8301808Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8302005Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8302211Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8302412Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8302645Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8302952Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8303174Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8303407Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8303607Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8303808Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8304118Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8304351Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8304647Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8304879Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8305188Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8305422Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8305726Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8305960Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8306256Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8306453Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8306651Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8306872Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8307075Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8307286Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8307488Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8307780Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8308014Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8308322Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8308555Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8308853Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8309086Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8309392Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8309629Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8309922Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8310153Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8310355Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8310556Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8310747Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8310957Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8311162Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8311457Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8311690Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8311893Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8312092Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8312291Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8312596Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8312920Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8313214Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8313490Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8313801Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8314036Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8314327Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8314575Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8314871Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8315105Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8315399Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8315632Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8315928Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8316176Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8316469Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8316703Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8316995Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8317245Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8317537Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8317739Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8317937Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8318189Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8318486Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8318719Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8319025Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8319259Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8319555Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8319790Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8320084Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8320317Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8320625Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8320823Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8321056Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8321350Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8321596Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8321888Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8322108Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8322311Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8322524Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8322727Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8323022Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8323247Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8323484Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8323687Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8323888Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8324183Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8324404Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8324607Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8324819Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8325012Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8325162Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8325361Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8325584Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8325803Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8326001Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8326220Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8326427Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8326626Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8326867Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8327076Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8327271Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8327508Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8327714Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8327916Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8328112Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8328325Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8328530Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8328727Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8328942Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8329239Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8329454Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8329657Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8329859Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8330064Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8330261Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8330476Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8330676Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8330885Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8331085Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8331381Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8331605Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8331806Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8332009Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8332209Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8332504Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8332716Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8332920Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8333131Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8333358Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8333653Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8333850Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8334053Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8334266Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8334468Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8334681Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8334890Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8335102Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8335293Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8335474Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8335658Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8335785Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8335886Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8336014Z E1204 11:26:39.342000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8336172Z [W1204 11:26:39.650214670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8336174Z 2025-12-04T11:45:25.8336322Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8336622Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8336919Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8337051Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8337545Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8337800Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8338025Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8338243Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8338445Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8338736Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8338975Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8339281Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8339516Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8339810Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8340051Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8340341Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8340575Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8340868Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8341087Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8341297Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8341500Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8341717Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8341919Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8342152Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8342446Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8342654Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8342888Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8343184Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8343427Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8343642Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8343863Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8344069Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8344278Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8344475Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8344694Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8344901Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8345096Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8345291Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8345522Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8345831Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8346066Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8346359Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8346580Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8346809Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8347006Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8347216Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8347415Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8347649Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8347953Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8348189Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8348481Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8348722Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8349016Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8349248Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8349543Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8349775Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8350071Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8350318Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8350610Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8350843Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8351134Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8351378Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8351671Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8351902Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8352193Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8352436Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8352729Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8352974Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8353304Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8353528Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8353730Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8353926Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8354218Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8354451Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8354764Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8354998Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8355292Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8355523Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8355833Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8356064Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8356358Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8356588Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8356895Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8357093Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8357300Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8357496Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8357703Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8357907Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8358139Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8358432Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8358630Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8358826Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8359032Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8359226Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8359458Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8359750Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8359996Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8360290Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8360486Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8360692Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8360904Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8361141Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8361434Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8361666Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8361870Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8362070Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8362277Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8362571Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8362807Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8363101Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8363376Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8363673Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8363907Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8364202Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8364448Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8364745Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8364944Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8365144Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8365381Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8365583Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8365784Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8366004Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8366298Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8366532Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8366826Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8367062Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8367355Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8367601Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8367893Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8368130Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8368422Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8368655Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8368861Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8369060Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8369256Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8369467Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8369681Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8369973Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8370195Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8370409Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8370607Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8370809Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8371101Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8371335Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8373147Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8373444Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8373741Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8373974Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8374269Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8374515Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8374810Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8375042Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8375335Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8375584Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8375879Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8376113Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8376419Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8376652Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8376947Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8377180Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8377474Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8377671Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8377880Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8378114Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8378407Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8378644Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8378947Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8379181Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8379473Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8379708Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8380010Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8380245Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8380539Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8380748Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8380981Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8381275Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8381509Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8381802Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8382016Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8382231Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8382430Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8382632Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8382929Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8383144Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8383386Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8383585Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8383787Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8384079Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8384323Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8384525Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8384724Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8384931Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8385080Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8385277Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8385497Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8385703Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8385899Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8386120Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8386327Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8386536Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8386756Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8386963Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8387159Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8387392Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8387598Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8387795Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8387991Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8388202Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8388417Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8388617Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8388817Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8389120Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8389332Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8389536Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8389735Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8389928Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8390123Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8390335Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8390547Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8390744Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8390943Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8391236Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8391451Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8391664Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8391863Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8392065Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8392357Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8392583Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8392783Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8392981Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8393193Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8393526Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8393724Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8393925Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8394115Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8394310Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8394523Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8394744Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8394942Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8395130Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8395312Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8395483Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8395624Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8395731Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8395858Z E1204 11:26:39.383000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8396015Z [W1204 11:26:39.652373619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8396018Z 2025-12-04T11:45:25.8396163Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8396459Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8396770Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8396901Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8397382Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8397649Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8397877Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8398082Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8398282Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8398578Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8398815Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8399121Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8399354Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8399646Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8399895Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8400187Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8400419Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8400711Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8400956Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8401163Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8401360Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8401582Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8401782Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8402018Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8402309Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8402505Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8402740Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8403032Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8403279Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8403476Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8403693Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8403898Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8404094Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8404304Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8404523Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8404727Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8404922Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8405128Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8405362Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8405656Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8405901Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8406194Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8406414Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8406620Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8406815Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8407023Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8407222Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8407467Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8407759Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8407993Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8408284Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8408526Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8408816Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8409048Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8409337Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8409580Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8409870Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8410106Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8410420Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8410652Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8410943Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8411173Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8411464Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8411694Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8411995Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8412229Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8412522Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8412758Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8413061Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8413395Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8413598Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8413794Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8414099Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8414330Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8414621Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8414864Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8415158Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8415391Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8415682Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8415914Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8416202Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8416459Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8416748Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8416946Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8417142Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8417351Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8417560Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8417760Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8417992Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8418281Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8418492Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8418686Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8418882Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8419086Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8419316Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8419610Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8419842Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8420135Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8420330Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8420553Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8420754Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8420987Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8421283Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8421504Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8421715Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8421914Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8422119Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8422412Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8422655Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8422948Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8423179Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8423510Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8423744Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8424037Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8424275Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8424569Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8424767Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8424979Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8425199Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8425401Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8425599Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8425800Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8426104Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8426339Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8426635Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8426870Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8427174Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8427409Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8427715Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8427946Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8428240Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8428461Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8428663Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8428863Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8429058Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8429280Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8429479Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8429772Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8429990Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8430202Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8430400Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8430600Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8430893Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8431125Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8431432Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8431664Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8431974Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8432208Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8432504Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8432738Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8433030Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8433299Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8433604Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8433840Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8434133Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8434366Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8434673Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8434905Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8435199Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8435432Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8435738Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8435938Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8436135Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8436382Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8436675Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8436910Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8437203Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8437436Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8437729Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8437972Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8438264Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8438499Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8438792Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8439001Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8439233Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8439525Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8439757Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8440049Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8440275Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8440478Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8440687Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8440889Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8441186Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8441398Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8441602Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8441800Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8442000Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8442305Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8442525Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8442728Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8442925Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8443119Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8443311Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8443508Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8443728Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8443934Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8444129Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8444364Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8444570Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8444764Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8444997Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8445203Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8445401Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8445621Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8445827Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8446024Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8446218Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8446444Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8446644Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8446843Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8447043Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8447348Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8447564Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8447769Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8447968Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8448159Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8448374Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8448586Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8448789Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8448996Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8449196Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8449491Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8449702Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8449907Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8450105Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8450307Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8450611Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8450824Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8451027Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8451224Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8451425Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8451730Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8451926Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8452129Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8452322Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8452529Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8452744Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8452949Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8453156Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8453375Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8453557Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8453729Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8453854Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8453960Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8454087Z E1204 11:26:39.385000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8454245Z [W1204 11:26:39.654533257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8454248Z 2025-12-04T11:45:25.8454396Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8454707Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8455005Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8455138Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8455631Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8455888Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8456114Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8456321Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8456522Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8456831Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8457065Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8457370Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8457603Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8457895Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8458128Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8458418Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8458651Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8458954Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8459175Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8459385Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8459581Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8459788Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8459999Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8460232Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8460522Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8460718Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8460960Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8461254Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8461475Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8461680Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8461899Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8462107Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8462305Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8462500Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8462718Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8462923Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8463127Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8463352Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8463585Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8463879Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8464112Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8464424Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8464645Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8464851Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8465049Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8465269Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8465469Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8465699Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8466007Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8466240Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8466532Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8466763Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8467055Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8467286Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8467591Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8467821Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8468113Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8468346Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8468650Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8468881Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8469173Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8469405Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8469706Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8469939Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8470228Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8470471Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8470767Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8471000Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8471291Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8471514Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8471716Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8471927Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8472219Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8472452Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8472742Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8472988Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8473318Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8473550Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8473843Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8474091Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8474382Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8474613Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8474918Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8475116Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8475314Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8475510Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8475718Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8475917Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8476148Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8476452Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8476647Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8476844Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8477038Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8477246Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8477481Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8477773Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8478005Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8478295Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8478504Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8478710Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8478924Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8479159Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8479453Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8479675Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8479877Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8480079Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8480279Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8480594Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8480827Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8481120Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8481353Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8481656Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8481890Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8482183Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8482420Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8482726Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8482923Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8483124Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8483383Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8483587Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8483788Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8483988Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8484283Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8484517Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8484818Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8485065Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8485360Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8485595Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8485887Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8486134Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8486428Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8486653Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8486854Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8487070Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8487262Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8487472Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8487685Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8487978Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8488202Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8488403Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8488603Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8488802Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8489098Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8489342Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8489636Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8489871Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8490164Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8490408Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8490702Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8490936Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8491228Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8491474Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8491770Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8492012Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8492307Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8492541Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8492832Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8493065Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8493420Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8493671Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8493964Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8494164Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8494362Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8494594Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8494900Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8495132Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8495425Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8495657Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8495962Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8496196Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8496508Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8496744Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8497037Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8497234Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8497468Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8497759Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8497993Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8498297Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8498511Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8498714Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8498915Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8499128Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8499421Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8499635Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8499836Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8500046Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8500246Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8500538Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8500769Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8500971Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8501174Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8501367Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8501514Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8501711Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8501931Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8502137Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8502344Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8502565Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8502771Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8502967Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8503199Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8503436Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8503634Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8503856Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8504060Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8504274Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8504471Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8504682Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8504897Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8505094Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8505296Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8505590Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8505805Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8506008Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8506206Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8506410Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8506604Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8506816Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8507017Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8507216Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8507428Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8507723Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8507938Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8508140Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8508350Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8508550Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8508845Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8509068Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8509270Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8509470Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8509670Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8509963Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8510160Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8510364Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8510565Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8510762Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8510974Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8511180Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8511377Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8511578Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8511761Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8511932Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8512060Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8512163Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8512308Z E1204 11:26:39.388000 980345 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8512350Z FAILED [1.5367s] [100%] 2025-12-04T11:45:25.8512352Z 2025-12-04T11:45:25.8512411Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.8512572Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8512621Z Traceback (most recent call last): 2025-12-04T11:45:25.8512787Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8512843Z method(*args, **kwargs) 2025-12-04T11:45:25.8512998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8513039Z method(*args, **kwargs) 2025-12-04T11:45:25.8513191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8513230Z with policy(): 2025-12-04T11:45:25.8513413Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8513456Z raise RuntimeError(msg) 2025-12-04T11:45:25.8513869Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.8513872Z 2025-12-04T11:45:25.8513951Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8514230Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.8514233Z 2025-12-04T11:45:25.8514323Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8514420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8514464Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8514527Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8515089Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8515193Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8515234Z graph_break [] 2025-12-04T11:45:25.8515302Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8515392Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8515888Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8515938Z current_size = base.storage().size() 2025-12-04T11:45:25.8515979Z Autotune Choices Stats: 2025-12-04T11:45:25.8516361Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.8516443Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8516498Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8516621Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8516868Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8517123Z triton_mm_33 0.0090 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8517354Z triton_mm_16 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8517585Z triton_mm_29 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8517813Z triton_mm_22 0.0108 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8518042Z triton_mm_21 0.0111 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8518270Z triton_mm_30 0.0113 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8518516Z triton_mm_23 0.0114 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8518747Z triton_mm_15 0.0120 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8518978Z triton_mm_31 0.0126 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8519112Z SingleProcess AUTOTUNE benchmarking takes 0.1466 seconds and 1.3969 seconds precompiling for 33 choices 2025-12-04T11:45:25.8519283Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8519333Z Traceback (most recent call last): 2025-12-04T11:45:25.8519493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8519535Z method(*args, **kwargs) 2025-12-04T11:45:25.8519688Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8519730Z method(*args, **kwargs) 2025-12-04T11:45:25.8519880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8519918Z with policy(): 2025-12-04T11:45:25.8520072Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8520124Z raise RuntimeError(msg) 2025-12-04T11:45:25.8520532Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.8520537Z 2025-12-04T11:45:25.8520611Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8520900Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.8520902Z 2025-12-04T11:45:25.8520990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8521068Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8521110Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8521170Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8521725Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8521829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8521867Z graph_break [] 2025-12-04T11:45:25.8521935Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8522012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8522511Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8522562Z current_size = base.storage().size() 2025-12-04T11:45:25.8522602Z Autotune Choices Stats: 2025-12-04T11:45:25.8522980Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.8523047Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8523099Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8523230Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8523512Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8523746Z triton_mm_33 0.0090 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8523976Z triton_mm_16 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8524223Z triton_mm_29 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8524451Z triton_mm_22 0.0108 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8524682Z triton_mm_21 0.0111 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8524920Z triton_mm_30 0.0113 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8525153Z triton_mm_23 0.0114 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8525387Z triton_mm_15 0.0120 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8525617Z triton_mm_31 0.0126 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8525748Z SingleProcess AUTOTUNE benchmarking takes 0.1466 seconds and 1.3969 seconds precompiling for 33 choices 2025-12-04T11:45:25.8525822Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8525869Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8525926Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8526028Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8526529Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8526570Z graph_break [] 2025-12-04T11:45:25.8526633Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8526708Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8526748Z Autotune Choices Stats: 2025-12-04T11:45:25.8527135Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008798999711871147, "best_triton_pos": 0} 2025-12-04T11:45:25.8527202Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8527253Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8527374Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8527612Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8527656Z _scaled_mm 0.0094 ms 94.0% 2025-12-04T11:45:25.8527901Z triton_mm_71 0.0095 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8528130Z triton_mm_67 0.0107 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8528353Z triton_mm_60 0.0112 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8528591Z triton_mm_54 0.0113 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8528820Z triton_mm_59 0.0114 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8529047Z triton_mm_68 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8529281Z triton_mm_61 0.0119 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8529512Z triton_mm_53 0.0124 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8529644Z SingleProcess AUTOTUNE benchmarking takes 0.2424 seconds and 0.8114 seconds precompiling for 39 choices 2025-12-04T11:45:25.8529701Z =================================== FAILURES =================================== 2025-12-04T11:45:25.8529871Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8529917Z Traceback (most recent call last): 2025-12-04T11:45:25.8530075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8530118Z method(*args, **kwargs) 2025-12-04T11:45:25.8530275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8530314Z method(*args, **kwargs) 2025-12-04T11:45:25.8530467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8530505Z with policy(): 2025-12-04T11:45:25.8530660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8530711Z raise RuntimeError(msg) 2025-12-04T11:45:25.8531121Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.8531124Z 2025-12-04T11:45:25.8531200Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8531473Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.8531485Z 2025-12-04T11:45:25.8531574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8531648Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8531692Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8531749Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8532305Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8532415Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8532455Z graph_break [] 2025-12-04T11:45:25.8532519Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8532593Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8533082Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8533131Z current_size = base.storage().size() 2025-12-04T11:45:25.8533173Z Autotune Choices Stats: 2025-12-04T11:45:25.8533583Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008559999987483025, "best_triton_pos": 0} 2025-12-04T11:45:25.8533651Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8533702Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8533838Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8534076Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8534313Z triton_mm_33 0.0090 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8534542Z triton_mm_16 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8534786Z triton_mm_29 0.0106 ms 80.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8535016Z triton_mm_22 0.0108 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8535245Z triton_mm_21 0.0111 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8535471Z triton_mm_30 0.0113 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8535717Z triton_mm_23 0.0114 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8535947Z triton_mm_15 0.0120 ms 71.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8536193Z triton_mm_31 0.0126 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8536325Z SingleProcess AUTOTUNE benchmarking takes 0.1466 seconds and 1.3969 seconds precompiling for 33 choices 2025-12-04T11:45:25.8536402Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8536443Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8536502Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8536602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8537085Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8537123Z graph_break [] 2025-12-04T11:45:25.8537189Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8537263Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8537304Z Autotune Choices Stats: 2025-12-04T11:45:25.8537679Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008798999711871147, "best_triton_pos": 0} 2025-12-04T11:45:25.8537745Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8537797Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8537915Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8538152Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8538195Z _scaled_mm 0.0094 ms 94.0% 2025-12-04T11:45:25.8538437Z triton_mm_71 0.0095 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8538663Z triton_mm_67 0.0107 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8538891Z triton_mm_60 0.0112 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8539117Z triton_mm_54 0.0113 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8539354Z triton_mm_59 0.0114 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8539583Z triton_mm_68 0.0116 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8539820Z triton_mm_61 0.0119 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8540049Z triton_mm_53 0.0124 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8540182Z SingleProcess AUTOTUNE benchmarking takes 0.2424 seconds and 0.8114 seconds precompiling for 39 choices 2025-12-04T11:45:25.8540258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8540300Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8540357Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8540456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8540942Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8540980Z graph_break [] 2025-12-04T11:45:25.8541042Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8541134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8541174Z Autotune Choices Stats: 2025-12-04T11:45:25.8541543Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.8541609Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8541659Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8541778Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8542033Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8542269Z triton_mm_109 0.0098 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8542500Z triton_mm_105 0.0106 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8542728Z triton_mm_92 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8542967Z triton_mm_97 0.0111 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8543197Z triton_mm_106 0.0111 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8543452Z triton_mm_98 0.0112 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8543696Z triton_mm_99 0.0117 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8543926Z triton_mm_91 0.0120 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8544157Z triton_mm_107 0.0126 ms 68.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8544288Z SingleProcess AUTOTUNE benchmarking takes 0.2462 seconds and 0.6320 seconds precompiling for 39 choices 2025-12-04T11:45:25.8544481Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-51332554a7ab49c3.xml - 2025-12-04T11:45:25.8544544Z =========================== short test summary info ============================ 2025-12-04T11:45:25.8545181Z FAILED [1.5367s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.8545185Z 2025-12-04T11:45:25.8545259Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8545532Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.8545535Z 2025-12-04T11:45:25.8545622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8545686Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.8545755Z ================== 1 failed, 187 deselected, 2 rerun in 6.98s ================== 2025-12-04T11:45:25.8545794Z Got exit code 1 2025-12-04T11:45:25.8546031Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:25.8546158Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:25.8546303Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8213eb35be992653.xml 2025-12-04T11:45:25.8546363Z ============================= test session starts ============================== 2025-12-04T11:45:25.8546476Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.8546517Z cachedir: .pytest_cache 2025-12-04T11:45:25.8546691Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.8546740Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.8546782Z configfile: pytest.ini 2025-12-04T11:45:25.8546948Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.8547026Z collecting ... collected 188 items / 110 deselected / 78 selected 2025-12-04T11:45:25.8547082Z stepcurrent: skipping 110 already run items. 2025-12-04T11:45:25.8547126Z Running 78 items in this shard 2025-12-04T11:45:25.8547138Z 2025-12-04T11:45:25.8548065Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/6t/c6t2x5i6uctdtle4hcsk2rcyyop7477fnwyvehlv2shtyuaatjjx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8548217Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8548438Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8548597Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8548741Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8549044Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8549179Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8549436Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8549576Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8549832Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8550001Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8550270Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8550405Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8550681Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8550876Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8551203Z E1204 11:26:49.405000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8551937Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/ul/culwnzaovofzhydoydyza7xxtefoovd3x5vkl5t5hdiixr6f5afg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8552094Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8552309Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8552464Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8552607Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8552895Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8553026Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8553317Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8553469Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8553725Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8553882Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8554150Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8554286Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8554575Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8554769Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8555085Z E1204 11:26:49.431000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8555814Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/vm/cvmfw2cu3sn7jlq753yuf4uiakz2uu6vimioiewglubd2dhg5422.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8555976Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8556214Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8556367Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8556515Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8556800Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8556931Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8557186Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8557323Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8557579Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8557747Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8558013Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8558149Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8558424Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8558626Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8558943Z E1204 11:26:49.463000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8559666Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/e4/ce46a72ekvnxka62ueaol4spu6t6ojo2hgonrzg2phswjjjaqfgx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8559825Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8560039Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8560192Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8560348Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8560633Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8560766Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8561023Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8561159Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8561416Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8561571Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8561843Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8561985Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8562260Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8562452Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8562767Z E1204 11:26:49.470000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8563548Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/wy/cwyhxrqkddzmghmnuj4hvl5g5cmipegyln2gxgzm24erxh33v53a.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8563695Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8563911Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8564078Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8564224Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8564508Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8564652Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8564908Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8565044Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8565300Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8565454Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8565724Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8565857Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8566132Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8566339Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8566652Z E1204 11:26:49.472000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8567388Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. for benchmark choice TritonTemplateCaller(/tmp/tmpnibohr57/e5/ce5u2q2qq3brrt6wbjdzmerjriiasieoloynfx47cjz34r4szh6d.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8567536Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8567750Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8567904Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8568048Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8568343Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8568474Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8568730Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8568876Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8569128Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8569283Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8569552Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8569685Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8569961Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8570154Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8570467Z E1204 11:26:49.474000 986269 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8570531Z ('RERUN', {'yellow': True}) [3.3083s] [ 1%] 2025-12-04T11:45:25.8570868Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:26:51.439000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8571165Z E1204 11:26:51.439000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8571292Z E1204 11:26:51.439000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8571437Z E1204 11:26:51.442000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8571742Z E1204 11:26:51.442000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8571869Z E1204 11:26:51.442000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8572013Z E1204 11:26:51.444000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8572306Z E1204 11:26:51.444000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8572449Z E1204 11:26:51.444000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8572593Z E1204 11:26:51.507000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8572889Z E1204 11:26:51.507000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8573025Z E1204 11:26:51.507000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8573167Z E1204 11:26:51.509000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8573485Z E1204 11:26:51.509000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8573613Z E1204 11:26:51.509000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8573759Z E1204 11:26:51.511000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8574049Z E1204 11:26:51.511000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8574176Z E1204 11:26:51.511000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8574223Z ('RERUN', {'yellow': True}) [1.7615s] [ 1%] 2025-12-04T11:45:25.8574561Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda E1204 11:26:52.987000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8574869Z E1204 11:26:52.987000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8574995Z E1204 11:26:52.987000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8575139Z E1204 11:26:52.989000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8575429Z E1204 11:26:52.989000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8575556Z E1204 11:26:52.989000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8575709Z E1204 11:26:52.991000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8576003Z E1204 11:26:52.991000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8576130Z E1204 11:26:52.991000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8576272Z E1204 11:26:53.032000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8576562Z E1204 11:26:53.032000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8576703Z E1204 11:26:53.032000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8576847Z E1204 11:26:53.034000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8577139Z E1204 11:26:53.034000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8577279Z E1204 11:26:53.034000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8577420Z E1204 11:26:53.036000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8577714Z E1204 11:26:53.036000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help.. 2025-12-04T11:45:25.8577841Z E1204 11:26:53.036000 986269 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8577882Z FAILED [1.5592s] [ 1%] 2025-12-04T11:45:25.8577884Z 2025-12-04T11:45:25.8577938Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.8578099Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8578146Z Traceback (most recent call last): 2025-12-04T11:45:25.8578304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8578347Z method(*args, **kwargs) 2025-12-04T11:45:25.8578502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8578543Z method(*args, **kwargs) 2025-12-04T11:45:25.8578707Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8578744Z with policy(): 2025-12-04T11:45:25.8578898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8578938Z raise RuntimeError(msg) 2025-12-04T11:45:25.8579353Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.8579357Z 2025-12-04T11:45:25.8579433Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8579720Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.8579722Z 2025-12-04T11:45:25.8579811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8579886Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8579931Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8579989Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8580545Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8580659Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8580697Z graph_break [] 2025-12-04T11:45:25.8580762Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8580838Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8581329Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8581389Z current_size = base.storage().size() 2025-12-04T11:45:25.8581433Z Autotune Choices Stats: 2025-12-04T11:45:25.8581807Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.8581876Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8581929Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8582053Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8582289Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8582521Z triton_mm_33 0.0095 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8582757Z triton_mm_29 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8582980Z triton_mm_21 0.0112 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8583208Z triton_mm_22 0.0115 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8583481Z triton_mm_16 0.0115 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8583709Z triton_mm_30 0.0116 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8583938Z triton_mm_23 0.0119 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8584169Z triton_mm_15 0.0123 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8584410Z triton_mm_31 0.0126 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8584541Z SingleProcess AUTOTUNE benchmarking takes 0.1591 seconds and 1.0135 seconds precompiling for 33 choices 2025-12-04T11:45:25.8584701Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8584746Z Traceback (most recent call last): 2025-12-04T11:45:25.8584921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8584961Z method(*args, **kwargs) 2025-12-04T11:45:25.8585115Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8585158Z method(*args, **kwargs) 2025-12-04T11:45:25.8585314Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8585352Z with policy(): 2025-12-04T11:45:25.8585510Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8585551Z raise RuntimeError(msg) 2025-12-04T11:45:25.8585959Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.8585962Z 2025-12-04T11:45:25.8586037Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8586309Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.8586312Z 2025-12-04T11:45:25.8586402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8586489Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8586535Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8586593Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8587143Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8587245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8587284Z graph_break [] 2025-12-04T11:45:25.8587366Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8587444Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8587930Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8587979Z current_size = base.storage().size() 2025-12-04T11:45:25.8588022Z Autotune Choices Stats: 2025-12-04T11:45:25.8588392Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.8588474Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8588525Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8588649Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8588884Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8589126Z triton_mm_33 0.0095 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8589353Z triton_mm_29 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8589580Z triton_mm_21 0.0112 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8589808Z triton_mm_22 0.0115 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8590034Z triton_mm_16 0.0115 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8590268Z triton_mm_30 0.0116 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8590494Z triton_mm_23 0.0119 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8590722Z triton_mm_15 0.0123 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8590948Z triton_mm_31 0.0126 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8591081Z SingleProcess AUTOTUNE benchmarking takes 0.1591 seconds and 1.0135 seconds precompiling for 33 choices 2025-12-04T11:45:25.8591170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8591214Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8591275Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8591374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8591865Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8591913Z graph_break [] 2025-12-04T11:45:25.8591977Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8592050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8592094Z Autotune Choices Stats: 2025-12-04T11:45:25.8592461Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008840000256896019, "best_triton_pos": 0} 2025-12-04T11:45:25.8592540Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8592590Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8592709Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8592943Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8593176Z triton_mm_71 0.0095 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8593438Z triton_mm_67 0.0107 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8593661Z triton_mm_60 0.0108 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8593888Z triton_mm_54 0.0111 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8594127Z triton_mm_68 0.0111 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8594351Z triton_mm_59 0.0114 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8594580Z triton_mm_61 0.0114 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8594806Z triton_mm_53 0.0119 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8595047Z triton_mm_69 0.0124 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8595177Z SingleProcess AUTOTUNE benchmarking takes 0.2544 seconds and 0.8116 seconds precompiling for 39 choices 2025-12-04T11:45:25.8595232Z =================================== FAILURES =================================== 2025-12-04T11:45:25.8595390Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.8595439Z Traceback (most recent call last): 2025-12-04T11:45:25.8595596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8595653Z method(*args, **kwargs) 2025-12-04T11:45:25.8595808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.8595850Z method(*args, **kwargs) 2025-12-04T11:45:25.8596002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.8596041Z with policy(): 2025-12-04T11:45:25.8596194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.8596250Z raise RuntimeError(msg) 2025-12-04T11:45:25.8596653Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.8596658Z 2025-12-04T11:45:25.8596735Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8597008Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.8597010Z 2025-12-04T11:45:25.8597098Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8597174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8597218Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8597277Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8597825Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8597937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8597975Z graph_break [] 2025-12-04T11:45:25.8598041Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8598119Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8598612Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.8598662Z current_size = base.storage().size() 2025-12-04T11:45:25.8598702Z Autotune Choices Stats: 2025-12-04T11:45:25.8599083Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.8599152Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8599205Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8599326Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8599566Z triton_mm_34 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8599807Z triton_mm_33 0.0095 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8600033Z triton_mm_29 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8600270Z triton_mm_21 0.0112 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8600495Z triton_mm_22 0.0115 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8600726Z triton_mm_16 0.0115 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8600952Z triton_mm_30 0.0116 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8601182Z triton_mm_23 0.0119 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8601411Z triton_mm_15 0.0123 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8601655Z triton_mm_31 0.0126 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8601786Z SingleProcess AUTOTUNE benchmarking takes 0.1591 seconds and 1.0135 seconds precompiling for 33 choices 2025-12-04T11:45:25.8601861Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8601907Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8601962Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8602063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8602559Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8602600Z graph_break [] 2025-12-04T11:45:25.8602663Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8602736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8602777Z Autotune Choices Stats: 2025-12-04T11:45:25.8603147Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008840000256896019, "best_triton_pos": 0} 2025-12-04T11:45:25.8603214Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8603307Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8603427Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8603662Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8603892Z triton_mm_71 0.0095 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8604132Z triton_mm_67 0.0107 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8604359Z triton_mm_60 0.0108 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8604584Z triton_mm_54 0.0111 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8604808Z triton_mm_68 0.0111 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8605036Z triton_mm_59 0.0114 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8605262Z triton_mm_61 0.0114 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8605502Z triton_mm_53 0.0119 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8605728Z triton_mm_69 0.0124 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8605862Z SingleProcess AUTOTUNE benchmarking takes 0.2544 seconds and 0.8116 seconds precompiling for 39 choices 2025-12-04T11:45:25.8605935Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.8605980Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.8606037Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.8606153Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.8606640Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.8606680Z graph_break [] 2025-12-04T11:45:25.8606745Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.8606818Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.8606863Z Autotune Choices Stats: 2025-12-04T11:45:25.8607229Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00863999966531992, "best_triton_pos": 0} 2025-12-04T11:45:25.8607309Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.8607358Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.8607480Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.8607727Z triton_mm_110 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8607957Z triton_mm_109 0.0094 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8608185Z triton_mm_105 0.0107 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8608412Z triton_mm_92 0.0110 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8608637Z triton_mm_97 0.0111 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8608860Z triton_mm_98 0.0112 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8609104Z triton_mm_106 0.0113 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.8609330Z triton_mm_99 0.0115 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8609556Z triton_mm_91 0.0118 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8609785Z triton_mm_107 0.0126 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.8609924Z SingleProcess AUTOTUNE benchmarking takes 0.2548 seconds and 0.6335 seconds precompiling for 39 choices 2025-12-04T11:45:25.8610118Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8213eb35be992653.xml - 2025-12-04T11:45:25.8610179Z =========================== short test summary info ============================ 2025-12-04T11:45:25.8610800Z FAILED [1.5592s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.8610816Z 2025-12-04T11:45:25.8610889Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.8611164Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.8611166Z 2025-12-04T11:45:25.8611255Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.8611319Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.8611399Z ================== 1 failed, 110 deselected, 2 rerun in 6.65s ================== 2025-12-04T11:45:25.8611437Z Got exit code 1 2025-12-04T11:45:25.8611478Z Retrying single test... 2025-12-04T11:45:25.8611622Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a296e5305c319eb1.xml 2025-12-04T11:45:25.8611682Z ============================= test session starts ============================== 2025-12-04T11:45:25.8611796Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.8611842Z cachedir: .pytest_cache 2025-12-04T11:45:25.8612002Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.8612051Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.8612091Z configfile: pytest.ini 2025-12-04T11:45:25.8612256Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.8612331Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.8612602Z stepcurrent: skipping 110 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.8612648Z Running 1 items in this shard 2025-12-04T11:45:25.8612650Z 2025-12-04T11:45:25.8613014Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:03.457377968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8613017Z 2025-12-04T11:45:25.8613171Z [W1204 11:27:03.831327875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8613179Z 2025-12-04T11:45:25.8613528Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8613829Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8613976Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8614465Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8614725Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8614966Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8615178Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8615379Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8615696Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8615933Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8616231Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8616464Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8616755Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8616987Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8617293Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8617527Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8617818Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8618049Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8618355Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8618587Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8618880Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8619077Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8619322Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8619613Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8619809Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8620051Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8620341Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8620579Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8620870Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8621094Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8621298Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8621503Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8621728Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8621896Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8622078Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8622605Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/ul/culwnzaovofzhydoydyza7xxtefoovd3x5vkl5t5hdiixr6f5afg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8622767Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8622988Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8623149Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8623319Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8623607Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8623759Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8624018Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8624157Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8624425Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8624580Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8624851Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8624989Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8625269Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8625462Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8625782Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8626090Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8626223Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8626701Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8626970Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8627196Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8627404Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8627608Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8627901Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8628146Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8628441Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8628682Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8628974Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8629207Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8629498Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8629731Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8630026Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8630271Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8630562Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8630797Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8631088Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8631296Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8631527Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8631817Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8632013Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8632247Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8632560Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8632791Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8633095Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8633346Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8633557Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8633758Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8633967Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8634134Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8634313Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8634418Z E1204 11:27:03.528000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8634746Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8635040Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8635173Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8635665Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8635919Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8636142Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8636349Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8636549Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8636857Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8637094Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8637398Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8637629Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8637923Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8638156Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8638451Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8638682Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8638986Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8639216Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8639505Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8639737Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8640040Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8640237Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8640468Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8640762Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8640969Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8641205Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8641495Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8641740Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8642031Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8642253Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8642462Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8642662Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8642873Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8643038Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8643227Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8643793Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/6t/c6t2x5i6uctdtle4hcsk2rcyyop7477fnwyvehlv2shtyuaatjjx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8643941Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8644160Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8644334Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8644482Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8644767Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8644900Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8645159Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8645311Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8645567Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8645723Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8646009Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8646143Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8646423Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8646615Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8646930Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8647223Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8647354Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8647846Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8648102Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8648330Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8648555Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8648756Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8650513Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8650750Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8651043Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8651293Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8651583Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8651827Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8652122Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8652357Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8652646Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8652879Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8653168Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8653453Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8653745Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8653941Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8654175Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8654482Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8654679Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8654912Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8655203Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8655434Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8655740Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8655960Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8656179Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8656379Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8656593Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8656762Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8656940Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8657043Z E1204 11:27:03.569000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8657202Z [W1204 11:27:03.839178002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8657204Z 2025-12-04T11:45:25.8657356Z [W1204 11:27:03.841085784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8657359Z 2025-12-04T11:45:25.8657685Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8657979Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8658111Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8658588Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8658850Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8659078Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8659285Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8659484Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8659791Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8660023Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8660314Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8660556Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8660847Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8661078Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8661374Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8661607Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8661900Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8662144Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8662437Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8662669Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8662957Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8663166Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8663423Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8663718Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8663914Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8664158Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8664452Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8664683Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8664985Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8665207Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8665413Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8665613Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8665826Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8665994Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8666173Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8666719Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/wy/cwyhxrqkddzmghmnuj4hvl5g5cmipegyln2gxgzm24erxh33v53a.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8666867Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8667081Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8667237Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8667394Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8667680Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8667813Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8668071Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8668210Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8668480Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8668635Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8668902Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8669051Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8669325Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8669520Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8669834Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8670128Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8670260Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8670753Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8671005Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8671231Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8671436Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8671648Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8671938Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8672173Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8672464Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8672708Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8673000Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8673233Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8673573Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8673804Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8674096Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8674326Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8674616Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8674849Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8675157Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8675356Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8675587Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8675876Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8676086Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8676317Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8676607Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8676838Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8677150Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8677369Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8677575Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8677788Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8677999Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8678168Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8678345Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8678447Z E1204 11:27:03.572000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8678604Z [W1204 11:27:03.848638825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8678606Z 2025-12-04T11:45:25.8678914Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8679218Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8679349Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8679828Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8680081Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8680318Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8680522Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8680720Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8681011Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8681257Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8681549Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8681780Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8682094Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8682326Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8682618Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8682847Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8683138Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8683402Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8683707Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8683938Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8684227Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8684428Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8684677Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8684968Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8685167Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8685395Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8685701Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8685931Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8686221Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8686455Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8686660Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8686864Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8687073Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8687239Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8687417Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8687953Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/e5/ce5u2q2qq3brrt6wbjdzmerjriiasieoloynfx47cjz34r4szh6d.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8688101Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8688316Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8688472Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8688616Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8688915Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8689048Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8689307Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8689444Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8689699Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8689866Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8690135Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8690269Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8690553Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8690746Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8691061Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8691353Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8691485Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8691961Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8692227Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8692450Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8692658Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8692858Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8693159Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8693432Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8693725Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8693961Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8694265Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8694498Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8694792Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8695036Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8695327Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8695562Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8695852Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8696085Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8696378Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8696590Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8696820Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8697112Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8697306Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8697550Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8697841Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8698072Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8698363Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8698599Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8698807Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8699007Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8699227Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8699391Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8699570Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8699673Z E1204 11:27:03.575000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8699982Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8700275Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8700405Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8700895Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8701147Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8701374Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8701578Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8701788Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8702082Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8702314Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8702605Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8702849Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8703144Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8703423Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8703724Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8703957Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8704249Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8704480Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8704769Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8704999Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8705302Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8705497Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8705731Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8706020Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8706231Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8706464Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8706754Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8706987Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8707291Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8707512Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8707716Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8707932Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8708142Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8708309Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8708487Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8709008Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/vm/cvmfw2cu3sn7jlq753yuf4uiakz2uu6vimioiewglubd2dhg5422.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8709156Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8709371Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8709536Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8709682Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8709966Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8710100Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8710358Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8710508Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8710762Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8710918Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8711186Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8711321Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8711608Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8711800Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8712114Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8712420Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8712552Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8713029Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8713304Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8713530Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8713751Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8713953Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8714244Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8714480Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8714794Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8715027Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8715317Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8715548Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8715853Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8716085Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8716375Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8716618Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8716906Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8717141Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8717431Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8717628Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8717860Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8718165Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8718361Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8718591Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8718881Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8719123Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8719417Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8719639Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8719845Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8720045Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8720267Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8720432Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8720608Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8720722Z E1204 11:27:03.582000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8720877Z [W1204 11:27:03.851273437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8720879Z 2025-12-04T11:45:25.8721190Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8721485Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8721615Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8722095Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8722359Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8722584Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8722790Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8722988Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8723323Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8723555Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8723847Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8724080Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8724373Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8724621Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8724910Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8725239Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8725530Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8725763Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8726052Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8726285Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8726577Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8726790Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8727021Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8727311Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8727508Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8727738Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8728041Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8728272Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8728563Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8728782Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8728999Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8729200Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8729409Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8729588Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8729766Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8730294Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp8am03hqu/e4/ce46a72ekvnxka62ueaol4spu6t6ojo2hgonrzg2phswjjjaqfgx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.8730442Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.8730658Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.8730812Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.8730960Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.8731265Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.8731398Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.8731656Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.8731795Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.8732062Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.8732218Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.8732486Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.8732621Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.8732895Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.8733100Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.8733446Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8733740Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8733885Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8734362Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8734615Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8734841Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8735049Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8735251Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8735556Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8735794Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8736088Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8736321Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8736624Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8736854Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8737146Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8737376Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8737680Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8737911Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8738212Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8738444Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8738736Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8738933Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8739164Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8739455Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8739652Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8739894Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8740184Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8740417Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8740710Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8740944Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8741150Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8741352Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.8741561Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.8741739Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.8741917Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.8742019Z E1204 11:27:03.584000 992188 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.8742069Z ('RERUN', {'yellow': True}) [3.6045s] [100%] 2025-12-04T11:45:25.8742431Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:05.667753994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8742434Z 2025-12-04T11:45:25.8742578Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8742877Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8743170Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8743343Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8743817Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8744083Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8744308Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8744514Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8744713Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8745018Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8745252Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8745542Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8745774Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8746080Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8746312Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8746601Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8746852Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8747140Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8747362Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8747569Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8747768Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8747976Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8748175Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8748415Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8748707Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8748904Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8749134Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8749437Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8749657Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8749853Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8750070Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8750284Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8750482Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8750675Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8750905Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8751108Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8751305Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8751500Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8751730Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8752024Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8752255Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8752560Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8752777Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8752982Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8753177Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8753409Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8753623Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8753853Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8754144Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8754377Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8754685Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8754916Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8755205Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8755448Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8755740Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8755970Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8756261Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8756492Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8756787Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8757031Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8757320Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8757551Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8757840Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8758085Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8758374Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8758605Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8758894Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8759139Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8759429Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8759658Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8759859Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8760055Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8760347Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8760576Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8760867Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8761098Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8761402Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8761635Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8761925Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8762155Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8762457Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8762689Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8762981Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8763176Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8763420Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8763615Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8763823Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8764034Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8764265Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8764557Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8764751Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8764946Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8765140Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8765334Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8765577Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8765868Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8766101Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8766392Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8766604Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8766812Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8767013Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8767246Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8767540Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8767774Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8767976Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8768186Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8768390Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8768686Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8768919Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8769213Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8769446Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8769741Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8769985Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8770276Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8770508Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8770803Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8771013Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8771210Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8771433Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8771634Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8771843Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8772043Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8772335Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8772578Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8772870Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8773107Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8773424Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8773656Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8773950Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8774200Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8774493Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8774715Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8774916Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8775117Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8775323Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8775536Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8775736Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8776027Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8776261Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8776463Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8776662Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8776874Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8777167Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8777402Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8777698Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8777932Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8778226Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8778471Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8778765Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8778998Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8779290Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8779543Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8779836Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8780074Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8780375Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8780617Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8780909Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8781141Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8781444Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8781681Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8781973Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8782170Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8782368Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8782604Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8782909Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8783141Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8783468Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8783701Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8784012Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8784245Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8784537Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8784771Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8785082Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8785279Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8785510Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8785815Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8786048Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8786342Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8786556Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8786758Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8786958Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8787159Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8787468Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8787680Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8787882Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8788081Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8788291Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8788585Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8788806Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8789008Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8789216Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8789408Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8789557Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8789754Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8789984Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8790190Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8790388Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8790608Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8790815Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8791011Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8791234Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8791451Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8791646Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8791869Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8792073Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8792271Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8792475Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8792688Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8792889Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8793090Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8793335Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8793629Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8793841Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8794058Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8794260Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8794453Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8794649Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8794862Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8795063Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8795261Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8795461Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8795777Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8795989Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8796190Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8796390Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8796603Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8796897Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8797109Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8797310Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8797520Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8797723Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8798015Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8798223Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8798424Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8798618Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8798816Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8799028Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8799233Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8799429Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8799620Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8799815Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8799986Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8800113Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8800216Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8800342Z E1204 11:27:05.406000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8800499Z [W1204 11:27:05.676078513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8800501Z 2025-12-04T11:45:25.8800657Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8800954Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8801252Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8801380Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8801872Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8802125Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8802360Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8802566Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8802768Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8803061Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8803333Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8803625Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8803871Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8804162Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8804395Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8804686Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8804933Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8805227Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8805449Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8805660Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8805856Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8806077Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8806274Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8806505Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8806813Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8807010Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8807244Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8807535Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8807757Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8807954Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8808185Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8808390Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8808586Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8808781Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8808999Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8809222Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8809418Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8809613Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8809845Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8810140Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8810386Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8810675Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8810913Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8811117Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8811314Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8811521Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8811718Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8811951Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8812241Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8812489Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8812783Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8813016Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8813341Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8813587Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8813879Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8814109Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8814399Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8814644Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8814940Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8815173Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8815475Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8815707Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8815999Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8816230Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8816521Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8816751Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8817054Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8817287Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8817578Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8817797Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8818010Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8818207Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8818498Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8818729Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8819031Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8819262Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8819556Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8819798Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8820090Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8820320Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8820610Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8820841Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8821133Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8821339Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8821536Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8821733Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8821941Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8822141Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8822384Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8822676Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8822872Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8823068Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8823301Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8823496Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8823727Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8824030Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8824264Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8824555Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8824750Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8824957Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8825157Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8825392Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8825697Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8825918Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8826120Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8826322Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8826543Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8826837Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8827071Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8827363Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8827617Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8827909Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8828141Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8828447Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8828680Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8828977Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8829173Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8829371Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8829591Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8829804Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8830003Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8830203Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8830496Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8830729Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8831034Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8831268Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8831563Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8831795Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8832099Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8832331Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8832633Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8832854Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8833057Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8833283Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8833477Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8833688Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8833888Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8834193Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8834415Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8834615Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8834814Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8835016Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8835321Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8835557Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8835851Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8836084Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8836390Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8836623Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8836928Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8837159Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8837453Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8837684Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8837978Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8838211Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8838516Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8838748Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8839039Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8839271Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8839572Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8839805Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8840100Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8840297Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8840507Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8840744Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8841038Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8841281Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8841572Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8841808Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8842099Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8842332Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8842622Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8842869Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8843163Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8843387Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8843622Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8843931Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8844164Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8844457Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8844674Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8844889Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8845088Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8845292Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8845601Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8845813Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8846016Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8846214Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8846413Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8846707Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8846928Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8847145Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8847343Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8847536Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8847686Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8847883Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8848115Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8848324Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8848519Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8848740Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8848945Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8849153Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8849372Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8849579Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8849785Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8850007Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8850215Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8850411Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8850608Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8850820Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8851023Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8851233Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8851435Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8851730Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8851944Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8852149Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8852356Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8852550Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8852746Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8852958Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8853168Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8853394Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8853595Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8853902Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8854115Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8854318Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8854520Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8854719Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8855012Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8855225Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8855440Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8855638Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8855837Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8856132Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8856329Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8856545Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8856737Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8856934Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8857147Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8857366Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8857563Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8857752Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8857944Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8858113Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8858240Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8858345Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8858472Z E1204 11:27:05.409000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8858629Z [W1204 11:27:05.678272002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8858632Z 2025-12-04T11:45:25.8858779Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8859073Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8859369Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8859516Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8859993Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8860248Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8860486Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8860692Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8860892Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8861187Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8861422Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8861725Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8861959Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8862260Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8862490Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8862783Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8863013Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8863333Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8863556Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8863766Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8863979Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8864186Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8864386Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8864617Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8864922Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8865117Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8865348Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8865639Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8865871Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8866069Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8866286Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8866507Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8866702Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8866897Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8867116Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8867319Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8867518Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8867711Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8867944Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8868248Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8868480Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8868773Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8868991Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8869206Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8869401Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8869610Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8869807Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8870051Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8870346Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8870577Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8870882Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8871113Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8871409Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8871639Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8871931Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8872161Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8872461Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8872695Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8872989Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8873220Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8873551Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8873780Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8874071Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8874300Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8874606Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8874836Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8875130Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8875381Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8875673Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8875894Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8876097Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8876293Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8876584Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8876830Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8877121Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8877353Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8877647Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8877893Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8878186Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8878419Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8878708Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8878952Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8879242Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8879439Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8879646Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8879843Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8880054Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8880253Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8880484Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8880776Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8880973Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8881178Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8881374Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8881568Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8881798Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8882103Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8882336Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8882628Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8882824Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8883031Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8883245Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8883508Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8883812Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8884032Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8884237Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8884438Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8884644Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8884939Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8885173Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8885482Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8885713Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8886007Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8886238Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8886547Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8886779Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8887073Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8887270Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8887479Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8887701Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8887902Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8888111Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8888309Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8888604Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8888838Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8889134Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8889370Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8889663Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8889905Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8890197Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8890430Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8890734Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8890954Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8891157Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8891356Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8891551Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8891779Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8891979Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8892271Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8892502Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8892704Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8892904Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8893104Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8893438Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8893673Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8893973Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8894220Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8894513Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8894745Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8895052Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8895285Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8895576Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8895809Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8896100Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8896352Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8896645Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8896891Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8897184Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8897417Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8897709Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8897941Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8898233Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8898446Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8898646Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8898879Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8899173Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8899408Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8899713Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8899946Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8900240Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8900471Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8900776Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8901009Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8901313Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8901508Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8901744Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8902035Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8902268Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8902558Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8902773Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8902984Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8903182Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8903426Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8903720Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8903949Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8904151Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8904349Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8904552Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8904843Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8905082Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8905287Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8905497Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8905691Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8905839Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8906036Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8906255Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8906462Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8906658Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8906881Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8907101Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8907296Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8907518Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8907722Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8907921Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8908159Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8908366Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8908563Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8908758Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8908983Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8909184Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8909382Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8909593Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8909888Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8910104Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8910308Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8910507Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8910698Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8910895Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8911120Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8911322Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8911520Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8911722Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8912016Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8912241Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8912444Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8912643Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8912844Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8913149Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8913396Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8913598Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8913808Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8914009Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8914304Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8914499Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8914704Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8914895Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8915089Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8915316Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8915524Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8915722Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8915914Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8916094Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8916280Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8916407Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8916513Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8916638Z E1204 11:27:05.411000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8916796Z [W1204 11:27:05.720126797 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8916799Z 2025-12-04T11:45:25.8916949Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.8917256Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.8917555Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.8917685Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.8918174Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.8918431Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.8918657Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.8918864Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.8919062Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8919359Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8919604Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8919895Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8920128Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8920419Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8920664Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8920953Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8921187Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8921477Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8921714Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8921920Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8922126Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8922333Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8922532Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8922767Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8923059Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8923304Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8923535Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8923848Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8924071Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8924267Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8924488Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8924692Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8924902Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8925098Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8925316Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8925521Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8925717Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8925928Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8926160Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8926454Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8926793Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8927087Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8927310Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8927513Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8927711Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8927918Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8928131Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8928362Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8928659Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8928894Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8929197Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8929433Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8929726Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8929965Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8930267Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8930501Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8930798Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8931042Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8931336Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8931569Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8931861Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8932093Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8932384Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8932627Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8932918Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8933151Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8933491Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8933742Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8934036Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8934256Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8934459Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8934668Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.8934962Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8935194Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8935497Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8935733Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8936028Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8936260Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8936552Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8936783Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8937088Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8937322Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8937616Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8937812Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8938023Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8938221Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8938431Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8938631Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8938863Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8939167Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8939362Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8939556Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8939768Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8939963Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8940196Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8940492Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8940724Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8941016Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8941215Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8941434Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8941637Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8941872Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8942165Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8942400Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8942601Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8942805Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8943005Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8943406Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8943641Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8943935Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8944187Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8944479Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8944716Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8945010Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8945248Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8945545Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8945757Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8945956Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8946176Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8946379Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8946580Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8946797Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8947091Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8947326Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8947625Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8947873Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8948168Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8948413Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8948706Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8948939Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8949234Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8949455Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8949658Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8950868Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8951076Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8951290Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.8951490Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8951786Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8952031Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8952235Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8952435Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8952637Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8952932Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8953183Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8953519Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8953752Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8954048Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8954285Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8954579Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8954813Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8955106Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8955387Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8955697Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8955932Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8956226Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8956461Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8956778Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8957011Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8957304Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8957538Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8957847Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8958045Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8958242Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8958476Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8958774Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8959008Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8959302Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8959534Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8959847Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8960091Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8960385Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8960623Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8960933Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8961141Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8966424Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8967037Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8967620Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.8968222Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8968767Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8969223Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8969665Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8970122Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8970654Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8971200Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8971662Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8972098Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8972558Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8973104Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8973713Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.8974174Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8974742Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8975219Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8975599Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.8975979Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8976432Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8976893Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8977344Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8977795Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8978257Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8978693Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8979146Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8979610Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.8980049Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8980499Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.8980960Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.8981396Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.8981865Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8982312Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8982760Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8983193Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8983688Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8984234Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8984776Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8985227Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8985670Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8986107Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.8986528Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.8986969Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8987419Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8987853Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8988287Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8988817Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8989360Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8989811Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8990245Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8990713Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8991243Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8991785Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.8992233Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.8992668Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.8993115Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.8993678Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.8994201Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.8994635Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.8995080Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.8995500Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.8995940Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.8996393Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.8996831Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.8997254Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.8997662Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.8998050Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.8998383Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.8998649Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.8998917Z E1204 11:27:05.453000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.8999252Z [W1204 11:27:05.722320125 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.8999446Z 2025-12-04T11:45:25.8999607Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9000090Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9000721Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9001188Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9001858Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9002619Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9003131Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9003631Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9004096Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9004625Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9005184Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9005743Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9006302Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9006858Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9007416Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9007971Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9008542Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9009114Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9009658Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9010120Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9010560Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9011013Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9011456Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9011922Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9012483Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9013007Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9013516Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9014074Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9014623Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9015073Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9015524Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9015985Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9016420Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9016844Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9017291Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9017765Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9018212Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9018635Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9019096Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9019653Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9020226Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9020781Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9021326Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9021783Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9022237Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9022674Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9023115Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9023608Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9024167Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9024727Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9025286Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9025842Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9026397Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9026976Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9027545Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9028103Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9028660Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9029234Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9029792Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9030347Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9030903Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9031472Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9032029Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9032584Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9033142Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9033719Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9034288Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9034849Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9035412Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9035961Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9036447Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9036882Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9037405Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9037961Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9038536Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9039093Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9039651Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9040208Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9040764Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9041337Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9041892Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9042451Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9043009Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9043569Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9043998Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9044424Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9044860Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9045301Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9045797Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9046355Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9046875Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9047300Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9047737Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9048163Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9048624Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9049185Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9049743Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9050315Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9050834Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9051270Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9051713Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9052186Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9052746Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9053334Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9053796Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9054235Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9054688Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9055230Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9055795Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9056354Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9056930Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9057492Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9058052Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9058611Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9059194Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9059755Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9060282Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9060713Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9061167Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9061627Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9062066Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9062502Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9063031Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9063619Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9064214Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9064773Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9065331Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9065889Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9066465Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9067025Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9067584Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9068133Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9068609Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9069046Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9069474Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9069912Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9070361Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9070892Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9071440Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9071897Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9072333Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9072772Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9073364Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9073924Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9074488Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9075045Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9075624Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9076183Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9076739Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9077297Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9077871Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9078432Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9078996Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9079557Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9080119Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9080677Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9081241Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9081802Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9082379Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9082946Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9083541Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9084072Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9084508Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9084989Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9085550Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9086110Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9086668Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9087245Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9087804Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9088362Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9088922Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9089488Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9090051Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9090574Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9091039Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9091599Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9092187Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9092745Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9093316Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9093767Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9094231Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9094665Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9095191Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9095733Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9096199Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9096635Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9097067Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9097595Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9098145Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9098602Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9099041Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9099464Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9099837Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9100214Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9100690Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9101162Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9101600Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9102050Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9102510Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9103059Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9103571Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9104032Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9104469Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9104920Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9105400Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9105841Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9106267Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9106709Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9107158Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9107598Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9108031Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9108558Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9109103Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9109553Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9110017Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9110443Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9110865Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9111311Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9111761Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9112210Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9112641Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9113168Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9113738Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9114205Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9114640Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9115076Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9115603Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9116142Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9116593Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9117028Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9117462Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9117994Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9118516Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9118979Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9119403Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9119823Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9120264Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9120717Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9121174Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9121594Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9122000Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9122384Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9122715Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9122992Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9123285Z E1204 11:27:05.455000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9123605Z [W1204 11:27:05.724444405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9123797Z 2025-12-04T11:45:25.9123943Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9124417Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9125040Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9125500Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9126149Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9126909Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9127441Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9127922Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9128363Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9128889Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9129451Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9130029Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9130589Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9131147Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9131706Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9132279Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9132837Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9133428Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9133973Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9134436Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9134873Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9135313Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9135755Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9136221Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9136792Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9137326Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9137789Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9138348Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9138895Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9139360Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9139810Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9140271Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9140705Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9141144Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9141592Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9142049Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9142486Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9142910Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9143421Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9143980Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9144538Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9145094Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9145639Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9146125Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9146562Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9146997Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9147439Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9147903Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9148477Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9149036Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9149591Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9150146Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9150718Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9151276Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9151831Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9152387Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9152946Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9153542Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9154097Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9154652Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9155235Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9155791Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9156351Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9156907Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9157477Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9158034Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9158591Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9159146Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9159702Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9160261Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9160714Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9161144Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9161663Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9162220Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9162778Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9163365Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9163921Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9164501Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9165069Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9165625Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9166180Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9166739Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9167312Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9167832Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9168258Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9168684Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9169140Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9169582Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9170047Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9170604Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9171125Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9171555Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9171980Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9172402Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9172863Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9173444Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9174029Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9174586Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9175111Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9175550Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9176010Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9176476Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9177037Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9177586Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9178057Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9178497Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9178930Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9179458Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9180021Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9180587Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9181146Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9181706Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9182267Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9182856Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9183454Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9184012Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9184535Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9184968Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9185438Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9185898Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9186333Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9186767Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9187315Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9187883Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9188447Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9189004Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9189568Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9190130Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9190688Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9191246Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9191805Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9192381Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9192840Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9193304Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9193729Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9194166Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9194630Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9195161Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9195710Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9196167Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9196627Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9197061Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9197588Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9198149Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9198711Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9199272Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9199829Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9200386Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9200943Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9201532Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9202095Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9202654Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9203220Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9203826Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9204388Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9204950Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9205508Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9206085Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9206644Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9207206Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9207767Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9208300Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9208734Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9209198Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9209756Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9210316Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9210905Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9211465Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9212026Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9212586Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9213163Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9213743Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9214304Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9214831Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9215311Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9215875Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9216435Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9216995Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9217542Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9217998Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9218440Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9218876Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9219405Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9219654Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9219857Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9220058Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9220258Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9220550Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9220789Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9220992Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9221193Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9221386Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9221546Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9221742Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9221962Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9222168Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9222362Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9222581Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9222789Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9222983Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9223202Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9223436Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9223630Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9223883Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9224168Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9224365Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9224562Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9224777Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9225006Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9225205Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9225406Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9225703Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9225931Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9226135Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9226332Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9226525Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9226721Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9226938Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9227142Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9227341Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9227542Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9227836Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9228079Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9228280Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9228478Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9228677Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9228971Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9229196Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9229398Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9229597Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9229796Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9230100Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9230297Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9230501Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9230693Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9230889Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9231106Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9231312Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9231511Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9231699Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9231881Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9232066Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9232203Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9232307Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9232432Z E1204 11:27:05.457000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9232486Z ('RERUN', {'yellow': True}) [1.6350s] [100%] 2025-12-04T11:45:25.9232838Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:06.112397183 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9232842Z 2025-12-04T11:45:25.9232987Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9233332Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9233629Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9233759Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9234238Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9234506Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9234733Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9234941Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9235142Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9235440Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9235673Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9235965Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9236197Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9236523Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9236756Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9237046Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9237278Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9237582Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9237802Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9238007Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9238202Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9238411Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9238622Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9238853Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9239145Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9239340Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9239574Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9239864Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9240084Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9240277Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9240495Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9240730Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9240924Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9241119Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9241337Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9241543Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9241753Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9241950Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9242179Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9242470Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9242711Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9243001Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9243219Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9243442Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9243637Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9243845Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9244047Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9244279Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9244571Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9244803Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9245122Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9245354Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9245646Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9245874Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9246180Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9246410Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9246702Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9246932Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9247236Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9247466Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9247754Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9247984Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9248277Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9248509Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9248801Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9249031Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9249336Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9249575Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9249864Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9250083Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9250285Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9250495Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9250784Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9251015Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9251306Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9251549Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9251839Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9252072Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9252363Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9252594Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9252884Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9253113Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9253427Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9253638Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9253849Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9254045Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9254251Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9254450Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9254681Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9254985Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9255181Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9255378Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9255572Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9255782Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9256016Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9256307Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9256539Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9256831Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9257028Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9257234Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9257437Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9257673Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9257993Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9258217Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9258419Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9258620Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9258820Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9259125Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9259358Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9259650Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9259883Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9260189Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9260425Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9260719Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9260951Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9261248Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9261445Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9261641Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9261864Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9262066Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9262288Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9262487Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9262784Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9263017Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9263351Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9263585Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9263877Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9264109Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9264417Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9264653Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9264945Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9265169Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9265372Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9265575Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9265767Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9265976Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9266176Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9266467Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9266715Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9266916Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9267115Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9267316Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9267623Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9267858Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9268150Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9268382Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9268673Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9268917Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9269209Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9269440Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9269733Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9269969Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9270260Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9270491Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9270783Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9271040Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9271331Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9271563Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9271855Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9272104Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9272398Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9272594Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9272792Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9273042Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9273374Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9273605Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9273896Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9274129Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9274423Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9274657Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9274950Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9275187Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9275508Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9275707Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9275940Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9276231Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9276477Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9276770Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9276988Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9277189Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9277388Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9277603Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9277893Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9278108Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9278307Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9278512Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9278714Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9279008Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9279230Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9279432Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9279656Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9279847Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9279995Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9280193Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9280414Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9280632Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9280830Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9281052Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9281258Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9281454Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9281685Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9281891Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9282086Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9282309Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9282516Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9282715Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9285123Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9285347Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9285552Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9285750Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9286000Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9286295Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9286509Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9286713Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9286913Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9287121Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9287317Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9287528Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9287729Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9287945Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9288146Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9288443Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9288655Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9288855Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9289056Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9289256Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9289549Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9289761Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9289963Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9290185Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9290385Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9290679Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9290875Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9291078Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9291286Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9291483Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9291696Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9291900Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9292109Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9292298Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9292479Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9292651Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9292781Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9292883Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9293013Z E1204 11:27:06.845000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9293171Z [W1204 11:27:06.114716170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9293176Z 2025-12-04T11:45:25.9293350Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9293647Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9293944Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9294095Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9294598Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9294854Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9295083Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9295309Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9295510Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9295803Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9296041Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9296333Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9296582Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9296872Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9297102Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9297392Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9297627Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9297917Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9298135Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9298342Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9298553Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9298771Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9298971Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9299201Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9299493Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9299699Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9299931Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9300222Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9300439Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9300646Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9300865Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9301069Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9301263Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9301459Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9301679Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9301883Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9302079Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9302271Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9302503Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9302822Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9303055Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9303383Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9303602Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9303808Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9304019Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9304227Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9304424Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9304656Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9304960Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9305194Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9305489Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9305718Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9306010Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9306241Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9306532Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9306762Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9307051Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9307318Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9307610Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9307841Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9308131Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9308373Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9308663Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9308893Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9309182Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9309425Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9309715Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9309948Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9310239Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9310462Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9310663Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9310858Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9311147Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9311378Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9311691Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9311921Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9312212Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9312443Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9312747Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9312976Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9313296Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9313526Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9313833Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9314028Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9314223Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9314418Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9314626Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9314830Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9315061Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9315350Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9315546Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9315755Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9315972Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9316166Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9316397Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9316686Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9316935Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9317228Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9317421Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9317629Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9317832Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9318078Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9318370Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9318592Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9318794Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9318994Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9319196Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9319490Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9319724Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9320014Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9320272Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9320564Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9320796Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9321087Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9321331Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9321625Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9321824Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9322019Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9322254Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9322455Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9322653Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9322852Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9323144Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9323408Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9323700Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9323932Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9324223Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9324477Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9324783Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9325016Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9325310Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9325543Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9325745Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9325944Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9326135Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9326345Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9326560Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9326854Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9327074Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9327276Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9327473Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9327675Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9327967Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9328201Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9328497Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9328743Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9329047Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9329277Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9329570Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9329813Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9330108Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9330339Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9330629Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9330873Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9331167Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9331400Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9331693Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9331925Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9332220Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9332453Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9332745Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9332942Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9333162Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9333432Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9333723Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9333955Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9334260Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9334493Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9334784Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9335019Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9335310Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9335556Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9335850Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9336045Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9336279Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9336573Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9336806Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9337097Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9337310Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9337528Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9337740Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9337942Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9338237Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9338450Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9338666Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9338863Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9339063Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9339354Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9339595Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9339797Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9339994Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9340186Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9340335Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9340533Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9340755Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9340960Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9341154Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9341374Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9341590Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9341796Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9342017Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9342222Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9342417Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9342641Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9342860Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9343058Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9343269Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9343485Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9343701Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9343900Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9344099Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9344392Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9344604Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9344808Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9345007Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9345198Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9345393Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9345604Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9345831Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9346028Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9346227Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9346520Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9346745Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9346961Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9347160Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9347360Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9347650Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9347876Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9348077Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9348275Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9348474Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9348765Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9348963Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9349164Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9349354Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9349552Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9349765Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9349982Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9350190Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9350379Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9350558Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9350728Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9350856Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9350970Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9351096Z E1204 11:27:06.848000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9351252Z [W1204 11:27:06.116845329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9351254Z 2025-12-04T11:45:25.9351398Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9351693Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9352004Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9352136Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9352615Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9352867Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9353094Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9353331Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9353530Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9353822Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9354073Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9354379Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9354610Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9354901Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9355133Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9355442Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9355674Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9355963Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9356184Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9356414Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9356610Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9356816Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9357013Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9357242Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9357536Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9357731Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9357961Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9358250Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9358493Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9358689Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9358908Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9359111Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9359305Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9359510Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9359729Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9359932Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9360126Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9360320Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9360566Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9360860Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9361091Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9361383Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9361603Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9361808Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9362004Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9362211Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9362409Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9362664Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9362955Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9363187Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9363500Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9363748Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9364038Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9364267Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9364555Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9364802Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9365092Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9365324Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9365616Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9365847Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9366136Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9366366Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9366656Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9366886Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9367205Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9367435Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9367724Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9367955Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9368257Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9368475Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9368678Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9368872Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9369177Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9369408Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9369698Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9369927Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9370218Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9370455Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9370744Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9370974Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9371262Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9371532Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9371824Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9372019Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9372216Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9372424Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9372633Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9372833Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9373062Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9373376Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9373585Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9373780Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9373973Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9374167Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9374397Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9374690Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9374922Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9375213Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9375410Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9375631Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9375846Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9376078Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9376371Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9376590Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9376806Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9377005Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9377207Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9377502Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9377748Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9378041Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9378272Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9378563Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9378796Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9379087Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9379320Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9379616Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9379816Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9380037Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9380259Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9380463Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9380661Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9380862Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9381167Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9381401Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9381692Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9381927Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9382234Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9382466Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9382757Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9382989Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9383320Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9383541Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9383742Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9383941Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9384132Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9384373Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9384572Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9384863Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9385081Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9385283Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9385495Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9385694Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9385986Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9386217Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9386526Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9386760Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9387051Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9387284Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9387577Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9387809Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9388100Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9388332Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9388642Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9388883Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9389177Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9389408Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9389714Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9389946Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9390239Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9390473Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9390762Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9390973Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9391169Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9391405Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9391697Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9391934Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9392225Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9392456Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9392748Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9392989Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9393325Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9393557Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9393850Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9394049Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9394294Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9394588Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9394818Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9395111Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9395341Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9395541Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9395739Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9395940Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9396236Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9396448Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9396650Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9396848Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9397047Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9397364Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9397584Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9397784Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9397980Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9398172Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9398333Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9398531Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9398752Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9398956Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9399151Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9399382Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9399587Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9399780Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9399999Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9400204Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9400401Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9400622Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9400826Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9401022Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9401215Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9401451Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9401651Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9401850Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9402050Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9402341Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9402564Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9402766Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9402965Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9403155Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9403391Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9403605Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9403806Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9404004Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9404202Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9404498Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9404709Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9404910Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9405109Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9405309Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9405635Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9405846Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9406046Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9406243Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9406442Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9406748Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9406943Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9407144Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9407333Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9407545Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9407759Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9407964Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9408162Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9408352Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9408535Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9408706Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9408832Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9408935Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9409062Z E1204 11:27:06.850000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9409216Z [W1204 11:27:06.157278535 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9409218Z 2025-12-04T11:45:25.9409362Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9409682Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9409978Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9410106Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9410585Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9410849Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9411073Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9411281Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9411479Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9411786Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9412022Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9412312Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9412545Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9412837Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9413068Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9413491Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9413723Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9414029Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9414260Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9414469Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9414665Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9414873Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9415085Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9415316Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9415608Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9415803Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9416037Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9416343Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9416566Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9416764Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9416984Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9417192Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9417387Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9417581Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9417798Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9418003Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9418210Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9418415Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9418647Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9418938Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9419168Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9419476Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9419694Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9419897Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9420093Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9420314Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9420512Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9420745Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9421034Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9421266Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9421558Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9421789Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9422080Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9422309Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9422623Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9422853Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9423144Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9423406Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9423715Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9423948Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9424238Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9424470Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9424774Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9425005Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9425293Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9425523Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9425812Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9426047Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9426338Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9426556Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9426757Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9426979Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9427269Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9427501Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9427790Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9428032Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9428324Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9428557Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9428847Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9429088Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9429380Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9429609Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9429899Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9430095Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9430293Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9430488Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9430694Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9430893Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9431123Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9431439Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9431634Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9431828Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9432025Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9432220Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9432461Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9432751Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9432987Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9433310Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9433522Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9433728Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9433928Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9434161Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9434457Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9434678Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9434878Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9435077Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9435277Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9435603Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9435838Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9436130Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9436363Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9436666Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9436902Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9437197Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9437428Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9437739Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9437935Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9438132Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9438353Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9438554Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9438758Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9438959Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9439253Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9439486Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9439779Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9440038Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9440333Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9440565Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9440855Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9441102Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9441395Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9441616Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9441818Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9442028Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9442221Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9442431Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9442631Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9442922Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9443145Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9443368Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9443567Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9443770Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9444060Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9444323Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9444616Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9444849Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9445139Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9445387Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9445680Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9445910Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9446202Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9446449Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9446740Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9446973Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9447265Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9447500Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9447790Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9448023Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9448314Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9448559Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9448859Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9449056Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9449254Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9449486Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9449789Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9450020Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9450311Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9450543Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9450848Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9451083Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9451373Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9451605Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9451902Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9452099Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9452331Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9452622Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9452874Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9453176Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9453427Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9453628Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9453826Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9454046Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9454341Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9454555Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9454758Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9454974Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9455174Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9455467Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9455687Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9455887Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9456088Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9456281Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9456430Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9456626Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9456847Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9457067Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9457275Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9457495Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9457700Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9457897Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9458127Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9458333Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9458528Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9458750Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9458955Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9459163Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9459359Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9459571Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9459772Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9459968Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9460170Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9460463Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9460674Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9460876Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9461073Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9461288Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9461483Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9461694Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9461895Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9462093Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9462306Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9462599Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9462810Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9463011Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9463222Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9463463Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9463756Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9463968Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9464168Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9464369Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9464568Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9464861Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9465056Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9465257Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9465475Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9465671Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9465884Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9466088Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9466283Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9466487Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9466669Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9466838Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9466966Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9467067Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9467193Z E1204 11:27:06.890000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9467370Z [W1204 11:27:06.159384025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9467372Z 2025-12-04T11:45:25.9467517Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9467812Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9468105Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9468234Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9468713Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9468965Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9469190Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9469405Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9469615Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9469907Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9470142Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9470432Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9470676Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9470969Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9471199Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9471490Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9471732Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9472022Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9472244Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9472450Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9472649Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9472856Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9473056Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9473318Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9473608Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9473821Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9474063Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9474353Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9474573Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9474769Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9475001Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9475205Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9475399Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9475592Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9475809Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9476029Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9476222Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9476415Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9476647Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9476942Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9477174Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9477465Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9477681Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9477884Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9478103Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9478310Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9478506Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9478738Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9479028Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9479273Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9479563Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9479792Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9480083Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9480326Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9480616Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9480847Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9481135Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9481367Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9481658Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9481888Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9482178Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9482429Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9482719Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9482950Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9483240Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9483524Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9483817Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9484048Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9484337Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9484569Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9484771Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9484965Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9485256Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9485487Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9485780Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9486010Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9486303Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9486532Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9486849Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9487079Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9487367Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9487598Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9487898Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9488094Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9488287Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9488484Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9488691Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9488903Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9489135Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9489423Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9489618Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9489811Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9490010Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9490202Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9490433Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9490722Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9490970Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9491277Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9491471Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9491677Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9491876Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9492122Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9492415Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9492634Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9492835Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9493044Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9493248Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9493560Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9493793Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9494085Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9494319Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9494612Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9494843Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9495136Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9495395Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9495690Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9495888Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9496083Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9496302Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9496517Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9496715Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9496914Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9497206Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9497454Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9497745Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9497979Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9498269Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9498503Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9498797Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9499029Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9499321Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9499541Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9499776Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9499973Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9500166Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9500375Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9500578Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9500881Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9501100Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9501301Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9501498Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9501713Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9502005Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9502237Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9502531Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9502764Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9503058Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9503317Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9503612Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9503843Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9504164Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9504398Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9504688Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9504922Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9505229Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9505462Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9505754Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9505985Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9506296Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9506527Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9506819Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9507015Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9507217Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9507453Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9507744Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9507975Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9508267Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9508526Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9508821Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9509052Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9509345Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9509593Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9509891Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9510087Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9510319Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9510627Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9510859Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9511151Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9511363Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9511567Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9511766Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9511967Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9512261Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9512473Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9512688Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9512893Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9513095Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9513419Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9513638Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9513856Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9514053Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9514245Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9514394Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9514590Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9514824Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9515029Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9515223Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9515448Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9515656Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9515854Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9516079Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9516286Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9516490Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9516714Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9516960Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9517166Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9517364Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9517581Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9517784Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9517999Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9518199Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9518494Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9518707Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9518923Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9519124Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9519314Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9519511Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9519724Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9519933Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9520135Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9520342Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9520637Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9520849Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9521083Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9521283Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9521487Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9521780Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9521996Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9522213Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9522411Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9522610Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9522901Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9523109Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9523336Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9523531Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9523726Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9523944Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9524155Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9524354Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9524547Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9524728Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9524901Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9525046Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9525154Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9525295Z E1204 11:27:06.892000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9525452Z [W1204 11:27:06.161481284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9525454Z 2025-12-04T11:45:25.9525600Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9525894Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9526205Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9526335Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9526813Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9527069Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9527310Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9527519Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9527719Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9528022Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9528262Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9528563Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9528802Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9529095Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9529329Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9529642Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9529880Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9530172Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9530397Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9530624Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9530823Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9531037Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9531238Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9531477Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9531790Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9531993Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9532225Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9532515Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9532737Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9532932Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9533154Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9533394Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9533593Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9533803Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9534038Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9534241Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9534436Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9534632Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9534878Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9535173Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9535405Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9535696Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9535931Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9536135Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9536331Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9536537Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9536734Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9536968Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9537261Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9537494Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9537786Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9538028Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9538327Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9538558Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9538847Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9539077Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9539379Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9539610Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9539902Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9540131Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9540432Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9540665Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9540957Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9541188Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9541481Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9541712Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9542003Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9542235Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9542547Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9542766Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9542966Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9543161Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9543481Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9543732Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9544022Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9544255Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9544548Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9544796Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9545086Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9545317Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9545606Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9545840Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9546129Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9546323Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9546517Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9546727Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9546949Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9547147Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9547378Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9547669Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9547881Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9548077Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9548272Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9548467Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9548696Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9549005Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9549240Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9549529Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9549727Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9549934Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9550136Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9550367Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9550658Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9550879Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9551102Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9551301Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9551502Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9551797Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9552029Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9552333Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9552566Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9552858Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9553094Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9553428Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9553662Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9553955Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9554153Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9554354Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9554575Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9554777Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9554976Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9555175Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9555497Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9555732Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9556027Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9556259Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9556568Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9556799Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9557091Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9557324Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9557630Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9557851Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9558054Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9558253Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9558445Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9558659Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9558861Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9559158Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9559378Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9559579Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9559799Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9559998Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9560293Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9560524Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9560833Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9561068Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9561360Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9561591Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9561897Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9562130Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9562424Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9562656Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9562953Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9563187Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9563512Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9563743Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9564036Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9564303Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9564595Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9564829Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9565119Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9565333Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9565533Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9565768Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9566059Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9566304Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9566596Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9566827Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9567119Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9567351Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9567643Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9567877Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9568171Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9568369Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9568624Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9568918Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9569152Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9569442Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9569667Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9569868Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9570069Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9570270Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9570564Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9570791Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9570992Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9571190Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9571391Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9571687Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9571907Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9572110Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9572309Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9572501Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9572662Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9572868Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9573090Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9573328Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9573524Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9573761Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9573972Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9574169Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9574388Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9574597Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9574808Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9575030Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9575234Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9575431Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9575625Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9575840Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9576044Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9576243Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9576445Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9576737Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9576984Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9577186Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9577387Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9577578Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9577772Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9577997Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9578198Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9578399Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9578599Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9578893Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9579119Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9579322Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9579521Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9579721Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9580020Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9580233Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9580436Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9580632Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9580833Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9581153Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9581348Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9581550Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9581740Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9581934Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9582161Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9582368Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9582565Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9582754Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9582934Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9583118Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9583246Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9583369Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9583497Z E1204 11:27:06.894000 992188 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9583539Z FAILED [1.5246s] [100%] 2025-12-04T11:45:25.9583542Z 2025-12-04T11:45:25.9583602Z ==================================== RERUNS ==================================== 2025-12-04T11:45:25.9583765Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.9583818Z Traceback (most recent call last): 2025-12-04T11:45:25.9583984Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9584030Z method(*args, **kwargs) 2025-12-04T11:45:25.9584185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9584227Z method(*args, **kwargs) 2025-12-04T11:45:25.9584377Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.9584417Z with policy(): 2025-12-04T11:45:25.9584571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.9584614Z raise RuntimeError(msg) 2025-12-04T11:45:25.9585031Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:25.9585060Z 2025-12-04T11:45:25.9585139Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.9585413Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.9585416Z 2025-12-04T11:45:25.9585506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.9585588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9585633Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9585693Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9586266Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9586370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9586409Z graph_break [] 2025-12-04T11:45:25.9586476Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9586553Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9587051Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.9587118Z current_size = base.storage().size() 2025-12-04T11:45:25.9587159Z Autotune Choices Stats: 2025-12-04T11:45:25.9587539Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:25.9587607Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9587661Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9587784Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9589563Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9589803Z triton_mm_33 0.0092 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9590036Z triton_mm_29 0.0106 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9590261Z triton_mm_30 0.0107 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9590501Z triton_mm_16 0.0109 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9590746Z triton_mm_22 0.0111 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9590971Z triton_mm_21 0.0113 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9591200Z triton_mm_23 0.0119 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9591443Z triton_mm_15 0.0122 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9591670Z triton_mm_31 0.0125 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9591807Z SingleProcess AUTOTUNE benchmarking takes 0.1607 seconds and 1.1908 seconds precompiling for 33 choices 2025-12-04T11:45:25.9591967Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.9592015Z Traceback (most recent call last): 2025-12-04T11:45:25.9592177Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9592232Z method(*args, **kwargs) 2025-12-04T11:45:25.9592386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9592428Z method(*args, **kwargs) 2025-12-04T11:45:25.9592578Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.9592617Z with policy(): 2025-12-04T11:45:25.9592768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.9592809Z raise RuntimeError(msg) 2025-12-04T11:45:25.9593219Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:25.9593224Z 2025-12-04T11:45:25.9593344Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.9593624Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.9593627Z 2025-12-04T11:45:25.9593715Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.9593792Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9593835Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9593895Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9594450Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9594582Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9594620Z graph_break [] 2025-12-04T11:45:25.9594686Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9594761Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9595251Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.9595302Z current_size = base.storage().size() 2025-12-04T11:45:25.9595343Z Autotune Choices Stats: 2025-12-04T11:45:25.9595729Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:25.9595795Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9595848Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9595970Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9596208Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9596462Z triton_mm_33 0.0092 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9596689Z triton_mm_29 0.0106 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9596913Z triton_mm_30 0.0107 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9597138Z triton_mm_16 0.0109 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9597365Z triton_mm_22 0.0111 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9597587Z triton_mm_21 0.0113 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9597812Z triton_mm_23 0.0119 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9598035Z triton_mm_15 0.0122 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9598285Z triton_mm_31 0.0125 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9598418Z SingleProcess AUTOTUNE benchmarking takes 0.1607 seconds and 1.1908 seconds precompiling for 33 choices 2025-12-04T11:45:25.9598493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9598536Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9598592Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9598696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9599197Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9599240Z graph_break [] 2025-12-04T11:45:25.9599303Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9599377Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9599418Z Autotune Choices Stats: 2025-12-04T11:45:25.9599786Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008359000086784363, "best_triton_pos": 0} 2025-12-04T11:45:25.9599851Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9599917Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9600038Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9600275Z triton_mm_72 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9600505Z triton_mm_71 0.0092 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9600548Z _scaled_mm 0.0094 ms 88.9% 2025-12-04T11:45:25.9600773Z triton_mm_54 0.0112 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9600998Z triton_mm_60 0.0113 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9601223Z triton_mm_67 0.0114 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9601446Z triton_mm_68 0.0114 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9601670Z triton_mm_59 0.0116 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9601920Z triton_mm_61 0.0118 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9602146Z triton_mm_53 0.0121 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9602278Z SingleProcess AUTOTUNE benchmarking takes 0.2420 seconds and 0.7899 seconds precompiling for 39 choices 2025-12-04T11:45:25.9602331Z =================================== FAILURES =================================== 2025-12-04T11:45:25.9602494Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:25.9602539Z Traceback (most recent call last): 2025-12-04T11:45:25.9602700Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9602740Z method(*args, **kwargs) 2025-12-04T11:45:25.9602906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:25.9602947Z method(*args, **kwargs) 2025-12-04T11:45:25.9603101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:25.9603138Z with policy(): 2025-12-04T11:45:25.9603382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:25.9603423Z raise RuntimeError(msg) 2025-12-04T11:45:25.9603834Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.9603853Z 2025-12-04T11:45:25.9603929Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.9604204Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.9604206Z 2025-12-04T11:45:25.9604295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.9604367Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9604412Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9604468Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9605021Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9605121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9605160Z graph_break [] 2025-12-04T11:45:25.9605223Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9605296Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9605784Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:25.9605847Z current_size = base.storage().size() 2025-12-04T11:45:25.9605889Z Autotune Choices Stats: 2025-12-04T11:45:25.9606270Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:25.9606337Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9606387Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9606510Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9606746Z triton_mm_34 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9606994Z triton_mm_33 0.0092 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9607220Z triton_mm_29 0.0106 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9607445Z triton_mm_30 0.0107 ms 82.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9607668Z triton_mm_16 0.0109 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9607903Z triton_mm_22 0.0111 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9608126Z triton_mm_21 0.0113 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9608352Z triton_mm_23 0.0119 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9608579Z triton_mm_15 0.0122 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9608807Z triton_mm_31 0.0125 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9608937Z SingleProcess AUTOTUNE benchmarking takes 0.1607 seconds and 1.1908 seconds precompiling for 33 choices 2025-12-04T11:45:25.9609012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9609053Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9609109Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9609208Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9609704Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9609754Z graph_break [] 2025-12-04T11:45:25.9609817Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9609890Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9609931Z Autotune Choices Stats: 2025-12-04T11:45:25.9610294Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008359000086784363, "best_triton_pos": 0} 2025-12-04T11:45:25.9610362Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9610411Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9610549Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9610784Z triton_mm_72 0.0084 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9611018Z triton_mm_71 0.0092 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9611060Z _scaled_mm 0.0094 ms 88.9% 2025-12-04T11:45:25.9611285Z triton_mm_54 0.0112 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9611523Z triton_mm_60 0.0113 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9611746Z triton_mm_67 0.0114 ms 73.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9611969Z triton_mm_68 0.0114 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9612191Z triton_mm_59 0.0116 ms 72.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9612422Z triton_mm_61 0.0118 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9612648Z triton_mm_53 0.0121 ms 69.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9612777Z SingleProcess AUTOTUNE benchmarking takes 0.2420 seconds and 0.7899 seconds precompiling for 39 choices 2025-12-04T11:45:25.9612850Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:25.9612891Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:25.9612952Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:25.9613064Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:25.9613610Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:25.9613649Z graph_break [] 2025-12-04T11:45:25.9613712Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:25.9613784Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:25.9613828Z Autotune Choices Stats: 2025-12-04T11:45:25.9614206Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-12-04T11:45:25.9614276Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:25.9614325Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:25.9614445Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:25.9614681Z triton_mm_110 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9614910Z triton_mm_109 0.0092 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9615152Z triton_mm_105 0.0103 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9615379Z triton_mm_106 0.0108 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9615603Z triton_mm_92 0.0110 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9615823Z triton_mm_97 0.0111 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9616050Z triton_mm_98 0.0111 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:25.9616276Z triton_mm_99 0.0117 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9616500Z triton_mm_91 0.0118 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9616727Z triton_mm_107 0.0126 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:25.9616872Z SingleProcess AUTOTUNE benchmarking takes 0.2491 seconds and 0.6395 seconds precompiling for 39 choices 2025-12-04T11:45:25.9617077Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a296e5305c319eb1.xml - 2025-12-04T11:45:25.9617138Z =========================== short test summary info ============================ 2025-12-04T11:45:25.9617765Z FAILED [1.5246s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:25.9617770Z 2025-12-04T11:45:25.9617846Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:25.9618132Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.9618134Z 2025-12-04T11:45:25.9618223Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:25.9618286Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:25.9618356Z ================== 1 failed, 187 deselected, 2 rerun in 6.78s ================== 2025-12-04T11:45:25.9618394Z Got exit code 1 2025-12-04T11:45:25.9618435Z Retrying single test... 2025-12-04T11:45:25.9618579Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9f8072257805adf1.xml 2025-12-04T11:45:25.9618638Z ============================= test session starts ============================== 2025-12-04T11:45:25.9618763Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:25.9618805Z cachedir: .pytest_cache 2025-12-04T11:45:25.9618966Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:25.9619014Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:25.9619055Z configfile: pytest.ini 2025-12-04T11:45:25.9619220Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:25.9619296Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:25.9619565Z stepcurrent: skipping 110 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:25.9619609Z Running 1 items in this shard 2025-12-04T11:45:25.9619613Z 2025-12-04T11:45:25.9619964Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:16.014668567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9619966Z 2025-12-04T11:45:25.9620122Z [W1204 11:27:17.372282892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9620124Z 2025-12-04T11:45:25.9620440Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9620743Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9620891Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9621389Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9621648Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9621877Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9622099Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9622299Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9622596Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9622835Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9623146Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9623434Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9623727Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9623960Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9624259Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9624494Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9624787Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9625019Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9625310Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9625580Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9625872Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9626071Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9626302Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9626610Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9626807Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9627039Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9627331Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9627578Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9627869Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9628089Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9628295Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9628495Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9628709Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9628878Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9629060Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9629588Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/6t/c6t2x5i6uctdtle4hcsk2rcyyop7477fnwyvehlv2shtyuaatjjx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9629748Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9629978Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9630134Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9630281Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9630570Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9630709Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9630979Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9631121Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9631380Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9631537Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9631824Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9631960Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9632237Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9632430Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9632746Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9633045Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9633177Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9633701Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9633953Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9634212Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9634420Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9634621Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9634914Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9635163Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9635455Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9635687Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9635982Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9636232Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9636523Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9636755Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9637045Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9637281Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9637572Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9637804Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9638094Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9638291Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9638549Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9638839Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9639037Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9639266Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9639570Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9639803Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9640093Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9640315Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9640534Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9640737Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9640947Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9641116Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9641295Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9641399Z E1204 11:27:17.086000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9641557Z [W1204 11:27:17.395389048 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9641560Z 2025-12-04T11:45:25.9641869Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9642161Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9642290Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9642784Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9643061Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9643334Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9643541Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9643756Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9644048Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9644280Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9644573Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9644819Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9645109Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9645341Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9645633Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9645864Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9646157Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9646387Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9646678Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9646908Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9647229Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9647424Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9647656Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9647953Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9648160Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9648394Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9648685Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9648916Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9649218Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9649439Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9649646Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9649845Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9650056Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9650224Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9650403Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9650925Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/ul/culwnzaovofzhydoydyza7xxtefoovd3x5vkl5t5hdiixr6f5afg.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9651072Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9651298Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9651466Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9651613Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9651898Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9652029Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9652289Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9652442Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9652698Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9652852Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9653119Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9653284Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9653574Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9653766Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9654081Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9654373Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9654505Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9654987Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9655239Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9655466Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9655701Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9655901Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9656194Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9656427Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9656733Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9656967Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9657260Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9657491Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9657798Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9658033Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9658323Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9658555Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9658845Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9659081Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9659373Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9659571Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9659803Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9660117Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9660314Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9660544Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9660835Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9661079Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9661371Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9661590Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9661796Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9661997Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9662219Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9662386Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9662563Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9662665Z E1204 11:27:17.121000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9662820Z [W1204 11:27:17.396323785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9662823Z 2025-12-04T11:45:25.9663130Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9663460Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9663589Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9664065Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9664347Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9664571Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9664776Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9664974Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9665266Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9665515Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9665806Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9666037Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9666327Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9666576Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9666865Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9667095Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9667384Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9667619Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9667910Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9668140Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9668430Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9668648Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9668881Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9669170Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9669365Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9669598Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9669900Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9670132Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9670422Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9670641Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9670861Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9671061Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9671272Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9671438Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9671615Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9672144Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/wy/cwyhxrqkddzmghmnuj4hvl5g5cmipegyln2gxgzm24erxh33v53a.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9672291Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9672506Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9672660Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9672816Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9673113Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9673245Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9673524Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9673662Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9673939Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9674095Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9674363Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9674498Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9674775Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9674982Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9675295Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9675587Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9675718Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9676197Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9676450Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9676675Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9676879Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9677094Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9677398Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9677631Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9677923Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9678157Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9678457Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9678688Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9678977Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9679208Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9679509Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9679742Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9680031Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9680263Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9680557Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9680753Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9680983Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9681272Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9681480Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9681720Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9682011Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9682240Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9682529Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9682762Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9682969Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9683170Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9683411Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9683593Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9683772Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9683874Z E1204 11:27:17.129000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9684181Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9684473Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9684605Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9685082Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9685333Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9685556Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9685776Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9685987Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9686277Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9686511Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9686801Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9687048Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9687340Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9687571Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9687861Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9688104Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9688395Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9688625Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9688915Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9689150Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9689438Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9689636Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9689868Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9690171Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9690383Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9690616Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9690907Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9691136Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9691439Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9691657Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9691864Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9692064Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9692287Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9692455Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9692631Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9693156Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/e4/ce46a72ekvnxka62ueaol4spu6t6ojo2hgonrzg2phswjjjaqfgx.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9693333Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9693549Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9693702Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9693847Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9694130Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9694279Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9694552Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9694689Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9694943Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9695098Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9695366Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9695514Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9695788Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9695981Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9696294Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9696602Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9696733Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9697210Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9697461Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9697688Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9697894Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9698092Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9698384Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9698628Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9698932Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9699165Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9699454Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9699686Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9699987Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9700218Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9700507Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9700739Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9701042Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9701274Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9701565Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9701760Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9701994Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9702284Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9702479Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9702709Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9703015Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9703287Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9703577Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9703798Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9704003Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9704219Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9704428Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9704594Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9704771Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9704872Z E1204 11:27:17.134000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9705042Z [W1204 11:27:17.410363542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9705044Z 2025-12-04T11:45:25.9705353Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9705646Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9705775Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9706254Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9706507Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9706730Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9706935Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9707154Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9707456Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9707689Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9707979Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9708212Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9708516Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9708749Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9709039Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9709270Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9709572Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9709801Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9710090Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9710320Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9710616Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9710812Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9711042Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9711332Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9711542Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9711783Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9712071Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9712302Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9712591Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9712822Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9713031Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9713231Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9713477Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9713642Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9713837Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9714357Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/vm/cvmfw2cu3sn7jlq753yuf4uiakz2uu6vimioiewglubd2dhg5422.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9714502Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9714716Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9714873Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9715019Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9715305Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9715436Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9715693Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9715843Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9716112Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9716267Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9716534Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9716667Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9716954Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9717148Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9717465Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9717759Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9717899Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9718377Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9718628Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9718852Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9719061Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9719261Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9719552Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9719786Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9720080Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9720334Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9720624Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9720856Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9721145Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9721389Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9721679Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9721910Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9722202Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9722458Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9722748Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9722943Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9723176Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9723489Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9723688Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9723922Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9724214Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9724444Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9724767Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9724990Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9725195Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9725396Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9725621Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9725788Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9725967Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9726066Z E1204 11:27:17.146000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9726222Z [W1204 11:27:17.414809818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9726224Z 2025-12-04T11:45:25.9726530Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9726841Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9726971Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9727445Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9727701Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9727928Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9728137Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9728334Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9728625Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9728878Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9729171Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9729404Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9729693Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9729935Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9730224Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9730454Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9730745Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9730988Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9731277Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9731513Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9731805Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9732001Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9732233Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9732525Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9732720Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9732950Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9733301Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9733534Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9733824Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9734044Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9734264Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9734464Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9734676Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9734841Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9735019Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9735557Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] for benchmark choice TritonTemplateCaller(/tmp/tmp81cx65hh/e5/ce5u2q2qq3brrt6wbjdzmerjriiasieoloynfx47cjz34r4szh6d.py, ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4) 2025-12-04T11:45:25.9735706Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Traceback (most recent call last): 2025-12-04T11:45:25.9735922Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run 2025-12-04T11:45:25.9736076Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] result = self.fn(*self.args, **self.kwargs) 2025-12-04T11:45:25.9736223Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:25.9736512Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 3255, in precompile_with_captured_stdout 2025-12-04T11:45:25.9736645Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] choice.precompile() 2025-12-04T11:45:25.9736901Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py", line 2289, in precompile 2025-12-04T11:45:25.9737040Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self.bmreq.precompile() 2025-12-04T11:45:25.9737294Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/autotune_process.py", line 677, in precompile 2025-12-04T11:45:25.9737477Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] getattr(mod, self.kernel_name).precompile() 2025-12-04T11:45:25.9737746Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 444, in precompile 2025-12-04T11:45:25.9737879Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] self._make_launchers() 2025-12-04T11:45:25.9738155Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 613, in _make_launchers 2025-12-04T11:45:25.9738348Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] raise RuntimeError(f"No valid triton configs. {type(exc).__name__}: {exc}") 2025-12-04T11:45:25.9738683Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] RuntimeError: No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9738977Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9739106Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9739583Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9739847Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9740072Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9740276Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9740475Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9740773Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9741007Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9741298Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9741529Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9741842Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9742072Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9742362Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9742593Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9742895Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9743129Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #18 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9743446Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #19 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9743678Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #20 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9743985Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #21 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9744181Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #22 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9744413Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9744703Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9744899Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9745133Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9745425Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9745656Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9745945Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #29 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9746193Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9746398Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9746598Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #32 thread_run from /usr/local/src/conda/python-3.12.5/Modules/_threadmodule.c:1114 2025-12-04T11:45:25.9746806Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #33 pythread_wrapper from /usr/local/src/conda/python-3.12.5/Python/thread_pthread.h:237 2025-12-04T11:45:25.9746971Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #34 start_thread from ./nptl/./nptl/pthread_create.c:447 2025-12-04T11:45:25.9747165Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] #35 clone3 from ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:78 2025-12-04T11:45:25.9747265Z E1204 11:27:17.148000 998112 site-packages/torch/_inductor/select_algorithm.py:3323] [0/0] 2025-12-04T11:45:25.9747319Z ('RERUN', {'yellow': True}) [3.6363s] [100%] 2025-12-04T11:45:25.9747669Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:19.350281748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9747672Z 2025-12-04T11:45:25.9747819Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9748123Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9748419Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9748549Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9749025Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9749281Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9749505Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9749711Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9749909Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9750205Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9750462Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9750753Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9750985Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9751276Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9751519Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9751810Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9752042Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9752332Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9752565Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9752770Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9752966Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9753179Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9753417Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9753655Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9753948Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9754142Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9754374Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9754680Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9754918Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9755114Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9755333Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9755538Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9755748Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9755942Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9756158Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9756362Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9756555Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9756767Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9756997Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9757291Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9757522Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9757814Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9758033Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9758236Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9758431Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9758638Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9758850Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9759103Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9759394Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9759628Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9759931Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9760162Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9760451Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9760682Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9760971Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9761214Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9761503Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9761733Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9762025Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9762258Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9762547Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9762777Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9763065Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9763358Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9763647Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9763877Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9764171Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9764418Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9764708Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9764926Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9765126Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9765334Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9765627Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9765857Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9766146Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9766377Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9766671Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9766902Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9767192Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9767422Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9767736Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9767966Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9768255Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9768450Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9768648Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9768855Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9769062Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9769261Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9769490Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9769793Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9769987Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9770181Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9770375Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9770569Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9770802Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9771093Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9771327Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9771619Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9771838Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9772056Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9772258Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9772491Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9772783Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9773018Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9773220Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9773455Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9773657Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9773951Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9774206Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9774499Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9774732Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9775023Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9775261Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9775553Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9775789Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9776084Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9776309Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9776506Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9776729Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9776932Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9777130Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9777346Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9777638Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9777871Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9778166Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9778417Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9778709Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9778942Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9779232Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9779468Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9779761Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9779981Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9780184Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9780384Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9780600Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9780810Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9781010Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9781303Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9781524Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9781737Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9781938Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9782139Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9782432Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9782677Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9782972Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9783204Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9783533Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9783770Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9784064Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9784297Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9784593Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9784842Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9785149Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9785385Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9785678Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9785910Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9786219Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9786454Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9786744Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9786977Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9787291Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9787488Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9787688Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9787923Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9788220Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9788454Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9788748Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9788980Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9789286Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9789530Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9789821Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9790055Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9790348Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9790561Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9790796Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9791087Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9791321Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9791630Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9791847Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9792049Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9792249Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9792452Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9792747Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9792960Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9793162Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9793393Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9793611Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9793918Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9794140Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9794341Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9794539Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9794748Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9794900Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9795097Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9795318Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9795522Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9795737Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9795959Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9796163Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9796357Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9796576Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9796786Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9796985Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9797207Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9797413Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9797611Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9797830Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9798043Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9798244Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9798441Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9798641Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9798946Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9799159Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9799363Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9799560Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9799763Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9799959Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9800173Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9800372Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9800569Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9800771Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9801064Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9801277Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9801481Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9801680Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9801892Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9802198Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9802414Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9802614Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9802814Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9803031Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9803360Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9803556Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9803760Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9803968Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9804165Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9804379Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9804585Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9804785Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9804976Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9805157Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9805328Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9805455Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9805558Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9805686Z E1204 11:27:19.090000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9805847Z [W1204 11:27:19.359203149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9805861Z 2025-12-04T11:45:25.9806007Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9806312Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9806606Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9806735Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9807225Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9807481Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9807709Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9807914Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9808127Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9808420Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9808655Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9808946Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9809182Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9809475Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9809705Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9809998Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9810228Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9810541Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9810766Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9810971Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9811169Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9811388Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9811588Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9811819Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9812112Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9812307Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9812550Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9812840Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9813059Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9813284Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9813505Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9813712Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9813906Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9814103Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9814322Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9814545Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9814755Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9814950Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9815184Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9815476Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9815726Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9816016Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9816232Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9816438Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9816646Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9816857Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9817054Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9817288Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9817581Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9817815Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9818106Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9818337Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9818628Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9818870Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9819178Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9819409Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9819699Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9819936Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9820237Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9820468Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9820758Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9820989Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9821291Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9821520Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9821809Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9822038Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9822335Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9822569Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9822856Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9823078Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9823324Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9823536Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9823825Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9824055Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9824345Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9824589Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9824887Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9825119Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9825408Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9825655Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9825946Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9826177Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9826467Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9826667Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9826862Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9827058Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9827264Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9827462Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9827721Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9828011Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9828207Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9828401Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9828596Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9828801Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9829033Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9829324Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9829556Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9829860Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9830054Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9830261Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9830461Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9830695Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9830992Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9831213Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9831414Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9831613Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9831828Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9832132Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9832368Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9832663Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9832896Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9833202Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9833466Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9833761Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9833994Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9834304Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9834501Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9834699Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9834922Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9835125Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9835326Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9835525Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9835819Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9836051Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9836378Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9836615Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9836907Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9837142Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9837449Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9837683Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9837977Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9838197Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9838411Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9838613Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9838807Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9839016Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9839217Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9839513Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9839732Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9839935Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9840134Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9840337Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9840656Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9840891Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9841186Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9841420Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9841727Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9841960Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9842254Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9842487Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9842792Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9843026Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9843348Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9843582Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9843879Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9844114Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9844409Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9844642Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9844956Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9845203Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9845500Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9845697Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9845898Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9846148Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9846439Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9846673Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9846965Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9847214Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9847505Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9847738Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9848029Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9848264Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9848559Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9848754Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9848989Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9849282Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9849540Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9849832Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9850045Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9850248Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9850459Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9850662Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9850956Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9851167Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9851370Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9851584Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9851784Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9852075Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9852296Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9852499Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9852701Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9852894Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9853041Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9853238Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9853505Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9853726Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9853922Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9854142Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9854348Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9854545Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9854777Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9854984Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9855181Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9855401Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9855621Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9855819Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9856014Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9856226Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9856427Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9856630Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9856830Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9857124Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9857337Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9857539Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9857763Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9857954Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9858151Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9858364Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9858566Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9858776Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9858978Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9859271Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9859485Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9859699Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9859899Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9860097Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9860388Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9860602Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9860804Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9861003Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9861204Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9861495Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9861692Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9861924Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9862115Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9862310Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9862527Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9862732Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9862942Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9863134Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9863344Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9863516Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9863643Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9863765Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9863891Z E1204 11:27:19.092000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9864052Z [W1204 11:27:19.361377498 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9864054Z 2025-12-04T11:45:25.9864199Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9864493Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9864791Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9864923Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9865402Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9865654Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9865880Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9866117Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9866318Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9866610Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9866842Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9867156Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9867389Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9867679Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9867911Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9868213Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9868447Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9868738Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9868959Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9869164Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9869365Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9869572Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9869770Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9870002Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9870304Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9870512Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9870744Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9871037Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9871256Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9871463Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9871681Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9871884Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9872081Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9872275Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9872509Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9872714Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9872909Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9873103Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9873365Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9873664Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9873894Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9874185Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9874404Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9874638Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9874832Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9875038Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9875237Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9875469Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9875777Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9876008Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9876300Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9876531Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9876836Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9877065Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9877354Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9877585Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9877881Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9878113Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9878402Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9878631Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9878937Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9879182Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9879472Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9879703Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9879995Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9880243Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9880534Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9880766Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9881057Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9881301Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9881501Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9881697Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9881989Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9882223Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9882514Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9882745Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9883036Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9883308Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9883619Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9883854Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9884143Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9884377Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9884681Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9884876Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9885074Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9885269Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9885494Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9885692Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9885925Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9886218Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9886413Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9886611Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9886805Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9886999Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9887230Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9887525Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9887781Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9888074Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9888268Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9888473Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9891002Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9891244Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9891540Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9891763Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9891967Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9892183Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9892383Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9892681Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9892913Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9893211Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9893482Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9893774Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9894008Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9894320Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9894570Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9894863Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9895060Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9895256Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9895492Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9895694Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9895891Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9896091Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9896381Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9896635Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9896929Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9897163Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9897456Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9897690Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9897982Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9898214Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9898506Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9898750Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9898953Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9899153Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9899346Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9899557Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9899769Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9900062Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9900281Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9900483Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9900694Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9900894Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9901187Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9901424Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9901722Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9901957Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9902250Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9902484Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9902776Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9903041Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9903372Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9903606Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9903899Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9904155Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9904447Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9904679Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9904970Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9905217Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9905511Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9905744Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9906037Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9906237Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9906437Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9906671Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9906961Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9907194Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9907514Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9907745Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9908040Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9908272Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9908579Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9908814Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9909105Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9909302Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9909550Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9909844Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9910076Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9910367Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9910583Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9910787Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9910988Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9911190Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9911483Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9911712Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9911924Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9912123Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9912322Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9912615Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9912847Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9913050Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9913273Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9913467Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9913614Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9913829Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9914050Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9914256Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9914455Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9914678Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9914887Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9915081Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9915301Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9915509Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9915704Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9915953Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9916158Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9916354Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9916547Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9916762Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9916978Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9917178Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9917377Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9917670Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9917900Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9918101Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9918299Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9918488Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9918686Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9918902Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9919103Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9919302Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9919502Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9919795Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9920021Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9920232Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9920433Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9920633Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9920925Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9921149Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9921351Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9921546Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9921745Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9922036Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9922245Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9922447Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9922635Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9922830Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9923043Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9923277Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9923475Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9923667Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9923848Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9924032Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9924174Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9924278Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9924407Z E1204 11:27:19.094000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9924565Z [W1204 11:27:19.405035187 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9924568Z 2025-12-04T11:45:25.9924713Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9925006Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9925314Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9925445Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9925927Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9926195Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9926421Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9926627Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9926827Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9927118Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9927355Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9927645Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9927877Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9928166Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9928420Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9928709Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9928940Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9929233Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9929466Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9929672Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9929869Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9930076Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9930273Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9930519Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9930811Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9931006Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9931236Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9931531Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9931751Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9931945Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9932164Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9932368Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9932574Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9932777Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9932994Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9933200Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9933426Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9933641Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9933877Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9934167Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9934400Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9934693Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9934927Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9935130Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9935325Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9935530Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9935729Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9935963Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9936254Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9936486Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9936775Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9937032Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9937322Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9937553Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9937842Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9938083Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9938379Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9938612Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9938901Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9939144Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9939432Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9939662Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9939952Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9940185Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9940474Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9940707Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9940997Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9941239Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9941541Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9941761Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9941961Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9942156Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:25.9942458Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9942690Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9942981Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9943216Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9943553Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9943783Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9944074Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9944303Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9944596Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9944826Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9945117Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9945317Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9945512Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9945738Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9945944Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9946142Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9946372Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9946677Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9946874Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9947070Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9947266Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9947460Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9947707Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9947998Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9948228Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9948517Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9948713Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9948922Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9949121Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9949353Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9949644Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9949880Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9950097Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9950297Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9950495Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9950787Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9951032Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9951324Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9951557Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9951848Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9952097Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9952391Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9952621Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9952913Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9953111Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9953344Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9953563Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9953765Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9953962Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9954176Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9954487Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9954720Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9955011Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9955257Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9955551Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9955784Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9956073Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9956320Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9956611Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9956834Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9957035Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9957234Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9957427Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9957637Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9957836Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9958125Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9958345Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9958568Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9958766Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9958966Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9959263Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9959511Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9959803Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9960035Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9960328Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9960576Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9960871Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9961103Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9961397Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9961631Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9961925Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9962156Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9962446Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9962678Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9962993Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9963226Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9963548Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9963780Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9964094Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9964291Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9964487Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9964717Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9965008Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9965255Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9965546Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9965780Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9966070Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9966307Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9966599Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9966831Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9967120Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9967351Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9967585Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9967876Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9968107Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9968410Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9968629Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9968836Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9969035Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9969236Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9969539Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9969753Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9969955Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9970152Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9970353Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9970646Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9970871Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9971076Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9971273Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9971477Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9971637Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:25.9971833Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9972053Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9972259Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9972456Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9972686Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9972891Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9973089Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9973334Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9973557Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:25.9973752Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9973973Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:25.9974177Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9974372Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9974570Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9974783Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9974984Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9975182Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9975386Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9975707Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9975919Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9976119Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9976317Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9976509Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:25.9976720Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9976934Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9977136Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9977334Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9977534Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9977842Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9978055Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9978256Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9978453Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9978654Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9978948Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9979161Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:25.9979359Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:25.9979556Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:25.9979771Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9980078Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9980273Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:25.9980474Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:25.9980665Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:25.9980872Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:25.9981084Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:25.9981287Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:25.9981483Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:25.9981671Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:25.9981870Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:25.9982040Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:25.9982167Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:25.9982271Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:25.9982396Z E1204 11:27:19.138000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:25.9982551Z [W1204 11:27:19.407283005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:25.9982555Z 2025-12-04T11:45:25.9982699Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:25.9982994Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:25.9983326Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:25.9983457Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:25.9983951Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:25.9984219Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:25.9984446Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:25.9984652Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:25.9984852Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9985155Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9985389Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9985681Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9985913Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9986220Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9986450Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9986742Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9986976Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9987268Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9987488Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9987692Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9987889Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9988107Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9988317Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9988548Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9988839Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9989036Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9989280Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9989573Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9989790Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9989985Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9990201Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9990420Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9990614Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9990806Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9991023Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9991226Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9991424Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9991618Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:25.9991849Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9992138Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9992381Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9992682Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9992900Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9993104Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:25.9993351Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:25.9993576Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:25.9993776Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:25.9994008Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9994299Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9994544Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9994835Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9995065Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9995355Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9995585Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9995875Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9996108Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9996399Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9996628Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9996944Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9997175Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9997465Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9997693Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9998002Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9998232Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9998525Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9998755Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9999058Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9999289Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:25.9999578Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:25.9999798Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:25.9999999Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0000197Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0000487Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0000721Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0001014Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0001265Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0001555Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0001785Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0002075Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0002318Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0002609Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0002840Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0003133Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0003358Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0003553Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0003751Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0003956Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0004156Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0004389Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0004678Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0004874Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0005067Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0005262Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0005489Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0005721Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0006011Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0006240Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0006545Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0006740Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0006948Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0007148Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0007382Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0007694Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0007915Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0008116Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0008314Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0008515Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0008807Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0009040Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0009332Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0009563Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0009877Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0010110Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0010406Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0010636Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0010941Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0011138Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0011334Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0011555Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0011771Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0011972Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0012170Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0012465Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0012699Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0012993Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0013225Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0013562Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0013795Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0014101Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0014351Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0014644Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0014865Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0015068Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0015279Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0015472Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0015681Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0015880Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0016188Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0016407Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0016610Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0016808Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0017009Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0017308Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0017542Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0017834Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0018065Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0018372Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0018616Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0018910Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0019143Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0019451Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0019687Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0019978Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0020210Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0020500Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0020746Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0021038Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0021269Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0021560Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0021797Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0022095Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0022292Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0022487Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0022732Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0023034Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0023293Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0023584Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0023834Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0024127Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0024359Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0024651Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0024896Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0025190Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0025385Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0025617Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0025908Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0026143Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0026434Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0026651Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0026856Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0027067Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0027282Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0027575Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0027788Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0027991Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0028201Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0028406Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0028698Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0028921Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0029135Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0029333Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0029524Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0029671Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0029868Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0030089Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0030298Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0030496Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0030716Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0030921Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0031116Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0031377Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0031582Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0031776Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0031995Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0032201Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0032409Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0032602Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0032816Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0033016Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0033225Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0033458Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0033753Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0033965Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0034166Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0034472Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0034664Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0034862Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0035076Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0035277Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0035496Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0035712Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0036010Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0036223Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0036424Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0036636Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0036836Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0037128Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0037340Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0037557Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0037755Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0037954Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0038250Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0038444Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0038648Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0038838Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0039034Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0039249Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0039452Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0039661Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0039860Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0040040Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0040210Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0040336Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0040438Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0040565Z E1204 11:27:19.140000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0040733Z [W1204 11:27:19.409410364 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0040736Z 2025-12-04T11:45:26.0040880Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0041173Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0041473Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0041614Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0042096Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0042350Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0042579Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0042786Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0042987Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0043318Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0043551Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0043859Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0044104Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0044398Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0044630Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0044923Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0045172Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0045464Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0045684Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0045891Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0046114Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0046322Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0046520Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0046753Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0047042Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0047247Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0047479Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0047769Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0047987Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0048194Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0048421Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0048625Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0048819Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0049012Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0049244Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0049452Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0049647Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0049842Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0050072Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0050375Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0050607Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0050897Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0051116Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0051321Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0051517Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0051721Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0051921Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0052151Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0052465Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0052697Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0052986Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0053216Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0053568Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0053799Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0054087Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0054322Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0054626Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0054856Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0055145Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0055374Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0055662Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0055894Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0056186Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0056416Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0056711Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0056968Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0057256Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0057486Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0057775Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0058005Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0058206Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0058400Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0058691Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0058936Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0059230Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0059462Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0059750Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0059982Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0060275Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0060505Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0060794Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0061025Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0061340Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0061535Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0061730Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0061924Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0062131Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0062346Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0062577Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0062867Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0063060Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0063297Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0063496Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0063697Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0063929Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0064219Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0064454Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0064743Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0064937Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0065143Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0065361Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0065606Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0065901Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0066124Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0066323Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0066539Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0066739Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0067032Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0067264Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0067573Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0067807Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0068099Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0068335Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0068627Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0068862Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0069153Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0069351Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0069548Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0069797Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0070000Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0070199Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0070400Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0070695Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0070942Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0071234Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0071465Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0071759Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0072004Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0072297Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0072528Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0072821Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0073046Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0073282Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0073482Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0073673Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0073884Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0074115Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0074407Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0074628Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0074828Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0075029Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0075244Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0075537Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0075768Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0076060Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0076309Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0076599Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0076834Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0077124Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0077360Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0077654Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0077891Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0078187Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0078434Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0078742Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0078977Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0079268Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0079503Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0079806Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0080041Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0080334Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0080545Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0080743Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0080975Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0081270Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0081501Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0081797Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0082032Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0082326Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0082560Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0082869Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0083112Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0083438Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0083634Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0083867Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0084176Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0084411Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0084703Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0084921Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0085138Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0085339Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0085538Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0085832Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0086048Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0086249Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0086448Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0086647Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0086940Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0090690Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0090908Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0091105Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0091297Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0091446Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0091655Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0091877Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0092082Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0092277Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0092496Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0092715Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0092912Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0093132Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0093371Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0093565Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0093789Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0093994Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0094194Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0094388Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0094600Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0094815Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0095025Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0095228Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0095520Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0095732Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0095945Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0096144Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0096335Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0096529Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0096743Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0096957Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0097155Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0097355Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0097650Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0097863Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0098066Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0098263Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0098461Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0098755Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0098994Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0099195Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0099391Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0099592Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0099886Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0100092Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0100294Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0100482Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0100677Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0100888Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0101107Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0101303Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0101492Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0101673Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0101845Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0101973Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0102075Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0102201Z E1204 11:27:19.142000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0102253Z ('RERUN', {'yellow': True}) [1.6066s] [100%] 2025-12-04T11:45:26.0102685Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda [W1204 11:27:20.750137094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0102688Z 2025-12-04T11:45:26.0102832Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0103148Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0103477Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0103605Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0104085Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0104355Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0104580Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0104785Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0104983Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0105290Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0105523Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0105814Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0106046Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0106338Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0106572Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0106863Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0107094Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0107382Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0107628Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0107833Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0108029Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0108236Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0108445Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0108677Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0108970Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0109165Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0109395Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0109701Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0109922Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0110116Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0110334Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0110541Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0110738Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0110931Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0111150Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0111354Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0111559Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0111765Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0111996Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0112285Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0112516Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0112821Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0113039Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0113242Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0113474Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0113703Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0113904Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0114135Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0114427Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0114658Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0114955Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0115187Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0115478Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0115712Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0116034Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0116266Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0116556Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0116786Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0117089Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0117319Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0117608Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0117837Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0118131Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0118379Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0118669Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0118900Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0119188Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0119421Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0119709Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0119929Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0120130Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0120336Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0120641Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0120873Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0121164Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0121396Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0121695Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0121926Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0122215Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0122445Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0122745Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0122977Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0123302Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0123496Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0123695Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0123889Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0124097Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0124294Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0124523Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0124840Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0125036Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0125234Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0125427Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0125624Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0125875Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0126166Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0126399Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0126689Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0126901Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0127107Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0127310Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0127545Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0127836Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0128060Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0128263Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0128462Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0128661Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0128976Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0129208Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0129504Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0129737Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0130053Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0130288Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0130579Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0130812Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0131102Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0131313Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0131510Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0131728Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0131930Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0132132Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0132336Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0132627Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0132860Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0133151Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0133455Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0133749Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0133981Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0134273Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0134526Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0134825Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0135046Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0135247Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0135461Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0135653Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0135863Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0136062Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0136354Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0136577Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0136780Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0136979Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0137178Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0137469Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0137727Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0138020Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0138253Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0138546Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0138792Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0139085Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0139324Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0139616Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0139864Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0140156Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0140390Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0140683Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0140919Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0141210Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0141442Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0141738Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0141984Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0142291Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0142489Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0142684Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0142919Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0143222Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0143481Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0143774Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0144008Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0144321Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0144554Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0144846Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0145079Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0145375Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0145572Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0145805Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0146098Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0146343Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0146658Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0146874Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0147074Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0147275Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0147490Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0147785Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0147998Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0148198Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0148397Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0148608Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0150029Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0150255Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0150458Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0150687Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0150882Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0151030Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0151236Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0151460Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0151684Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0151893Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0152116Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0152326Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0152521Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0152746Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0152953Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0153147Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0153397Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0153603Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0153823Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0154018Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0154280Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0154483Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0154684Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0154884Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0155180Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0155395Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0155596Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0155796Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0156008Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0156218Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0156431Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0156635Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0156833Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0157034Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0157329Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0157545Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0157748Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0157946Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0158159Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0158471Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0158687Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0158889Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0159090Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0159293Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0159585Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0159782Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0159983Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0160184Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0160392Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0160604Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0160808Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0161006Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0161199Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0161380Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0161551Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0161676Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0161779Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0161905Z E1204 11:27:20.483000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0162074Z [W1204 11:27:20.752472520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0162076Z 2025-12-04T11:45:26.0162222Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0162532Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0162830Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0162959Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0163487Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0163744Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0163968Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0164174Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0164404Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0164700Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0164935Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0165230Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0165466Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0165762Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0166003Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0166294Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0166539Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0166831Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0167063Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0167271Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0167469Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0167681Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0167881Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0168117Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0168408Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0168615Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0168861Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0169152Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0169372Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0169567Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0169790Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0169993Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0170188Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0170382Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0170602Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0170838Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0171033Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0171238Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0171469Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0171761Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0171993Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0172286Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0172508Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0172713Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0172934Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0173142Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0173366Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0173597Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0173889Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0174124Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0174415Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0174647Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0174937Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0175190Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0175500Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0175730Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0176021Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0176251Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0176540Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0176771Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0177060Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0177302Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0177607Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0177840Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0178130Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0178365Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0178655Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0178888Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0179179Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0179418Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0179621Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0179818Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0180125Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0180355Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0180650Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0180882Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0181172Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0181405Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0181708Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0181948Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0182243Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0182473Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0182766Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0182961Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0183158Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0183385Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0183592Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0183805Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0184035Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0184341Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0184537Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0184737Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0184935Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0185130Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0185361Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0185651Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0185897Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0186197Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0186394Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0186600Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0186802Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0187041Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0187338Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0187559Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0187760Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0187959Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0188173Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0188476Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0188708Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0189002Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0189242Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0189535Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0189770Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0190061Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0190308Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0190610Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0190809Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0191007Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0191226Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0191430Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0191631Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0191833Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0192125Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0192370Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0192665Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0192905Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0193199Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0193460Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0193755Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0193993Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0194288Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0194508Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0194744Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0194943Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0195136Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0195347Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0195551Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0195849Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0196073Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0196274Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0196476Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0196687Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0196981Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0197227Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0197519Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0197752Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0198050Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0198284Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0198578Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0198813Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0199131Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0199364Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0199656Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0199887Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0200183Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0200414Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0200707Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0200942Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0201251Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0201483Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0201785Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0201985Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0202183Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0202415Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0202708Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0202938Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0205367Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0205651Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0205953Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0206188Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0206480Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0206718Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0207009Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0207206Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0207438Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0207747Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0207982Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0208290Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0208504Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0208707Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0208907Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0209107Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0209399Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0209612Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0209826Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0210033Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0210240Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0210535Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0210756Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0210959Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0211157Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0211350Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0211499Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0211694Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0211925Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0212131Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0212338Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0212560Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0212765Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0212964Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0213183Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0213419Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0213615Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0213835Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0214072Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0214269Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0214464Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0214678Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0214881Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0215081Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0215281Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0215573Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0215786Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0216001Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0216201Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0216392Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0216601Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0216813Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0217016Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0217215Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0217414Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0217707Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0217918Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0218129Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0218340Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0218540Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0218834Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0219046Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0219252Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0219451Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0219653Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0219946Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0220152Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0220353Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0220541Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0220748Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0220962Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0221169Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0221366Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0221558Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0221740Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0221909Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0222037Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0222152Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0222289Z E1204 11:27:20.485000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0222447Z [W1204 11:27:20.754641909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0222450Z 2025-12-04T11:45:26.0222597Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0222891Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0223189Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0223357Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0223844Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0224099Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0224342Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0224551Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0224772Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0225064Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0225298Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0225593Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0225826Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0226117Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0226354Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0226671Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0226901Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0227196Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0227414Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0227620Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0227816Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0228024Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0228223Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0228456Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0228760Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0228955Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0229203Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0229493Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0229713Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0229909Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0230128Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0230336Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0230531Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0230740Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0230974Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0231179Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0231375Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0231569Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0231801Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0232092Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0232323Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0232613Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0232842Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0233050Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0233274Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0233481Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0233680Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0233912Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0234205Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0234437Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0234727Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0234958Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0235279Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0235510Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0235801Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0236029Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0236321Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0236553Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0236842Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0237073Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0237379Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0237611Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0237915Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0238148Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0238440Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0238669Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0238961Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0239189Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0239492Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0239724Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0239924Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0240120Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0240411Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0240644Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0240934Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0241164Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0241453Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0241693Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0241982Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0242223Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0242517Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0242749Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0243040Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0243237Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0243475Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0243670Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0243900Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0244101Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0244331Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0244626Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0244823Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0245019Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0245216Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0245409Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0245639Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0245947Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0246178Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0246481Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0246676Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0246885Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0247090Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0247326Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0247618Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0247838Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0248051Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0248261Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0248461Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0248752Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0248984Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0249278Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0249513Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0249806Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0250037Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0250342Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0250572Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0250874Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0251071Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0251268Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0251489Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0251691Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0251895Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0252097Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0252411Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0252644Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0252937Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0253169Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0253493Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0253725Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0254017Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0254252Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0254563Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0254784Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0255003Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0255201Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0255392Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0255604Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0255806Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0256100Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0256320Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0256524Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0256748Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0256948Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0257240Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0257471Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0257763Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0257996Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0258289Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0258519Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0258827Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0259061Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0259365Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0259596Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0259888Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0260121Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0260413Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0260645Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0260935Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0261200Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0261495Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0261727Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0262019Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0262217Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0262414Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0262648Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0262941Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0263185Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0263513Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0263764Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0264054Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0264286Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0264580Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0264818Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0265109Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0265305Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0265563Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0265855Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0266089Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0266382Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0266599Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0266800Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0266998Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0267197Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0267488Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0267718Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0267918Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0268129Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0268330Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0268621Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0268844Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0269045Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0269242Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0269432Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0269592Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0269798Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0270018Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0270224Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0270418Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0270643Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0270849Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0271045Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0271264Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0271469Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0271673Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0271892Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0272109Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0272306Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0272503Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0272718Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0272922Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0273120Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0273346Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0273639Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0273879Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0274081Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0274279Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0274471Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0274665Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0274882Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0275084Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0275282Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0275481Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0275773Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0275999Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0276198Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0276409Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0276609Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0276901Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0277114Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0277317Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0277515Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0277713Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0278031Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0278226Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0278429Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0278618Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0278812Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0279028Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0279231Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0279428Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0279620Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0279802Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0279985Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0280115Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0280218Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0280352Z E1204 11:27:20.488000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0280509Z [W1204 11:27:20.795539448 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0280511Z 2025-12-04T11:45:26.0280654Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0280949Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0281246Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0281379Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0281856Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0282131Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0282359Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0282564Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0282764Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0283055Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0283315Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0283611Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0283843Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0284134Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0284379Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0284683Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0284914Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0285203Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0285424Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0285628Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0285825Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0286033Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0286246Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0286488Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0286781Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0286975Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0287204Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0287498Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0287715Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0287910Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0288130Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0288333Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0288543Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0288736Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0288965Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0289168Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0289364Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0289560Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0289790Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0290082Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0290313Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0290752Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0290969Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0291175Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0291369Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0291576Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0291776Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0292006Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0292298Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0292527Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0292830Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0293061Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0293417Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0293647Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0293937Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0294169Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0294458Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0294689Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0294980Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0295239Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0295532Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0295761Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0296051Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0296282Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0296575Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0296806Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0297095Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0297341Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0297633Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0297864Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0298064Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0298259Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0298550Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0298780Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0299071Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0299300Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0299613Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0299844Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0300136Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0300370Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0300663Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0300895Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0301183Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0301379Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0301586Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0301781Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0301997Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0302196Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0302429Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0302720Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0302915Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0303108Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0303336Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0303531Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0303789Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0304079Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0304310Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0304603Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0304799Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0305006Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0305209Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0305441Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0305733Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0305965Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0306167Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0306377Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0306581Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0306876Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0307110Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0307402Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0307633Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0307924Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0308184Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0308478Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0308711Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0309002Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0309203Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0309399Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0309621Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0309821Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0310041Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0310242Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0310543Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0310777Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0311068Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0311301Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0311596Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0311831Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0312125Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0312378Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0312670Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0312890Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0313092Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0313315Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0313507Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0313717Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0313919Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0314210Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0314446Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0314650Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0314862Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0315061Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0315353Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0315586Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0315877Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0316111Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0316408Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0316656Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0316959Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0317194Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0317485Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0317719Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0318010Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0318245Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0318538Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0318780Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0319071Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0319313Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0319604Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0319835Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0320130Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0320328Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0320524Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0320758Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0321075Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0321308Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0321600Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0321832Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0322125Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0322356Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0322651Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0322882Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0323188Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0323441Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0323684Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0323976Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0324207Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0324501Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0324716Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0324916Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0325115Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0325327Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0325643Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0325858Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0326059Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0326256Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0326459Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0326753Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0326974Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0327175Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0327372Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0327577Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0327723Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0327934Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0328155Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0328361Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0328559Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0328778Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0328985Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0329180Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0329400Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0329627Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0329823Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0330044Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0330251Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0330448Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0330647Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0330861Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0331064Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0331262Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0331463Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0331770Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0331994Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0332195Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0332395Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0332588Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0332785Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0332998Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0333199Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0333488Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0333702Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0334012Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0334225Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0334425Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0334623Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0334828Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0335122Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0335335Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0335536Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0335745Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0335945Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0336249Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0336444Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0336644Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0336835Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0337033Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0337246Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0337452Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0337648Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0337848Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0338038Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0338210Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0338337Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0338440Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0338566Z E1204 11:27:20.529000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0338722Z [W1204 11:27:20.797691727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0338724Z 2025-12-04T11:45:26.0338869Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0339165Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0339462Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0339592Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0340084Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0340347Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0340572Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0340778Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0340978Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0341271Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0341509Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0341804Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0342053Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0342354Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0342587Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0342877Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0343111Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0343443Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0343667Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0343874Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0344072Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0344295Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0344494Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0344740Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0345031Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0345230Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0345465Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0345755Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0345976Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0346172Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0346406Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0346622Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0346829Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0347024Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0347241Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0347449Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0347645Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0347841Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0348072Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0348367Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0348615Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0348915Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0349135Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0349338Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0349536Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0349743Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0349942Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0350174Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0350465Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0350722Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0351014Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0351247Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0351539Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0351775Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0352066Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0352297Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0352588Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0352829Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0353127Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0353406Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0353696Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0353929Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0354220Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0354452Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0354742Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0354972Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0355292Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0355525Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0355817Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0356035Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0356239Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0356433Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0356726Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0356957Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0357262Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0357494Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0357806Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0358042Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0358332Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0358570Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0358863Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0359095Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0359388Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0359612Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0359808Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0360004Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0360212Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0360415Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0360646Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0360940Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0361134Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0361328Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0361536Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0361732Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0361977Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0362270Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0362503Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0362798Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0362994Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0363202Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0363444Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0363693Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0364002Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0364225Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0364426Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0364626Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0364828Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0365127Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0365362Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0365653Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0365903Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0366195Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0366442Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0366734Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0366969Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0367266Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0367466Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0367663Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0367885Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0368099Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0368308Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0368511Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0368805Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0369037Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0369332Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0369568Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0369863Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0370095Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0370401Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0370647Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0370939Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0371161Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0371364Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0371564Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0371759Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0371971Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0372172Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0372487Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0372709Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0372911Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0373110Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0373350Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0373647Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0373881Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0374175Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0374409Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0374725Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0374974Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0375268Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0375502Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0375799Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0376032Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0376327Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0376563Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0376883Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0377117Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0377412Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0377649Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0377944Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0378180Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0378472Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0378670Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0378869Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0379117Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0379421Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0379656Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0379949Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0380185Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0380478Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0380712Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0381004Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0381261Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0381556Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0381755Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0381987Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0382281Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0382514Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0382807Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0383021Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0383222Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0383476Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0383677Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0383987Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0384202Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0384404Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0384604Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0384804Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0385099Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0385320Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0385537Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0385749Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0385943Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0386092Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0386288Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0386511Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0386717Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0386914Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0387134Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0387341Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0387551Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0387773Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0387992Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0388187Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0388410Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0388617Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0388816Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0389012Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0389225Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0389426Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0389636Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0389853Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0390147Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0390360Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0390564Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0390768Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0390960Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0391156Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0391369Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0391570Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0391781Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0391980Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0392286Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0392499Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0392703Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0392902Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0393103Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0393441Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0393654Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0393871Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0394082Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0394283Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0394575Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0394773Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0394976Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0395167Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0395364Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0395577Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0395781Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0395991Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0396181Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0396374Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0396545Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0396671Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0396774Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0396901Z E1204 11:27:20.531000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0397057Z [W1204 11:27:20.799859226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T11:45:26.0397061Z 2025-12-04T11:45:26.0397206Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Runtime error during autotuning: 2025-12-04T11:45:26.0397501Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] No valid triton configs. OutOfMemoryError: out of resource: triton_mm Required: 65536 Hardware limit:65536 Reducing block sizes or `num_stages` may help. 2025-12-04T11:45:26.0397797Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Exception raised from loadKernel at /var/lib/jenkins/workspace/torch/csrc/inductor/static_cuda_launcher.cpp:147 (most recent call first): 2025-12-04T11:45:26.0397950Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] C++ CapturedTraceback: 2025-12-04T11:45:26.0398433Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #4 std::_Function_handler, std::allocator > > const> (), c10::SetStackTraceFetcher(std::function, std::allocator > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 2025-12-04T11:45:26.0398688Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) from ??:0 2025-12-04T11:45:26.0398915Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #6 (anonymous namespace)::load_kernel(_object*, _object*) [clone .cold] from static_cuda_launcher.cpp:0 2025-12-04T11:45:26.0399122Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #7 cfunction_call from /usr/local/src/conda/python-3.12.5/Objects/methodobject.c:548 2025-12-04T11:45:26.0399323Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #8 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0399616Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #9 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0399853Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #10 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0400156Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #11 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0400388Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #12 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0400691Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #13 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0400921Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #14 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0401214Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #15 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0401445Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #16 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0401735Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #17 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0401954Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #18 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0402173Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #19 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0402381Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #20 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0402590Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #21 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0402791Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #22 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0403022Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #23 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0403350Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #24 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0403547Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #25 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0403779Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0404069Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #27 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0404291Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #28 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0404503Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #29 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0404723Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #30 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0404941Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #31 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0405136Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #32 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0405332Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #33 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0405551Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0405755Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #35 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0405952Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #36 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0406145Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #37 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0406400Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0406705Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #39 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0406939Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0407230Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #41 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0407450Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0407657Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #43 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0407851Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #44 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0408061Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #45 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0408259Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #46 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0408492Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0408795Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #48 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0409041Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0409333Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #50 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0409562Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0409857Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #52 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0410088Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0410380Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #54 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0410611Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #55 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0410921Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #56 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0411153Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0411447Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #58 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0411679Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0411971Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #60 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0412204Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #61 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0412499Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #62 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0412730Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0413032Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #64 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0413393Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #65 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0413704Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #66 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0413937Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0414229Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #68 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0414452Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0414652Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #70 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0414848Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #71 type_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:1677 2025-12-04T11:45:26.0415138Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #72 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0415398Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0415688Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #74 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0415921Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0416215Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #76 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0416447Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #77 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0416739Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #78 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0416971Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0417263Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #80 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0417512Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0417805Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #82 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0418013Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #83 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0418208Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #84 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0418407Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #85 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0418617Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #86 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0418818Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #87 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0419048Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #88 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0419340Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #89 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0419538Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #90 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0419762Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #91 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0419958Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #92 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0420153Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #93 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0420387Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #94 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0420681Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #95 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0420918Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #96 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0421209Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #97 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0421403Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #98 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0421611Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #99 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0421826Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #100 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0422059Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0422367Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #102 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0422589Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0422792Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #104 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0422991Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #105 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0423195Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #106 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0423523Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #107 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0423758Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0424079Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #109 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0424315Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #110 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0424609Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #111 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0424842Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #112 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0425137Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #113 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0425370Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0425668Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #115 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0425868Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #116 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0426077Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #117 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0426302Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0426502Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #119 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0426714Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #120 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0426915Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #121 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0427210Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #122 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0427443Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0427740Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #124 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0427979Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #125 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0428274Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #126 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0428533Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #127 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0428827Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #128 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0429061Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #129 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0429355Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #130 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0429577Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0429782Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #132 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0429983Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #133 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0430180Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #134 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0430402Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #135 partial_call from /usr/local/src/conda/python-3.12.5/Modules/_functoolsmodule.c:331 2025-12-04T11:45:26.0430604Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #136 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0430917Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #137 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0431137Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #138 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0431339Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #139 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0431538Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #140 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0431741Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #141 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0432034Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #142 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0432267Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #143 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0432563Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #144 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0432819Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #145 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0433116Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #146 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0433382Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #147 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0433677Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #148 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0433911Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #149 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0434206Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #150 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0434441Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #151 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0434732Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #152 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0434987Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #153 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0435294Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #154 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0435529Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #155 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0435823Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #156 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0436058Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #157 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0436351Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #158 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0436585Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #159 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0436879Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #160 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0437090Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #161 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0437302Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #162 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0437536Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #163 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0437830Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #164 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0438064Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #165 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0438358Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #166 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0438591Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #167 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0438884Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #168 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0439117Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #169 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0439431Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #170 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0439676Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #171 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0439970Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #172 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0440167Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #173 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0440402Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #174 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0440695Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #175 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0440928Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #176 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_call.h:92 2025-12-04T11:45:26.0441223Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #177 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0441450Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #178 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0441666Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #179 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0441866Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #180 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0442072Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #181 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0442364Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #182 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0442582Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #183 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0442783Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #184 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0442981Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #185 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0443182Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #186 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0443509Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #187 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0443752Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #188 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:91 2025-12-04T11:45:26.0443953Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #189 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0444169Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #190 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0444365Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #191 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0444514Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #192 dynamo__custom_eval_frame from :0 2025-12-04T11:45:26.0444713Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #193 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0444933Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #194 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0445140Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #195 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0445337Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #196 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0445559Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #197 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0445788Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #198 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0445986Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #199 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0446209Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #200 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0446415Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #201 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:69 2025-12-04T11:45:26.0446616Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #202 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0446837Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #203 _PyEval_EvalFrame from /usr/local/src/conda/python-3.12.5/Include/internal/pycore_ceval.h:89 2025-12-04T11:45:26.0447048Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #204 method_vectorcall from /usr/local/src/conda/python-3.12.5/Objects/classobject.c:61 2025-12-04T11:45:26.0447246Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #205 _PyVectorcall_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:283 2025-12-04T11:45:26.0447442Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #206 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0447666Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #207 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0447868Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #208 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0448067Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #209 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0448278Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #210 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0448575Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #211 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0448791Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #212 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0448996Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #213 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0449194Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #214 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0449386Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #215 _PyObject_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:367 2025-12-04T11:45:26.0449583Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #216 PyCFunction_Call from /usr/local/src/conda/python-3.12.5/Objects/call.c:387 2025-12-04T11:45:26.0449808Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #217 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0450020Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #218 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0450218Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #219 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0450420Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #220 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0450713Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #221 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0450928Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #222 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0451132Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #223 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0451329Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #224 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0451531Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #225 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0451821Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #226 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0452047Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #227 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.12.5/Objects/call.c:144 2025-12-04T11:45:26.0452249Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #228 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.12.5/Objects/call.c:508 2025-12-04T11:45:26.0452461Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #229 slot_tp_call from /usr/local/src/conda/python-3.12.5/Objects/typeobject.c:8779 2025-12-04T11:45:26.0452660Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #230 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.12.5/Objects/call.c:240 2025-12-04T11:45:26.0452955Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #231 _PyEval_EvalFrameDefault from /home/conda/feedstock_root/build_artifacts/python-split_1723141048588/work/build-static/Python/bytecodes.c:2714 2025-12-04T11:45:26.0453152Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #232 PyEval_EvalCode from /usr/local/src/conda/python-3.12.5/Python/ceval.c:578 2025-12-04T11:45:26.0453394Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #233 run_eval_code_obj from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1722 2025-12-04T11:45:26.0453584Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #234 run_mod from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1743 2025-12-04T11:45:26.0453778Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #235 pyrun_file from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:1643 2025-12-04T11:45:26.0454009Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #236 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:433 2025-12-04T11:45:26.0454234Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #237 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.12.5/Python/pythonrun.c:78 2025-12-04T11:45:26.0454435Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #238 pymain_run_file_obj from /usr/local/src/conda/python-3.12.5/Modules/main.c:360 2025-12-04T11:45:26.0454624Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #239 Py_BytesMain from /usr/local/src/conda/python-3.12.5/Modules/main.c:767 2025-12-04T11:45:26.0454804Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #240 __libc_start_call_main from ./csu/../sysdeps/x86/libc-start.c:58 2025-12-04T11:45:26.0454978Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #241 __libc_start_main_impl from ./csu/../csu/libc-start.c:360 2025-12-04T11:45:26.0455104Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] #242 _start from ??:0 2025-12-04T11:45:26.0455208Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] . 2025-12-04T11:45:26.0455333Z E1204 11:27:20.533000 998112 site-packages/torch/_inductor/select_algorithm.py:3696] [0/0] Ignoring this choice. 2025-12-04T11:45:26.0455376Z FAILED [1.4136s] [100%] 2025-12-04T11:45:26.0455378Z 2025-12-04T11:45:26.0455436Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0455600Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0455648Z Traceback (most recent call last): 2025-12-04T11:45:26.0455832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0455876Z method(*args, **kwargs) 2025-12-04T11:45:26.0456030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0456070Z method(*args, **kwargs) 2025-12-04T11:45:26.0456224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0456277Z with policy(): 2025-12-04T11:45:26.0456431Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0456471Z raise RuntimeError(msg) 2025-12-04T11:45:26.0456885Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1954545664. 2025-12-04T11:45:26.0456889Z 2025-12-04T11:45:26.0456969Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0457245Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0457247Z 2025-12-04T11:45:26.0457339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0457417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0457462Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0457522Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0458100Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0458214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0458255Z graph_break [] 2025-12-04T11:45:26.0458323Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0458402Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0458896Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0458947Z current_size = base.storage().size() 2025-12-04T11:45:26.0458989Z Autotune Choices Stats: 2025-12-04T11:45:26.0459368Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008919999934732914, "best_triton_pos": 0} 2025-12-04T11:45:26.0459436Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0459488Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0459612Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0459853Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0460105Z triton_mm_33 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0460346Z triton_mm_29 0.0108 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0460572Z triton_mm_30 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0460796Z triton_mm_22 0.0114 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0461021Z triton_mm_16 0.0114 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0461248Z triton_mm_21 0.0116 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0461475Z triton_mm_23 0.0119 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0461702Z triton_mm_15 0.0122 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0461947Z triton_mm_31 0.0129 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0462080Z SingleProcess AUTOTUNE benchmarking takes 0.1565 seconds and 1.3513 seconds precompiling for 33 choices 2025-12-04T11:45:26.0462243Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0462290Z Traceback (most recent call last): 2025-12-04T11:45:26.0462448Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0462490Z method(*args, **kwargs) 2025-12-04T11:45:26.0462646Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0462685Z method(*args, **kwargs) 2025-12-04T11:45:26.0462839Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0462877Z with policy(): 2025-12-04T11:45:26.0463034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0463077Z raise RuntimeError(msg) 2025-12-04T11:45:26.0463522Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1954545664 and is now 2921332736. 2025-12-04T11:45:26.0463543Z 2025-12-04T11:45:26.0463617Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0463891Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0463893Z 2025-12-04T11:45:26.0463983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0464058Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0464116Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0464176Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0464734Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0464836Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0464877Z graph_break [] 2025-12-04T11:45:26.0464941Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0465016Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0465504Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0465554Z current_size = base.storage().size() 2025-12-04T11:45:26.0465611Z Autotune Choices Stats: 2025-12-04T11:45:26.0465996Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008919999934732914, "best_triton_pos": 0} 2025-12-04T11:45:26.0466064Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0466116Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0466237Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0466473Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0466709Z triton_mm_33 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0466934Z triton_mm_29 0.0108 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0467160Z triton_mm_30 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0467386Z triton_mm_22 0.0114 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0467619Z triton_mm_16 0.0114 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0467846Z triton_mm_21 0.0116 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0468084Z triton_mm_23 0.0119 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0468313Z triton_mm_15 0.0122 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0468541Z triton_mm_31 0.0129 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0468673Z SingleProcess AUTOTUNE benchmarking takes 0.1565 seconds and 1.3513 seconds precompiling for 33 choices 2025-12-04T11:45:26.0468751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0468795Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0468856Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0468955Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0469445Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0469501Z graph_break [] 2025-12-04T11:45:26.0469575Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0469649Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0469692Z Autotune Choices Stats: 2025-12-04T11:45:26.0470059Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00860000029206276, "best_triton_pos": 0} 2025-12-04T11:45:26.0470126Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0470177Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0470304Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0470539Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0470772Z triton_mm_71 0.0090 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0470999Z triton_mm_67 0.0107 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0471223Z triton_mm_54 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0471461Z triton_mm_60 0.0111 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0471696Z triton_mm_59 0.0112 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0471923Z triton_mm_68 0.0116 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0472151Z triton_mm_61 0.0119 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0472380Z triton_mm_53 0.0120 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0472609Z triton_mm_69 0.0127 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0472739Z SingleProcess AUTOTUNE benchmarking takes 0.2466 seconds and 0.7895 seconds precompiling for 39 choices 2025-12-04T11:45:26.0472794Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0472953Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0473017Z Traceback (most recent call last): 2025-12-04T11:45:26.0473175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0473226Z method(*args, **kwargs) 2025-12-04T11:45:26.0473426Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0473470Z method(*args, **kwargs) 2025-12-04T11:45:26.0473623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0473661Z with policy(): 2025-12-04T11:45:26.0473815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0473860Z raise RuntimeError(msg) 2025-12-04T11:45:26.0474269Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:26.0474275Z 2025-12-04T11:45:26.0474350Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0474624Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0474626Z 2025-12-04T11:45:26.0474713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0474790Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0474833Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0474892Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0475465Z inductor [('triton_bundler_save_kernel', 312), ('generated_module_cache_miss', 38), ('benchmarking.InductorBenchmarker.benchmark_gpu', 33), ('select_algorithm_num_precompiles', 32), ('select_algorithm_num_precompilation_exceptions', 6), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0475579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0475617Z graph_break [] 2025-12-04T11:45:26.0475686Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0475759Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0476247Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0476298Z current_size = base.storage().size() 2025-12-04T11:45:26.0476340Z Autotune Choices Stats: 2025-12-04T11:45:26.0476712Z {"num_choices": 33, "num_triton_choices": 32, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008919999934732914, "best_triton_pos": 0} 2025-12-04T11:45:26.0476776Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0476828Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0476949Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0477202Z triton_mm_34 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0477445Z triton_mm_33 0.0090 ms 99.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0477672Z triton_mm_29 0.0108 ms 82.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0477896Z triton_mm_30 0.0112 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0478120Z triton_mm_22 0.0114 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0478346Z triton_mm_16 0.0114 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0478571Z triton_mm_21 0.0116 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0478796Z triton_mm_23 0.0119 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0479036Z triton_mm_15 0.0122 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0479266Z triton_mm_31 0.0129 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0479410Z SingleProcess AUTOTUNE benchmarking takes 0.1565 seconds and 1.3513 seconds precompiling for 33 choices 2025-12-04T11:45:26.0479484Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0479527Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0479584Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0479686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0480174Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0480213Z graph_break [] 2025-12-04T11:45:26.0480277Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0480353Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0480394Z Autotune Choices Stats: 2025-12-04T11:45:26.0480761Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00860000029206276, "best_triton_pos": 0} 2025-12-04T11:45:26.0480839Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0480900Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0481021Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0481257Z triton_mm_72 0.0086 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0481489Z triton_mm_71 0.0090 ms 96.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0481713Z triton_mm_67 0.0107 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0481945Z triton_mm_54 0.0110 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0482170Z triton_mm_60 0.0111 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0482394Z triton_mm_59 0.0112 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0482619Z triton_mm_68 0.0116 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0482856Z triton_mm_61 0.0119 ms 72.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0483100Z triton_mm_53 0.0120 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0483357Z triton_mm_69 0.0127 ms 67.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0483487Z SingleProcess AUTOTUNE benchmarking takes 0.2466 seconds and 0.7895 seconds precompiling for 39 choices 2025-12-04T11:45:26.0483563Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0483609Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0483666Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0483766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0484254Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0484293Z graph_break [] 2025-12-04T11:45:26.0484358Z aten_mm_info [('aten._scaled_mm.default_1024_512_1024', 1)] 2025-12-04T11:45:26.0484445Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0484489Z Autotune Choices Stats: 2025-12-04T11:45:26.0484867Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_110", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008840000256896019, "best_triton_pos": 0} 2025-12-04T11:45:26.0484935Z AUTOTUNE scaled_mm(1024x1024, 1024x512, 1024x1, 1x512, 512) 2025-12-04T11:45:26.0484985Z strides: [1024, 1], [1, 1024], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0485105Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0485342Z triton_mm_110 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0485575Z triton_mm_109 0.0092 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0485617Z _scaled_mm 0.0092 ms 95.7% 2025-12-04T11:45:26.0485843Z triton_mm_105 0.0107 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0486067Z triton_mm_92 0.0111 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0486291Z triton_mm_98 0.0114 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0486532Z triton_mm_97 0.0115 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0486771Z triton_mm_106 0.0118 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0486996Z triton_mm_99 0.0121 ms 73.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0487222Z triton_mm_91 0.0121 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0487354Z SingleProcess AUTOTUNE benchmarking takes 0.2549 seconds and 0.6195 seconds precompiling for 39 choices 2025-12-04T11:45:26.0487549Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9f8072257805adf1.xml - 2025-12-04T11:45:26.0487609Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0488235Z FAILED [1.4136s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 2921332736 and is now 3888119808. 2025-12-04T11:45:26.0488250Z 2025-12-04T11:45:26.0488323Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0488607Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0488610Z 2025-12-04T11:45:26.0488699Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0488764Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0488831Z ================== 1 failed, 187 deselected, 2 rerun in 6.67s ================== 2025-12-04T11:45:26.0488876Z Got exit code 1 2025-12-04T11:45:26.0489100Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0489231Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0489381Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf956d39bf641063.xml 2025-12-04T11:45:26.0489441Z ============================= test session starts ============================== 2025-12-04T11:45:26.0489556Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0489597Z cachedir: .pytest_cache 2025-12-04T11:45:26.0489759Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0489807Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0489848Z configfile: pytest.ini 2025-12-04T11:45:26.0490012Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0490103Z collecting ... collected 188 items / 111 deselected / 77 selected 2025-12-04T11:45:26.0490157Z stepcurrent: skipping 111 already run items. 2025-12-04T11:45:26.0490204Z Running 77 items in this shard 2025-12-04T11:45:26.0490207Z 2025-12-04T11:45:26.0490439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6005s] [ 1%] 2025-12-04T11:45:26.0490677Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2756s] [ 1%] 2025-12-04T11:45:26.0490879Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.2239s] [ 1%] 2025-12-04T11:45:26.0490881Z 2025-12-04T11:45:26.0490937Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0491089Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0491135Z Traceback (most recent call last): 2025-12-04T11:45:26.0491295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0491340Z method(*args, **kwargs) 2025-12-04T11:45:26.0491492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0491537Z method(*args, **kwargs) 2025-12-04T11:45:26.0491691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0491729Z with policy(): 2025-12-04T11:45:26.0491884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0491938Z raise RuntimeError(msg) 2025-12-04T11:45:26.0492350Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0492352Z 2025-12-04T11:45:26.0492426Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0492695Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0492697Z 2025-12-04T11:45:26.0492784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0492860Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0492904Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0492964Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0493032Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0493133Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0494712Z graph_break [] 2025-12-04T11:45:26.0494781Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0494933Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0494981Z Traceback (most recent call last): 2025-12-04T11:45:26.0495136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0495178Z method(*args, **kwargs) 2025-12-04T11:45:26.0495329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0495393Z method(*args, **kwargs) 2025-12-04T11:45:26.0495543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0495584Z with policy(): 2025-12-04T11:45:26.0495738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0495779Z raise RuntimeError(msg) 2025-12-04T11:45:26.0496197Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0496200Z 2025-12-04T11:45:26.0496274Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0496544Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0496548Z 2025-12-04T11:45:26.0496635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0496712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0496754Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0496811Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0496878Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0496977Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0497013Z graph_break [] 2025-12-04T11:45:26.0497074Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0497148Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0497205Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0497259Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0497360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0497437Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0497475Z graph_break [] 2025-12-04T11:45:26.0497533Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0497587Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0497736Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0497783Z Traceback (most recent call last): 2025-12-04T11:45:26.0497937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0497979Z method(*args, **kwargs) 2025-12-04T11:45:26.0498131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0498173Z method(*args, **kwargs) 2025-12-04T11:45:26.0498324Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0498363Z with policy(): 2025-12-04T11:45:26.0498515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0498558Z raise RuntimeError(msg) 2025-12-04T11:45:26.0498956Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0498978Z 2025-12-04T11:45:26.0499052Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0499318Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0499320Z 2025-12-04T11:45:26.0499406Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0499480Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0499533Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0499591Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0499658Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0499756Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0499793Z graph_break [] 2025-12-04T11:45:26.0499856Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0499930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0499972Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0500027Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0500123Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0500187Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0500223Z graph_break [] 2025-12-04T11:45:26.0500284Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0500358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0500399Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0500455Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0500549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0500627Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0500663Z graph_break [] 2025-12-04T11:45:26.0500722Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0500925Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cf956d39bf641063.xml - 2025-12-04T11:45:26.0500988Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0501592Z FAILED [0.2239s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0501597Z 2025-12-04T11:45:26.0501669Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0501936Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0501938Z 2025-12-04T11:45:26.0502024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0502087Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0502157Z ================== 1 failed, 111 deselected, 2 rerun in 2.12s ================== 2025-12-04T11:45:26.0502196Z Got exit code 1 2025-12-04T11:45:26.0502236Z Retrying single test... 2025-12-04T11:45:26.0502380Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1258ec67e81494e5.xml 2025-12-04T11:45:26.0502437Z ============================= test session starts ============================== 2025-12-04T11:45:26.0502562Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0502603Z cachedir: .pytest_cache 2025-12-04T11:45:26.0502762Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0502808Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0502848Z configfile: pytest.ini 2025-12-04T11:45:26.0503020Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0503098Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0503392Z stepcurrent: skipping 111 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0503438Z Running 1 items in this shard 2025-12-04T11:45:26.0503440Z 2025-12-04T11:45:26.0503665Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6160s] [100%] 2025-12-04T11:45:26.0503890Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2677s] [100%] 2025-12-04T11:45:26.0504091Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.2328s] [100%] 2025-12-04T11:45:26.0504094Z 2025-12-04T11:45:26.0504145Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0504294Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0504341Z Traceback (most recent call last): 2025-12-04T11:45:26.0504516Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0504557Z method(*args, **kwargs) 2025-12-04T11:45:26.0504723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0504763Z method(*args, **kwargs) 2025-12-04T11:45:26.0504914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0504951Z with policy(): 2025-12-04T11:45:26.0505104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0505145Z raise RuntimeError(msg) 2025-12-04T11:45:26.0505546Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0505550Z 2025-12-04T11:45:26.0505624Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0505889Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0505891Z 2025-12-04T11:45:26.0505981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0506053Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0506095Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0506152Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0506219Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0506331Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0506368Z graph_break [] 2025-12-04T11:45:26.0506428Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0506581Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0506627Z Traceback (most recent call last): 2025-12-04T11:45:26.0506792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0506832Z method(*args, **kwargs) 2025-12-04T11:45:26.0506983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0507022Z method(*args, **kwargs) 2025-12-04T11:45:26.0507173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0507211Z with policy(): 2025-12-04T11:45:26.0507364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0507406Z raise RuntimeError(msg) 2025-12-04T11:45:26.0507805Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0507807Z 2025-12-04T11:45:26.0507880Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0508146Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0508160Z 2025-12-04T11:45:26.0508249Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0508321Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0508374Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0508431Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0508499Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0508597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0508635Z graph_break [] 2025-12-04T11:45:26.0508695Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0508769Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0508810Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0508865Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0508960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0509032Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0509069Z graph_break [] 2025-12-04T11:45:26.0509129Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0509181Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0509332Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0509378Z Traceback (most recent call last): 2025-12-04T11:45:26.0509535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0509577Z method(*args, **kwargs) 2025-12-04T11:45:26.0509726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0509779Z method(*args, **kwargs) 2025-12-04T11:45:26.0509930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0509966Z with policy(): 2025-12-04T11:45:26.0510119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0510160Z raise RuntimeError(msg) 2025-12-04T11:45:26.0510565Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0510567Z 2025-12-04T11:45:26.0510640Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0510905Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0510909Z 2025-12-04T11:45:26.0510997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0511071Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0511113Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0511169Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0511236Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0511333Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0511370Z graph_break [] 2025-12-04T11:45:26.0511428Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0511501Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0511542Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0511609Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0511704Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0511784Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0511820Z graph_break [] 2025-12-04T11:45:26.0511880Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0511952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0511995Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0512049Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0512147Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0512210Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0512246Z graph_break [] 2025-12-04T11:45:26.0512303Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0512499Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1258ec67e81494e5.xml - 2025-12-04T11:45:26.0512558Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0513163Z FAILED [0.2328s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0513166Z 2025-12-04T11:45:26.0513238Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0513533Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0513551Z 2025-12-04T11:45:26.0513637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0513701Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0513769Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:26.0513806Z Got exit code 1 2025-12-04T11:45:26.0513846Z Retrying single test... 2025-12-04T11:45:26.0514008Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-afc6dcbf238159f8.xml 2025-12-04T11:45:26.0514068Z ============================= test session starts ============================== 2025-12-04T11:45:26.0514179Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0514221Z cachedir: .pytest_cache 2025-12-04T11:45:26.0514380Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0514426Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0514466Z configfile: pytest.ini 2025-12-04T11:45:26.0514629Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0514703Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0514967Z stepcurrent: skipping 111 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0515010Z Running 1 items in this shard 2025-12-04T11:45:26.0515012Z 2025-12-04T11:45:26.0515241Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7081s] [100%] 2025-12-04T11:45:26.0515480Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3688s] [100%] 2025-12-04T11:45:26.0515693Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.3621s] [100%] 2025-12-04T11:45:26.0515695Z 2025-12-04T11:45:26.0515747Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0515897Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0515942Z Traceback (most recent call last): 2025-12-04T11:45:26.0516101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0516144Z method(*args, **kwargs) 2025-12-04T11:45:26.0516296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0516338Z method(*args, **kwargs) 2025-12-04T11:45:26.0516489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0516526Z with policy(): 2025-12-04T11:45:26.0516677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0516722Z raise RuntimeError(msg) 2025-12-04T11:45:26.0517121Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0517136Z 2025-12-04T11:45:26.0517209Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0517474Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0517476Z 2025-12-04T11:45:26.0517563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0517637Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0517691Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0517748Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0517816Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0517913Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0517951Z graph_break [] 2025-12-04T11:45:26.0518012Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0518162Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0518207Z Traceback (most recent call last): 2025-12-04T11:45:26.0518363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0518402Z method(*args, **kwargs) 2025-12-04T11:45:26.0518556Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0518595Z method(*args, **kwargs) 2025-12-04T11:45:26.0518746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0518783Z with policy(): 2025-12-04T11:45:26.0518934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0518987Z raise RuntimeError(msg) 2025-12-04T11:45:26.0519393Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0519395Z 2025-12-04T11:45:26.0519470Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0519734Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0519736Z 2025-12-04T11:45:26.0519823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0519896Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0519941Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0519997Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0520063Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0520164Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0520200Z graph_break [] 2025-12-04T11:45:26.0520260Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0520334Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0520375Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0520431Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0520525Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0520590Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0520637Z graph_break [] 2025-12-04T11:45:26.0520696Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0520748Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0520896Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0520941Z Traceback (most recent call last): 2025-12-04T11:45:26.0521094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0521145Z method(*args, **kwargs) 2025-12-04T11:45:26.0521296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0521336Z method(*args, **kwargs) 2025-12-04T11:45:26.0521488Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0521526Z with policy(): 2025-12-04T11:45:26.0521680Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0521721Z raise RuntimeError(msg) 2025-12-04T11:45:26.0522119Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0522122Z 2025-12-04T11:45:26.0522196Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0522459Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0522461Z 2025-12-04T11:45:26.0522560Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0522634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0522676Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0522742Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0522808Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0522904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0522943Z graph_break [] 2025-12-04T11:45:26.0523002Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0523076Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0523117Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0523172Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0523297Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0523364Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0523400Z graph_break [] 2025-12-04T11:45:26.0523458Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0523531Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0523573Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0523627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0523723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0523786Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0523822Z graph_break [] 2025-12-04T11:45:26.0523879Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0524071Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-afc6dcbf238159f8.xml - 2025-12-04T11:45:26.0524145Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0524747Z FAILED [0.3621s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0524762Z 2025-12-04T11:45:26.0524836Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0525097Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0525100Z 2025-12-04T11:45:26.0525187Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0525248Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0525316Z ================== 1 failed, 187 deselected, 2 rerun in 2.46s ================== 2025-12-04T11:45:26.0525353Z Got exit code 1 2025-12-04T11:45:26.0525567Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0525695Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0525841Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bb62c0abeda55ac.xml 2025-12-04T11:45:26.0525897Z ============================= test session starts ============================== 2025-12-04T11:45:26.0526011Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0526070Z cachedir: .pytest_cache 2025-12-04T11:45:26.0526228Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0526288Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0526332Z configfile: pytest.ini 2025-12-04T11:45:26.0526491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0526569Z collecting ... collected 188 items / 112 deselected / 76 selected 2025-12-04T11:45:26.0526624Z stepcurrent: skipping 112 already run items. 2025-12-04T11:45:26.0526669Z Running 76 items in this shard 2025-12-04T11:45:26.0526671Z 2025-12-04T11:45:26.0526898Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5823s] [ 1%] 2025-12-04T11:45:26.0527121Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2565s] [ 1%] 2025-12-04T11:45:26.0527319Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2139s] [ 1%] 2025-12-04T11:45:26.0527321Z 2025-12-04T11:45:26.0527371Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0527523Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0527568Z Traceback (most recent call last): 2025-12-04T11:45:26.0527726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0527766Z method(*args, **kwargs) 2025-12-04T11:45:26.0527919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0527971Z method(*args, **kwargs) 2025-12-04T11:45:26.0528124Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0528162Z with policy(): 2025-12-04T11:45:26.0528315Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0528356Z raise RuntimeError(msg) 2025-12-04T11:45:26.0528760Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0528763Z 2025-12-04T11:45:26.0528837Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0529100Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0529103Z 2025-12-04T11:45:26.0529191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0529264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0529306Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0529363Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0529429Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0529526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0529564Z graph_break [] 2025-12-04T11:45:26.0529623Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0529785Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0529830Z Traceback (most recent call last): 2025-12-04T11:45:26.0529996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0530036Z method(*args, **kwargs) 2025-12-04T11:45:26.0530189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0530229Z method(*args, **kwargs) 2025-12-04T11:45:26.0530382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0530418Z with policy(): 2025-12-04T11:45:26.0530572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0530613Z raise RuntimeError(msg) 2025-12-04T11:45:26.0531008Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0531010Z 2025-12-04T11:45:26.0531082Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0531346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0531348Z 2025-12-04T11:45:26.0531436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0531508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0531551Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0531619Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0531686Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0531785Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0531821Z graph_break [] 2025-12-04T11:45:26.0531880Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0531955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0531996Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0532064Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0532159Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0532226Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0532262Z graph_break [] 2025-12-04T11:45:26.0532320Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0532375Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0532525Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0532571Z Traceback (most recent call last): 2025-12-04T11:45:26.0532726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0532766Z method(*args, **kwargs) 2025-12-04T11:45:26.0532918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0532959Z method(*args, **kwargs) 2025-12-04T11:45:26.0533108Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0533146Z with policy(): 2025-12-04T11:45:26.0533331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0533388Z raise RuntimeError(msg) 2025-12-04T11:45:26.0533793Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0533795Z 2025-12-04T11:45:26.0533870Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0534133Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0534135Z 2025-12-04T11:45:26.0534222Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0534296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0534340Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0534395Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0534462Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0534558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0534597Z graph_break [] 2025-12-04T11:45:26.0534656Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0534731Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0534772Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0534827Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0534921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0534985Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0535037Z graph_break [] 2025-12-04T11:45:26.0535095Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0535167Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0535210Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0535264Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0535361Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0535424Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0535475Z graph_break [] 2025-12-04T11:45:26.0535532Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0535723Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7bb62c0abeda55ac.xml - 2025-12-04T11:45:26.0535784Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0536383Z FAILED [0.2139s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0536385Z 2025-12-04T11:45:26.0536458Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0536718Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0536720Z 2025-12-04T11:45:26.0536806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0536878Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0536946Z ================== 1 failed, 112 deselected, 2 rerun in 2.07s ================== 2025-12-04T11:45:26.0536984Z Got exit code 1 2025-12-04T11:45:26.0537037Z Retrying single test... 2025-12-04T11:45:26.0537182Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78d874d805889162.xml 2025-12-04T11:45:26.0537240Z ============================= test session starts ============================== 2025-12-04T11:45:26.0537351Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0537393Z cachedir: .pytest_cache 2025-12-04T11:45:26.0537550Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0537599Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0537639Z configfile: pytest.ini 2025-12-04T11:45:26.0537801Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0537874Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0538136Z stepcurrent: skipping 112 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0538180Z Running 1 items in this shard 2025-12-04T11:45:26.0538182Z 2025-12-04T11:45:26.0538411Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5936s] [100%] 2025-12-04T11:45:26.0538634Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2570s] [100%] 2025-12-04T11:45:26.0538844Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2160s] [100%] 2025-12-04T11:45:26.0538846Z 2025-12-04T11:45:26.0538898Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0539048Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0539094Z Traceback (most recent call last): 2025-12-04T11:45:26.0539265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0539307Z method(*args, **kwargs) 2025-12-04T11:45:26.0539459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0539499Z method(*args, **kwargs) 2025-12-04T11:45:26.0539649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0539688Z with policy(): 2025-12-04T11:45:26.0539840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0539882Z raise RuntimeError(msg) 2025-12-04T11:45:26.0540279Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0540281Z 2025-12-04T11:45:26.0540357Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0540617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0540631Z 2025-12-04T11:45:26.0540718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0540801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0540843Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0540899Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0540964Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0541063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0541100Z graph_break [] 2025-12-04T11:45:26.0541158Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0541308Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0541354Z Traceback (most recent call last): 2025-12-04T11:45:26.0541509Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0541550Z method(*args, **kwargs) 2025-12-04T11:45:26.0541701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0541740Z method(*args, **kwargs) 2025-12-04T11:45:26.0541889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0541926Z with policy(): 2025-12-04T11:45:26.0542078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0542119Z raise RuntimeError(msg) 2025-12-04T11:45:26.0542513Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0542526Z 2025-12-04T11:45:26.0542600Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0542863Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0542866Z 2025-12-04T11:45:26.0542963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0543037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0543079Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0543135Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0543201Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0543333Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0543372Z graph_break [] 2025-12-04T11:45:26.0543430Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0543504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0543545Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0543601Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0543695Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0543761Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0543797Z graph_break [] 2025-12-04T11:45:26.0543855Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0543907Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0544054Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0544119Z Traceback (most recent call last): 2025-12-04T11:45:26.0544273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0544325Z method(*args, **kwargs) 2025-12-04T11:45:26.0544478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0544518Z method(*args, **kwargs) 2025-12-04T11:45:26.0544669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0544706Z with policy(): 2025-12-04T11:45:26.0544857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0544899Z raise RuntimeError(msg) 2025-12-04T11:45:26.0545290Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0545294Z 2025-12-04T11:45:26.0545368Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0545632Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0545634Z 2025-12-04T11:45:26.0545720Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0545793Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0545835Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0545890Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0545970Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0546067Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0546103Z graph_break [] 2025-12-04T11:45:26.0546163Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0546237Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0546277Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0546334Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0546442Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0546507Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0546544Z graph_break [] 2025-12-04T11:45:26.0546602Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0546675Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0546719Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0546773Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0546869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0546932Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0546969Z graph_break [] 2025-12-04T11:45:26.0547026Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0547217Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-78d874d805889162.xml - 2025-12-04T11:45:26.0547276Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0547878Z FAILED [0.2160s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0547902Z 2025-12-04T11:45:26.0547975Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0548236Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0548238Z 2025-12-04T11:45:26.0548325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0548385Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0548453Z ================== 1 failed, 187 deselected, 2 rerun in 2.08s ================== 2025-12-04T11:45:26.0548492Z Got exit code 1 2025-12-04T11:45:26.0548532Z Retrying single test... 2025-12-04T11:45:26.0548677Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e411e3875dbe6ca6.xml 2025-12-04T11:45:26.0548734Z ============================= test session starts ============================== 2025-12-04T11:45:26.0548844Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0548885Z cachedir: .pytest_cache 2025-12-04T11:45:26.0549043Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0549090Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0549129Z configfile: pytest.ini 2025-12-04T11:45:26.0549292Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0549365Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0549638Z stepcurrent: skipping 112 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0549683Z Running 1 items in this shard 2025-12-04T11:45:26.0549685Z 2025-12-04T11:45:26.0549910Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6007s] [100%] 2025-12-04T11:45:26.0550144Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2592s] [100%] 2025-12-04T11:45:26.0550339Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2320s] [100%] 2025-12-04T11:45:26.0550341Z 2025-12-04T11:45:26.0550395Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0550541Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0550590Z Traceback (most recent call last): 2025-12-04T11:45:26.0550747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0550788Z method(*args, **kwargs) 2025-12-04T11:45:26.0550941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0550982Z method(*args, **kwargs) 2025-12-04T11:45:26.0551132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0551171Z with policy(): 2025-12-04T11:45:26.0551325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0551379Z raise RuntimeError(msg) 2025-12-04T11:45:26.0551783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0551787Z 2025-12-04T11:45:26.0551860Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0552125Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0552127Z 2025-12-04T11:45:26.0552213Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0552291Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0552332Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0552389Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0552456Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0552554Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0552589Z graph_break [] 2025-12-04T11:45:26.0552650Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0552803Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0552850Z Traceback (most recent call last): 2025-12-04T11:45:26.0553002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0553044Z method(*args, **kwargs) 2025-12-04T11:45:26.0553214Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0553287Z method(*args, **kwargs) 2025-12-04T11:45:26.0553440Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0553477Z with policy(): 2025-12-04T11:45:26.0553630Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0553672Z raise RuntimeError(msg) 2025-12-04T11:45:26.0554078Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0554082Z 2025-12-04T11:45:26.0554158Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0554426Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0554429Z 2025-12-04T11:45:26.0554515Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0554588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0554629Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0554688Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0554754Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0554852Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0554888Z graph_break [] 2025-12-04T11:45:26.0554947Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0555037Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0555081Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0555135Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0555246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0555310Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0555347Z graph_break [] 2025-12-04T11:45:26.0555404Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0555457Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0555605Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0555652Z Traceback (most recent call last): 2025-12-04T11:45:26.0555805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0555848Z method(*args, **kwargs) 2025-12-04T11:45:26.0555999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0556041Z method(*args, **kwargs) 2025-12-04T11:45:26.0556191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0556228Z with policy(): 2025-12-04T11:45:26.0556380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0556422Z raise RuntimeError(msg) 2025-12-04T11:45:26.0556813Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0556831Z 2025-12-04T11:45:26.0556903Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0557166Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0557169Z 2025-12-04T11:45:26.0557254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0557338Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0557381Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0557440Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0557505Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0557602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0557640Z graph_break [] 2025-12-04T11:45:26.0557699Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0557771Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0557816Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0557870Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0557966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0558031Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0558071Z graph_break [] 2025-12-04T11:45:26.0558130Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0558205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0558245Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0558299Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0558394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0558469Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0558505Z graph_break [] 2025-12-04T11:45:26.0558562Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0558764Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e411e3875dbe6ca6.xml - 2025-12-04T11:45:26.0558825Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0559424Z FAILED [0.2320s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0559428Z 2025-12-04T11:45:26.0559502Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0559764Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0559766Z 2025-12-04T11:45:26.0559852Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0559916Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0559983Z ================== 1 failed, 187 deselected, 2 rerun in 2.11s ================== 2025-12-04T11:45:26.0560020Z Got exit code 1 2025-12-04T11:45:26.0560231Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0560370Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0560514Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-86c91f8f88e19d48.xml 2025-12-04T11:45:26.0560575Z ============================= test session starts ============================== 2025-12-04T11:45:26.0560686Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0560728Z cachedir: .pytest_cache 2025-12-04T11:45:26.0560896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0560942Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0560983Z configfile: pytest.ini 2025-12-04T11:45:26.0561148Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0561225Z collecting ... collected 188 items / 113 deselected / 75 selected 2025-12-04T11:45:26.0561280Z stepcurrent: skipping 113 already run items. 2025-12-04T11:45:26.0561324Z Running 75 items in this shard 2025-12-04T11:45:26.0561326Z 2025-12-04T11:45:26.0561556Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5909s] [ 1%] 2025-12-04T11:45:26.0561780Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2631s] [ 1%] 2025-12-04T11:45:26.0561980Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.2375s] [ 1%] 2025-12-04T11:45:26.0561982Z 2025-12-04T11:45:26.0562034Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0562197Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0562243Z Traceback (most recent call last): 2025-12-04T11:45:26.0562407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0562449Z method(*args, **kwargs) 2025-12-04T11:45:26.0562602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0562645Z method(*args, **kwargs) 2025-12-04T11:45:26.0562796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0562833Z with policy(): 2025-12-04T11:45:26.0562987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0563030Z raise RuntimeError(msg) 2025-12-04T11:45:26.0563451Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0563453Z 2025-12-04T11:45:26.0563527Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0563789Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0563793Z 2025-12-04T11:45:26.0563879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0563953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0564012Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0564069Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0564133Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0564237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0564274Z graph_break [] 2025-12-04T11:45:26.0564336Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0564498Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0564547Z Traceback (most recent call last): 2025-12-04T11:45:26.0564699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0564740Z method(*args, **kwargs) 2025-12-04T11:45:26.0564890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0564931Z method(*args, **kwargs) 2025-12-04T11:45:26.0565080Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0565119Z with policy(): 2025-12-04T11:45:26.0565273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0565317Z raise RuntimeError(msg) 2025-12-04T11:45:26.0565708Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0565710Z 2025-12-04T11:45:26.0565784Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0566044Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0566060Z 2025-12-04T11:45:26.0566173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0566247Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0566291Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0566346Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0566412Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0566509Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0566549Z graph_break [] 2025-12-04T11:45:26.0566607Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0566682Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0566724Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0566780Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0566875Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0566940Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0566977Z graph_break [] 2025-12-04T11:45:26.0567038Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0567090Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0567241Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0567289Z Traceback (most recent call last): 2025-12-04T11:45:26.0567444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0567485Z method(*args, **kwargs) 2025-12-04T11:45:26.0567649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0567691Z method(*args, **kwargs) 2025-12-04T11:45:26.0567840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0567879Z with policy(): 2025-12-04T11:45:26.0568029Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0568082Z raise RuntimeError(msg) 2025-12-04T11:45:26.0568476Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0568478Z 2025-12-04T11:45:26.0568554Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0568817Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0568819Z 2025-12-04T11:45:26.0568907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0568980Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0569023Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0569079Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0569145Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0569242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0569279Z graph_break [] 2025-12-04T11:45:26.0569338Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0569427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0569469Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0569525Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0569630Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0569695Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0569731Z graph_break [] 2025-12-04T11:45:26.0569792Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0569864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0569906Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0569960Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0570056Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0570122Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0570159Z graph_break [] 2025-12-04T11:45:26.0570216Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0570411Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-86c91f8f88e19d48.xml - 2025-12-04T11:45:26.0570471Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0571069Z FAILED [0.2375s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0571083Z 2025-12-04T11:45:26.0571157Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0571418Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0571419Z 2025-12-04T11:45:26.0571506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0571567Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0571644Z ================== 1 failed, 113 deselected, 2 rerun in 2.11s ================== 2025-12-04T11:45:26.0571681Z Got exit code 1 2025-12-04T11:45:26.0571721Z Retrying single test... 2025-12-04T11:45:26.0571868Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fab66949f0e472c2.xml 2025-12-04T11:45:26.0571926Z ============================= test session starts ============================== 2025-12-04T11:45:26.0572036Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0572080Z cachedir: .pytest_cache 2025-12-04T11:45:26.0572238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0572286Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0572326Z configfile: pytest.ini 2025-12-04T11:45:26.0572491Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0572566Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0572826Z stepcurrent: skipping 113 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0572870Z Running 1 items in this shard 2025-12-04T11:45:26.0572885Z 2025-12-04T11:45:26.0573109Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5994s] [100%] 2025-12-04T11:45:26.0573376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2683s] [100%] 2025-12-04T11:45:26.0573572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.2255s] [100%] 2025-12-04T11:45:26.0573575Z 2025-12-04T11:45:26.0573626Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0573779Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0573828Z Traceback (most recent call last): 2025-12-04T11:45:26.0573988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0574029Z method(*args, **kwargs) 2025-12-04T11:45:26.0574183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0574226Z method(*args, **kwargs) 2025-12-04T11:45:26.0574376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0574415Z with policy(): 2025-12-04T11:45:26.0574567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0574609Z raise RuntimeError(msg) 2025-12-04T11:45:26.0575004Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0575022Z 2025-12-04T11:45:26.0575096Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0575357Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0575359Z 2025-12-04T11:45:26.0575466Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0575541Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0575582Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0575638Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0575704Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0575805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0575841Z graph_break [] 2025-12-04T11:45:26.0575902Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0576050Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0576099Z Traceback (most recent call last): 2025-12-04T11:45:26.0576252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0576294Z method(*args, **kwargs) 2025-12-04T11:45:26.0576444Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0576484Z method(*args, **kwargs) 2025-12-04T11:45:26.0576633Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0576685Z with policy(): 2025-12-04T11:45:26.0576835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0576877Z raise RuntimeError(msg) 2025-12-04T11:45:26.0577284Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0577288Z 2025-12-04T11:45:26.0577361Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0577622Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0577626Z 2025-12-04T11:45:26.0577711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0577785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0577827Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0577884Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0577949Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0578048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0578085Z graph_break [] 2025-12-04T11:45:26.0578145Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0578217Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0578260Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0578315Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0578413Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0578490Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0578527Z graph_break [] 2025-12-04T11:45:26.0578585Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0578638Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0578786Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0578834Z Traceback (most recent call last): 2025-12-04T11:45:26.0578999Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0579040Z method(*args, **kwargs) 2025-12-04T11:45:26.0579190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0579230Z method(*args, **kwargs) 2025-12-04T11:45:26.0579381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0579419Z with policy(): 2025-12-04T11:45:26.0579573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0579616Z raise RuntimeError(msg) 2025-12-04T11:45:26.0580010Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0580013Z 2025-12-04T11:45:26.0580086Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0580350Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0580371Z 2025-12-04T11:45:26.0580458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0580542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0580584Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0580641Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0580706Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0580806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0580841Z graph_break [] 2025-12-04T11:45:26.0580900Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0580972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0581015Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0581073Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0581169Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0581233Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0581272Z graph_break [] 2025-12-04T11:45:26.0581330Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0581403Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0581443Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0581501Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0581594Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0581658Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0581693Z graph_break [] 2025-12-04T11:45:26.0581751Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0581953Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fab66949f0e472c2.xml - 2025-12-04T11:45:26.0582014Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0582622Z FAILED [0.2255s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0582626Z 2025-12-04T11:45:26.0582699Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0582959Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0582964Z 2025-12-04T11:45:26.0583049Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0583114Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0583180Z ================== 1 failed, 187 deselected, 2 rerun in 2.11s ================== 2025-12-04T11:45:26.0583218Z Got exit code 1 2025-12-04T11:45:26.0583294Z Retrying single test... 2025-12-04T11:45:26.0583443Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ce42825959dd83d0.xml 2025-12-04T11:45:26.0583499Z ============================= test session starts ============================== 2025-12-04T11:45:26.0583608Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0583648Z cachedir: .pytest_cache 2025-12-04T11:45:26.0583822Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0583867Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0583909Z configfile: pytest.ini 2025-12-04T11:45:26.0584082Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0584157Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0584418Z stepcurrent: skipping 113 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0584462Z Running 1 items in this shard 2025-12-04T11:45:26.0584464Z 2025-12-04T11:45:26.0584689Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5966s] [100%] 2025-12-04T11:45:26.0584913Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2550s] [100%] 2025-12-04T11:45:26.0585110Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.2157s] [100%] 2025-12-04T11:45:26.0585112Z 2025-12-04T11:45:26.0585162Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0585312Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0585358Z Traceback (most recent call last): 2025-12-04T11:45:26.0585514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0585554Z method(*args, **kwargs) 2025-12-04T11:45:26.0585722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0585761Z method(*args, **kwargs) 2025-12-04T11:45:26.0585915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0585952Z with policy(): 2025-12-04T11:45:26.0586104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0586158Z raise RuntimeError(msg) 2025-12-04T11:45:26.0586553Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0586555Z 2025-12-04T11:45:26.0586629Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0586893Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0586895Z 2025-12-04T11:45:26.0586982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0587055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0587099Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0587155Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0587222Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0587320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0587358Z graph_break [] 2025-12-04T11:45:26.0587417Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0587577Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0587623Z Traceback (most recent call last): 2025-12-04T11:45:26.0587788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0587828Z method(*args, **kwargs) 2025-12-04T11:45:26.0587979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0588020Z method(*args, **kwargs) 2025-12-04T11:45:26.0588171Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0588207Z with policy(): 2025-12-04T11:45:26.0588359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0588401Z raise RuntimeError(msg) 2025-12-04T11:45:26.0588793Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0588795Z 2025-12-04T11:45:26.0588870Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0589135Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0589137Z 2025-12-04T11:45:26.0589222Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0589296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0589350Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0589406Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0589472Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0589570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0589607Z graph_break [] 2025-12-04T11:45:26.0589666Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0589740Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0589793Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0589849Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0589945Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0590010Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0590045Z graph_break [] 2025-12-04T11:45:26.0590106Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0590159Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0590308Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0590354Z Traceback (most recent call last): 2025-12-04T11:45:26.0590508Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0590547Z method(*args, **kwargs) 2025-12-04T11:45:26.0590699Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0590738Z method(*args, **kwargs) 2025-12-04T11:45:26.0590888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0590924Z with policy(): 2025-12-04T11:45:26.0591091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0591132Z raise RuntimeError(msg) 2025-12-04T11:45:26.0591535Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0591538Z 2025-12-04T11:45:26.0591612Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0591872Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0591874Z 2025-12-04T11:45:26.0591960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0592035Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0592076Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0592133Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0592199Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0592297Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0592333Z graph_break [] 2025-12-04T11:45:26.0592393Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0592466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0592509Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0592563Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0592660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0592738Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0592773Z graph_break [] 2025-12-04T11:45:26.0592831Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0592905Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0592946Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0593002Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0593098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0593180Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0593218Z graph_break [] 2025-12-04T11:45:26.0593318Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0593509Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ce42825959dd83d0.xml - 2025-12-04T11:45:26.0593568Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0594165Z FAILED [0.2157s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0594168Z 2025-12-04T11:45:26.0594240Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0594504Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0594506Z 2025-12-04T11:45:26.0594593Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0594674Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0594742Z ================== 1 failed, 187 deselected, 2 rerun in 2.08s ================== 2025-12-04T11:45:26.0594783Z Got exit code 1 2025-12-04T11:45:26.0595007Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0595136Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0595282Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-62fdc7c8c9a96cde.xml 2025-12-04T11:45:26.0595338Z ============================= test session starts ============================== 2025-12-04T11:45:26.0595449Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0595493Z cachedir: .pytest_cache 2025-12-04T11:45:26.0595653Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0595698Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0595741Z configfile: pytest.ini 2025-12-04T11:45:26.0595902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0595980Z collecting ... collected 188 items / 114 deselected / 74 selected 2025-12-04T11:45:26.0596035Z stepcurrent: skipping 114 already run items. 2025-12-04T11:45:26.0596082Z Running 74 items in this shard 2025-12-04T11:45:26.0596084Z 2025-12-04T11:45:26.0596310Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.5985s] [ 1%] 2025-12-04T11:45:26.0596532Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2641s] [ 1%] 2025-12-04T11:45:26.0596744Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2194s] [ 1%] 2025-12-04T11:45:26.0596746Z 2025-12-04T11:45:26.0596798Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0596956Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0597002Z Traceback (most recent call last): 2025-12-04T11:45:26.0597158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0597200Z method(*args, **kwargs) 2025-12-04T11:45:26.0597352Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0597395Z method(*args, **kwargs) 2025-12-04T11:45:26.0597551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0597590Z with policy(): 2025-12-04T11:45:26.0597745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0597785Z raise RuntimeError(msg) 2025-12-04T11:45:26.0598175Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0598178Z 2025-12-04T11:45:26.0598251Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0598515Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0598529Z 2025-12-04T11:45:26.0598625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0598702Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0598743Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0598801Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0598867Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0598966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0599003Z graph_break [] 2025-12-04T11:45:26.0599064Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0599210Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0599259Z Traceback (most recent call last): 2025-12-04T11:45:26.0599412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0599455Z method(*args, **kwargs) 2025-12-04T11:45:26.0599607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0599647Z method(*args, **kwargs) 2025-12-04T11:45:26.0599798Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0599838Z with policy(): 2025-12-04T11:45:26.0599991Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0600032Z raise RuntimeError(msg) 2025-12-04T11:45:26.0600423Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0600437Z 2025-12-04T11:45:26.0600511Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0600783Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0600785Z 2025-12-04T11:45:26.0600873Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0600951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0600994Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0601053Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0601120Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0601220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0601256Z graph_break [] 2025-12-04T11:45:26.0601317Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0601390Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0601433Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0601491Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0601588Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0601652Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0601693Z graph_break [] 2025-12-04T11:45:26.0601751Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0601807Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0601965Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0602013Z Traceback (most recent call last): 2025-12-04T11:45:26.0602179Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0602224Z method(*args, **kwargs) 2025-12-04T11:45:26.0602379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0602423Z method(*args, **kwargs) 2025-12-04T11:45:26.0602574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0602613Z with policy(): 2025-12-04T11:45:26.0602765Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0602809Z raise RuntimeError(msg) 2025-12-04T11:45:26.0603203Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0603205Z 2025-12-04T11:45:26.0603304Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0603564Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0603566Z 2025-12-04T11:45:26.0603653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0603726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0603783Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0603839Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0603904Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0604003Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0604039Z graph_break [] 2025-12-04T11:45:26.0604099Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0604186Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0604228Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0604283Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0604380Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0604443Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0604481Z graph_break [] 2025-12-04T11:45:26.0604541Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0604614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0604654Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0604711Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0604806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0604870Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0604906Z graph_break [] 2025-12-04T11:45:26.0604966Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0605156Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-62fdc7c8c9a96cde.xml - 2025-12-04T11:45:26.0605217Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0605823Z FAILED [0.2194s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0605843Z 2025-12-04T11:45:26.0605915Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0606178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0606180Z 2025-12-04T11:45:26.0606266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0606329Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0606398Z ================== 1 failed, 114 deselected, 2 rerun in 2.10s ================== 2025-12-04T11:45:26.0606440Z Got exit code 1 2025-12-04T11:45:26.0606481Z Retrying single test... 2025-12-04T11:45:26.0606631Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4b877aa577e4a316.xml 2025-12-04T11:45:26.0606689Z ============================= test session starts ============================== 2025-12-04T11:45:26.0606801Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0606841Z cachedir: .pytest_cache 2025-12-04T11:45:26.0607000Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0607044Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0607085Z configfile: pytest.ini 2025-12-04T11:45:26.0607246Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0607342Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0607601Z stepcurrent: skipping 114 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0607647Z Running 1 items in this shard 2025-12-04T11:45:26.0607650Z 2025-12-04T11:45:26.0607882Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6151s] [100%] 2025-12-04T11:45:26.0608100Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2642s] [100%] 2025-12-04T11:45:26.0608294Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2202s] [100%] 2025-12-04T11:45:26.0608298Z 2025-12-04T11:45:26.0608351Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0608500Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0608547Z Traceback (most recent call last): 2025-12-04T11:45:26.0608706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0608747Z method(*args, **kwargs) 2025-12-04T11:45:26.0608901Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0608942Z method(*args, **kwargs) 2025-12-04T11:45:26.0609093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0609142Z with policy(): 2025-12-04T11:45:26.0609296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0609337Z raise RuntimeError(msg) 2025-12-04T11:45:26.0609742Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0609745Z 2025-12-04T11:45:26.0609819Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0610081Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0610085Z 2025-12-04T11:45:26.0610175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0610249Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0610294Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0610351Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0610419Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0610517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0610558Z graph_break [] 2025-12-04T11:45:26.0610617Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0610767Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0610812Z Traceback (most recent call last): 2025-12-04T11:45:26.0610965Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0611017Z method(*args, **kwargs) 2025-12-04T11:45:26.0611169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0611210Z method(*args, **kwargs) 2025-12-04T11:45:26.0611361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0611398Z with policy(): 2025-12-04T11:45:26.0611564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0611606Z raise RuntimeError(msg) 2025-12-04T11:45:26.0611999Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0612003Z 2025-12-04T11:45:26.0612076Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0612337Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0612340Z 2025-12-04T11:45:26.0612428Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0612502Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0612547Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0612603Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0612672Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0612771Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0612820Z graph_break [] 2025-12-04T11:45:26.0612880Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0612955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0613007Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0613064Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0613161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0613227Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0613298Z graph_break [] 2025-12-04T11:45:26.0613359Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0613410Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0613560Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0613606Z Traceback (most recent call last): 2025-12-04T11:45:26.0613763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0613804Z method(*args, **kwargs) 2025-12-04T11:45:26.0613958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0613998Z method(*args, **kwargs) 2025-12-04T11:45:26.0614151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0614188Z with policy(): 2025-12-04T11:45:26.0614340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0614382Z raise RuntimeError(msg) 2025-12-04T11:45:26.0614773Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0614793Z 2025-12-04T11:45:26.0614868Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0615132Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0615135Z 2025-12-04T11:45:26.0615236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0615310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0615354Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0615411Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0615479Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0615578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0615617Z graph_break [] 2025-12-04T11:45:26.0615676Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0615752Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0615793Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0615850Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0615947Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0616016Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0616051Z graph_break [] 2025-12-04T11:45:26.0616112Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0616185Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0616228Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0616296Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0616392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0616469Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0616508Z graph_break [] 2025-12-04T11:45:26.0616570Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0616765Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4b877aa577e4a316.xml - 2025-12-04T11:45:26.0616828Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0617424Z FAILED [0.2202s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0617428Z 2025-12-04T11:45:26.0617506Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0617765Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0617767Z 2025-12-04T11:45:26.0617856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0617920Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0617992Z ================== 1 failed, 187 deselected, 2 rerun in 2.12s ================== 2025-12-04T11:45:26.0618029Z Got exit code 1 2025-12-04T11:45:26.0618073Z Retrying single test... 2025-12-04T11:45:26.0618232Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4cf48eb407765d82.xml 2025-12-04T11:45:26.0618295Z ============================= test session starts ============================== 2025-12-04T11:45:26.0618407Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0618449Z cachedir: .pytest_cache 2025-12-04T11:45:26.0618608Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0618667Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0618708Z configfile: pytest.ini 2025-12-04T11:45:26.0618870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0618948Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0619209Z stepcurrent: skipping 114 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0619254Z Running 1 items in this shard 2025-12-04T11:45:26.0619256Z 2025-12-04T11:45:26.0619481Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6168s] [100%] 2025-12-04T11:45:26.0619701Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2661s] [100%] 2025-12-04T11:45:26.0619898Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.2261s] [100%] 2025-12-04T11:45:26.0619900Z 2025-12-04T11:45:26.0619952Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0620112Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0620159Z Traceback (most recent call last): 2025-12-04T11:45:26.0620333Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0620377Z method(*args, **kwargs) 2025-12-04T11:45:26.0620528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0620570Z method(*args, **kwargs) 2025-12-04T11:45:26.0620723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0620763Z with policy(): 2025-12-04T11:45:26.0620915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0620961Z raise RuntimeError(msg) 2025-12-04T11:45:26.0621354Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1092616192. 2025-12-04T11:45:26.0621356Z 2025-12-04T11:45:26.0621435Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0621699Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0621701Z 2025-12-04T11:45:26.0621791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0621867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0621922Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0621983Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0622051Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0622155Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0622194Z graph_break [] 2025-12-04T11:45:26.0622259Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0622421Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0622468Z Traceback (most recent call last): 2025-12-04T11:45:26.0622622Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0622661Z method(*args, **kwargs) 2025-12-04T11:45:26.0622814Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0622855Z method(*args, **kwargs) 2025-12-04T11:45:26.0623007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0623046Z with policy(): 2025-12-04T11:45:26.0623200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0623245Z raise RuntimeError(msg) 2025-12-04T11:45:26.0623672Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1117782016. 2025-12-04T11:45:26.0623674Z 2025-12-04T11:45:26.0623750Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0624008Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0624026Z 2025-12-04T11:45:26.0624130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0624207Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0624250Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0624308Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0624374Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0624476Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0624513Z graph_break [] 2025-12-04T11:45:26.0624576Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0624652Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0624695Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0624758Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0624858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0624926Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0624962Z graph_break [] 2025-12-04T11:45:26.0625022Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0625079Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0625228Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0625273Z Traceback (most recent call last): 2025-12-04T11:45:26.0625430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0625469Z method(*args, **kwargs) 2025-12-04T11:45:26.0625635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0625676Z method(*args, **kwargs) 2025-12-04T11:45:26.0625829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0625866Z with policy(): 2025-12-04T11:45:26.0626019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0626072Z raise RuntimeError(msg) 2025-12-04T11:45:26.0626468Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0626471Z 2025-12-04T11:45:26.0626549Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0626808Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0626810Z 2025-12-04T11:45:26.0626901Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0626973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0627017Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0627076Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0627143Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0627244Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0627282Z graph_break [] 2025-12-04T11:45:26.0627340Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0627430Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0627471Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0627528Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0627639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0627704Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0627742Z graph_break [] 2025-12-04T11:45:26.0627805Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0627883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0627926Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0627983Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0628083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0628152Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.0628189Z graph_break [] 2025-12-04T11:45:26.0628249Z aten_mm_info [('aten._scaled_mm.default_16_32_16', 1)] 2025-12-04T11:45:26.0628440Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4cf48eb407765d82.xml - 2025-12-04T11:45:26.0628501Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0629095Z FAILED [0.2261s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1117782016 and is now 1142947840. 2025-12-04T11:45:26.0629107Z 2025-12-04T11:45:26.0629185Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0629445Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0629447Z 2025-12-04T11:45:26.0629541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0629603Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0629685Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:26.0629725Z Got exit code 1 2025-12-04T11:45:26.0629937Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0630063Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0630208Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-768e264f873a61de.xml 2025-12-04T11:45:26.0630267Z ============================= test session starts ============================== 2025-12-04T11:45:26.0630382Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0630424Z cachedir: .pytest_cache 2025-12-04T11:45:26.0630583Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0630628Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0630672Z configfile: pytest.ini 2025-12-04T11:45:26.0630835Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0630915Z collecting ... collected 188 items / 115 deselected / 73 selected 2025-12-04T11:45:26.0630985Z stepcurrent: skipping 115 already run items. 2025-12-04T11:45:26.0631031Z Running 73 items in this shard 2025-12-04T11:45:26.0631033Z 2025-12-04T11:45:26.0631274Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8064s] [ 1%] 2025-12-04T11:45:26.0631498Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4446s] [ 1%] 2025-12-04T11:45:26.0631700Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.5322s] [ 1%] 2025-12-04T11:45:26.0631703Z 2025-12-04T11:45:26.0631758Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0631910Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0631956Z Traceback (most recent call last): 2025-12-04T11:45:26.0632113Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0632154Z method(*args, **kwargs) 2025-12-04T11:45:26.0632308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0632349Z method(*args, **kwargs) 2025-12-04T11:45:26.0632502Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0632538Z with policy(): 2025-12-04T11:45:26.0632695Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0632736Z raise RuntimeError(msg) 2025-12-04T11:45:26.0633131Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0633145Z 2025-12-04T11:45:26.0633221Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0633531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0633533Z 2025-12-04T11:45:26.0633622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0633698Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0633741Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0633801Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0634295Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0634398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0634437Z graph_break [] 2025-12-04T11:45:26.0634499Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0634575Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0635060Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0635133Z current_size = base.storage().size() 2025-12-04T11:45:26.0635188Z Autotune Choices Stats: 2025-12-04T11:45:26.0635569Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0635624Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0635669Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0635778Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0636015Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0636246Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0636288Z _scaled_mm 0.0073 ms 82.4% 2025-12-04T11:45:26.0636419Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0878 seconds precompiling for 3 choices 2025-12-04T11:45:26.0636570Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0636616Z Traceback (most recent call last): 2025-12-04T11:45:26.0636771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0638251Z method(*args, **kwargs) 2025-12-04T11:45:26.0638409Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0638452Z method(*args, **kwargs) 2025-12-04T11:45:26.0638606Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0638646Z with policy(): 2025-12-04T11:45:26.0638811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0638857Z raise RuntimeError(msg) 2025-12-04T11:45:26.0639260Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0639265Z 2025-12-04T11:45:26.0639344Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0639613Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0639615Z 2025-12-04T11:45:26.0639708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0639788Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0639831Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0639890Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0640374Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0640501Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0640538Z graph_break [] 2025-12-04T11:45:26.0640598Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0640672Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0641165Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0641215Z current_size = base.storage().size() 2025-12-04T11:45:26.0641259Z Autotune Choices Stats: 2025-12-04T11:45:26.0641635Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0641686Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0641731Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0641833Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0642075Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0642306Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0642363Z _scaled_mm 0.0073 ms 82.4% 2025-12-04T11:45:26.0642492Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0878 seconds precompiling for 3 choices 2025-12-04T11:45:26.0642575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0642616Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0642690Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0642790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0643302Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0643342Z graph_break [] 2025-12-04T11:45:26.0643405Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0643479Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0643519Z Autotune Choices Stats: 2025-12-04T11:45:26.0643889Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0643939Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0643985Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0644082Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0644345Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0644577Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0644623Z _scaled_mm 0.0070 ms 88.0% 2025-12-04T11:45:26.0644752Z SingleProcess AUTOTUNE benchmarking takes 0.0154 seconds and 0.0782 seconds precompiling for 3 choices 2025-12-04T11:45:26.0644810Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0644962Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0645011Z Traceback (most recent call last): 2025-12-04T11:45:26.0645168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0645209Z method(*args, **kwargs) 2025-12-04T11:45:26.0645363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0645402Z method(*args, **kwargs) 2025-12-04T11:45:26.0645555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0645591Z with policy(): 2025-12-04T11:45:26.0645745Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0645785Z raise RuntimeError(msg) 2025-12-04T11:45:26.0646190Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0646208Z 2025-12-04T11:45:26.0646288Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0646557Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0646573Z 2025-12-04T11:45:26.0646661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0646741Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0646782Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0646845Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0647331Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0647431Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0647469Z graph_break [] 2025-12-04T11:45:26.0647532Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0647606Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0648098Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0648161Z current_size = base.storage().size() 2025-12-04T11:45:26.0648201Z Autotune Choices Stats: 2025-12-04T11:45:26.0648586Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0648640Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0648684Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0648783Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0649020Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0649251Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0649294Z _scaled_mm 0.0073 ms 82.4% 2025-12-04T11:45:26.0649423Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0878 seconds precompiling for 3 choices 2025-12-04T11:45:26.0649499Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0649541Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0649597Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0649698Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0650184Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0650233Z graph_break [] 2025-12-04T11:45:26.0650297Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0650389Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0650430Z Autotune Choices Stats: 2025-12-04T11:45:26.0650798Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0650851Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0650896Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0650993Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0651228Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0651459Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0651501Z _scaled_mm 0.0070 ms 88.0% 2025-12-04T11:45:26.0651629Z SingleProcess AUTOTUNE benchmarking takes 0.0154 seconds and 0.0782 seconds precompiling for 3 choices 2025-12-04T11:45:26.0651717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0651759Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0651816Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0651925Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0652412Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0652453Z graph_break [] 2025-12-04T11:45:26.0652513Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0652588Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0652629Z Autotune Choices Stats: 2025-12-04T11:45:26.0652991Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0653042Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0653086Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0653183Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0653442Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0653671Z triton_mm_5 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0653731Z _scaled_mm 0.0234 ms 25.7% 2025-12-04T11:45:26.0653860Z SingleProcess AUTOTUNE benchmarking takes 0.0194 seconds and 0.1770 seconds precompiling for 3 choices 2025-12-04T11:45:26.0654051Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-768e264f873a61de.xml - 2025-12-04T11:45:26.0654130Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0654747Z FAILED [0.5322s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0654751Z 2025-12-04T11:45:26.0654827Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0655095Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0655099Z 2025-12-04T11:45:26.0655187Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0655255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0655327Z ================== 1 failed, 115 deselected, 2 rerun in 2.80s ================== 2025-12-04T11:45:26.0655365Z Got exit code 1 2025-12-04T11:45:26.0655404Z Retrying single test... 2025-12-04T11:45:26.0655550Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c36b211ec0b530a.xml 2025-12-04T11:45:26.0655622Z ============================= test session starts ============================== 2025-12-04T11:45:26.0655751Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0655792Z cachedir: .pytest_cache 2025-12-04T11:45:26.0655952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0655997Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0656042Z configfile: pytest.ini 2025-12-04T11:45:26.0656207Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0656287Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0656554Z stepcurrent: skipping 115 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0656599Z Running 1 items in this shard 2025-12-04T11:45:26.0656601Z 2025-12-04T11:45:26.0656828Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9082s] [100%] 2025-12-04T11:45:26.0657053Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5581s] [100%] 2025-12-04T11:45:26.0657252Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.6435s] [100%] 2025-12-04T11:45:26.0657254Z 2025-12-04T11:45:26.0657309Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0657461Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0657518Z Traceback (most recent call last): 2025-12-04T11:45:26.0657679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0657721Z method(*args, **kwargs) 2025-12-04T11:45:26.0657875Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0657915Z method(*args, **kwargs) 2025-12-04T11:45:26.0658078Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0658117Z with policy(): 2025-12-04T11:45:26.0658272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0658312Z raise RuntimeError(msg) 2025-12-04T11:45:26.0658716Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0658719Z 2025-12-04T11:45:26.0658795Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0659064Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0659066Z 2025-12-04T11:45:26.0659156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0659229Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0659271Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0659342Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0659836Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0659936Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0659975Z graph_break [] 2025-12-04T11:45:26.0660037Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0660113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0660599Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0660651Z current_size = base.storage().size() 2025-12-04T11:45:26.0660691Z Autotune Choices Stats: 2025-12-04T11:45:26.0661064Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005878999829292297, "best_triton_pos": 0} 2025-12-04T11:45:26.0661116Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0661160Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0661266Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0661517Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0661751Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0661793Z _scaled_mm 0.0066 ms 89.1% 2025-12-04T11:45:26.0661932Z SingleProcess AUTOTUNE benchmarking takes 0.0181 seconds and 0.0865 seconds precompiling for 3 choices 2025-12-04T11:45:26.0662083Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0662130Z Traceback (most recent call last): 2025-12-04T11:45:26.0662285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0662328Z method(*args, **kwargs) 2025-12-04T11:45:26.0662479Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0662521Z method(*args, **kwargs) 2025-12-04T11:45:26.0662672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0662709Z with policy(): 2025-12-04T11:45:26.0662862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0662904Z raise RuntimeError(msg) 2025-12-04T11:45:26.0663340Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0663362Z 2025-12-04T11:45:26.0663442Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0663725Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0663728Z 2025-12-04T11:45:26.0663815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0663894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0663936Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0663993Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0664476Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0664580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0664616Z graph_break [] 2025-12-04T11:45:26.0664677Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0664750Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0665241Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0665308Z current_size = base.storage().size() 2025-12-04T11:45:26.0665349Z Autotune Choices Stats: 2025-12-04T11:45:26.0665717Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005878999829292297, "best_triton_pos": 0} 2025-12-04T11:45:26.0665768Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0665824Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0665924Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0666158Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0666388Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0666432Z _scaled_mm 0.0066 ms 89.1% 2025-12-04T11:45:26.0666562Z SingleProcess AUTOTUNE benchmarking takes 0.0181 seconds and 0.0865 seconds precompiling for 3 choices 2025-12-04T11:45:26.0666639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0666681Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0666739Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0666840Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0667324Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0667371Z graph_break [] 2025-12-04T11:45:26.0667443Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0667516Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0667558Z Autotune Choices Stats: 2025-12-04T11:45:26.0668020Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0668072Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0668114Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0668214Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0668449Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0668678Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0668721Z _scaled_mm 0.0202 ms 30.3% 2025-12-04T11:45:26.0668848Z SingleProcess AUTOTUNE benchmarking takes 0.0165 seconds and 0.0794 seconds precompiling for 3 choices 2025-12-04T11:45:26.0668901Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0669052Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0669116Z Traceback (most recent call last): 2025-12-04T11:45:26.0669275Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0669317Z method(*args, **kwargs) 2025-12-04T11:45:26.0669473Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0669513Z method(*args, **kwargs) 2025-12-04T11:45:26.0669676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0669714Z with policy(): 2025-12-04T11:45:26.0669867Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0669910Z raise RuntimeError(msg) 2025-12-04T11:45:26.0670309Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0670315Z 2025-12-04T11:45:26.0670387Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0670655Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0670657Z 2025-12-04T11:45:26.0670744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0670818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0670859Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0670916Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0671423Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0671523Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0671561Z graph_break [] 2025-12-04T11:45:26.0671624Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0671697Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0672187Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0672235Z current_size = base.storage().size() 2025-12-04T11:45:26.0672276Z Autotune Choices Stats: 2025-12-04T11:45:26.0672646Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005878999829292297, "best_triton_pos": 0} 2025-12-04T11:45:26.0672695Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0672739Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0672836Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0673071Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0673342Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0673384Z _scaled_mm 0.0066 ms 89.1% 2025-12-04T11:45:26.0673533Z SingleProcess AUTOTUNE benchmarking takes 0.0181 seconds and 0.0865 seconds precompiling for 3 choices 2025-12-04T11:45:26.0673608Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0673649Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0673707Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0673805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0674298Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0674337Z graph_break [] 2025-12-04T11:45:26.0674398Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0674471Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0674513Z Autotune Choices Stats: 2025-12-04T11:45:26.0674874Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0674942Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0674986Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0675097Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0675333Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0675564Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0675606Z _scaled_mm 0.0202 ms 30.3% 2025-12-04T11:45:26.0675733Z SingleProcess AUTOTUNE benchmarking takes 0.0165 seconds and 0.0794 seconds precompiling for 3 choices 2025-12-04T11:45:26.0675808Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0675849Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0675905Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0676005Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0676491Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0676533Z graph_break [] 2025-12-04T11:45:26.0676592Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0676666Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0676722Z Autotune Choices Stats: 2025-12-04T11:45:26.0677084Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.0677131Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0677184Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0677283Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0677517Z triton_mm_4 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0677747Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0677791Z _scaled_mm 0.0225 ms 26.3% 2025-12-04T11:45:26.0677918Z SingleProcess AUTOTUNE benchmarking takes 0.0200 seconds and 0.1809 seconds precompiling for 3 choices 2025-12-04T11:45:26.0678110Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8c36b211ec0b530a.xml - 2025-12-04T11:45:26.0678170Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0678785Z FAILED [0.6435s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0678799Z 2025-12-04T11:45:26.0678889Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0679156Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0679158Z 2025-12-04T11:45:26.0679247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0679309Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0679376Z ================== 1 failed, 187 deselected, 2 rerun in 3.13s ================== 2025-12-04T11:45:26.0679413Z Got exit code 1 2025-12-04T11:45:26.0679455Z Retrying single test... 2025-12-04T11:45:26.0679604Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fa79ab9b69e4826.xml 2025-12-04T11:45:26.0679661Z ============================= test session starts ============================== 2025-12-04T11:45:26.0679773Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0679814Z cachedir: .pytest_cache 2025-12-04T11:45:26.0679973Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0680020Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0680062Z configfile: pytest.ini 2025-12-04T11:45:26.0680224Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0680299Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0680564Z stepcurrent: skipping 115 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0680625Z Running 1 items in this shard 2025-12-04T11:45:26.0680628Z 2025-12-04T11:45:26.0680854Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8155s] [100%] 2025-12-04T11:45:26.0681093Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4479s] [100%] 2025-12-04T11:45:26.0681294Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.5306s] [100%] 2025-12-04T11:45:26.0681297Z 2025-12-04T11:45:26.0681348Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0681498Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0681547Z Traceback (most recent call last): 2025-12-04T11:45:26.0681705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0681749Z method(*args, **kwargs) 2025-12-04T11:45:26.0681902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0681943Z method(*args, **kwargs) 2025-12-04T11:45:26.0682094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0682133Z with policy(): 2025-12-04T11:45:26.0682286Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0682343Z raise RuntimeError(msg) 2025-12-04T11:45:26.0682753Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0682756Z 2025-12-04T11:45:26.0682828Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0683095Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0683097Z 2025-12-04T11:45:26.0683184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0683294Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0683339Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0683396Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0683885Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0683987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0684028Z graph_break [] 2025-12-04T11:45:26.0684091Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0684163Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0684654Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0684718Z current_size = base.storage().size() 2025-12-04T11:45:26.0684759Z Autotune Choices Stats: 2025-12-04T11:45:26.0685147Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:26.0685196Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0685240Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0685338Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0685577Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0685807Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0685853Z _scaled_mm 0.0218 ms 27.3% 2025-12-04T11:45:26.0685981Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0903 seconds precompiling for 3 choices 2025-12-04T11:45:26.0686133Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0686179Z Traceback (most recent call last): 2025-12-04T11:45:26.0686336Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0686391Z method(*args, **kwargs) 2025-12-04T11:45:26.0686557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0686601Z method(*args, **kwargs) 2025-12-04T11:45:26.0686752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0686792Z with policy(): 2025-12-04T11:45:26.0686949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0686992Z raise RuntimeError(msg) 2025-12-04T11:45:26.0687392Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0687396Z 2025-12-04T11:45:26.0687470Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0687736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0687738Z 2025-12-04T11:45:26.0687826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0687899Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0687942Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0687998Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0688483Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0688597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0688636Z graph_break [] 2025-12-04T11:45:26.0688696Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0688781Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0689269Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0689318Z current_size = base.storage().size() 2025-12-04T11:45:26.0689359Z Autotune Choices Stats: 2025-12-04T11:45:26.0689731Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:26.0689781Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0689826Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0689926Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0690160Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0690403Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0690456Z _scaled_mm 0.0218 ms 27.3% 2025-12-04T11:45:26.0690586Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0903 seconds precompiling for 3 choices 2025-12-04T11:45:26.0690659Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0690704Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0690760Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0690860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0691344Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0691385Z graph_break [] 2025-12-04T11:45:26.0691446Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0691518Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0691560Z Autotune Choices Stats: 2025-12-04T11:45:26.0691924Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.0691973Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0692028Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0692125Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0692359Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0692599Z triton_mm_2 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0692641Z _scaled_mm 0.0212 ms 28.1% 2025-12-04T11:45:26.0692770Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0828 seconds precompiling for 3 choices 2025-12-04T11:45:26.0692822Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0692972Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0693025Z Traceback (most recent call last): 2025-12-04T11:45:26.0693183Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0693226Z method(*args, **kwargs) 2025-12-04T11:45:26.0693407Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0693450Z method(*args, **kwargs) 2025-12-04T11:45:26.0693602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0693645Z with policy(): 2025-12-04T11:45:26.0693796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0693839Z raise RuntimeError(msg) 2025-12-04T11:45:26.0694280Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0694282Z 2025-12-04T11:45:26.0694357Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0694624Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0694626Z 2025-12-04T11:45:26.0694713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0694785Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0694829Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0694887Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0695376Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0695476Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0695515Z graph_break [] 2025-12-04T11:45:26.0695576Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0695650Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0696143Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0696205Z current_size = base.storage().size() 2025-12-04T11:45:26.0696247Z Autotune Choices Stats: 2025-12-04T11:45:26.0696626Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:26.0696676Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0696719Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0696818Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0697052Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0697283Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0697328Z _scaled_mm 0.0218 ms 27.3% 2025-12-04T11:45:26.0697456Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0903 seconds precompiling for 3 choices 2025-12-04T11:45:26.0697529Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0697571Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0697627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0697726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0698239Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0698279Z graph_break [] 2025-12-04T11:45:26.0698340Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0698413Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0698456Z Autotune Choices Stats: 2025-12-04T11:45:26.0698817Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.0698868Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0698911Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0699010Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0699242Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0699472Z triton_mm_2 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0699514Z _scaled_mm 0.0212 ms 28.1% 2025-12-04T11:45:26.0699642Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0828 seconds precompiling for 3 choices 2025-12-04T11:45:26.0699727Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0699768Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0699825Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0699924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0700420Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0700459Z graph_break [] 2025-12-04T11:45:26.0700521Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0700596Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0700639Z Autotune Choices Stats: 2025-12-04T11:45:26.0701004Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0701054Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0701097Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0701195Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0701429Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0701671Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0701725Z _scaled_mm 0.0232 ms 26.2% 2025-12-04T11:45:26.0701853Z SingleProcess AUTOTUNE benchmarking takes 0.0202 seconds and 0.1808 seconds precompiling for 3 choices 2025-12-04T11:45:26.0702047Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0fa79ab9b69e4826.xml - 2025-12-04T11:45:26.0702107Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0702722Z FAILED [0.5306s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0702727Z 2025-12-04T11:45:26.0702800Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0703071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0703073Z 2025-12-04T11:45:26.0703161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0703224Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0703648Z ================== 1 failed, 187 deselected, 2 rerun in 2.81s ================== 2025-12-04T11:45:26.0703686Z Got exit code 1 2025-12-04T11:45:26.0703924Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0704054Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0704201Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a6c7d3c69ccfa88.xml 2025-12-04T11:45:26.0704258Z ============================= test session starts ============================== 2025-12-04T11:45:26.0704381Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0704424Z cachedir: .pytest_cache 2025-12-04T11:45:26.0704585Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0704631Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0704674Z configfile: pytest.ini 2025-12-04T11:45:26.0704838Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0704915Z collecting ... collected 188 items / 116 deselected / 72 selected 2025-12-04T11:45:26.0704970Z stepcurrent: skipping 116 already run items. 2025-12-04T11:45:26.0705016Z Running 72 items in this shard 2025-12-04T11:45:26.0705018Z 2025-12-04T11:45:26.0705247Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9277s] [ 1%] 2025-12-04T11:45:26.0705476Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5511s] [ 1%] 2025-12-04T11:45:26.0705675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.6620s] [ 1%] 2025-12-04T11:45:26.0705693Z 2025-12-04T11:45:26.0705744Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0705909Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0705958Z Traceback (most recent call last): 2025-12-04T11:45:26.0706116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0706158Z method(*args, **kwargs) 2025-12-04T11:45:26.0706311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0706353Z method(*args, **kwargs) 2025-12-04T11:45:26.0706506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0706544Z with policy(): 2025-12-04T11:45:26.0706701Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0706743Z raise RuntimeError(msg) 2025-12-04T11:45:26.0707142Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0707144Z 2025-12-04T11:45:26.0707219Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0707486Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0707488Z 2025-12-04T11:45:26.0707574Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0707661Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0707706Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0707764Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0708271Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0708370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0708410Z graph_break [] 2025-12-04T11:45:26.0708469Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0708544Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0709035Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0709083Z current_size = base.storage().size() 2025-12-04T11:45:26.0709122Z Autotune Choices Stats: 2025-12-04T11:45:26.0709490Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0709539Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0709595Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0709694Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0709941Z triton_mm_1 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0710172Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0710215Z _scaled_mm 0.0224 ms 26.3% 2025-12-04T11:45:26.0710346Z SingleProcess AUTOTUNE benchmarking takes 0.0173 seconds and 0.0871 seconds precompiling for 3 choices 2025-12-04T11:45:26.0710496Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0710549Z Traceback (most recent call last): 2025-12-04T11:45:26.0710704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0710749Z method(*args, **kwargs) 2025-12-04T11:45:26.0710903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0710946Z method(*args, **kwargs) 2025-12-04T11:45:26.0711099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0711136Z with policy(): 2025-12-04T11:45:26.0711289Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0711335Z raise RuntimeError(msg) 2025-12-04T11:45:26.0711732Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0711747Z 2025-12-04T11:45:26.0711823Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0712098Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0712101Z 2025-12-04T11:45:26.0712188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0712262Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0712309Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0712366Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0712859Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0712959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0712997Z graph_break [] 2025-12-04T11:45:26.0713057Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0713130Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0713649Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0713718Z current_size = base.storage().size() 2025-12-04T11:45:26.0713773Z Autotune Choices Stats: 2025-12-04T11:45:26.0714143Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0714193Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0714237Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0714334Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0714566Z triton_mm_1 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0714795Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0714837Z _scaled_mm 0.0224 ms 26.3% 2025-12-04T11:45:26.0714965Z SingleProcess AUTOTUNE benchmarking takes 0.0173 seconds and 0.0871 seconds precompiling for 3 choices 2025-12-04T11:45:26.0715040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0715082Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0715139Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0715239Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0715739Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0715776Z graph_break [] 2025-12-04T11:45:26.0715837Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0715924Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0715968Z Autotune Choices Stats: 2025-12-04T11:45:26.0716330Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0716382Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0716425Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0716523Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0716752Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0716977Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0717021Z _scaled_mm 0.0082 ms 74.5% 2025-12-04T11:45:26.0717148Z SingleProcess AUTOTUNE benchmarking takes 0.0162 seconds and 0.0790 seconds precompiling for 3 choices 2025-12-04T11:45:26.0717214Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0717366Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0717423Z Traceback (most recent call last): 2025-12-04T11:45:26.0717580Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0717621Z method(*args, **kwargs) 2025-12-04T11:45:26.0717775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0717820Z method(*args, **kwargs) 2025-12-04T11:45:26.0717971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0718008Z with policy(): 2025-12-04T11:45:26.0718161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0718205Z raise RuntimeError(msg) 2025-12-04T11:45:26.0718606Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0718610Z 2025-12-04T11:45:26.0718684Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0718949Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0718951Z 2025-12-04T11:45:26.0719038Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0719127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0719172Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0719229Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0719726Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0719826Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0719865Z graph_break [] 2025-12-04T11:45:26.0719927Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0720000Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0720490Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0720538Z current_size = base.storage().size() 2025-12-04T11:45:26.0720577Z Autotune Choices Stats: 2025-12-04T11:45:26.0720943Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0720993Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0721062Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0721159Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0721402Z triton_mm_1 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0721630Z triton_mm_0 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0721675Z _scaled_mm 0.0224 ms 26.3% 2025-12-04T11:45:26.0721802Z SingleProcess AUTOTUNE benchmarking takes 0.0173 seconds and 0.0871 seconds precompiling for 3 choices 2025-12-04T11:45:26.0721876Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0721918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0721977Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0722077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0722570Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0722609Z graph_break [] 2025-12-04T11:45:26.0722670Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0722745Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0722786Z Autotune Choices Stats: 2025-12-04T11:45:26.0723150Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0723221Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0723294Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0723392Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0723647Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0723874Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0723919Z _scaled_mm 0.0082 ms 74.5% 2025-12-04T11:45:26.0724049Z SingleProcess AUTOTUNE benchmarking takes 0.0162 seconds and 0.0790 seconds precompiling for 3 choices 2025-12-04T11:45:26.0724125Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0724169Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0724228Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0724327Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0724813Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0724874Z graph_break [] 2025-12-04T11:45:26.0724933Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0725007Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0725065Z Autotune Choices Stats: 2025-12-04T11:45:26.0725428Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005878999829292297, "best_triton_pos": 0} 2025-12-04T11:45:26.0725478Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0725521Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0725619Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0725854Z triton_mm_4 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0726081Z triton_mm_5 0.0060 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0726124Z _scaled_mm 0.0238 ms 24.7% 2025-12-04T11:45:26.0726253Z SingleProcess AUTOTUNE benchmarking takes 0.0233 seconds and 0.1831 seconds precompiling for 3 choices 2025-12-04T11:45:26.0726446Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4a6c7d3c69ccfa88.xml - 2025-12-04T11:45:26.0726506Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0727119Z FAILED [0.6620s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0727145Z 2025-12-04T11:45:26.0727220Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0727499Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0727501Z 2025-12-04T11:45:26.0727589Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0727651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0727723Z ================== 1 failed, 116 deselected, 2 rerun in 3.16s ================== 2025-12-04T11:45:26.0727763Z Got exit code 1 2025-12-04T11:45:26.0727804Z Retrying single test... 2025-12-04T11:45:26.0727949Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-915a16894278e059.xml 2025-12-04T11:45:26.0728008Z ============================= test session starts ============================== 2025-12-04T11:45:26.0728118Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0728162Z cachedir: .pytest_cache 2025-12-04T11:45:26.0728322Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0728372Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0728413Z configfile: pytest.ini 2025-12-04T11:45:26.0728579Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0728668Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0728949Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0728993Z Running 1 items in this shard 2025-12-04T11:45:26.0728995Z 2025-12-04T11:45:26.0729221Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8274s] [100%] 2025-12-04T11:45:26.0729446Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4489s] [100%] 2025-12-04T11:45:26.0729644Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.5516s] [100%] 2025-12-04T11:45:26.0729648Z 2025-12-04T11:45:26.0729702Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0729854Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0729904Z Traceback (most recent call last): 2025-12-04T11:45:26.0730064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0730112Z method(*args, **kwargs) 2025-12-04T11:45:26.0730266Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0730307Z method(*args, **kwargs) 2025-12-04T11:45:26.0730459Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0730517Z with policy(): 2025-12-04T11:45:26.0730670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0730713Z raise RuntimeError(msg) 2025-12-04T11:45:26.0731112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0731127Z 2025-12-04T11:45:26.0731201Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0731467Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0731469Z 2025-12-04T11:45:26.0731557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0731632Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0731678Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0731736Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0732222Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0732323Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0732360Z graph_break [] 2025-12-04T11:45:26.0732423Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0732507Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0733009Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0733057Z current_size = base.storage().size() 2025-12-04T11:45:26.0733098Z Autotune Choices Stats: 2025-12-04T11:45:26.0733493Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0733544Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0733591Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0733691Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0733925Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0734152Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0734193Z _scaled_mm 0.0228 ms 26.5% 2025-12-04T11:45:26.0734320Z SingleProcess AUTOTUNE benchmarking takes 0.0170 seconds and 0.0911 seconds precompiling for 3 choices 2025-12-04T11:45:26.0734470Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0734534Z Traceback (most recent call last): 2025-12-04T11:45:26.0734692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0734734Z method(*args, **kwargs) 2025-12-04T11:45:26.0734889Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0734930Z method(*args, **kwargs) 2025-12-04T11:45:26.0735099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0735138Z with policy(): 2025-12-04T11:45:26.0735291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0735337Z raise RuntimeError(msg) 2025-12-04T11:45:26.0735737Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0735741Z 2025-12-04T11:45:26.0735815Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0736081Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0736083Z 2025-12-04T11:45:26.0736173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0736246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0736294Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0736351Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0736871Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0736972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0737012Z graph_break [] 2025-12-04T11:45:26.0737073Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0737146Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0737634Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0737684Z current_size = base.storage().size() 2025-12-04T11:45:26.0737729Z Autotune Choices Stats: 2025-12-04T11:45:26.0738093Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0738144Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0738187Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0738288Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0738518Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0738764Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0738808Z _scaled_mm 0.0228 ms 26.5% 2025-12-04T11:45:26.0738949Z SingleProcess AUTOTUNE benchmarking takes 0.0170 seconds and 0.0911 seconds precompiling for 3 choices 2025-12-04T11:45:26.0739024Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0739069Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0739125Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0739226Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0739716Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0739753Z graph_break [] 2025-12-04T11:45:26.0739816Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0739891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0739932Z Autotune Choices Stats: 2025-12-04T11:45:26.0740293Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0740356Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0740399Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0740510Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0740741Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0740970Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0741015Z _scaled_mm 0.0209 ms 29.3% 2025-12-04T11:45:26.0741142Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0778 seconds precompiling for 3 choices 2025-12-04T11:45:26.0741199Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0741351Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0741401Z Traceback (most recent call last): 2025-12-04T11:45:26.0741558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0741602Z method(*args, **kwargs) 2025-12-04T11:45:26.0741760Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0741800Z method(*args, **kwargs) 2025-12-04T11:45:26.0741953Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0741994Z with policy(): 2025-12-04T11:45:26.0742147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0742201Z raise RuntimeError(msg) 2025-12-04T11:45:26.0742600Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0742603Z 2025-12-04T11:45:26.0742689Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0742953Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0742955Z 2025-12-04T11:45:26.0743043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0743116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0743160Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0743217Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0743734Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0743835Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0743871Z graph_break [] 2025-12-04T11:45:26.0743934Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0744006Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0744530Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0744578Z current_size = base.storage().size() 2025-12-04T11:45:26.0744620Z Autotune Choices Stats: 2025-12-04T11:45:26.0744985Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0745036Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0745081Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0745179Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0745411Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0745637Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0745679Z _scaled_mm 0.0228 ms 26.5% 2025-12-04T11:45:26.0745807Z SingleProcess AUTOTUNE benchmarking takes 0.0170 seconds and 0.0911 seconds precompiling for 3 choices 2025-12-04T11:45:26.0745881Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0745939Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0745996Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0746095Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0746599Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0746638Z graph_break [] 2025-12-04T11:45:26.0746700Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0746772Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0746813Z Autotune Choices Stats: 2025-12-04T11:45:26.0747174Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0747226Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0747268Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0747367Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0747597Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0747822Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0747879Z _scaled_mm 0.0209 ms 29.3% 2025-12-04T11:45:26.0748006Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0778 seconds precompiling for 3 choices 2025-12-04T11:45:26.0748090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0748132Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0748189Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0748288Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0748772Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0748812Z graph_break [] 2025-12-04T11:45:26.0748872Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0748944Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0748989Z Autotune Choices Stats: 2025-12-04T11:45:26.0749351Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0749400Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0749442Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0749540Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0749770Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0750008Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0750050Z _scaled_mm 0.0203 ms 30.4% 2025-12-04T11:45:26.0750189Z SingleProcess AUTOTUNE benchmarking takes 0.0203 seconds and 0.1839 seconds precompiling for 3 choices 2025-12-04T11:45:26.0750380Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-915a16894278e059.xml - 2025-12-04T11:45:26.0750439Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0751047Z FAILED [0.5516s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0751051Z 2025-12-04T11:45:26.0751124Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0751395Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0751397Z 2025-12-04T11:45:26.0751485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0751546Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0751632Z ================== 1 failed, 187 deselected, 2 rerun in 2.85s ================== 2025-12-04T11:45:26.0751668Z Got exit code 1 2025-12-04T11:45:26.0751710Z Retrying single test... 2025-12-04T11:45:26.0751867Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3194eacf1450a1a4.xml 2025-12-04T11:45:26.0751924Z ============================= test session starts ============================== 2025-12-04T11:45:26.0752034Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0752076Z cachedir: .pytest_cache 2025-12-04T11:45:26.0752236Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0752282Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0752325Z configfile: pytest.ini 2025-12-04T11:45:26.0752487Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0752564Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0752827Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0752870Z Running 1 items in this shard 2025-12-04T11:45:26.0752872Z 2025-12-04T11:45:26.0753100Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8012s] [100%] 2025-12-04T11:45:26.0753362Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4603s] [100%] 2025-12-04T11:45:26.0753560Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.5390s] [100%] 2025-12-04T11:45:26.0753579Z 2025-12-04T11:45:26.0753630Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0753780Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0753830Z Traceback (most recent call last): 2025-12-04T11:45:26.0753987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0754044Z method(*args, **kwargs) 2025-12-04T11:45:26.0754200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0754242Z method(*args, **kwargs) 2025-12-04T11:45:26.0754394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0754435Z with policy(): 2025-12-04T11:45:26.0754588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0754629Z raise RuntimeError(msg) 2025-12-04T11:45:26.0755032Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0755034Z 2025-12-04T11:45:26.0755109Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0755375Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0755392Z 2025-12-04T11:45:26.0755481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0755553Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0755600Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0755670Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0756159Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0756262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0756299Z graph_break [] 2025-12-04T11:45:26.0756360Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0756434Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0756924Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0756971Z current_size = base.storage().size() 2025-12-04T11:45:26.0757017Z Autotune Choices Stats: 2025-12-04T11:45:26.0757391Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0757454Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0757497Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0757598Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0757835Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0758073Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0758118Z _scaled_mm 0.0220 ms 27.3% 2025-12-04T11:45:26.0758247Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0879 seconds precompiling for 3 choices 2025-12-04T11:45:26.0758400Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0758449Z Traceback (most recent call last): 2025-12-04T11:45:26.0758607Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0758649Z method(*args, **kwargs) 2025-12-04T11:45:26.0758803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0758843Z method(*args, **kwargs) 2025-12-04T11:45:26.0758997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0759035Z with policy(): 2025-12-04T11:45:26.0759191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0759233Z raise RuntimeError(msg) 2025-12-04T11:45:26.0759660Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0759662Z 2025-12-04T11:45:26.0759736Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0760002Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0760004Z 2025-12-04T11:45:26.0760092Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0760166Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0760212Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0760272Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0760761Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0760861Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0760900Z graph_break [] 2025-12-04T11:45:26.0760961Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0761035Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0761533Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0761593Z current_size = base.storage().size() 2025-12-04T11:45:26.0761636Z Autotune Choices Stats: 2025-12-04T11:45:26.0762014Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0762065Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0762109Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0762209Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0762445Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0762675Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0762716Z _scaled_mm 0.0220 ms 27.3% 2025-12-04T11:45:26.0762845Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0879 seconds precompiling for 3 choices 2025-12-04T11:45:26.0762918Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0762965Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0763021Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0763122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0763675Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0763718Z graph_break [] 2025-12-04T11:45:26.0763778Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0763852Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0763894Z Autotune Choices Stats: 2025-12-04T11:45:26.0764253Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:26.0764306Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0764348Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0764447Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0764680Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0764907Z triton_mm_3 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0764953Z _scaled_mm 0.0215 ms 29.0% 2025-12-04T11:45:26.0765082Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0839 seconds precompiling for 3 choices 2025-12-04T11:45:26.0765148Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0765299Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0765345Z Traceback (most recent call last): 2025-12-04T11:45:26.0765505Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0765548Z method(*args, **kwargs) 2025-12-04T11:45:26.0765726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0765770Z method(*args, **kwargs) 2025-12-04T11:45:26.0765924Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0765965Z with policy(): 2025-12-04T11:45:26.0766123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0766165Z raise RuntimeError(msg) 2025-12-04T11:45:26.0766566Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0766569Z 2025-12-04T11:45:26.0766645Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0766911Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0766913Z 2025-12-04T11:45:26.0767002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0767090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0767136Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0767192Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0767688Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0767787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0767825Z graph_break [] 2025-12-04T11:45:26.0767885Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0767959Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0768452Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0768500Z current_size = base.storage().size() 2025-12-04T11:45:26.0768542Z Autotune Choices Stats: 2025-12-04T11:45:26.0768907Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0768956Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0769012Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0769114Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0769347Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0769584Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0769627Z _scaled_mm 0.0220 ms 27.3% 2025-12-04T11:45:26.0769755Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0879 seconds precompiling for 3 choices 2025-12-04T11:45:26.0769829Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0769880Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0769935Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0770035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0770528Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0770569Z graph_break [] 2025-12-04T11:45:26.0770633Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0770705Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0770749Z Autotune Choices Stats: 2025-12-04T11:45:26.0771117Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:26.0771179Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0771221Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0771321Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0771552Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0771780Z triton_mm_3 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0771825Z _scaled_mm 0.0215 ms 29.0% 2025-12-04T11:45:26.0771952Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0839 seconds precompiling for 3 choices 2025-12-04T11:45:26.0772025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0772069Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0772125Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0772226Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0772705Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0772755Z graph_break [] 2025-12-04T11:45:26.0772816Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0772891Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0772933Z Autotune Choices Stats: 2025-12-04T11:45:26.0773329Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0773382Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32) 2025-12-04T11:45:26.0773426Z strides: [32, 1], [1, 32], [1, 1], [1, 1] 2025-12-04T11:45:26.0773524Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0773760Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0773990Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0774033Z _scaled_mm 0.0235 ms 26.0% 2025-12-04T11:45:26.0774162Z SingleProcess AUTOTUNE benchmarking takes 0.0209 seconds and 0.1826 seconds precompiling for 3 choices 2025-12-04T11:45:26.0774354Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3194eacf1450a1a4.xml - 2025-12-04T11:45:26.0774416Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0775038Z FAILED [0.5390s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0775054Z 2025-12-04T11:45:26.0775127Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0775397Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0775399Z 2025-12-04T11:45:26.0775486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0775551Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0775620Z ================== 1 failed, 187 deselected, 2 rerun in 2.82s ================== 2025-12-04T11:45:26.0775662Z Got exit code 1 2025-12-04T11:45:26.0775876Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0776005Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0776153Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f915867615cae279.xml 2025-12-04T11:45:26.0776211Z ============================= test session starts ============================== 2025-12-04T11:45:26.0776321Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0776362Z cachedir: .pytest_cache 2025-12-04T11:45:26.0776523Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0776585Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0776629Z configfile: pytest.ini 2025-12-04T11:45:26.0776793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0776871Z collecting ... collected 188 items / 117 deselected / 71 selected 2025-12-04T11:45:26.0776926Z stepcurrent: skipping 117 already run items. 2025-12-04T11:45:26.0776973Z Running 71 items in this shard 2025-12-04T11:45:26.0776986Z 2025-12-04T11:45:26.0777213Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9205s] [ 1%] 2025-12-04T11:45:26.0777439Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5568s] [ 1%] 2025-12-04T11:45:26.0777637Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.6636s] [ 1%] 2025-12-04T11:45:26.0777639Z 2025-12-04T11:45:26.0777695Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0777845Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0777896Z Traceback (most recent call last): 2025-12-04T11:45:26.0778055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0778097Z method(*args, **kwargs) 2025-12-04T11:45:26.0778252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0778296Z method(*args, **kwargs) 2025-12-04T11:45:26.0778461Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0778500Z with policy(): 2025-12-04T11:45:26.0778665Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0778709Z raise RuntimeError(msg) 2025-12-04T11:45:26.0779111Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0779113Z 2025-12-04T11:45:26.0779186Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0779453Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0779457Z 2025-12-04T11:45:26.0779545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0779622Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0779666Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0779726Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0780211Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0780310Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0780365Z graph_break [] 2025-12-04T11:45:26.0780424Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0780498Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0780999Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0781049Z current_size = base.storage().size() 2025-12-04T11:45:26.0781092Z Autotune Choices Stats: 2025-12-04T11:45:26.0781467Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0781525Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0781577Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0781700Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0781939Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0782173Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0782215Z _scaled_mm 0.0239 ms 24.6% 2025-12-04T11:45:26.0782356Z SingleProcess AUTOTUNE benchmarking takes 0.0174 seconds and 0.0862 seconds precompiling for 3 choices 2025-12-04T11:45:26.0782508Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0782567Z Traceback (most recent call last): 2025-12-04T11:45:26.0782725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0782766Z method(*args, **kwargs) 2025-12-04T11:45:26.0782921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0782963Z method(*args, **kwargs) 2025-12-04T11:45:26.0783116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0783155Z with policy(): 2025-12-04T11:45:26.0783345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0783394Z raise RuntimeError(msg) 2025-12-04T11:45:26.0783795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0783797Z 2025-12-04T11:45:26.0783874Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0784138Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0784140Z 2025-12-04T11:45:26.0784227Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0784319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0784364Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0784420Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0784922Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0785023Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0785059Z graph_break [] 2025-12-04T11:45:26.0785123Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0785197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0785692Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0785740Z current_size = base.storage().size() 2025-12-04T11:45:26.0785781Z Autotune Choices Stats: 2025-12-04T11:45:26.0786151Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0786207Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0786271Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0786394Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0786641Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0786872Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0786918Z _scaled_mm 0.0239 ms 24.6% 2025-12-04T11:45:26.0787046Z SingleProcess AUTOTUNE benchmarking takes 0.0174 seconds and 0.0862 seconds precompiling for 3 choices 2025-12-04T11:45:26.0787120Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0787167Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0787225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0787325Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0787809Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0787845Z graph_break [] 2025-12-04T11:45:26.0787908Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0787980Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0788022Z Autotune Choices Stats: 2025-12-04T11:45:26.0788389Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0788456Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0788504Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0788625Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0788869Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0789106Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0789150Z _scaled_mm 0.0230 ms 26.8% 2025-12-04T11:45:26.0789277Z SingleProcess AUTOTUNE benchmarking takes 0.0170 seconds and 0.0782 seconds precompiling for 3 choices 2025-12-04T11:45:26.0789334Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0789483Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0789536Z Traceback (most recent call last): 2025-12-04T11:45:26.0789692Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0789736Z method(*args, **kwargs) 2025-12-04T11:45:26.0789891Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0789935Z method(*args, **kwargs) 2025-12-04T11:45:26.0790100Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0790140Z with policy(): 2025-12-04T11:45:26.0790305Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0790352Z raise RuntimeError(msg) 2025-12-04T11:45:26.0790750Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0790752Z 2025-12-04T11:45:26.0790827Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0791092Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0791097Z 2025-12-04T11:45:26.0791184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0791259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0791303Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0791362Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0791849Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0791949Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0791998Z graph_break [] 2025-12-04T11:45:26.0792060Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0792134Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0792638Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0792685Z current_size = base.storage().size() 2025-12-04T11:45:26.0792727Z Autotune Choices Stats: 2025-12-04T11:45:26.0793097Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0793152Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0793203Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0793359Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0793598Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0793830Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0793875Z _scaled_mm 0.0239 ms 24.6% 2025-12-04T11:45:26.0794019Z SingleProcess AUTOTUNE benchmarking takes 0.0174 seconds and 0.0862 seconds precompiling for 3 choices 2025-12-04T11:45:26.0794097Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0794162Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0794223Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0794323Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0794810Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0794847Z graph_break [] 2025-12-04T11:45:26.0794910Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0794983Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0795026Z Autotune Choices Stats: 2025-12-04T11:45:26.0795390Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0795443Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0795493Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0795614Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0795848Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0796095Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0796140Z _scaled_mm 0.0230 ms 26.8% 2025-12-04T11:45:26.0796268Z SingleProcess AUTOTUNE benchmarking takes 0.0170 seconds and 0.0782 seconds precompiling for 3 choices 2025-12-04T11:45:26.0796357Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0796401Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0796458Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0796560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0797044Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0797083Z graph_break [] 2025-12-04T11:45:26.0797143Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0797215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0797257Z Autotune Choices Stats: 2025-12-04T11:45:26.0797620Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.0797686Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0797735Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0797854Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0798100Z triton_mm_4 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0798330Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0798375Z _scaled_mm 0.0082 ms 72.2% 2025-12-04T11:45:26.0798503Z SingleProcess AUTOTUNE benchmarking takes 0.0234 seconds and 0.1874 seconds precompiling for 3 choices 2025-12-04T11:45:26.0798701Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f915867615cae279.xml - 2025-12-04T11:45:26.0798762Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0799368Z FAILED [0.6636s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0799371Z 2025-12-04T11:45:26.0799447Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0799710Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0799724Z 2025-12-04T11:45:26.0799813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0799877Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0799947Z ================== 1 failed, 117 deselected, 2 rerun in 3.16s ================== 2025-12-04T11:45:26.0799987Z Got exit code 1 2025-12-04T11:45:26.0800029Z Retrying single test... 2025-12-04T11:45:26.0800186Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-76ce2a3a49aae256.xml 2025-12-04T11:45:26.0800245Z ============================= test session starts ============================== 2025-12-04T11:45:26.0800356Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0800402Z cachedir: .pytest_cache 2025-12-04T11:45:26.0800564Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0800612Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0800652Z configfile: pytest.ini 2025-12-04T11:45:26.0800817Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0800892Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0801161Z stepcurrent: skipping 117 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0801207Z Running 1 items in this shard 2025-12-04T11:45:26.0801209Z 2025-12-04T11:45:26.0801438Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8479s] [100%] 2025-12-04T11:45:26.0801679Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4678s] [100%] 2025-12-04T11:45:26.0801885Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.5478s] [100%] 2025-12-04T11:45:26.0801887Z 2025-12-04T11:45:26.0801940Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0802090Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0802140Z Traceback (most recent call last): 2025-12-04T11:45:26.0802297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0802341Z method(*args, **kwargs) 2025-12-04T11:45:26.0802498Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0802543Z method(*args, **kwargs) 2025-12-04T11:45:26.0802697Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0802736Z with policy(): 2025-12-04T11:45:26.0802890Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0802938Z raise RuntimeError(msg) 2025-12-04T11:45:26.0803366Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0803369Z 2025-12-04T11:45:26.0803460Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0803725Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0803727Z 2025-12-04T11:45:26.0803815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0803893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0803936Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0804008Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0804496Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0804598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0804636Z graph_break [] 2025-12-04T11:45:26.0804699Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0804773Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0805269Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0805317Z current_size = base.storage().size() 2025-12-04T11:45:26.0805361Z Autotune Choices Stats: 2025-12-04T11:45:26.0805761Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0805815Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0805866Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0805988Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0806228Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0806459Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0806505Z _scaled_mm 0.0227 ms 26.6% 2025-12-04T11:45:26.0806634Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0966 seconds precompiling for 3 choices 2025-12-04T11:45:26.0806785Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0806832Z Traceback (most recent call last): 2025-12-04T11:45:26.0806990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0807033Z method(*args, **kwargs) 2025-12-04T11:45:26.0807188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0807231Z method(*args, **kwargs) 2025-12-04T11:45:26.0807386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0807435Z with policy(): 2025-12-04T11:45:26.0807588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0807630Z raise RuntimeError(msg) 2025-12-04T11:45:26.0808040Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0808043Z 2025-12-04T11:45:26.0808119Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0808383Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0808387Z 2025-12-04T11:45:26.0808475Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0808549Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0808595Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0808652Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0809142Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0809240Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0809302Z graph_break [] 2025-12-04T11:45:26.0809362Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0809439Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0809942Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0809991Z current_size = base.storage().size() 2025-12-04T11:45:26.0810035Z Autotune Choices Stats: 2025-12-04T11:45:26.0810406Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0810462Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0810511Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0810634Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0810869Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0811103Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0811147Z _scaled_mm 0.0227 ms 26.6% 2025-12-04T11:45:26.0811280Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0966 seconds precompiling for 3 choices 2025-12-04T11:45:26.0811366Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0811410Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0811467Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0811569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0812062Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0812099Z graph_break [] 2025-12-04T11:45:26.0812159Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0812233Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0812274Z Autotune Choices Stats: 2025-12-04T11:45:26.0812638Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:26.0812694Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0812744Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0812865Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0813095Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0813366Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0813424Z _scaled_mm 0.0212 ms 29.5% 2025-12-04T11:45:26.0813555Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0810 seconds precompiling for 3 choices 2025-12-04T11:45:26.0813608Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0813759Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0813805Z Traceback (most recent call last): 2025-12-04T11:45:26.0813961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0814004Z method(*args, **kwargs) 2025-12-04T11:45:26.0814158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0814206Z method(*args, **kwargs) 2025-12-04T11:45:26.0814357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0814394Z with policy(): 2025-12-04T11:45:26.0814546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0814593Z raise RuntimeError(msg) 2025-12-04T11:45:26.0814993Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0814995Z 2025-12-04T11:45:26.0815084Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0815349Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0815351Z 2025-12-04T11:45:26.0815443Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0815517Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0815576Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0815633Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0816120Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0816221Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0816257Z graph_break [] 2025-12-04T11:45:26.0816321Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0816394Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0816886Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0816932Z current_size = base.storage().size() 2025-12-04T11:45:26.0816973Z Autotune Choices Stats: 2025-12-04T11:45:26.0817370Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0817426Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0817475Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0817600Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0817833Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0818062Z triton_mm_0 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0818106Z _scaled_mm 0.0227 ms 26.6% 2025-12-04T11:45:26.0818234Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0966 seconds precompiling for 3 choices 2025-12-04T11:45:26.0818310Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0818351Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0818407Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0818508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0818994Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0819044Z graph_break [] 2025-12-04T11:45:26.0819105Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0819179Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0819221Z Autotune Choices Stats: 2025-12-04T11:45:26.0819595Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006238999776542187, "best_triton_pos": 0} 2025-12-04T11:45:26.0819649Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0819698Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0819820Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0820057Z triton_mm_2 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0820293Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0820338Z _scaled_mm 0.0212 ms 29.5% 2025-12-04T11:45:26.0822045Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0810 seconds precompiling for 3 choices 2025-12-04T11:45:26.0822123Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0822168Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0822226Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0822340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0822832Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0822869Z graph_break [] 2025-12-04T11:45:26.0822931Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0823004Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0823045Z Autotune Choices Stats: 2025-12-04T11:45:26.0823422Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.0823478Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0823527Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0823649Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0823882Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0824115Z triton_mm_4 0.0062 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0824179Z _scaled_mm 0.0235 ms 25.0% 2025-12-04T11:45:26.0824305Z SingleProcess AUTOTUNE benchmarking takes 0.0202 seconds and 0.1842 seconds precompiling for 3 choices 2025-12-04T11:45:26.0824498Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-76ce2a3a49aae256.xml - 2025-12-04T11:45:26.0824559Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0825183Z FAILED [0.5478s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0825186Z 2025-12-04T11:45:26.0825260Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0825531Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0825533Z 2025-12-04T11:45:26.0825621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0825684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0825753Z ================== 1 failed, 187 deselected, 2 rerun in 2.88s ================== 2025-12-04T11:45:26.0825791Z Got exit code 1 2025-12-04T11:45:26.0825831Z Retrying single test... 2025-12-04T11:45:26.0825976Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b53dbae4619a0c51.xml 2025-12-04T11:45:26.0826033Z ============================= test session starts ============================== 2025-12-04T11:45:26.0826158Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0826199Z cachedir: .pytest_cache 2025-12-04T11:45:26.0826370Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0826418Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0826460Z configfile: pytest.ini 2025-12-04T11:45:26.0826625Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0826701Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0826964Z stepcurrent: skipping 117 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0827007Z Running 1 items in this shard 2025-12-04T11:45:26.0827011Z 2025-12-04T11:45:26.0827238Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8167s] [100%] 2025-12-04T11:45:26.0827461Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4544s] [100%] 2025-12-04T11:45:26.0827658Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda FAILED [0.5491s] [100%] 2025-12-04T11:45:26.0827661Z 2025-12-04T11:45:26.0827712Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0827861Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0827910Z Traceback (most recent call last): 2025-12-04T11:45:26.0828069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0828124Z method(*args, **kwargs) 2025-12-04T11:45:26.0828276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0828320Z method(*args, **kwargs) 2025-12-04T11:45:26.0828470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0828509Z with policy(): 2025-12-04T11:45:26.0828671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0828715Z raise RuntimeError(msg) 2025-12-04T11:45:26.0829112Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0829116Z 2025-12-04T11:45:26.0829190Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0829456Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0829458Z 2025-12-04T11:45:26.0829547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0829620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0829665Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0829721Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0830204Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0830326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0830364Z graph_break [] 2025-12-04T11:45:26.0830424Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0830498Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0830985Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0831034Z current_size = base.storage().size() 2025-12-04T11:45:26.0831075Z Autotune Choices Stats: 2025-12-04T11:45:26.0831442Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0831497Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0831545Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0831665Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0831898Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0832140Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0832184Z _scaled_mm 0.0225 ms 26.8% 2025-12-04T11:45:26.0832311Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0912 seconds precompiling for 3 choices 2025-12-04T11:45:26.0832469Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0832516Z Traceback (most recent call last): 2025-12-04T11:45:26.0832672Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0832714Z method(*args, **kwargs) 2025-12-04T11:45:26.0832866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0832910Z method(*args, **kwargs) 2025-12-04T11:45:26.0833060Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0833097Z with policy(): 2025-12-04T11:45:26.0833282Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0833325Z raise RuntimeError(msg) 2025-12-04T11:45:26.0833722Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0833725Z 2025-12-04T11:45:26.0833798Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0834077Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0834079Z 2025-12-04T11:45:26.0834179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0834253Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0834298Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0834355Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0834841Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0834941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0834978Z graph_break [] 2025-12-04T11:45:26.0835037Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0835112Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0835595Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0835643Z current_size = base.storage().size() 2025-12-04T11:45:26.0835682Z Autotune Choices Stats: 2025-12-04T11:45:26.0836046Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0836115Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0836163Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0836285Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0836531Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0836761Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0836804Z _scaled_mm 0.0225 ms 26.8% 2025-12-04T11:45:26.0836930Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0912 seconds precompiling for 3 choices 2025-12-04T11:45:26.0837003Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0837047Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0837105Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0837206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0837688Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0837737Z graph_break [] 2025-12-04T11:45:26.0837795Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0837869Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0837919Z Autotune Choices Stats: 2025-12-04T11:45:26.0838282Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0838335Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0838383Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0838503Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0838734Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0838961Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0839003Z _scaled_mm 0.0241 ms 25.2% 2025-12-04T11:45:26.0839131Z SingleProcess AUTOTUNE benchmarking takes 0.0156 seconds and 0.0820 seconds precompiling for 3 choices 2025-12-04T11:45:26.0839184Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0839333Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0839379Z Traceback (most recent call last): 2025-12-04T11:45:26.0839555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0839596Z method(*args, **kwargs) 2025-12-04T11:45:26.0839750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0839791Z method(*args, **kwargs) 2025-12-04T11:45:26.0839941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0839978Z with policy(): 2025-12-04T11:45:26.0840142Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0840185Z raise RuntimeError(msg) 2025-12-04T11:45:26.0840580Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0840584Z 2025-12-04T11:45:26.0840659Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0840921Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0840923Z 2025-12-04T11:45:26.0841012Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0841084Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0841131Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0841188Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0841681Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0841791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0841831Z graph_break [] 2025-12-04T11:45:26.0841890Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0841964Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0842451Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0842499Z current_size = base.storage().size() 2025-12-04T11:45:26.0842541Z Autotune Choices Stats: 2025-12-04T11:45:26.0842906Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0842960Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0843008Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0843126Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0843390Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0843638Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0843681Z _scaled_mm 0.0225 ms 26.8% 2025-12-04T11:45:26.0843808Z SingleProcess AUTOTUNE benchmarking takes 0.0164 seconds and 0.0912 seconds precompiling for 3 choices 2025-12-04T11:45:26.0843893Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0843936Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0843992Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0844091Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0844567Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0844607Z graph_break [] 2025-12-04T11:45:26.0844667Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0844740Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0844781Z Autotune Choices Stats: 2025-12-04T11:45:26.0845138Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0845209Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0845257Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0845377Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0845622Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0845851Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0845894Z _scaled_mm 0.0241 ms 25.2% 2025-12-04T11:45:26.0846021Z SingleProcess AUTOTUNE benchmarking takes 0.0156 seconds and 0.0820 seconds precompiling for 3 choices 2025-12-04T11:45:26.0846093Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0846139Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0846194Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0846294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0846772Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0846809Z graph_break [] 2025-12-04T11:45:26.0846869Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0846941Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0846983Z Autotune Choices Stats: 2025-12-04T11:45:26.0847354Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0847407Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0847455Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0847588Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0847821Z triton_mm_5 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0848049Z triton_mm_4 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0848094Z _scaled_mm 0.0229 ms 26.4% 2025-12-04T11:45:26.0848223Z SingleProcess AUTOTUNE benchmarking takes 0.0199 seconds and 0.1862 seconds precompiling for 3 choices 2025-12-04T11:45:26.0848413Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b53dbae4619a0c51.xml - 2025-12-04T11:45:26.0848475Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0849077Z FAILED [0.5491s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0849090Z 2025-12-04T11:45:26.0849163Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0849437Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0849439Z 2025-12-04T11:45:26.0849527Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0849591Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0849658Z ================== 1 failed, 187 deselected, 2 rerun in 2.84s ================== 2025-12-04T11:45:26.0849696Z Got exit code 1 2025-12-04T11:45:26.0849907Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda 2025-12-04T11:45:26.0850036Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0850180Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7ac4165d830054b3.xml 2025-12-04T11:45:26.0850237Z ============================= test session starts ============================== 2025-12-04T11:45:26.0850348Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0850390Z cachedir: .pytest_cache 2025-12-04T11:45:26.0850549Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0850595Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0850639Z configfile: pytest.ini 2025-12-04T11:45:26.0850800Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0850892Z collecting ... collected 188 items / 118 deselected / 70 selected 2025-12-04T11:45:26.0850947Z stepcurrent: skipping 118 already run items. 2025-12-04T11:45:26.0850993Z Running 70 items in this shard 2025-12-04T11:45:26.0850996Z 2025-12-04T11:45:26.0851219Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8440s] [ 1%] 2025-12-04T11:45:26.0851450Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4649s] [ 1%] 2025-12-04T11:45:26.0851645Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.5491s] [ 1%] 2025-12-04T11:45:26.0851647Z 2025-12-04T11:45:26.0851698Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0851846Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0851896Z Traceback (most recent call last): 2025-12-04T11:45:26.0852053Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0852096Z method(*args, **kwargs) 2025-12-04T11:45:26.0852248Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0852292Z method(*args, **kwargs) 2025-12-04T11:45:26.0852442Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0852482Z with policy(): 2025-12-04T11:45:26.0852635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0852689Z raise RuntimeError(msg) 2025-12-04T11:45:26.0853099Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0853102Z 2025-12-04T11:45:26.0853175Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0853463Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0853465Z 2025-12-04T11:45:26.0853551Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0853625Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0853670Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0853726Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0854210Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0854311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0854349Z graph_break [] 2025-12-04T11:45:26.0854410Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0854482Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0854969Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0855032Z current_size = base.storage().size() 2025-12-04T11:45:26.0855072Z Autotune Choices Stats: 2025-12-04T11:45:26.0855451Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0855504Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0855553Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0855676Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0855910Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0856137Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0856180Z _scaled_mm 0.0231 ms 26.0% 2025-12-04T11:45:26.0856306Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0905 seconds precompiling for 3 choices 2025-12-04T11:45:26.0856455Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0856501Z Traceback (most recent call last): 2025-12-04T11:45:26.0856670Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0856712Z method(*args, **kwargs) 2025-12-04T11:45:26.0856880Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0856922Z method(*args, **kwargs) 2025-12-04T11:45:26.0857073Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0857111Z with policy(): 2025-12-04T11:45:26.0857263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0857306Z raise RuntimeError(msg) 2025-12-04T11:45:26.0857706Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0857710Z 2025-12-04T11:45:26.0857785Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0858049Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0858051Z 2025-12-04T11:45:26.0858139Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0858211Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0858256Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0858312Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0858791Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0858902Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0858939Z graph_break [] 2025-12-04T11:45:26.0858999Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0859082Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0859567Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0859615Z current_size = base.storage().size() 2025-12-04T11:45:26.0859656Z Autotune Choices Stats: 2025-12-04T11:45:26.0860024Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0860078Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0860126Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0860248Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0860478Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0860725Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0860769Z _scaled_mm 0.0231 ms 26.0% 2025-12-04T11:45:26.0860894Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0905 seconds precompiling for 3 choices 2025-12-04T11:45:26.0860969Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0861011Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0861067Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0861165Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0861646Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0861683Z graph_break [] 2025-12-04T11:45:26.0861743Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0861816Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0861857Z Autotune Choices Stats: 2025-12-04T11:45:26.0862218Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.0862285Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0862334Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0862453Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0862683Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0862919Z triton_mm_2 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0862963Z _scaled_mm 0.0224 ms 27.9% 2025-12-04T11:45:26.0863089Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0727 seconds precompiling for 3 choices 2025-12-04T11:45:26.0863142Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0863319Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0863366Z Traceback (most recent call last): 2025-12-04T11:45:26.0863522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0863565Z method(*args, **kwargs) 2025-12-04T11:45:26.0863717Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0863759Z method(*args, **kwargs) 2025-12-04T11:45:26.0863909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0863947Z with policy(): 2025-12-04T11:45:26.0864099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0864157Z raise RuntimeError(msg) 2025-12-04T11:45:26.0864564Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0864567Z 2025-12-04T11:45:26.0864641Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0864903Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0864907Z 2025-12-04T11:45:26.0864993Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0865065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0865110Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0865167Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0865649Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0865749Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0865786Z graph_break [] 2025-12-04T11:45:26.0865846Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0865918Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0866404Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0866464Z current_size = base.storage().size() 2025-12-04T11:45:26.0866505Z Autotune Choices Stats: 2025-12-04T11:45:26.0866880Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0866934Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0866983Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0867105Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0867337Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0867560Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0867604Z _scaled_mm 0.0231 ms 26.0% 2025-12-04T11:45:26.0867731Z SingleProcess AUTOTUNE benchmarking takes 0.0163 seconds and 0.0905 seconds precompiling for 3 choices 2025-12-04T11:45:26.0867804Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0867846Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0867905Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0868020Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0868509Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0868548Z graph_break [] 2025-12-04T11:45:26.0868609Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0868682Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0868724Z Autotune Choices Stats: 2025-12-04T11:45:26.0869084Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.0869140Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0869190Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0869309Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0869539Z triton_mm_3 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0869765Z triton_mm_2 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0869819Z _scaled_mm 0.0224 ms 27.9% 2025-12-04T11:45:26.0869945Z SingleProcess AUTOTUNE benchmarking takes 0.0155 seconds and 0.0727 seconds precompiling for 3 choices 2025-12-04T11:45:26.0870019Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0870062Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0870119Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0870218Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0870706Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0870745Z graph_break [] 2025-12-04T11:45:26.0870808Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0870881Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0870921Z Autotune Choices Stats: 2025-12-04T11:45:26.0871281Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.0871334Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0871382Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0871499Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0871727Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0871974Z triton_mm_4 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0872018Z _scaled_mm 0.0234 ms 25.9% 2025-12-04T11:45:26.0872145Z SingleProcess AUTOTUNE benchmarking takes 0.0200 seconds and 0.1839 seconds precompiling for 3 choices 2025-12-04T11:45:26.0872337Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7ac4165d830054b3.xml - 2025-12-04T11:45:26.0872397Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0872999Z FAILED [0.5491s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0873003Z 2025-12-04T11:45:26.0873078Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0873374Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0873376Z 2025-12-04T11:45:26.0873464Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0873525Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0873611Z ================== 1 failed, 118 deselected, 2 rerun in 2.88s ================== 2025-12-04T11:45:26.0873648Z Got exit code 1 2025-12-04T11:45:26.0873690Z Retrying single test... 2025-12-04T11:45:26.0873835Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9bc2d6076735fcdd.xml 2025-12-04T11:45:26.0873892Z ============================= test session starts ============================== 2025-12-04T11:45:26.0874001Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0874057Z cachedir: .pytest_cache 2025-12-04T11:45:26.0874215Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0874263Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0874303Z configfile: pytest.ini 2025-12-04T11:45:26.0874466Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0874542Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0874802Z stepcurrent: skipping 118 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0874845Z Running 1 items in this shard 2025-12-04T11:45:26.0874847Z 2025-12-04T11:45:26.0875069Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8190s] [100%] 2025-12-04T11:45:26.0875288Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4543s] [100%] 2025-12-04T11:45:26.0875482Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.5379s] [100%] 2025-12-04T11:45:26.0875499Z 2025-12-04T11:45:26.0875552Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0875712Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0875759Z Traceback (most recent call last): 2025-12-04T11:45:26.0875916Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0875959Z method(*args, **kwargs) 2025-12-04T11:45:26.0876112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0876153Z method(*args, **kwargs) 2025-12-04T11:45:26.0876303Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0876341Z with policy(): 2025-12-04T11:45:26.0876493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0876536Z raise RuntimeError(msg) 2025-12-04T11:45:26.0876929Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0876932Z 2025-12-04T11:45:26.0877007Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0877272Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0877274Z 2025-12-04T11:45:26.0877372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0877446Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0877488Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0877548Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0878037Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0878137Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0878175Z graph_break [] 2025-12-04T11:45:26.0878237Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0878312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0878799Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0878845Z current_size = base.storage().size() 2025-12-04T11:45:26.0878887Z Autotune Choices Stats: 2025-12-04T11:45:26.0879249Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.0879316Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0879365Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0879486Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0879728Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0879954Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0879997Z _scaled_mm 0.0226 ms 26.8% 2025-12-04T11:45:26.0880123Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0904 seconds precompiling for 3 choices 2025-12-04T11:45:26.0880271Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0880319Z Traceback (most recent call last): 2025-12-04T11:45:26.0880475Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0880517Z method(*args, **kwargs) 2025-12-04T11:45:26.0880669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0880708Z method(*args, **kwargs) 2025-12-04T11:45:26.0880861Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0880897Z with policy(): 2025-12-04T11:45:26.0881049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0881091Z raise RuntimeError(msg) 2025-12-04T11:45:26.0881500Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0881502Z 2025-12-04T11:45:26.0881576Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0881856Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0881858Z 2025-12-04T11:45:26.0881948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0882020Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0882064Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0882121Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0882604Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0882703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0882740Z graph_break [] 2025-12-04T11:45:26.0882799Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0882873Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0883388Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0883464Z current_size = base.storage().size() 2025-12-04T11:45:26.0883506Z Autotune Choices Stats: 2025-12-04T11:45:26.0883868Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.0883921Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0883968Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0884089Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0884326Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0884556Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0884599Z _scaled_mm 0.0226 ms 26.8% 2025-12-04T11:45:26.0884726Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0904 seconds precompiling for 3 choices 2025-12-04T11:45:26.0884798Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0884841Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0884896Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0884995Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0885489Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0885525Z graph_break [] 2025-12-04T11:45:26.0885599Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0885672Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0885713Z Autotune Choices Stats: 2025-12-04T11:45:26.0886068Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0886124Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0886172Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0886292Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0886525Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0886749Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0886790Z _scaled_mm 0.0232 ms 26.4% 2025-12-04T11:45:26.0886930Z SingleProcess AUTOTUNE benchmarking takes 0.0165 seconds and 0.0807 seconds precompiling for 3 choices 2025-12-04T11:45:26.0886983Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0887140Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0887186Z Traceback (most recent call last): 2025-12-04T11:45:26.0887342Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0887385Z method(*args, **kwargs) 2025-12-04T11:45:26.0887537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0887578Z method(*args, **kwargs) 2025-12-04T11:45:26.0887729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0887767Z with policy(): 2025-12-04T11:45:26.0887918Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0887961Z raise RuntimeError(msg) 2025-12-04T11:45:26.0888354Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0888357Z 2025-12-04T11:45:26.0888431Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0888691Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0888693Z 2025-12-04T11:45:26.0888794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0888868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0888912Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0888968Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0889461Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0889560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0889596Z graph_break [] 2025-12-04T11:45:26.0889656Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0889730Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0890217Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0890263Z current_size = base.storage().size() 2025-12-04T11:45:26.0890305Z Autotune Choices Stats: 2025-12-04T11:45:26.0890663Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.0890727Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0890775Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0890895Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0891135Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0891362Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0891405Z _scaled_mm 0.0226 ms 26.8% 2025-12-04T11:45:26.0891531Z SingleProcess AUTOTUNE benchmarking takes 0.0168 seconds and 0.0904 seconds precompiling for 3 choices 2025-12-04T11:45:26.0891604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0891647Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0891704Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0891804Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0892284Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0892320Z graph_break [] 2025-12-04T11:45:26.0892380Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0892452Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0892504Z Autotune Choices Stats: 2025-12-04T11:45:26.0892857Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.0892910Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0892957Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0893088Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0893348Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0893574Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0893617Z _scaled_mm 0.0232 ms 26.4% 2025-12-04T11:45:26.0893743Z SingleProcess AUTOTUNE benchmarking takes 0.0165 seconds and 0.0807 seconds precompiling for 3 choices 2025-12-04T11:45:26.0893817Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0893859Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0893916Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0894015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0894489Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0894541Z graph_break [] 2025-12-04T11:45:26.0894600Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0894686Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0894727Z Autotune Choices Stats: 2025-12-04T11:45:26.0895085Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.0895138Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0895186Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0895307Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0895541Z triton_mm_4 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0895768Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0895814Z _scaled_mm 0.0258 ms 23.9% 2025-12-04T11:45:26.0895940Z SingleProcess AUTOTUNE benchmarking takes 0.0199 seconds and 0.1814 seconds precompiling for 3 choices 2025-12-04T11:45:26.0896130Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9bc2d6076735fcdd.xml - 2025-12-04T11:45:26.0896189Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0896809Z FAILED [0.5379s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0896812Z 2025-12-04T11:45:26.0896897Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0897162Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0897164Z 2025-12-04T11:45:26.0897252Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0897317Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0897387Z ================== 1 failed, 187 deselected, 2 rerun in 2.83s ================== 2025-12-04T11:45:26.0897426Z Got exit code 1 2025-12-04T11:45:26.0897469Z Retrying single test... 2025-12-04T11:45:26.0897612Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f01580084ca5c55a.xml 2025-12-04T11:45:26.0897670Z ============================= test session starts ============================== 2025-12-04T11:45:26.0897782Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0897822Z cachedir: .pytest_cache 2025-12-04T11:45:26.0897985Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0898033Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0898085Z configfile: pytest.ini 2025-12-04T11:45:26.0898246Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0898332Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0898592Z stepcurrent: skipping 118 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0898637Z Running 1 items in this shard 2025-12-04T11:45:26.0898640Z 2025-12-04T11:45:26.0898862Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8018s] [100%] 2025-12-04T11:45:26.0899080Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4427s] [100%] 2025-12-04T11:45:26.0899280Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda FAILED [0.5606s] [100%] 2025-12-04T11:45:26.0899282Z 2025-12-04T11:45:26.0899335Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0899483Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0899533Z Traceback (most recent call last): 2025-12-04T11:45:26.0899691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0899735Z method(*args, **kwargs) 2025-12-04T11:45:26.0899888Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0899930Z method(*args, **kwargs) 2025-12-04T11:45:26.0900093Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0900130Z with policy(): 2025-12-04T11:45:26.0900283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0900326Z raise RuntimeError(msg) 2025-12-04T11:45:26.0900737Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.0900740Z 2025-12-04T11:45:26.0900816Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0901077Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0901081Z 2025-12-04T11:45:26.0901168Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0901241Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0901285Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0901341Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0901820Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0901918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0901967Z graph_break [] 2025-12-04T11:45:26.0902028Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0902103Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0902599Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0902647Z current_size = base.storage().size() 2025-12-04T11:45:26.0902687Z Autotune Choices Stats: 2025-12-04T11:45:26.0903052Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005998999811708927, "best_triton_pos": 0} 2025-12-04T11:45:26.0903109Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0903157Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0903316Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0903547Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0903774Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0903818Z _scaled_mm 0.0220 ms 27.2% 2025-12-04T11:45:26.0903962Z SingleProcess AUTOTUNE benchmarking takes 0.0161 seconds and 0.0900 seconds precompiling for 3 choices 2025-12-04T11:45:26.0904109Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0904155Z Traceback (most recent call last): 2025-12-04T11:45:26.0904310Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0904353Z method(*args, **kwargs) 2025-12-04T11:45:26.0904519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0904560Z method(*args, **kwargs) 2025-12-04T11:45:26.0904711Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0904747Z with policy(): 2025-12-04T11:45:26.0904899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0904943Z raise RuntimeError(msg) 2025-12-04T11:45:26.0905341Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1050673152. 2025-12-04T11:45:26.0905344Z 2025-12-04T11:45:26.0905418Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0905683Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0905685Z 2025-12-04T11:45:26.0905773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0905860Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0905905Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0905961Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0906458Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0906558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0906598Z graph_break [] 2025-12-04T11:45:26.0906657Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0906731Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0907217Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0907266Z current_size = base.storage().size() 2025-12-04T11:45:26.0907305Z Autotune Choices Stats: 2025-12-04T11:45:26.0907670Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005998999811708927, "best_triton_pos": 0} 2025-12-04T11:45:26.0907724Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0907771Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0907906Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0908139Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0908379Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0908422Z _scaled_mm 0.0220 ms 27.2% 2025-12-04T11:45:26.0908548Z SingleProcess AUTOTUNE benchmarking takes 0.0161 seconds and 0.0900 seconds precompiling for 3 choices 2025-12-04T11:45:26.0908620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0908666Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0908724Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0908824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0909303Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0909343Z graph_break [] 2025-12-04T11:45:26.0909401Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0909475Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0909515Z Autotune Choices Stats: 2025-12-04T11:45:26.0909884Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0909955Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0910003Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0910123Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0910352Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0910579Z triton_mm_3 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0910623Z _scaled_mm 0.0206 ms 29.3% 2025-12-04T11:45:26.0910751Z SingleProcess AUTOTUNE benchmarking takes 0.0152 seconds and 0.0788 seconds precompiling for 3 choices 2025-12-04T11:45:26.0910804Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0910957Z _ TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0911004Z Traceback (most recent call last): 2025-12-04T11:45:26.0911161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0911204Z method(*args, **kwargs) 2025-12-04T11:45:26.0911361Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0911403Z method(*args, **kwargs) 2025-12-04T11:45:26.0911569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0911606Z with policy(): 2025-12-04T11:45:26.0911761Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0911802Z raise RuntimeError(msg) 2025-12-04T11:45:26.0912210Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0912212Z 2025-12-04T11:45:26.0912287Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0912551Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0912555Z 2025-12-04T11:45:26.0912642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0912716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0912762Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0912817Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0913336Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0913434Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0913487Z graph_break [] 2025-12-04T11:45:26.0913545Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0913620Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0914124Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0914174Z current_size = base.storage().size() 2025-12-04T11:45:26.0914216Z Autotune Choices Stats: 2025-12-04T11:45:26.0914581Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005998999811708927, "best_triton_pos": 0} 2025-12-04T11:45:26.0914635Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0914683Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0914805Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0915036Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0915262Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0915302Z _scaled_mm 0.0220 ms 27.2% 2025-12-04T11:45:26.0915447Z SingleProcess AUTOTUNE benchmarking takes 0.0161 seconds and 0.0900 seconds precompiling for 3 choices 2025-12-04T11:45:26.0915521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0915569Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0915627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0915730Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0916226Z inductor [('triton_bundler_save_kernel', 24), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0916266Z graph_break [] 2025-12-04T11:45:26.0916326Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0916400Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0916441Z Autotune Choices Stats: 2025-12-04T11:45:26.0916801Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.0916855Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0916903Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0917024Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0917254Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0917505Z triton_mm_3 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0917547Z _scaled_mm 0.0206 ms 29.3% 2025-12-04T11:45:26.0917676Z SingleProcess AUTOTUNE benchmarking takes 0.0152 seconds and 0.0788 seconds precompiling for 3 choices 2025-12-04T11:45:26.0917750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0917796Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0917852Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0917952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0918430Z inductor [('triton_bundler_save_kernel', 24), ('async_compile_cache_miss', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 3), ('generated_module_cache_miss', 2), ('select_algorithm_num_precompiles', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0918468Z graph_break [] 2025-12-04T11:45:26.0918527Z aten_mm_info [('aten._scaled_mm.default_16_32_32', 1)] 2025-12-04T11:45:26.0918600Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0918641Z Autotune Choices Stats: 2025-12-04T11:45:26.0919002Z {"num_choices": 3, "num_triton_choices": 2, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.0919066Z AUTOTUNE scaled_mm(16x32, 32x32, 16x1, 1x32, 32) 2025-12-04T11:45:26.0919114Z strides: [32, 1], [1, 32], [1, 1], [1, 1], [1] 2025-12-04T11:45:26.0919235Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32, torch.bfloat16 2025-12-04T11:45:26.0919468Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0919708Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.0919749Z _scaled_mm 0.0236 ms 25.4% 2025-12-04T11:45:26.0919876Z SingleProcess AUTOTUNE benchmarking takes 0.0216 seconds and 0.1883 seconds precompiling for 3 choices 2025-12-04T11:45:26.0920065Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f01580084ca5c55a.xml - 2025-12-04T11:45:26.0920126Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0920725Z FAILED [0.5606s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1082130432. 2025-12-04T11:45:26.0920728Z 2025-12-04T11:45:26.0920801Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0921066Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0921079Z 2025-12-04T11:45:26.0921165Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0921241Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0921308Z ================== 1 failed, 187 deselected, 2 rerun in 2.82s ================== 2025-12-04T11:45:26.0921347Z Got exit code 1 2025-12-04T11:45:26.0921559Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda 2025-12-04T11:45:26.0921689Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.0921833Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-93399e0183cbfd4f.xml 2025-12-04T11:45:26.0921895Z ============================= test session starts ============================== 2025-12-04T11:45:26.0922009Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0922049Z cachedir: .pytest_cache 2025-12-04T11:45:26.0922210Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0922259Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0922301Z configfile: pytest.ini 2025-12-04T11:45:26.0922464Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0922541Z collecting ... collected 188 items / 119 deselected / 69 selected 2025-12-04T11:45:26.0922597Z stepcurrent: skipping 119 already run items. 2025-12-04T11:45:26.0922644Z Running 69 items in this shard 2025-12-04T11:45:26.0922646Z 2025-12-04T11:45:26.0922883Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_False_cuda SKIPPED [0.0003s] (Need device-side TMA support in Triton) [ 1%] 2025-12-04T11:45:26.0923130Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_1024,1024,512_use_fast_accum_True_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 2%] 2025-12-04T11:45:26.0923383Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_False_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 4%] 2025-12-04T11:45:26.0923620Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_tma_template_shape_16,32,32_use_fast_accum_True_cuda SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 5%] 2025-12-04T11:45:26.0923749Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_scaled_mm_preserves_strides_cuda PASSED [1.9939s] [ 7%] 2025-12-04T11:45:26.0923951Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda PASSED [1.0929s] [ 8%] 2025-12-04T11:45:26.0924174Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3231s] [ 10%] 2025-12-04T11:45:26.0924394Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.2636s] [ 10%] 2025-12-04T11:45:26.0924591Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.2617s] [ 10%] 2025-12-04T11:45:26.0924594Z 2025-12-04T11:45:26.0924644Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0924796Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0924844Z Traceback (most recent call last): 2025-12-04T11:45:26.0925030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0925073Z method(*args, **kwargs) 2025-12-04T11:45:26.0925241Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0925284Z method(*args, **kwargs) 2025-12-04T11:45:26.0925435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0925473Z with policy(): 2025-12-04T11:45:26.0925627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0925670Z raise RuntimeError(msg) 2025-12-04T11:45:26.0926068Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1092616192 and is now 1207959552. 2025-12-04T11:45:26.0926072Z 2025-12-04T11:45:26.0926146Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0926414Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0926416Z 2025-12-04T11:45:26.0926507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0926581Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0926628Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0926684Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0926785Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0927292Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0927331Z graph_break [] 2025-12-04T11:45:26.0927397Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0927483Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0927526Z Autotune Choices Stats: 2025-12-04T11:45:26.0927902Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012799999676644802, "best_triton_pos": 0} 2025-12-04T11:45:26.0927954Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0928001Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0928101Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0928344Z triton_mm_54 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0928577Z triton_mm_32 0.0138 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0928810Z triton_mm_33 0.0142 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0929063Z triton_mm_34 0.0149 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0929293Z triton_mm_53 0.0154 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0929523Z triton_mm_51 0.0167 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0929750Z triton_mm_52 0.0168 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0929981Z triton_mm_50 0.0172 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0930215Z triton_mm_31 0.0178 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0930440Z triton_mm_35 0.0182 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0930572Z SingleProcess AUTOTUNE benchmarking takes 0.2881 seconds and 0.3653 seconds precompiling for 39 choices 2025-12-04T11:45:26.0930733Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0930782Z Traceback (most recent call last): 2025-12-04T11:45:26.0930946Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0930989Z method(*args, **kwargs) 2025-12-04T11:45:26.0931141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0931197Z method(*args, **kwargs) 2025-12-04T11:45:26.0931349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0931387Z with policy(): 2025-12-04T11:45:26.0931540Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0931584Z raise RuntimeError(msg) 2025-12-04T11:45:26.0931982Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1207959552 and is now 1293942784. 2025-12-04T11:45:26.0931984Z 2025-12-04T11:45:26.0932057Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0932324Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0932326Z 2025-12-04T11:45:26.0932412Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0932486Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0932543Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0932602Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0932702Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0933206Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0933244Z graph_break [] 2025-12-04T11:45:26.0933359Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0933432Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0933474Z Autotune Choices Stats: 2025-12-04T11:45:26.0933846Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012799999676644802, "best_triton_pos": 0} 2025-12-04T11:45:26.0933897Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0933942Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0934041Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0934280Z triton_mm_54 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0934509Z triton_mm_32 0.0138 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0934759Z triton_mm_33 0.0142 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0934988Z triton_mm_34 0.0149 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0935229Z triton_mm_53 0.0154 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0935456Z triton_mm_51 0.0167 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0935690Z triton_mm_52 0.0168 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0935917Z triton_mm_50 0.0172 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0936150Z triton_mm_31 0.0178 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0936377Z triton_mm_35 0.0182 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0936522Z SingleProcess AUTOTUNE benchmarking takes 0.2881 seconds and 0.3653 seconds precompiling for 39 choices 2025-12-04T11:45:26.0936609Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0936653Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0936711Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0936812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0937296Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0937336Z graph_break [] 2025-12-04T11:45:26.0937400Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0937473Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0937514Z Autotune Choices Stats: 2025-12-04T11:45:26.0937892Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_92", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.01295899972319603, "best_triton_pos": 0} 2025-12-04T11:45:26.0937941Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0937984Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0938081Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0938317Z triton_mm_92 0.0130 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0938557Z triton_mm_70 0.0138 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0938801Z triton_mm_71 0.0144 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0939031Z triton_mm_72 0.0147 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0939260Z triton_mm_91 0.0152 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0939491Z triton_mm_89 0.0166 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0939719Z triton_mm_88 0.0171 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0939950Z triton_mm_90 0.0171 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0940180Z triton_mm_69 0.0176 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0940426Z triton_mm_73 0.0180 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0940558Z SingleProcess AUTOTUNE benchmarking takes 0.2680 seconds and 0.3496 seconds precompiling for 39 choices 2025-12-04T11:45:26.0940611Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0940763Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0940811Z Traceback (most recent call last): 2025-12-04T11:45:26.0940970Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0941015Z method(*args, **kwargs) 2025-12-04T11:45:26.0941169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0941213Z method(*args, **kwargs) 2025-12-04T11:45:26.0941366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0941403Z with policy(): 2025-12-04T11:45:26.0941559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0941602Z raise RuntimeError(msg) 2025-12-04T11:45:26.0942003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1293942784 and is now 1379926016. 2025-12-04T11:45:26.0942016Z 2025-12-04T11:45:26.0942090Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0942359Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0942361Z 2025-12-04T11:45:26.0942449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0942531Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0942579Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0942635Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0942734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0943223Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0943281Z graph_break [] 2025-12-04T11:45:26.0943347Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0943421Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0943462Z Autotune Choices Stats: 2025-12-04T11:45:26.0943831Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012799999676644802, "best_triton_pos": 0} 2025-12-04T11:45:26.0943899Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0943944Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0944041Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0944291Z triton_mm_54 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0944522Z triton_mm_32 0.0138 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0944750Z triton_mm_33 0.0142 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0944980Z triton_mm_34 0.0149 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0945206Z triton_mm_53 0.0154 ms 83.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0945432Z triton_mm_51 0.0167 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0945660Z triton_mm_52 0.0168 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0945899Z triton_mm_50 0.0172 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0946130Z triton_mm_31 0.0178 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0946369Z triton_mm_35 0.0182 ms 70.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0946499Z SingleProcess AUTOTUNE benchmarking takes 0.2881 seconds and 0.3653 seconds precompiling for 39 choices 2025-12-04T11:45:26.0946572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0946618Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0946674Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0946775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0947264Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0947302Z graph_break [] 2025-12-04T11:45:26.0947367Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0947439Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0947479Z Autotune Choices Stats: 2025-12-04T11:45:26.0947871Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_92", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.01295899972319603, "best_triton_pos": 0} 2025-12-04T11:45:26.0947921Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0947963Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0948063Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0948298Z triton_mm_92 0.0130 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0948531Z triton_mm_70 0.0138 ms 93.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0948765Z triton_mm_71 0.0144 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0948995Z triton_mm_72 0.0147 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0949225Z triton_mm_91 0.0152 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0949451Z triton_mm_89 0.0166 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0949698Z triton_mm_88 0.0171 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0949936Z triton_mm_90 0.0171 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0950167Z triton_mm_69 0.0176 ms 73.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0950393Z triton_mm_73 0.0180 ms 72.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0950524Z SingleProcess AUTOTUNE benchmarking takes 0.2680 seconds and 0.3496 seconds precompiling for 39 choices 2025-12-04T11:45:26.0950598Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0950644Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0950702Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0950803Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0951293Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0951342Z graph_break [] 2025-12-04T11:45:26.0951406Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0951490Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0951531Z Autotune Choices Stats: 2025-12-04T11:45:26.0951900Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_130", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012679999694228172, "best_triton_pos": 0} 2025-12-04T11:45:26.0951950Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0951992Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0952089Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0952330Z triton_mm_130 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0952375Z _scaled_mm 0.0131 ms 96.9% 2025-12-04T11:45:26.0952608Z triton_mm_108 0.0139 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0952840Z triton_mm_109 0.0142 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0953071Z triton_mm_110 0.0149 ms 85.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0953347Z triton_mm_129 0.0155 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0953576Z triton_mm_127 0.0168 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0953818Z triton_mm_128 0.0172 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0954046Z triton_mm_126 0.0172 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0954286Z triton_mm_107 0.0176 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0954416Z SingleProcess AUTOTUNE benchmarking takes 0.2718 seconds and 0.3660 seconds precompiling for 39 choices 2025-12-04T11:45:26.0954609Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-93399e0183cbfd4f.xml - 2025-12-04T11:45:26.0954671Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0955285Z FAILED [1.2617s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1293942784 and is now 1379926016. 2025-12-04T11:45:26.0955302Z 2025-12-04T11:45:26.0955389Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0955659Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0955662Z 2025-12-04T11:45:26.0955751Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0955814Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0955895Z ======= 1 failed, 2 passed, 4 skipped, 119 deselected, 2 rerun in 6.96s ======== 2025-12-04T11:45:26.0955933Z Got exit code 1 2025-12-04T11:45:26.0955976Z Retrying single test... 2025-12-04T11:45:26.0956122Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1f994b35f0e977c5.xml 2025-12-04T11:45:26.0956179Z ============================= test session starts ============================== 2025-12-04T11:45:26.0956292Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0956333Z cachedir: .pytest_cache 2025-12-04T11:45:26.0956495Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0956545Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0956586Z configfile: pytest.ini 2025-12-04T11:45:26.0956751Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0956826Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0957105Z stepcurrent: skipping 125 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0957148Z Running 1 items in this shard 2025-12-04T11:45:26.0957150Z 2025-12-04T11:45:26.0957377Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [3.0745s] [100%] 2025-12-04T11:45:26.0957608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3577s] [100%] 2025-12-04T11:45:26.0957806Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.1599s] [100%] 2025-12-04T11:45:26.0957808Z 2025-12-04T11:45:26.0957859Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0958009Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0958058Z Traceback (most recent call last): 2025-12-04T11:45:26.0958218Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0958265Z method(*args, **kwargs) 2025-12-04T11:45:26.0958420Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0958465Z method(*args, **kwargs) 2025-12-04T11:45:26.0958616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0958653Z with policy(): 2025-12-04T11:45:26.0958805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0958865Z raise RuntimeError(msg) 2025-12-04T11:45:26.0959272Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1115684864. 2025-12-04T11:45:26.0959275Z 2025-12-04T11:45:26.0959352Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0959617Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0959619Z 2025-12-04T11:45:26.0959707Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0959782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0959829Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0959885Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0960378Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0960478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0960514Z graph_break [] 2025-12-04T11:45:26.0960579Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0960652Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0961150Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0961208Z current_size = base.storage().size() 2025-12-04T11:45:26.0961250Z Autotune Choices Stats: 2025-12-04T11:45:26.0961637Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012760000303387642, "best_triton_pos": 0} 2025-12-04T11:45:26.0961689Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0961730Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0961831Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0962077Z triton_mm_35 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0962308Z triton_mm_13 0.0140 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0962544Z triton_mm_14 0.0142 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0962778Z triton_mm_15 0.0147 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0963032Z triton_mm_34 0.0154 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0963294Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0963525Z triton_mm_32 0.0170 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0963754Z triton_mm_33 0.0172 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0963988Z triton_mm_12 0.0176 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0964214Z triton_mm_16 0.0181 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0964344Z SingleProcess AUTOTUNE benchmarking takes 0.1947 seconds and 0.7658 seconds precompiling for 39 choices 2025-12-04T11:45:26.0964495Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0964545Z Traceback (most recent call last): 2025-12-04T11:45:26.0964705Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0964763Z method(*args, **kwargs) 2025-12-04T11:45:26.0964922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0964966Z method(*args, **kwargs) 2025-12-04T11:45:26.0965121Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0965160Z with policy(): 2025-12-04T11:45:26.0965327Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0965374Z raise RuntimeError(msg) 2025-12-04T11:45:26.0965774Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1115684864 and is now 1212153856. 2025-12-04T11:45:26.0965778Z 2025-12-04T11:45:26.0965855Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0966123Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0966126Z 2025-12-04T11:45:26.0966217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0966292Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0966340Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0966397Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0966883Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0967016Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0967052Z graph_break [] 2025-12-04T11:45:26.0967119Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0967192Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0967686Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0967734Z current_size = base.storage().size() 2025-12-04T11:45:26.0967776Z Autotune Choices Stats: 2025-12-04T11:45:26.0968149Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012760000303387642, "best_triton_pos": 0} 2025-12-04T11:45:26.0968199Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0968242Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0968343Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0968580Z triton_mm_35 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0968824Z triton_mm_13 0.0140 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0969055Z triton_mm_14 0.0142 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0969293Z triton_mm_15 0.0147 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0969526Z triton_mm_34 0.0154 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0969754Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0969982Z triton_mm_32 0.0170 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0970210Z triton_mm_33 0.0172 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0970443Z triton_mm_12 0.0176 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0970681Z triton_mm_16 0.0181 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0970823Z SingleProcess AUTOTUNE benchmarking takes 0.1947 seconds and 0.7658 seconds precompiling for 39 choices 2025-12-04T11:45:26.0970897Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0970942Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0971000Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0971099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0971589Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0971628Z graph_break [] 2025-12-04T11:45:26.0971694Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0971766Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0971807Z Autotune Choices Stats: 2025-12-04T11:45:26.0972176Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012799999676644802, "best_triton_pos": 0} 2025-12-04T11:45:26.0972229Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0972273Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0972383Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0972623Z triton_mm_73 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0972667Z _scaled_mm 0.0137 ms 93.3% 2025-12-04T11:45:26.0972908Z triton_mm_51 0.0141 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0973135Z triton_mm_52 0.0146 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0973393Z triton_mm_53 0.0152 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0973624Z triton_mm_72 0.0154 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0973858Z triton_mm_70 0.0167 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0974087Z triton_mm_69 0.0171 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0974316Z triton_mm_71 0.0173 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0974582Z triton_mm_50 0.0176 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0974713Z SingleProcess AUTOTUNE benchmarking takes 0.2868 seconds and 0.5080 seconds precompiling for 39 choices 2025-12-04T11:45:26.0974769Z =================================== FAILURES =================================== 2025-12-04T11:45:26.0974921Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0974973Z Traceback (most recent call last): 2025-12-04T11:45:26.0975132Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0975179Z method(*args, **kwargs) 2025-12-04T11:45:26.0975332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0975376Z method(*args, **kwargs) 2025-12-04T11:45:26.0975528Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0975567Z with policy(): 2025-12-04T11:45:26.0975722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0975769Z raise RuntimeError(msg) 2025-12-04T11:45:26.0976173Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.0976191Z 2025-12-04T11:45:26.0976266Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0976538Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0976540Z 2025-12-04T11:45:26.0976628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0976717Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0976765Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0976824Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0977314Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0977417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0977454Z graph_break [] 2025-12-04T11:45:26.0977522Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0977594Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0978090Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0978137Z current_size = base.storage().size() 2025-12-04T11:45:26.0978192Z Autotune Choices Stats: 2025-12-04T11:45:26.0978581Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012760000303387642, "best_triton_pos": 0} 2025-12-04T11:45:26.0978631Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0978678Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0978778Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0979019Z triton_mm_35 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0979253Z triton_mm_13 0.0140 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0979485Z triton_mm_14 0.0142 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0979715Z triton_mm_15 0.0147 ms 86.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0979944Z triton_mm_34 0.0154 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0980171Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0980412Z triton_mm_32 0.0170 ms 74.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0980653Z triton_mm_33 0.0172 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0980886Z triton_mm_12 0.0176 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0981115Z triton_mm_16 0.0181 ms 70.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.0981249Z SingleProcess AUTOTUNE benchmarking takes 0.1947 seconds and 0.7658 seconds precompiling for 39 choices 2025-12-04T11:45:26.0981323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0981369Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0981425Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0981528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0982016Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0982074Z graph_break [] 2025-12-04T11:45:26.0982137Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0982224Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0982264Z Autotune Choices Stats: 2025-12-04T11:45:26.0982636Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012799999676644802, "best_triton_pos": 0} 2025-12-04T11:45:26.0982685Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0982729Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0982826Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0983069Z triton_mm_73 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0983117Z _scaled_mm 0.0137 ms 93.3% 2025-12-04T11:45:26.0983384Z triton_mm_51 0.0141 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0983614Z triton_mm_52 0.0146 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0983839Z triton_mm_53 0.0152 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0984085Z triton_mm_72 0.0154 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0984326Z triton_mm_70 0.0167 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0984555Z triton_mm_69 0.0171 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0984788Z triton_mm_71 0.0173 ms 73.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0985024Z triton_mm_50 0.0176 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0985157Z SingleProcess AUTOTUNE benchmarking takes 0.2868 seconds and 0.5080 seconds precompiling for 39 choices 2025-12-04T11:45:26.0985230Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0985276Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0985333Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0985434Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0985938Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0985992Z graph_break [] 2025-12-04T11:45:26.0986058Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0986133Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0986175Z Autotune Choices Stats: 2025-12-04T11:45:26.0986547Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_111", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012839999981224537, "best_triton_pos": 0} 2025-12-04T11:45:26.0986601Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0986650Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0986749Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0986990Z triton_mm_111 0.0128 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0987037Z _scaled_mm 0.0131 ms 97.9% 2025-12-04T11:45:26.0987268Z triton_mm_89 0.0140 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0987500Z triton_mm_90 0.0143 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0987739Z triton_mm_91 0.0149 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0987968Z triton_mm_110 0.0154 ms 83.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0988206Z triton_mm_108 0.0168 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0988433Z triton_mm_107 0.0171 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0988664Z triton_mm_109 0.0172 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0988897Z triton_mm_88 0.0179 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0989031Z SingleProcess AUTOTUNE benchmarking takes 0.2807 seconds and 0.3639 seconds precompiling for 39 choices 2025-12-04T11:45:26.0989223Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1f994b35f0e977c5.xml - 2025-12-04T11:45:26.0989283Z =========================== short test summary info ============================ 2025-12-04T11:45:26.0989909Z FAILED [1.1599s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.0989923Z 2025-12-04T11:45:26.0989996Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0990265Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0990267Z 2025-12-04T11:45:26.0990354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0990421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.0990489Z ================== 1 failed, 187 deselected, 2 rerun in 5.61s ================== 2025-12-04T11:45:26.0990532Z Got exit code 1 2025-12-04T11:45:26.0990572Z Retrying single test... 2025-12-04T11:45:26.0990721Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-28f92bec1f37331f.xml 2025-12-04T11:45:26.0990779Z ============================= test session starts ============================== 2025-12-04T11:45:26.0990893Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.0990935Z cachedir: .pytest_cache 2025-12-04T11:45:26.0991096Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.0991145Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.0991186Z configfile: pytest.ini 2025-12-04T11:45:26.0991349Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.0991438Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.0991703Z stepcurrent: skipping 125 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0991751Z Running 1 items in this shard 2025-12-04T11:45:26.0991753Z 2025-12-04T11:45:26.0991991Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [3.2689s] [100%] 2025-12-04T11:45:26.0992212Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3594s] [100%] 2025-12-04T11:45:26.0992411Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.1676s] [100%] 2025-12-04T11:45:26.0992415Z 2025-12-04T11:45:26.0992466Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.0992619Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0992668Z Traceback (most recent call last): 2025-12-04T11:45:26.0992830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0992875Z method(*args, **kwargs) 2025-12-04T11:45:26.0993031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0993074Z method(*args, **kwargs) 2025-12-04T11:45:26.0993229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0993317Z with policy(): 2025-12-04T11:45:26.0993474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0993519Z raise RuntimeError(msg) 2025-12-04T11:45:26.0993934Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1115684864. 2025-12-04T11:45:26.0993937Z 2025-12-04T11:45:26.0994012Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.0994279Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.0994283Z 2025-12-04T11:45:26.0994371Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.0994445Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.0994494Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.0994554Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.0995046Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.0995146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.0995185Z graph_break [] 2025-12-04T11:45:26.0995250Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.0995340Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.0995829Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.0995898Z current_size = base.storage().size() 2025-12-04T11:45:26.0995942Z Autotune Choices Stats: 2025-12-04T11:45:26.0996316Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.01271899975836277, "best_triton_pos": 0} 2025-12-04T11:45:26.0996368Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.0996414Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.0996518Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.0996760Z triton_mm_35 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0996806Z _scaled_mm 0.0131 ms 96.9% 2025-12-04T11:45:26.0997038Z triton_mm_13 0.0140 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0997266Z triton_mm_14 0.0143 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0997519Z triton_mm_15 0.0150 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0997749Z triton_mm_34 0.0154 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0997977Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0998203Z triton_mm_32 0.0170 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0998436Z triton_mm_33 0.0170 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0998666Z triton_mm_12 0.0177 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.0998799Z SingleProcess AUTOTUNE benchmarking takes 0.1852 seconds and 0.9456 seconds precompiling for 39 choices 2025-12-04T11:45:26.0998951Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.0999000Z Traceback (most recent call last): 2025-12-04T11:45:26.0999160Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0999217Z method(*args, **kwargs) 2025-12-04T11:45:26.0999375Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.0999420Z method(*args, **kwargs) 2025-12-04T11:45:26.0999571Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.0999608Z with policy(): 2025-12-04T11:45:26.0999773Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.0999818Z raise RuntimeError(msg) 2025-12-04T11:45:26.1000219Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1115684864 and is now 1212153856. 2025-12-04T11:45:26.1000223Z 2025-12-04T11:45:26.1000297Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1000568Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1000570Z 2025-12-04T11:45:26.1000659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1000735Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1000780Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1000837Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1001328Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1001450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1001491Z graph_break [] 2025-12-04T11:45:26.1001556Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.1001632Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1002119Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1002170Z current_size = base.storage().size() 2025-12-04T11:45:26.1002212Z Autotune Choices Stats: 2025-12-04T11:45:26.1002588Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.01271899975836277, "best_triton_pos": 0} 2025-12-04T11:45:26.1002638Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.1002685Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1002786Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1003025Z triton_mm_35 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1003080Z _scaled_mm 0.0131 ms 96.9% 2025-12-04T11:45:26.1003343Z triton_mm_13 0.0140 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1003589Z triton_mm_14 0.0143 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1003821Z triton_mm_15 0.0150 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1004053Z triton_mm_34 0.0154 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1004284Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1004514Z triton_mm_32 0.0170 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1004744Z triton_mm_33 0.0170 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1004974Z triton_mm_12 0.0177 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1005124Z SingleProcess AUTOTUNE benchmarking takes 0.1852 seconds and 0.9456 seconds precompiling for 39 choices 2025-12-04T11:45:26.1005209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1005253Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1005310Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1005411Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1005903Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1005945Z graph_break [] 2025-12-04T11:45:26.1006010Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.1006084Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1006125Z Autotune Choices Stats: 2025-12-04T11:45:26.1006498Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012959999963641167, "best_triton_pos": 0} 2025-12-04T11:45:26.1006546Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.1006591Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1006689Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1006926Z triton_mm_73 0.0130 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1007172Z triton_mm_51 0.0141 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1007410Z triton_mm_52 0.0142 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1007639Z triton_mm_53 0.0146 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1007866Z triton_mm_72 0.0157 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1008102Z triton_mm_70 0.0166 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1008333Z triton_mm_69 0.0170 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1008563Z triton_mm_71 0.0172 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1008809Z triton_mm_50 0.0178 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1009049Z triton_mm_54 0.0181 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1009184Z SingleProcess AUTOTUNE benchmarking takes 0.2710 seconds and 0.5178 seconds precompiling for 39 choices 2025-12-04T11:45:26.1009238Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1009393Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1009444Z Traceback (most recent call last): 2025-12-04T11:45:26.1009602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1009647Z method(*args, **kwargs) 2025-12-04T11:45:26.1009801Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1009846Z method(*args, **kwargs) 2025-12-04T11:45:26.1010002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1010040Z with policy(): 2025-12-04T11:45:26.1010196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1010244Z raise RuntimeError(msg) 2025-12-04T11:45:26.1010647Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1010666Z 2025-12-04T11:45:26.1010743Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1011011Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1011013Z 2025-12-04T11:45:26.1011103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1011186Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1011234Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1011290Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1011778Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1011879Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1011919Z graph_break [] 2025-12-04T11:45:26.1011984Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.1012061Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1012553Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1012612Z current_size = base.storage().size() 2025-12-04T11:45:26.1012654Z Autotune Choices Stats: 2025-12-04T11:45:26.1013043Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.01271899975836277, "best_triton_pos": 0} 2025-12-04T11:45:26.1013094Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.1013137Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1013238Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1013508Z triton_mm_35 0.0127 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1013554Z _scaled_mm 0.0131 ms 96.9% 2025-12-04T11:45:26.1013783Z triton_mm_13 0.0140 ms 90.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1014015Z triton_mm_14 0.0143 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1014250Z triton_mm_15 0.0150 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1014478Z triton_mm_34 0.0154 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1014725Z triton_mm_31 0.0169 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1014953Z triton_mm_32 0.0170 ms 75.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1016873Z triton_mm_33 0.0170 ms 74.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1017108Z triton_mm_12 0.0177 ms 71.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1017242Z SingleProcess AUTOTUNE benchmarking takes 0.1852 seconds and 0.9456 seconds precompiling for 39 choices 2025-12-04T11:45:26.1017319Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1017366Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1017425Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1017524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1018017Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1018068Z graph_break [] 2025-12-04T11:45:26.1018134Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.1018208Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1018251Z Autotune Choices Stats: 2025-12-04T11:45:26.1018635Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_73", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012959999963641167, "best_triton_pos": 0} 2025-12-04T11:45:26.1018687Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.1018731Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1018831Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1019068Z triton_mm_73 0.0130 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1019304Z triton_mm_51 0.0141 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1019533Z triton_mm_52 0.0142 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1019761Z triton_mm_53 0.0146 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1019990Z triton_mm_72 0.0157 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1020232Z triton_mm_70 0.0166 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1020470Z triton_mm_69 0.0170 ms 76.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1020698Z triton_mm_71 0.0172 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1020933Z triton_mm_50 0.0178 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1021167Z triton_mm_54 0.0181 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1021297Z SingleProcess AUTOTUNE benchmarking takes 0.2710 seconds and 0.5178 seconds precompiling for 39 choices 2025-12-04T11:45:26.1021372Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1021413Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1021471Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1021569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1022068Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1022116Z graph_break [] 2025-12-04T11:45:26.1022183Z aten_mm_info [('aten._scaled_mm.default_1024_2048_1024', 1)] 2025-12-04T11:45:26.1022256Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1022298Z Autotune Choices Stats: 2025-12-04T11:45:26.1022669Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_111", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.012480000033974648, "best_triton_pos": 0} 2025-12-04T11:45:26.1022720Z AUTOTUNE scaled_mm(1024x1024, 1024x2048, , ) 2025-12-04T11:45:26.1022763Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1022860Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1023100Z triton_mm_111 0.0125 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1023143Z _scaled_mm 0.0135 ms 92.6% 2025-12-04T11:45:26.1023395Z triton_mm_89 0.0141 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1023624Z triton_mm_90 0.0146 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1023870Z triton_mm_91 0.0150 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1024099Z triton_mm_110 0.0152 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1024343Z triton_mm_108 0.0166 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1024573Z triton_mm_107 0.0168 ms 74.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1024805Z triton_mm_109 0.0172 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1025037Z triton_mm_88 0.0178 ms 70.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1025166Z SingleProcess AUTOTUNE benchmarking takes 0.2899 seconds and 0.3554 seconds precompiling for 39 choices 2025-12-04T11:45:26.1025361Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-28f92bec1f37331f.xml - 2025-12-04T11:45:26.1025420Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1026051Z FAILED [1.1676s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1026054Z 2025-12-04T11:45:26.1026130Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1026396Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1026398Z 2025-12-04T11:45:26.1026487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1026551Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1026620Z ================== 1 failed, 187 deselected, 2 rerun in 5.81s ================== 2025-12-04T11:45:26.1026661Z Got exit code 1 2025-12-04T11:45:26.1026876Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1027005Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1027152Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-93c0dd25b812e4ea.xml 2025-12-04T11:45:26.1027212Z ============================= test session starts ============================== 2025-12-04T11:45:26.1027325Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1027366Z cachedir: .pytest_cache 2025-12-04T11:45:26.1027540Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1027588Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1027628Z configfile: pytest.ini 2025-12-04T11:45:26.1027793Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1027871Z collecting ... collected 188 items / 126 deselected / 62 selected 2025-12-04T11:45:26.1027925Z stepcurrent: skipping 126 already run items. 2025-12-04T11:45:26.1027979Z Running 62 items in this shard 2025-12-04T11:45:26.1027981Z 2025-12-04T11:45:26.1028204Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6615s] [ 1%] 2025-12-04T11:45:26.1028420Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2626s] [ 1%] 2025-12-04T11:45:26.1028613Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2169s] [ 1%] 2025-12-04T11:45:26.1028616Z 2025-12-04T11:45:26.1028667Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1028816Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1028864Z Traceback (most recent call last): 2025-12-04T11:45:26.1029026Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1029071Z method(*args, **kwargs) 2025-12-04T11:45:26.1029226Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1029281Z method(*args, **kwargs) 2025-12-04T11:45:26.1029434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1029471Z with policy(): 2025-12-04T11:45:26.1029642Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1029686Z raise RuntimeError(msg) 2025-12-04T11:45:26.1030081Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1030083Z 2025-12-04T11:45:26.1030159Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1030426Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1030430Z 2025-12-04T11:45:26.1030519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1030592Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1030640Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1030696Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1030763Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1030865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1030902Z graph_break [] 2025-12-04T11:45:26.1030964Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1031112Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1031171Z Traceback (most recent call last): 2025-12-04T11:45:26.1031326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1031369Z method(*args, **kwargs) 2025-12-04T11:45:26.1031522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1031563Z method(*args, **kwargs) 2025-12-04T11:45:26.1031723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1031761Z with policy(): 2025-12-04T11:45:26.1031915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1031957Z raise RuntimeError(msg) 2025-12-04T11:45:26.1032346Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1032350Z 2025-12-04T11:45:26.1032424Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1032683Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1032686Z 2025-12-04T11:45:26.1032773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1032846Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1032893Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1032948Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1033019Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1033130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1033167Z graph_break [] 2025-12-04T11:45:26.1033227Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1033353Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1033397Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1033454Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1033550Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1033615Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1033651Z graph_break [] 2025-12-04T11:45:26.1033712Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1033764Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1033911Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1033960Z Traceback (most recent call last): 2025-12-04T11:45:26.1034117Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1034157Z method(*args, **kwargs) 2025-12-04T11:45:26.1034412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1034453Z method(*args, **kwargs) 2025-12-04T11:45:26.1034608Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1034644Z with policy(): 2025-12-04T11:45:26.1034797Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1034839Z raise RuntimeError(msg) 2025-12-04T11:45:26.1035251Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1035254Z 2025-12-04T11:45:26.1035329Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1035605Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1035608Z 2025-12-04T11:45:26.1035695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1035768Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1035815Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1035874Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1035939Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1036036Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1036076Z graph_break [] 2025-12-04T11:45:26.1036136Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1036210Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1036253Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1036311Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1036407Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1036472Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1036508Z graph_break [] 2025-12-04T11:45:26.1036567Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1036639Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1036697Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1036752Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1036859Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1036922Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1036960Z graph_break [] 2025-12-04T11:45:26.1037017Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1037213Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-93c0dd25b812e4ea.xml - 2025-12-04T11:45:26.1037273Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1037870Z FAILED [0.2169s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1037875Z 2025-12-04T11:45:26.1037947Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1038208Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1038210Z 2025-12-04T11:45:26.1038299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1038361Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1038429Z ================== 1 failed, 126 deselected, 2 rerun in 2.16s ================== 2025-12-04T11:45:26.1038479Z Got exit code 1 2025-12-04T11:45:26.1038522Z Retrying single test... 2025-12-04T11:45:26.1038669Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36197cdded6579c9.xml 2025-12-04T11:45:26.1038729Z ============================= test session starts ============================== 2025-12-04T11:45:26.1038839Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1038880Z cachedir: .pytest_cache 2025-12-04T11:45:26.1039049Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1039099Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1039140Z configfile: pytest.ini 2025-12-04T11:45:26.1039301Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1039374Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1039635Z stepcurrent: skipping 126 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1039681Z Running 1 items in this shard 2025-12-04T11:45:26.1039684Z 2025-12-04T11:45:26.1039905Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6610s] [100%] 2025-12-04T11:45:26.1040124Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2623s] [100%] 2025-12-04T11:45:26.1040317Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2225s] [100%] 2025-12-04T11:45:26.1040319Z 2025-12-04T11:45:26.1040383Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1040527Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1040574Z Traceback (most recent call last): 2025-12-04T11:45:26.1040743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1040789Z method(*args, **kwargs) 2025-12-04T11:45:26.1040945Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1040987Z method(*args, **kwargs) 2025-12-04T11:45:26.1041138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1041176Z with policy(): 2025-12-04T11:45:26.1041328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1041373Z raise RuntimeError(msg) 2025-12-04T11:45:26.1041766Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1041768Z 2025-12-04T11:45:26.1041844Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1042104Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1042106Z 2025-12-04T11:45:26.1042193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1042265Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1042322Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1042377Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1042447Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1042549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1042589Z graph_break [] 2025-12-04T11:45:26.1042650Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1042810Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1042859Z Traceback (most recent call last): 2025-12-04T11:45:26.1043014Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1043055Z method(*args, **kwargs) 2025-12-04T11:45:26.1043206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1043314Z method(*args, **kwargs) 2025-12-04T11:45:26.1043465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1043503Z with policy(): 2025-12-04T11:45:26.1043655Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1043700Z raise RuntimeError(msg) 2025-12-04T11:45:26.1044089Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1044091Z 2025-12-04T11:45:26.1044164Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1044451Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1044453Z 2025-12-04T11:45:26.1044553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1044626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1044671Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1044728Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1044795Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1044892Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1044929Z graph_break [] 2025-12-04T11:45:26.1044990Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1045064Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1045110Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1045167Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1045265Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1045332Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1045367Z graph_break [] 2025-12-04T11:45:26.1045428Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1045482Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1045629Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1045676Z Traceback (most recent call last): 2025-12-04T11:45:26.1045835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1045892Z method(*args, **kwargs) 2025-12-04T11:45:26.1046045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1046089Z method(*args, **kwargs) 2025-12-04T11:45:26.1046242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1046279Z with policy(): 2025-12-04T11:45:26.1046434Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1046490Z raise RuntimeError(msg) 2025-12-04T11:45:26.1046882Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1046886Z 2025-12-04T11:45:26.1046959Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1047220Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1047222Z 2025-12-04T11:45:26.1047310Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1047382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1047428Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1047484Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1047550Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1047650Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1047688Z graph_break [] 2025-12-04T11:45:26.1047759Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1047836Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1047880Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1047947Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1048043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1048110Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1048146Z graph_break [] 2025-12-04T11:45:26.1048206Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1048278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1048325Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1048379Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1048476Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1048541Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1048579Z graph_break [] 2025-12-04T11:45:26.1048638Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1048832Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-36197cdded6579c9.xml - 2025-12-04T11:45:26.1048891Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1049478Z FAILED [0.2225s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1049492Z 2025-12-04T11:45:26.1049565Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1049826Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1049828Z 2025-12-04T11:45:26.1049916Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1049979Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1050058Z ================== 1 failed, 187 deselected, 2 rerun in 2.16s ================== 2025-12-04T11:45:26.1050101Z Got exit code 1 2025-12-04T11:45:26.1050141Z Retrying single test... 2025-12-04T11:45:26.1050287Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b9eb5166299d9a8f.xml 2025-12-04T11:45:26.1050345Z ============================= test session starts ============================== 2025-12-04T11:45:26.1050457Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1050500Z cachedir: .pytest_cache 2025-12-04T11:45:26.1050659Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1050704Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1050744Z configfile: pytest.ini 2025-12-04T11:45:26.1050908Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1050981Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1051240Z stepcurrent: skipping 126 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1051283Z Running 1 items in this shard 2025-12-04T11:45:26.1051297Z 2025-12-04T11:45:26.1051516Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6582s] [100%] 2025-12-04T11:45:26.1051743Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2757s] [100%] 2025-12-04T11:45:26.1051935Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2195s] [100%] 2025-12-04T11:45:26.1051937Z 2025-12-04T11:45:26.1051988Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1052134Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1052183Z Traceback (most recent call last): 2025-12-04T11:45:26.1052341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1052386Z method(*args, **kwargs) 2025-12-04T11:45:26.1052539Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1052583Z method(*args, **kwargs) 2025-12-04T11:45:26.1052735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1052772Z with policy(): 2025-12-04T11:45:26.1052928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1052972Z raise RuntimeError(msg) 2025-12-04T11:45:26.1053419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1053439Z 2025-12-04T11:45:26.1053512Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1053774Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1053776Z 2025-12-04T11:45:26.1053874Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1053949Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1053993Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1054049Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1054115Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1054213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1054251Z graph_break [] 2025-12-04T11:45:26.1054311Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1054457Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1054506Z Traceback (most recent call last): 2025-12-04T11:45:26.1054660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1054700Z method(*args, **kwargs) 2025-12-04T11:45:26.1054853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1054898Z method(*args, **kwargs) 2025-12-04T11:45:26.1055047Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1055086Z with policy(): 2025-12-04T11:45:26.1055252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1055297Z raise RuntimeError(msg) 2025-12-04T11:45:26.1055706Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1055708Z 2025-12-04T11:45:26.1055782Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1056040Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1056044Z 2025-12-04T11:45:26.1056130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1056205Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1056248Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1056306Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1056372Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1056472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1056508Z graph_break [] 2025-12-04T11:45:26.1056571Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1056643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1056687Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1056743Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1056840Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1056918Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1056954Z graph_break [] 2025-12-04T11:45:26.1057013Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1057068Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1057215Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1057264Z Traceback (most recent call last): 2025-12-04T11:45:26.1057437Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1057482Z method(*args, **kwargs) 2025-12-04T11:45:26.1057634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1057676Z method(*args, **kwargs) 2025-12-04T11:45:26.1057827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1057867Z with policy(): 2025-12-04T11:45:26.1058022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1058068Z raise RuntimeError(msg) 2025-12-04T11:45:26.1058458Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1058461Z 2025-12-04T11:45:26.1058534Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1058793Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1058809Z 2025-12-04T11:45:26.1058896Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1058972Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1059025Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1059082Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1059147Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1059245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1059282Z graph_break [] 2025-12-04T11:45:26.1059342Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1059415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1059459Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1059513Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1059612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1059676Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1059713Z graph_break [] 2025-12-04T11:45:26.1059773Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1059848Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1059896Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1059952Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1060048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1060112Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1060148Z graph_break [] 2025-12-04T11:45:26.1060207Z aten_mm_info [('aten._scaled_mm.default_1024_16_16', 1)] 2025-12-04T11:45:26.1060401Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b9eb5166299d9a8f.xml - 2025-12-04T11:45:26.1060474Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1061075Z FAILED [0.2195s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1061078Z 2025-12-04T11:45:26.1061152Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1061411Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1061415Z 2025-12-04T11:45:26.1061501Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1061563Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1061632Z ================== 1 failed, 187 deselected, 2 rerun in 2.17s ================== 2025-12-04T11:45:26.1061675Z Got exit code 1 2025-12-04T11:45:26.1061883Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1062012Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1062157Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d0eae4e8065d5a75.xml 2025-12-04T11:45:26.1062215Z ============================= test session starts ============================== 2025-12-04T11:45:26.1062324Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1062377Z cachedir: .pytest_cache 2025-12-04T11:45:26.1062538Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1062599Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1062641Z configfile: pytest.ini 2025-12-04T11:45:26.1062804Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1062882Z collecting ... collected 188 items / 127 deselected / 61 selected 2025-12-04T11:45:26.1062936Z stepcurrent: skipping 127 already run items. 2025-12-04T11:45:26.1062983Z Running 61 items in this shard 2025-12-04T11:45:26.1062985Z 2025-12-04T11:45:26.1063211Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6727s] [ 1%] 2025-12-04T11:45:26.1063477Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2674s] [ 1%] 2025-12-04T11:45:26.1063675Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2203s] [ 1%] 2025-12-04T11:45:26.1063677Z 2025-12-04T11:45:26.1063727Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1063878Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1063925Z Traceback (most recent call last): 2025-12-04T11:45:26.1064084Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1064127Z method(*args, **kwargs) 2025-12-04T11:45:26.1064281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1064340Z method(*args, **kwargs) 2025-12-04T11:45:26.1064493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1064531Z with policy(): 2025-12-04T11:45:26.1064683Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1064729Z raise RuntimeError(msg) 2025-12-04T11:45:26.1065138Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1065140Z 2025-12-04T11:45:26.1065215Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1065482Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1065486Z 2025-12-04T11:45:26.1065572Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1065645Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1065689Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1065746Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1065813Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1065911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1065948Z graph_break [] 2025-12-04T11:45:26.1066010Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1066174Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1066221Z Traceback (most recent call last): 2025-12-04T11:45:26.1066390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1066430Z method(*args, **kwargs) 2025-12-04T11:45:26.1066582Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1066624Z method(*args, **kwargs) 2025-12-04T11:45:26.1066776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1066812Z with policy(): 2025-12-04T11:45:26.1066966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1067010Z raise RuntimeError(msg) 2025-12-04T11:45:26.1067410Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1067412Z 2025-12-04T11:45:26.1067486Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1067749Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1067751Z 2025-12-04T11:45:26.1067839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1067914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1067959Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1068026Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1068092Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1068190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1068228Z graph_break [] 2025-12-04T11:45:26.1068291Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1068365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1068408Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1068475Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1068571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1068638Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1068674Z graph_break [] 2025-12-04T11:45:26.1068736Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1068790Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1068939Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1068991Z Traceback (most recent call last): 2025-12-04T11:45:26.1069149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1069191Z method(*args, **kwargs) 2025-12-04T11:45:26.1069346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1069387Z method(*args, **kwargs) 2025-12-04T11:45:26.1069537Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1069573Z with policy(): 2025-12-04T11:45:26.1069726Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1069779Z raise RuntimeError(msg) 2025-12-04T11:45:26.1070183Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1070185Z 2025-12-04T11:45:26.1070258Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1070524Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1070526Z 2025-12-04T11:45:26.1070613Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1070686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1070734Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1070789Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1070856Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1070952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1070989Z graph_break [] 2025-12-04T11:45:26.1071049Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1071127Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1071170Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1071225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1071321Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1071385Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1071438Z graph_break [] 2025-12-04T11:45:26.1071500Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1071572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1071618Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1071672Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1071769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1071832Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1071878Z graph_break [] 2025-12-04T11:45:26.1071938Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1072136Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d0eae4e8065d5a75.xml - 2025-12-04T11:45:26.1072195Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1072802Z FAILED [0.2203s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1072804Z 2025-12-04T11:45:26.1072878Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1073141Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1073143Z 2025-12-04T11:45:26.1073229Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1073330Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1073413Z ================== 1 failed, 127 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.1073452Z Got exit code 1 2025-12-04T11:45:26.1073494Z Retrying single test... 2025-12-04T11:45:26.1073654Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6a2249a633e17299.xml 2025-12-04T11:45:26.1073714Z ============================= test session starts ============================== 2025-12-04T11:45:26.1073825Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1073867Z cachedir: .pytest_cache 2025-12-04T11:45:26.1074024Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1074074Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1074114Z configfile: pytest.ini 2025-12-04T11:45:26.1074276Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1074353Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1074616Z stepcurrent: skipping 127 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1074660Z Running 1 items in this shard 2025-12-04T11:45:26.1074662Z 2025-12-04T11:45:26.1074885Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6450s] [100%] 2025-12-04T11:45:26.1075104Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2535s] [100%] 2025-12-04T11:45:26.1075299Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2106s] [100%] 2025-12-04T11:45:26.1075318Z 2025-12-04T11:45:26.1075369Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1075516Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1075565Z Traceback (most recent call last): 2025-12-04T11:45:26.1075735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1075779Z method(*args, **kwargs) 2025-12-04T11:45:26.1075933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1075976Z method(*args, **kwargs) 2025-12-04T11:45:26.1076126Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1076166Z with policy(): 2025-12-04T11:45:26.1076319Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1076363Z raise RuntimeError(msg) 2025-12-04T11:45:26.1076758Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1076760Z 2025-12-04T11:45:26.1076834Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1077100Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1077114Z 2025-12-04T11:45:26.1077200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1077274Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1077317Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1077386Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1077452Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1077550Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1077587Z graph_break [] 2025-12-04T11:45:26.1077650Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1077798Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1077846Z Traceback (most recent call last): 2025-12-04T11:45:26.1078000Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1078045Z method(*args, **kwargs) 2025-12-04T11:45:26.1078196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1078244Z method(*args, **kwargs) 2025-12-04T11:45:26.1078394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1078431Z with policy(): 2025-12-04T11:45:26.1078583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1078627Z raise RuntimeError(msg) 2025-12-04T11:45:26.1079017Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1079032Z 2025-12-04T11:45:26.1079106Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1079367Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1079371Z 2025-12-04T11:45:26.1079458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1079542Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1079587Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1079643Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1079708Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1079806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1079844Z graph_break [] 2025-12-04T11:45:26.1079906Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1079979Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1080023Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1080078Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1080174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1080238Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1080275Z graph_break [] 2025-12-04T11:45:26.1080335Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1080387Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1080534Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1080596Z Traceback (most recent call last): 2025-12-04T11:45:26.1080750Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1080793Z method(*args, **kwargs) 2025-12-04T11:45:26.1080955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1080997Z method(*args, **kwargs) 2025-12-04T11:45:26.1081147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1081185Z with policy(): 2025-12-04T11:45:26.1081337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1081381Z raise RuntimeError(msg) 2025-12-04T11:45:26.1081772Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1081777Z 2025-12-04T11:45:26.1081851Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1082112Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1082116Z 2025-12-04T11:45:26.1082203Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1082277Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1082320Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1082375Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1082441Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1082551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1082587Z graph_break [] 2025-12-04T11:45:26.1082649Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1082723Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1082767Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1082821Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1082928Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1082993Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1083029Z graph_break [] 2025-12-04T11:45:26.1083089Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1083163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1083205Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1083308Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1083404Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1083469Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1083505Z graph_break [] 2025-12-04T11:45:26.1083565Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1083759Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6a2249a633e17299.xml - 2025-12-04T11:45:26.1083819Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1084414Z FAILED [0.2106s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1084435Z 2025-12-04T11:45:26.1084530Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1084795Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1084798Z 2025-12-04T11:45:26.1084885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1084947Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1085016Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:26.1085057Z Got exit code 1 2025-12-04T11:45:26.1085097Z Retrying single test... 2025-12-04T11:45:26.1085246Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ec126f2a4e09daf.xml 2025-12-04T11:45:26.1085304Z ============================= test session starts ============================== 2025-12-04T11:45:26.1085415Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1085457Z cachedir: .pytest_cache 2025-12-04T11:45:26.1085618Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1085668Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1085713Z configfile: pytest.ini 2025-12-04T11:45:26.1085874Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1085949Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1086210Z stepcurrent: skipping 127 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1086273Z Running 1 items in this shard 2025-12-04T11:45:26.1086275Z 2025-12-04T11:45:26.1086497Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6526s] [100%] 2025-12-04T11:45:26.1086731Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2676s] [100%] 2025-12-04T11:45:26.1086924Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2210s] [100%] 2025-12-04T11:45:26.1086927Z 2025-12-04T11:45:26.1086977Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1087125Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1087176Z Traceback (most recent call last): 2025-12-04T11:45:26.1087334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1087377Z method(*args, **kwargs) 2025-12-04T11:45:26.1087531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1087574Z method(*args, **kwargs) 2025-12-04T11:45:26.1087732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1087769Z with policy(): 2025-12-04T11:45:26.1087925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1087965Z raise RuntimeError(msg) 2025-12-04T11:45:26.1088384Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1088386Z 2025-12-04T11:45:26.1088460Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1088724Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1088726Z 2025-12-04T11:45:26.1088812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1088887Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1088933Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1088996Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1089061Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1089159Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1089196Z graph_break [] 2025-12-04T11:45:26.1089258Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1089408Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1089460Z Traceback (most recent call last): 2025-12-04T11:45:26.1089614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1089656Z method(*args, **kwargs) 2025-12-04T11:45:26.1089809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1089864Z method(*args, **kwargs) 2025-12-04T11:45:26.1090016Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1090053Z with policy(): 2025-12-04T11:45:26.1090206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1090252Z raise RuntimeError(msg) 2025-12-04T11:45:26.1090661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1090664Z 2025-12-04T11:45:26.1090737Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1090999Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1091003Z 2025-12-04T11:45:26.1091088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1091163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1091208Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1091264Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1091329Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1091428Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1091464Z graph_break [] 2025-12-04T11:45:26.1091527Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1091599Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1091645Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1091711Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1091812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1091887Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1091926Z graph_break [] 2025-12-04T11:45:26.1091987Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1092040Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1092187Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1092239Z Traceback (most recent call last): 2025-12-04T11:45:26.1092394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1092438Z method(*args, **kwargs) 2025-12-04T11:45:26.1092591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1092636Z method(*args, **kwargs) 2025-12-04T11:45:26.1092789Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1092828Z with policy(): 2025-12-04T11:45:26.1092980Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1093024Z raise RuntimeError(msg) 2025-12-04T11:45:26.1093485Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1093487Z 2025-12-04T11:45:26.1093560Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1093842Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1093844Z 2025-12-04T11:45:26.1093932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1094009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1094053Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1094123Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1094190Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1094289Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1094327Z graph_break [] 2025-12-04T11:45:26.1094390Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1094465Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1094510Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1094565Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1094666Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1094732Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1094772Z graph_break [] 2025-12-04T11:45:26.1094831Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1094906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1094950Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1095006Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1095102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1095167Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1095217Z graph_break [] 2025-12-04T11:45:26.1095279Z aten_mm_info [('aten._scaled_mm.default_1024_2048_16', 1)] 2025-12-04T11:45:26.1095491Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0ec126f2a4e09daf.xml - 2025-12-04T11:45:26.1095554Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1096148Z FAILED [0.2210s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1096153Z 2025-12-04T11:45:26.1096226Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1096491Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1096493Z 2025-12-04T11:45:26.1096579Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1096643Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1096710Z ================== 1 failed, 187 deselected, 2 rerun in 2.16s ================== 2025-12-04T11:45:26.1096750Z Got exit code 1 2025-12-04T11:45:26.1096960Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1097090Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1097248Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-39a38a3ddad9961e.xml 2025-12-04T11:45:26.1097305Z ============================= test session starts ============================== 2025-12-04T11:45:26.1097416Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1097459Z cachedir: .pytest_cache 2025-12-04T11:45:26.1097618Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1097682Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1097723Z configfile: pytest.ini 2025-12-04T11:45:26.1097886Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1097964Z collecting ... collected 188 items / 128 deselected / 60 selected 2025-12-04T11:45:26.1098021Z stepcurrent: skipping 128 already run items. 2025-12-04T11:45:26.1098068Z Running 60 items in this shard 2025-12-04T11:45:26.1098070Z 2025-12-04T11:45:26.1098295Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0456s] [ 1%] 2025-12-04T11:45:26.1098514Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7204s] [ 1%] 2025-12-04T11:45:26.1098705Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5984s] [ 1%] 2025-12-04T11:45:26.1098707Z 2025-12-04T11:45:26.1098758Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1098905Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1098952Z Traceback (most recent call last): 2025-12-04T11:45:26.1099131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1099174Z method(*args, **kwargs) 2025-12-04T11:45:26.1099337Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1099381Z method(*args, **kwargs) 2025-12-04T11:45:26.1099532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1099572Z with policy(): 2025-12-04T11:45:26.1099725Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1099771Z raise RuntimeError(msg) 2025-12-04T11:45:26.1100168Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1100174Z 2025-12-04T11:45:26.1100247Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1100513Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1100515Z 2025-12-04T11:45:26.1100603Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1100677Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1100722Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1100778Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1101267Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1101377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1101413Z graph_break [] 2025-12-04T11:45:26.1101477Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1101561Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1102056Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1102106Z current_size = base.storage().size() 2025-12-04T11:45:26.1102150Z Autotune Choices Stats: 2025-12-04T11:45:26.1102530Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1102577Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1102622Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1102724Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1102967Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1103221Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1103479Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1103703Z triton_mm_3 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1103930Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1104157Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1104383Z triton_mm_0 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1104611Z triton_mm_6 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1104656Z _scaled_mm 0.0235 ms 25.9% 2025-12-04T11:45:26.1104789Z SingleProcess AUTOTUNE benchmarking takes 0.0393 seconds and 0.1685 seconds precompiling for 9 choices 2025-12-04T11:45:26.1104954Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1105002Z Traceback (most recent call last): 2025-12-04T11:45:26.1105162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1105205Z method(*args, **kwargs) 2025-12-04T11:45:26.1105358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1105414Z method(*args, **kwargs) 2025-12-04T11:45:26.1105569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1105607Z with policy(): 2025-12-04T11:45:26.1105762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1105805Z raise RuntimeError(msg) 2025-12-04T11:45:26.1106202Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1106206Z 2025-12-04T11:45:26.1106279Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1106543Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1106546Z 2025-12-04T11:45:26.1106632Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1106707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1106751Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1106825Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1107329Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1107432Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1107468Z graph_break [] 2025-12-04T11:45:26.1107531Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1107604Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1108097Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1108147Z current_size = base.storage().size() 2025-12-04T11:45:26.1108188Z Autotune Choices Stats: 2025-12-04T11:45:26.1108562Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1108606Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1108655Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1108756Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1109006Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1109235Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1109474Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1109701Z triton_mm_3 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1109933Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1110158Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1110385Z triton_mm_0 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1110610Z triton_mm_6 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1110666Z _scaled_mm 0.0235 ms 25.9% 2025-12-04T11:45:26.1110797Z SingleProcess AUTOTUNE benchmarking takes 0.0393 seconds and 0.1685 seconds precompiling for 9 choices 2025-12-04T11:45:26.1110883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1110930Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1110985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1111085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1111571Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1111610Z graph_break [] 2025-12-04T11:45:26.1111672Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1111746Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1111787Z Autotune Choices Stats: 2025-12-04T11:45:26.1112158Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.1112203Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1112247Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1112346Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1112581Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1112823Z triton_mm_9 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1113065Z triton_mm_8 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1113333Z triton_mm_11 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1113558Z triton_mm_14 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1113789Z triton_mm_12 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1114017Z triton_mm_13 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1114243Z triton_mm_15 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1114288Z _scaled_mm 0.0209 ms 29.8% 2025-12-04T11:45:26.1114417Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.0867 seconds precompiling for 9 choices 2025-12-04T11:45:26.1114488Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1114647Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1114694Z Traceback (most recent call last): 2025-12-04T11:45:26.1114852Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1114898Z method(*args, **kwargs) 2025-12-04T11:45:26.1115052Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1115095Z method(*args, **kwargs) 2025-12-04T11:45:26.1115247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1115284Z with policy(): 2025-12-04T11:45:26.1115443Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1115488Z raise RuntimeError(msg) 2025-12-04T11:45:26.1115885Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1115887Z 2025-12-04T11:45:26.1115962Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1116229Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1116231Z 2025-12-04T11:45:26.1116320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1116411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1116456Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1116514Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1117012Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1117111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1117149Z graph_break [] 2025-12-04T11:45:26.1117210Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1117285Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1117781Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1117831Z current_size = base.storage().size() 2025-12-04T11:45:26.1117870Z Autotune Choices Stats: 2025-12-04T11:45:26.1118242Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1118285Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1118340Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1118439Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1118686Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1118919Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1119148Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1119376Z triton_mm_3 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1119605Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1119831Z triton_mm_7 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1120059Z triton_mm_0 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1120284Z triton_mm_6 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1120338Z _scaled_mm 0.0235 ms 25.9% 2025-12-04T11:45:26.1120469Z SingleProcess AUTOTUNE benchmarking takes 0.0393 seconds and 0.1685 seconds precompiling for 9 choices 2025-12-04T11:45:26.1120544Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1120590Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1120659Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1120759Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1121251Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1121290Z graph_break [] 2025-12-04T11:45:26.1121353Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1121427Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1121470Z Autotune Choices Stats: 2025-12-04T11:45:26.1121837Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.1121885Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1121929Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1122029Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1122288Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1122520Z triton_mm_9 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1122748Z triton_mm_8 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1122973Z triton_mm_11 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1123209Z triton_mm_14 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1123470Z triton_mm_12 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1123698Z triton_mm_13 0.0066 ms 95.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1123924Z triton_mm_15 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1123988Z _scaled_mm 0.0209 ms 29.8% 2025-12-04T11:45:26.1124118Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.0867 seconds precompiling for 9 choices 2025-12-04T11:45:26.1124191Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1124237Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1124293Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1124407Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1124893Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1124932Z graph_break [] 2025-12-04T11:45:26.1124994Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1125069Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1125111Z Autotune Choices Stats: 2025-12-04T11:45:26.1125489Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1125533Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1125578Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1125676Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1125912Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1126171Z triton_mm_20 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1126402Z triton_mm_17 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1126632Z triton_mm_18 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1126856Z triton_mm_23 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1127203Z triton_mm_19 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1127431Z triton_mm_21 0.0065 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1127659Z triton_mm_22 0.0067 ms 91.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1127703Z _scaled_mm 0.0071 ms 85.9% 2025-12-04T11:45:26.1127852Z SingleProcess AUTOTUNE benchmarking takes 0.0535 seconds and 0.1886 seconds precompiling for 9 choices 2025-12-04T11:45:26.1128049Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-39a38a3ddad9961e.xml - 2025-12-04T11:45:26.1128109Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1128723Z FAILED [0.5984s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1128725Z 2025-12-04T11:45:26.1128802Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1129069Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1129071Z 2025-12-04T11:45:26.1129159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1129223Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1129292Z ================== 1 failed, 128 deselected, 2 rerun in 3.38s ================== 2025-12-04T11:45:26.1129332Z Got exit code 1 2025-12-04T11:45:26.1129375Z Retrying single test... 2025-12-04T11:45:26.1129520Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1e2a9907a409a5c5.xml 2025-12-04T11:45:26.1129579Z ============================= test session starts ============================== 2025-12-04T11:45:26.1129692Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1129747Z cachedir: .pytest_cache 2025-12-04T11:45:26.1129908Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1129967Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1130008Z configfile: pytest.ini 2025-12-04T11:45:26.1130173Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1130248Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1130509Z stepcurrent: skipping 128 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1130555Z Running 1 items in this shard 2025-12-04T11:45:26.1130557Z 2025-12-04T11:45:26.1130778Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1407s] [100%] 2025-12-04T11:45:26.1130997Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7937s] [100%] 2025-12-04T11:45:26.1131189Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6908s] [100%] 2025-12-04T11:45:26.1131192Z 2025-12-04T11:45:26.1131244Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1131392Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1131438Z Traceback (most recent call last): 2025-12-04T11:45:26.1131599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1131656Z method(*args, **kwargs) 2025-12-04T11:45:26.1131810Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1131857Z method(*args, **kwargs) 2025-12-04T11:45:26.1132009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1132046Z with policy(): 2025-12-04T11:45:26.1132211Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1132260Z raise RuntimeError(msg) 2025-12-04T11:45:26.1132654Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1132658Z 2025-12-04T11:45:26.1132735Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1132998Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1133000Z 2025-12-04T11:45:26.1133088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1133163Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1133210Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1133301Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1133785Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1133913Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1133953Z graph_break [] 2025-12-04T11:45:26.1134017Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1134091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1134587Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1134633Z current_size = base.storage().size() 2025-12-04T11:45:26.1134677Z Autotune Choices Stats: 2025-12-04T11:45:26.1135054Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1135101Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1135142Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1135247Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1135485Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1135718Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1135962Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1136203Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1136431Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1136654Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1136883Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1137112Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1137156Z _scaled_mm 0.0218 ms 27.1% 2025-12-04T11:45:26.1137285Z SingleProcess AUTOTUNE benchmarking takes 0.0401 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.1137431Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1137489Z Traceback (most recent call last): 2025-12-04T11:45:26.1137649Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1137703Z method(*args, **kwargs) 2025-12-04T11:45:26.1137857Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1137900Z method(*args, **kwargs) 2025-12-04T11:45:26.1138056Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1138094Z with policy(): 2025-12-04T11:45:26.1138247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1138290Z raise RuntimeError(msg) 2025-12-04T11:45:26.1138685Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1138689Z 2025-12-04T11:45:26.1138769Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1139031Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1139034Z 2025-12-04T11:45:26.1139124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1139199Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1139245Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1139301Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1139799Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1139899Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1139938Z graph_break [] 2025-12-04T11:45:26.1140012Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1140086Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1140577Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1140625Z current_size = base.storage().size() 2025-12-04T11:45:26.1140670Z Autotune Choices Stats: 2025-12-04T11:45:26.1141043Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1141091Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1141132Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1141233Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1141472Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1141733Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1141963Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1142190Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1142417Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1142644Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1142871Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1143098Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1143141Z _scaled_mm 0.0218 ms 27.1% 2025-12-04T11:45:26.1143322Z SingleProcess AUTOTUNE benchmarking takes 0.0401 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.1143412Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1143455Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1143513Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1143613Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1144110Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1144150Z graph_break [] 2025-12-04T11:45:26.1144211Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1144288Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1144329Z Autotune Choices Stats: 2025-12-04T11:45:26.1144701Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.1144744Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1144788Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1144887Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1145122Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1145368Z triton_mm_8 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1145609Z triton_mm_9 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1145844Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1146069Z triton_mm_15 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1146299Z triton_mm_14 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1146527Z triton_mm_13 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1146754Z triton_mm_11 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1146798Z _scaled_mm 0.0227 ms 26.2% 2025-12-04T11:45:26.1146928Z SingleProcess AUTOTUNE benchmarking takes 0.0392 seconds and 0.1026 seconds precompiling for 9 choices 2025-12-04T11:45:26.1146984Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1147142Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1147189Z Traceback (most recent call last): 2025-12-04T11:45:26.1147348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1147390Z method(*args, **kwargs) 2025-12-04T11:45:26.1147555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1147599Z method(*args, **kwargs) 2025-12-04T11:45:26.1147753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1147791Z with policy(): 2025-12-04T11:45:26.1147943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1147988Z raise RuntimeError(msg) 2025-12-04T11:45:26.1148383Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1148386Z 2025-12-04T11:45:26.1148462Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1148726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1148730Z 2025-12-04T11:45:26.1148817Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1148892Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1148945Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1149003Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1149499Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1149600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1149637Z graph_break [] 2025-12-04T11:45:26.1149697Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1149769Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1150264Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1150312Z current_size = base.storage().size() 2025-12-04T11:45:26.1150355Z Autotune Choices Stats: 2025-12-04T11:45:26.1150724Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1150771Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1150814Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1150912Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1151160Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1151390Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1151630Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1151858Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1152093Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1152323Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1152549Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1152773Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1152825Z _scaled_mm 0.0218 ms 27.1% 2025-12-04T11:45:26.1152954Z SingleProcess AUTOTUNE benchmarking takes 0.0401 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.1153042Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1153088Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1153145Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1153284Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1153770Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1153810Z graph_break [] 2025-12-04T11:45:26.1153871Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1153945Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1153986Z Autotune Choices Stats: 2025-12-04T11:45:26.1154355Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.1154400Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1154440Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1154542Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1154775Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1155020Z triton_mm_8 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1155261Z triton_mm_9 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1155492Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1155716Z triton_mm_15 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1155943Z triton_mm_14 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1156175Z triton_mm_13 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1156402Z triton_mm_11 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1156445Z _scaled_mm 0.0227 ms 26.2% 2025-12-04T11:45:26.1156596Z SingleProcess AUTOTUNE benchmarking takes 0.0392 seconds and 0.1026 seconds precompiling for 9 choices 2025-12-04T11:45:26.1156672Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1156727Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1156786Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1156885Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1157370Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1157407Z graph_break [] 2025-12-04T11:45:26.1157472Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1157544Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1157588Z Autotune Choices Stats: 2025-12-04T11:45:26.1157954Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1158000Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1158041Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1158139Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1158376Z triton_mm_19 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1158621Z triton_mm_16 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1158851Z triton_mm_18 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1159088Z triton_mm_17 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1159317Z triton_mm_20 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1159548Z triton_mm_21 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1159773Z triton_mm_23 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1160005Z triton_mm_22 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1160046Z _scaled_mm 0.0232 ms 25.5% 2025-12-04T11:45:26.1160177Z SingleProcess AUTOTUNE benchmarking takes 0.0556 seconds and 0.1874 seconds precompiling for 9 choices 2025-12-04T11:45:26.1160382Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-1e2a9907a409a5c5.xml - 2025-12-04T11:45:26.1160455Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1161049Z FAILED [0.6908s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1161054Z 2025-12-04T11:45:26.1161129Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1161393Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1161397Z 2025-12-04T11:45:26.1161486Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1161552Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1161619Z ================== 1 failed, 187 deselected, 2 rerun in 3.64s ================== 2025-12-04T11:45:26.1161660Z Got exit code 1 2025-12-04T11:45:26.1161700Z Retrying single test... 2025-12-04T11:45:26.1161849Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6255a66cf5392739.xml 2025-12-04T11:45:26.1161904Z ============================= test session starts ============================== 2025-12-04T11:45:26.1162017Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1162057Z cachedir: .pytest_cache 2025-12-04T11:45:26.1162236Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1162282Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1162324Z configfile: pytest.ini 2025-12-04T11:45:26.1162488Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1162564Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1162839Z stepcurrent: skipping 128 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1162884Z Running 1 items in this shard 2025-12-04T11:45:26.1162887Z 2025-12-04T11:45:26.1163103Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0136s] [100%] 2025-12-04T11:45:26.1163358Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7150s] [100%] 2025-12-04T11:45:26.1163552Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5819s] [100%] 2025-12-04T11:45:26.1163554Z 2025-12-04T11:45:26.1163604Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1163751Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1163797Z Traceback (most recent call last): 2025-12-04T11:45:26.1163959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1164001Z method(*args, **kwargs) 2025-12-04T11:45:26.1164172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1164213Z method(*args, **kwargs) 2025-12-04T11:45:26.1164380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1164418Z with policy(): 2025-12-04T11:45:26.1164574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1164615Z raise RuntimeError(msg) 2025-12-04T11:45:26.1165011Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1165014Z 2025-12-04T11:45:26.1165089Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1165355Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1165358Z 2025-12-04T11:45:26.1165447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1165521Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1165565Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1165622Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1166110Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1166223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1166263Z graph_break [] 2025-12-04T11:45:26.1166328Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1166403Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1166903Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1166952Z current_size = base.storage().size() 2025-12-04T11:45:26.1166994Z Autotune Choices Stats: 2025-12-04T11:45:26.1167372Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1167421Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1167461Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1167560Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1167797Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1168032Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1168284Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1168510Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1168736Z triton_mm_7 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1168961Z triton_mm_5 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1169190Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1169418Z triton_mm_4 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1169464Z _scaled_mm 0.0178 ms 33.9% 2025-12-04T11:45:26.1169593Z SingleProcess AUTOTUNE benchmarking takes 0.0354 seconds and 0.1607 seconds precompiling for 9 choices 2025-12-04T11:45:26.1169743Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1169789Z Traceback (most recent call last): 2025-12-04T11:45:26.1169962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1170004Z method(*args, **kwargs) 2025-12-04T11:45:26.1170161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1170201Z method(*args, **kwargs) 2025-12-04T11:45:26.1170354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1170390Z with policy(): 2025-12-04T11:45:26.1170564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1170607Z raise RuntimeError(msg) 2025-12-04T11:45:26.1171002Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1171006Z 2025-12-04T11:45:26.1171083Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1171346Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1171348Z 2025-12-04T11:45:26.1171437Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1171512Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1171557Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1171613Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1172104Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1172230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1172270Z graph_break [] 2025-12-04T11:45:26.1172331Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1172407Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1172897Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1172949Z current_size = base.storage().size() 2025-12-04T11:45:26.1172991Z Autotune Choices Stats: 2025-12-04T11:45:26.1173379Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1173427Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1173468Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1173570Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1173805Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1174052Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1174281Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1174522Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1174749Z triton_mm_7 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1174976Z triton_mm_5 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1175201Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1175430Z triton_mm_4 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1175474Z _scaled_mm 0.0178 ms 33.9% 2025-12-04T11:45:26.1175603Z SingleProcess AUTOTUNE benchmarking takes 0.0354 seconds and 0.1607 seconds precompiling for 9 choices 2025-12-04T11:45:26.1175678Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1175739Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1175801Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1175914Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1176401Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1176441Z graph_break [] 2025-12-04T11:45:26.1176503Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1176578Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1176620Z Autotune Choices Stats: 2025-12-04T11:45:26.1176987Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1177034Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1177077Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1177176Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1177412Z triton_mm_9 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1177641Z triton_mm_10 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1177886Z triton_mm_14 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1178124Z triton_mm_11 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1178351Z triton_mm_15 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1178580Z triton_mm_8 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1178810Z triton_mm_12 0.0062 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1179036Z triton_mm_13 0.0063 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1179076Z _scaled_mm 0.0208 ms 28.2% 2025-12-04T11:45:26.1179208Z SingleProcess AUTOTUNE benchmarking takes 0.0335 seconds and 0.1032 seconds precompiling for 9 choices 2025-12-04T11:45:26.1179262Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1179409Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1179466Z Traceback (most recent call last): 2025-12-04T11:45:26.1179638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1179680Z method(*args, **kwargs) 2025-12-04T11:45:26.1179834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1179875Z method(*args, **kwargs) 2025-12-04T11:45:26.1180028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1180065Z with policy(): 2025-12-04T11:45:26.1180220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1180264Z raise RuntimeError(msg) 2025-12-04T11:45:26.1180660Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1180663Z 2025-12-04T11:45:26.1180737Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1180998Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1181000Z 2025-12-04T11:45:26.1181090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1181162Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1181207Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1181264Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1181765Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1181863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1181912Z graph_break [] 2025-12-04T11:45:26.1181974Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1182048Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1182545Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1182594Z current_size = base.storage().size() 2025-12-04T11:45:26.1182637Z Autotune Choices Stats: 2025-12-04T11:45:26.1183006Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1183050Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1183090Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1183191Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1183449Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1183708Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1183940Z triton_mm_0 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1184164Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1184393Z triton_mm_7 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1184621Z triton_mm_5 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1184850Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1185075Z triton_mm_4 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1185117Z _scaled_mm 0.0178 ms 33.9% 2025-12-04T11:45:26.1185267Z SingleProcess AUTOTUNE benchmarking takes 0.0354 seconds and 0.1607 seconds precompiling for 9 choices 2025-12-04T11:45:26.1185342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1185385Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1185443Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1185543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1186033Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1186073Z graph_break [] 2025-12-04T11:45:26.1186136Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1186214Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1186256Z Autotune Choices Stats: 2025-12-04T11:45:26.1186626Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1186671Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1186713Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1186811Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1187043Z triton_mm_9 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1187292Z triton_mm_10 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1187520Z triton_mm_14 0.0061 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1187746Z triton_mm_11 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1187971Z triton_mm_15 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1188202Z triton_mm_8 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1188429Z triton_mm_12 0.0062 ms 94.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1188657Z triton_mm_13 0.0063 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1188697Z _scaled_mm 0.0208 ms 28.2% 2025-12-04T11:45:26.1188828Z SingleProcess AUTOTUNE benchmarking takes 0.0335 seconds and 0.1032 seconds precompiling for 9 choices 2025-12-04T11:45:26.1188915Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1188958Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1189016Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1189115Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1189612Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1191227Z graph_break [] 2025-12-04T11:45:26.1191292Z aten_mm_info [('aten._scaled_mm.default_1024_16_32', 1)] 2025-12-04T11:45:26.1191367Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1191412Z Autotune Choices Stats: 2025-12-04T11:45:26.1191778Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_20", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.1191824Z AUTOTUNE scaled_mm(1024x32, 32x16, , ) 2025-12-04T11:45:26.1191865Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1191965Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1192201Z triton_mm_20 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1192430Z triton_mm_18 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1192687Z triton_mm_19 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1192917Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1193145Z triton_mm_16 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1193420Z triton_mm_23 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1193648Z triton_mm_21 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1193873Z triton_mm_22 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1193915Z _scaled_mm 0.0223 ms 27.1% 2025-12-04T11:45:26.1194044Z SingleProcess AUTOTUNE benchmarking takes 0.0485 seconds and 0.1892 seconds precompiling for 9 choices 2025-12-04T11:45:26.1194238Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6255a66cf5392739.xml - 2025-12-04T11:45:26.1194321Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1194927Z FAILED [0.5819s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1194931Z 2025-12-04T11:45:26.1195006Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1195271Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1195275Z 2025-12-04T11:45:26.1195365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1195427Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1195498Z ================== 1 failed, 187 deselected, 2 rerun in 3.33s ================== 2025-12-04T11:45:26.1195536Z Got exit code 1 2025-12-04T11:45:26.1195746Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1195875Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1196019Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-24d0d58f9b8b9c6f.xml 2025-12-04T11:45:26.1196077Z ============================= test session starts ============================== 2025-12-04T11:45:26.1196204Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1196246Z cachedir: .pytest_cache 2025-12-04T11:45:26.1196419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1196465Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1196507Z configfile: pytest.ini 2025-12-04T11:45:26.1196673Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1196752Z collecting ... collected 188 items / 129 deselected / 59 selected 2025-12-04T11:45:26.1196808Z stepcurrent: skipping 129 already run items. 2025-12-04T11:45:26.1196852Z Running 59 items in this shard 2025-12-04T11:45:26.1196854Z 2025-12-04T11:45:26.1197081Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3629s] [ 1%] 2025-12-04T11:45:26.1197303Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8756s] [ 1%] 2025-12-04T11:45:26.1197498Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.8610s] [ 1%] 2025-12-04T11:45:26.1197500Z 2025-12-04T11:45:26.1197551Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1197700Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1197745Z Traceback (most recent call last): 2025-12-04T11:45:26.1197906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1197947Z method(*args, **kwargs) 2025-12-04T11:45:26.1198114Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1198154Z method(*args, **kwargs) 2025-12-04T11:45:26.1198307Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1198345Z with policy(): 2025-12-04T11:45:26.1198499Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1198540Z raise RuntimeError(msg) 2025-12-04T11:45:26.1198949Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.1198951Z 2025-12-04T11:45:26.1199028Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1199292Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1199295Z 2025-12-04T11:45:26.1199383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1199456Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1199500Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1199557Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1200054Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1200167Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1200203Z graph_break [] 2025-12-04T11:45:26.1200286Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1200360Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1200853Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1200901Z current_size = base.storage().size() 2025-12-04T11:45:26.1200943Z Autotune Choices Stats: 2025-12-04T11:45:26.1201317Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.1201365Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1201405Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1201506Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1201740Z triton_mm_8 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1201972Z triton_mm_13 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1202214Z triton_mm_15 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1202441Z triton_mm_5 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1202681Z triton_mm_16 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1202905Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1203140Z triton_mm_12 0.0068 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1203402Z triton_mm_10 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1203627Z triton_mm_14 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1203852Z triton_mm_18 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1204001Z SingleProcess AUTOTUNE benchmarking takes 0.0835 seconds and 0.3606 seconds precompiling for 21 choices 2025-12-04T11:45:26.1204164Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1204210Z Traceback (most recent call last): 2025-12-04T11:45:26.1204368Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1204409Z method(*args, **kwargs) 2025-12-04T11:45:26.1204562Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1204602Z method(*args, **kwargs) 2025-12-04T11:45:26.1204754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1204792Z with policy(): 2025-12-04T11:45:26.1204949Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1204990Z raise RuntimeError(msg) 2025-12-04T11:45:26.1205392Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.1205395Z 2025-12-04T11:45:26.1205470Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1205734Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1205736Z 2025-12-04T11:45:26.1205838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1205911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1205955Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1206012Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1206517Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1206616Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1206657Z graph_break [] 2025-12-04T11:45:26.1206720Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1206795Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1207288Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1207334Z current_size = base.storage().size() 2025-12-04T11:45:26.1207376Z Autotune Choices Stats: 2025-12-04T11:45:26.1207747Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.1207807Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1207847Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1207948Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1208192Z triton_mm_8 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1208422Z triton_mm_13 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1208650Z triton_mm_15 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1208882Z triton_mm_5 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1209110Z triton_mm_16 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1209338Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1209570Z triton_mm_12 0.0068 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1209808Z triton_mm_10 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1210035Z triton_mm_14 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1210271Z triton_mm_18 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1210403Z SingleProcess AUTOTUNE benchmarking takes 0.0835 seconds and 0.3606 seconds precompiling for 21 choices 2025-12-04T11:45:26.1210477Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1210522Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1210579Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1210678Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1211172Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1212917Z graph_break [] 2025-12-04T11:45:26.1212983Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1213057Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1213099Z Autotune Choices Stats: 2025-12-04T11:45:26.1213525Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1213575Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1213615Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1213714Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1213947Z triton_mm_33 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1214192Z triton_mm_25 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1214424Z triton_mm_26 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1214654Z triton_mm_32 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1214880Z triton_mm_31 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1215107Z triton_mm_37 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1215351Z triton_mm_29 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1215576Z triton_mm_30 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1215815Z triton_mm_36 0.0070 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1216045Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1216178Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.2867 seconds precompiling for 21 choices 2025-12-04T11:45:26.1216231Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1216382Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1216427Z Traceback (most recent call last): 2025-12-04T11:45:26.1216587Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1216677Z method(*args, **kwargs) 2025-12-04T11:45:26.1216832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1216872Z method(*args, **kwargs) 2025-12-04T11:45:26.1217023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1217060Z with policy(): 2025-12-04T11:45:26.1217225Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1217266Z raise RuntimeError(msg) 2025-12-04T11:45:26.1217664Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1217667Z 2025-12-04T11:45:26.1217741Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1218007Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1218010Z 2025-12-04T11:45:26.1218099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1218171Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1218214Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1218272Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1218761Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1218860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1218910Z graph_break [] 2025-12-04T11:45:26.1218973Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1219047Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1219544Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1219594Z current_size = base.storage().size() 2025-12-04T11:45:26.1219636Z Autotune Choices Stats: 2025-12-04T11:45:26.1220005Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.1220053Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1220094Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1220195Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1220428Z triton_mm_8 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1220657Z triton_mm_13 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1220902Z triton_mm_15 0.0067 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1221144Z triton_mm_5 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1221368Z triton_mm_16 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1221591Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1221818Z triton_mm_12 0.0068 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1222043Z triton_mm_10 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1222266Z triton_mm_14 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1222487Z triton_mm_18 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1222616Z SingleProcess AUTOTUNE benchmarking takes 0.0835 seconds and 0.3606 seconds precompiling for 21 choices 2025-12-04T11:45:26.1222700Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1222746Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1222801Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1222903Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1223428Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1223466Z graph_break [] 2025-12-04T11:45:26.1223531Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1223604Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1223648Z Autotune Choices Stats: 2025-12-04T11:45:26.1224010Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1224056Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1224096Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1224196Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1224444Z triton_mm_33 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1224672Z triton_mm_25 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1224913Z triton_mm_26 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1225141Z triton_mm_32 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1225367Z triton_mm_31 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1225589Z triton_mm_37 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1225816Z triton_mm_29 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1226041Z triton_mm_30 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1226268Z triton_mm_36 0.0070 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1226495Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1226639Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.2867 seconds precompiling for 21 choices 2025-12-04T11:45:26.1226713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1226755Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1226814Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1226923Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1227413Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1227451Z graph_break [] 2025-12-04T11:45:26.1227514Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1227587Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1227630Z Autotune Choices Stats: 2025-12-04T11:45:26.1227994Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_45", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:26.1228055Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1228096Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1228193Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1228442Z triton_mm_45 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1228669Z triton_mm_47 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1228894Z triton_mm_48 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1229120Z triton_mm_46 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1229345Z triton_mm_54 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1229570Z triton_mm_55 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1229796Z triton_mm_51 0.0067 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1230025Z triton_mm_52 0.0068 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1230262Z triton_mm_56 0.0068 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1230488Z triton_mm_57 0.0068 ms 96.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1230633Z SingleProcess AUTOTUNE benchmarking takes 0.1385 seconds and 0.2277 seconds precompiling for 21 choices 2025-12-04T11:45:26.1230826Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-24d0d58f9b8b9c6f.xml - 2025-12-04T11:45:26.1230887Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1231491Z FAILED [0.8610s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1231494Z 2025-12-04T11:45:26.1231568Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1231831Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1231846Z 2025-12-04T11:45:26.1231935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1231997Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1232067Z ================== 1 failed, 129 deselected, 2 rerun in 4.12s ================== 2025-12-04T11:45:26.1232104Z Got exit code 1 2025-12-04T11:45:26.1232144Z Retrying single test... 2025-12-04T11:45:26.1232299Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-609e3f8923f09063.xml 2025-12-04T11:45:26.1232357Z ============================= test session starts ============================== 2025-12-04T11:45:26.1232471Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1232512Z cachedir: .pytest_cache 2025-12-04T11:45:26.1232669Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1232718Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1232759Z configfile: pytest.ini 2025-12-04T11:45:26.1232921Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1232996Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1233383Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1233427Z Running 1 items in this shard 2025-12-04T11:45:26.1233430Z 2025-12-04T11:45:26.1233652Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3803s] [100%] 2025-12-04T11:45:26.1233871Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8801s] [100%] 2025-12-04T11:45:26.1234065Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7585s] [100%] 2025-12-04T11:45:26.1234087Z 2025-12-04T11:45:26.1234139Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1234287Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1234335Z Traceback (most recent call last): 2025-12-04T11:45:26.1234493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1234547Z method(*args, **kwargs) 2025-12-04T11:45:26.1234702Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1234745Z method(*args, **kwargs) 2025-12-04T11:45:26.1234896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1234935Z with policy(): 2025-12-04T11:45:26.1235089Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1235132Z raise RuntimeError(msg) 2025-12-04T11:45:26.1235525Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.1235529Z 2025-12-04T11:45:26.1235601Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1235882Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1235884Z 2025-12-04T11:45:26.1235972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1236045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1236087Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1236157Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1236648Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1236748Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1236786Z graph_break [] 2025-12-04T11:45:26.1236850Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1236924Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1237417Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1237465Z current_size = base.storage().size() 2025-12-04T11:45:26.1237506Z Autotune Choices Stats: 2025-12-04T11:45:26.1237874Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1237931Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1237972Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1238073Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1238308Z triton_mm_17 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1238550Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1238778Z triton_mm_6 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1239008Z triton_mm_14 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1239235Z triton_mm_13 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1239463Z triton_mm_15 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1239698Z triton_mm_10 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1239925Z triton_mm_12 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1240161Z triton_mm_7 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1240390Z triton_mm_11 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1240521Z SingleProcess AUTOTUNE benchmarking takes 0.0842 seconds and 0.3793 seconds precompiling for 21 choices 2025-12-04T11:45:26.1240669Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1240716Z Traceback (most recent call last): 2025-12-04T11:45:26.1240872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1240915Z method(*args, **kwargs) 2025-12-04T11:45:26.1241068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1241110Z method(*args, **kwargs) 2025-12-04T11:45:26.1241263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1241301Z with policy(): 2025-12-04T11:45:26.1241454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1241496Z raise RuntimeError(msg) 2025-12-04T11:45:26.1241890Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.1241904Z 2025-12-04T11:45:26.1241979Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1242250Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1242252Z 2025-12-04T11:45:26.1242340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1242415Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1242457Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1242514Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1243002Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1243102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1243139Z graph_break [] 2025-12-04T11:45:26.1243204Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1243313Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1243818Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1243867Z current_size = base.storage().size() 2025-12-04T11:45:26.1243924Z Autotune Choices Stats: 2025-12-04T11:45:26.1244290Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1244336Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1244381Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1244479Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1244712Z triton_mm_17 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1244941Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1245169Z triton_mm_6 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1245400Z triton_mm_14 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1245627Z triton_mm_13 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1245871Z triton_mm_15 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1246108Z triton_mm_10 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1246335Z triton_mm_12 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1246563Z triton_mm_7 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1246791Z triton_mm_11 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1246921Z SingleProcess AUTOTUNE benchmarking takes 0.0842 seconds and 0.3793 seconds precompiling for 21 choices 2025-12-04T11:45:26.1246995Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1247040Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1247109Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1247211Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1247708Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1247749Z graph_break [] 2025-12-04T11:45:26.1247813Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1247889Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1247930Z Autotune Choices Stats: 2025-12-04T11:45:26.1248295Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_38", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599000189453363, "best_triton_pos": 0} 2025-12-04T11:45:26.1248341Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1248382Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1248481Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1248712Z triton_mm_38 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1248941Z triton_mm_34 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1249167Z triton_mm_28 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1249404Z triton_mm_35 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1249636Z triton_mm_31 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1249872Z triton_mm_29 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1250098Z triton_mm_32 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1250325Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1250549Z triton_mm_33 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1250774Z triton_mm_36 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1250918Z SingleProcess AUTOTUNE benchmarking takes 0.1218 seconds and 0.2924 seconds precompiling for 21 choices 2025-12-04T11:45:26.1250971Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1251122Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1251169Z Traceback (most recent call last): 2025-12-04T11:45:26.1251335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1251377Z method(*args, **kwargs) 2025-12-04T11:45:26.1251531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1251572Z method(*args, **kwargs) 2025-12-04T11:45:26.1251728Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1251767Z with policy(): 2025-12-04T11:45:26.1251923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1251964Z raise RuntimeError(msg) 2025-12-04T11:45:26.1252364Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1252367Z 2025-12-04T11:45:26.1252441Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1252703Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1252706Z 2025-12-04T11:45:26.1252795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1252868Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1252921Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1252977Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1253486Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1253602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1253640Z graph_break [] 2025-12-04T11:45:26.1253704Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1253780Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1254267Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1254316Z current_size = base.storage().size() 2025-12-04T11:45:26.1254357Z Autotune Choices Stats: 2025-12-04T11:45:26.1254720Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1254783Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1254824Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1254923Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1255154Z triton_mm_17 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1255397Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1255623Z triton_mm_6 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1255851Z triton_mm_14 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1256077Z triton_mm_13 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1256301Z triton_mm_15 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1256527Z triton_mm_10 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1256751Z triton_mm_12 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1256990Z triton_mm_7 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1257217Z triton_mm_11 0.0069 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1257355Z SingleProcess AUTOTUNE benchmarking takes 0.0842 seconds and 0.3793 seconds precompiling for 21 choices 2025-12-04T11:45:26.1257429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1257472Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1257533Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1257634Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1258127Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1258165Z graph_break [] 2025-12-04T11:45:26.1258228Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1258301Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1258354Z Autotune Choices Stats: 2025-12-04T11:45:26.1258711Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_38", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006599000189453363, "best_triton_pos": 0} 2025-12-04T11:45:26.1258758Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1258799Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1258913Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1259146Z triton_mm_38 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1259373Z triton_mm_34 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1259599Z triton_mm_28 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1259826Z triton_mm_35 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1260054Z triton_mm_31 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1260277Z triton_mm_29 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1260505Z triton_mm_32 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1260744Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1260976Z triton_mm_33 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1261202Z triton_mm_36 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1261331Z SingleProcess AUTOTUNE benchmarking takes 0.1218 seconds and 0.2924 seconds precompiling for 21 choices 2025-12-04T11:45:26.1261408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1261450Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1261506Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1261605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1262097Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1262147Z graph_break [] 2025-12-04T11:45:26.1262209Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1262282Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1262323Z Autotune Choices Stats: 2025-12-04T11:45:26.1262696Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_46", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-12-04T11:45:26.1262742Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1262784Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1262881Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1263112Z triton_mm_46 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1263370Z triton_mm_53 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1263594Z triton_mm_58 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1263820Z triton_mm_54 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1264050Z triton_mm_47 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1264294Z triton_mm_52 0.0068 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1264516Z triton_mm_57 0.0069 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1264756Z triton_mm_45 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1264982Z triton_mm_51 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1265213Z triton_mm_44 0.0071 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1265343Z SingleProcess AUTOTUNE benchmarking takes 0.1409 seconds and 0.2294 seconds precompiling for 21 choices 2025-12-04T11:45:26.1265532Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-609e3f8923f09063.xml - 2025-12-04T11:45:26.1265595Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1266194Z FAILED [0.7585s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1266213Z 2025-12-04T11:45:26.1266288Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1266563Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1266567Z 2025-12-04T11:45:26.1266654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1266717Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1266786Z ================== 1 failed, 187 deselected, 2 rerun in 4.04s ================== 2025-12-04T11:45:26.1266825Z Got exit code 1 2025-12-04T11:45:26.1266865Z Retrying single test... 2025-12-04T11:45:26.1267011Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d3c52e5d4082e736.xml 2025-12-04T11:45:26.1267069Z ============================= test session starts ============================== 2025-12-04T11:45:26.1267181Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1267222Z cachedir: .pytest_cache 2025-12-04T11:45:26.1267380Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1267424Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1267466Z configfile: pytest.ini 2025-12-04T11:45:26.1267627Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1267703Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1267961Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1268015Z Running 1 items in this shard 2025-12-04T11:45:26.1268017Z 2025-12-04T11:45:26.1268236Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3611s] [100%] 2025-12-04T11:45:26.1268457Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8752s] [100%] 2025-12-04T11:45:26.1268660Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7686s] [100%] 2025-12-04T11:45:26.1268663Z 2025-12-04T11:45:26.1268715Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1268862Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1268910Z Traceback (most recent call last): 2025-12-04T11:45:26.1269069Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1269111Z method(*args, **kwargs) 2025-12-04T11:45:26.1269265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1269305Z method(*args, **kwargs) 2025-12-04T11:45:26.1269457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1269494Z with policy(): 2025-12-04T11:45:26.1269661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1269702Z raise RuntimeError(msg) 2025-12-04T11:45:26.1270096Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.1270111Z 2025-12-04T11:45:26.1270185Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1270449Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1270451Z 2025-12-04T11:45:26.1270537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1270614Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1270656Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1270715Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1271205Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1271304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1271342Z graph_break [] 2025-12-04T11:45:26.1271405Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1271480Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1271966Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1272025Z current_size = base.storage().size() 2025-12-04T11:45:26.1272066Z Autotune Choices Stats: 2025-12-04T11:45:26.1272445Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_14", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1272492Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1272533Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1272633Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1272870Z triton_mm_14 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1273106Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1273358Z triton_mm_13 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1273585Z triton_mm_7 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1273828Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1274069Z triton_mm_16 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1274293Z triton_mm_18 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1274520Z triton_mm_6 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1274741Z triton_mm_10 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1274970Z triton_mm_11 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1275100Z SingleProcess AUTOTUNE benchmarking takes 0.0831 seconds and 0.3627 seconds precompiling for 21 choices 2025-12-04T11:45:26.1275249Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1275297Z Traceback (most recent call last): 2025-12-04T11:45:26.1275453Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1275494Z method(*args, **kwargs) 2025-12-04T11:45:26.1275661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1275703Z method(*args, **kwargs) 2025-12-04T11:45:26.1275855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1275893Z with policy(): 2025-12-04T11:45:26.1276046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1276089Z raise RuntimeError(msg) 2025-12-04T11:45:26.1276499Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.1276505Z 2025-12-04T11:45:26.1276580Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1276845Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1276847Z 2025-12-04T11:45:26.1276934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1277007Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1277049Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1277109Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1277597Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1277717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1277753Z graph_break [] 2025-12-04T11:45:26.1277827Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1277902Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1278390Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1278441Z current_size = base.storage().size() 2025-12-04T11:45:26.1278481Z Autotune Choices Stats: 2025-12-04T11:45:26.1278850Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_14", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1278897Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1278939Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1279038Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1279275Z triton_mm_14 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1279506Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1279742Z triton_mm_13 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1279966Z triton_mm_7 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1280198Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1280423Z triton_mm_16 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1280647Z triton_mm_18 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1280873Z triton_mm_6 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1281094Z triton_mm_10 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1281335Z triton_mm_11 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1281468Z SingleProcess AUTOTUNE benchmarking takes 0.0831 seconds and 0.3627 seconds precompiling for 21 choices 2025-12-04T11:45:26.1281552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1281595Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1281651Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1281752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1282238Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1282278Z graph_break [] 2025-12-04T11:45:26.1282341Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1282417Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1282458Z Autotune Choices Stats: 2025-12-04T11:45:26.1282816Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599000189453363, "best_triton_pos": 0} 2025-12-04T11:45:26.1282862Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1282903Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1283000Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1283231Z triton_mm_33 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1283521Z triton_mm_32 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1283762Z triton_mm_38 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1283988Z triton_mm_34 0.0068 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1284218Z triton_mm_35 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1284444Z triton_mm_37 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1284670Z triton_mm_31 0.0070 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1284916Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1285140Z triton_mm_36 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1285379Z triton_mm_30 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1285511Z SingleProcess AUTOTUNE benchmarking takes 0.1196 seconds and 0.2915 seconds precompiling for 21 choices 2025-12-04T11:45:26.1285564Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1285714Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1285761Z Traceback (most recent call last): 2025-12-04T11:45:26.1285923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1285964Z method(*args, **kwargs) 2025-12-04T11:45:26.1286118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1286158Z method(*args, **kwargs) 2025-12-04T11:45:26.1286313Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1286348Z with policy(): 2025-12-04T11:45:26.1286506Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1286546Z raise RuntimeError(msg) 2025-12-04T11:45:26.1286943Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1286959Z 2025-12-04T11:45:26.1287037Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1287299Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1287301Z 2025-12-04T11:45:26.1287390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1287482Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1287525Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1287583Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1288071Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1288170Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1288209Z graph_break [] 2025-12-04T11:45:26.1288270Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1288343Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1288826Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1288884Z current_size = base.storage().size() 2025-12-04T11:45:26.1288926Z Autotune Choices Stats: 2025-12-04T11:45:26.1289298Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_14", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599999964237213, "best_triton_pos": 0} 2025-12-04T11:45:26.1289346Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1289386Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1289489Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1289719Z triton_mm_14 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1289951Z triton_mm_5 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1290177Z triton_mm_13 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1290407Z triton_mm_7 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1290634Z triton_mm_17 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1290859Z triton_mm_16 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1291098Z triton_mm_18 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1291332Z triton_mm_6 0.0068 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1291558Z triton_mm_10 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1291785Z triton_mm_11 0.0068 ms 97.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1291919Z SingleProcess AUTOTUNE benchmarking takes 0.0831 seconds and 0.3627 seconds precompiling for 21 choices 2025-12-04T11:45:26.1291993Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1292036Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1292095Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1292195Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1292688Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1292726Z graph_break [] 2025-12-04T11:45:26.1292789Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1292872Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1292913Z Autotune Choices Stats: 2025-12-04T11:45:26.1293301Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_33", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006599000189453363, "best_triton_pos": 0} 2025-12-04T11:45:26.1293349Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1293390Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1293492Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1293724Z triton_mm_33 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1293957Z triton_mm_32 0.0067 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1294184Z triton_mm_38 0.0067 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1294408Z triton_mm_34 0.0068 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1294650Z triton_mm_35 0.0068 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1294873Z triton_mm_37 0.0069 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1295109Z triton_mm_31 0.0070 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1295338Z triton_mm_27 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1295569Z triton_mm_36 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1295794Z triton_mm_30 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1295924Z SingleProcess AUTOTUNE benchmarking takes 0.1196 seconds and 0.2915 seconds precompiling for 21 choices 2025-12-04T11:45:26.1296000Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1296057Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1296115Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1296214Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1296715Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1296752Z graph_break [] 2025-12-04T11:45:26.1296817Z aten_mm_info [('aten._scaled_mm.default_1024_2048_32', 1)] 2025-12-04T11:45:26.1296890Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1296933Z Autotune Choices Stats: 2025-12-04T11:45:26.1297296Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_45", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006560000125318766, "best_triton_pos": 0} 2025-12-04T11:45:26.1297344Z AUTOTUNE scaled_mm(1024x32, 32x2048, , ) 2025-12-04T11:45:26.1297388Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1297487Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1297722Z triton_mm_45 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1297952Z triton_mm_58 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1298177Z triton_mm_57 0.0067 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1298411Z triton_mm_50 0.0068 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1298651Z triton_mm_52 0.0068 ms 97.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1298878Z triton_mm_51 0.0069 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1299103Z triton_mm_55 0.0069 ms 95.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1299331Z triton_mm_56 0.0069 ms 94.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1299555Z triton_mm_48 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1299782Z triton_mm_53 0.0070 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1299925Z SingleProcess AUTOTUNE benchmarking takes 0.1466 seconds and 0.2341 seconds precompiling for 21 choices 2025-12-04T11:45:26.1300117Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d3c52e5d4082e736.xml - 2025-12-04T11:45:26.1300190Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1300785Z FAILED [0.7686s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.1300789Z 2025-12-04T11:45:26.1300863Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1301127Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1301131Z 2025-12-04T11:45:26.1301220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1301283Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1301353Z ================== 1 failed, 187 deselected, 2 rerun in 4.02s ================== 2025-12-04T11:45:26.1301391Z Got exit code 1 2025-12-04T11:45:26.1301603Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1301735Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1301879Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-24a45fdc5dc47fc3.xml 2025-12-04T11:45:26.1301957Z ============================= test session starts ============================== 2025-12-04T11:45:26.1302067Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1302111Z cachedir: .pytest_cache 2025-12-04T11:45:26.1302270Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1302315Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1302355Z configfile: pytest.ini 2025-12-04T11:45:26.1302530Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1302607Z collecting ... collected 188 items / 130 deselected / 58 selected 2025-12-04T11:45:26.1302662Z stepcurrent: skipping 130 already run items. 2025-12-04T11:45:26.1302707Z Running 58 items in this shard 2025-12-04T11:45:26.1302709Z 2025-12-04T11:45:26.1302931Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9657s] [ 1%] 2025-12-04T11:45:26.1303147Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5192s] [ 1%] 2025-12-04T11:45:26.1303365Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.5896s] [ 1%] 2025-12-04T11:45:26.1303368Z 2025-12-04T11:45:26.1303419Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1303578Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1303624Z Traceback (most recent call last): 2025-12-04T11:45:26.1303783Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1303826Z method(*args, **kwargs) 2025-12-04T11:45:26.1303979Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1304034Z method(*args, **kwargs) 2025-12-04T11:45:26.1304188Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1304227Z with policy(): 2025-12-04T11:45:26.1304381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1304424Z raise RuntimeError(msg) 2025-12-04T11:45:26.1304815Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.1304818Z 2025-12-04T11:45:26.1304892Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1305151Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1305154Z 2025-12-04T11:45:26.1305242Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1305316Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1305360Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1305418Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1305899Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1306012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1306049Z graph_break [] 2025-12-04T11:45:26.1306112Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1306185Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1306684Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1306733Z current_size = base.storage().size() 2025-12-04T11:45:26.1306776Z Autotune Choices Stats: 2025-12-04T11:45:26.1307148Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1307199Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1307242Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1307346Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1307592Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1307821Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1308057Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1308283Z triton_mm_0 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1308326Z _scaled_mm 0.0074 ms 82.6% 2025-12-04T11:45:26.1308453Z SingleProcess AUTOTUNE benchmarking takes 0.0243 seconds and 0.1204 seconds precompiling for 5 choices 2025-12-04T11:45:26.1308598Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1308644Z Traceback (most recent call last): 2025-12-04T11:45:26.1308803Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1308842Z method(*args, **kwargs) 2025-12-04T11:45:26.1308998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1309038Z method(*args, **kwargs) 2025-12-04T11:45:26.1309191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1309228Z with policy(): 2025-12-04T11:45:26.1309386Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1309427Z raise RuntimeError(msg) 2025-12-04T11:45:26.1309820Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.1309834Z 2025-12-04T11:45:26.1309910Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1310171Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1310184Z 2025-12-04T11:45:26.1310273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1310347Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1310392Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1310449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1310933Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1311031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1311069Z graph_break [] 2025-12-04T11:45:26.1311131Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1311204Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1311700Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1311749Z current_size = base.storage().size() 2025-12-04T11:45:26.1311788Z Autotune Choices Stats: 2025-12-04T11:45:26.1312168Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1312214Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1312256Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1312358Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1312591Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1312822Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1313044Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1313302Z triton_mm_0 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1313346Z _scaled_mm 0.0074 ms 82.6% 2025-12-04T11:45:26.1313477Z SingleProcess AUTOTUNE benchmarking takes 0.0243 seconds and 0.1204 seconds precompiling for 5 choices 2025-12-04T11:45:26.1313566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1313610Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1313667Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1313767Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1314262Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1314301Z graph_break [] 2025-12-04T11:45:26.1314363Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1314437Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1314479Z Autotune Choices Stats: 2025-12-04T11:45:26.1314836Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1314883Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1314925Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1315024Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1315265Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1315495Z triton_mm_5 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1315741Z triton_mm_7 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1315784Z _scaled_mm 0.0072 ms 85.5% 2025-12-04T11:45:26.1316005Z triton_mm_4 0.0074 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1316134Z SingleProcess AUTOTUNE benchmarking takes 0.0231 seconds and 0.1201 seconds precompiling for 5 choices 2025-12-04T11:45:26.1316186Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1316332Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1316379Z Traceback (most recent call last): 2025-12-04T11:45:26.1316535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1316576Z method(*args, **kwargs) 2025-12-04T11:45:26.1316729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1316769Z method(*args, **kwargs) 2025-12-04T11:45:26.1316919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1316958Z with policy(): 2025-12-04T11:45:26.1317110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1317164Z raise RuntimeError(msg) 2025-12-04T11:45:26.1317556Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1317558Z 2025-12-04T11:45:26.1317634Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1317905Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1317911Z 2025-12-04T11:45:26.1317999Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1318073Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1318117Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1318178Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1318654Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1318755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1318802Z graph_break [] 2025-12-04T11:45:26.1318863Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1318934Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1319428Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1319477Z current_size = base.storage().size() 2025-12-04T11:45:26.1319519Z Autotune Choices Stats: 2025-12-04T11:45:26.1319888Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.1319937Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1319981Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1320081Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1320316Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1320543Z triton_mm_1 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1320767Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1320989Z triton_mm_0 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1321042Z _scaled_mm 0.0074 ms 82.6% 2025-12-04T11:45:26.1321167Z SingleProcess AUTOTUNE benchmarking takes 0.0243 seconds and 0.1204 seconds precompiling for 5 choices 2025-12-04T11:45:26.1321243Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1321285Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1321343Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1321456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1321940Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1321980Z graph_break [] 2025-12-04T11:45:26.1322039Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1322113Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1322154Z Autotune Choices Stats: 2025-12-04T11:45:26.1322513Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1322571Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1322613Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1322710Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1322937Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1323176Z triton_mm_5 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1323442Z triton_mm_7 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1323483Z _scaled_mm 0.0072 ms 85.5% 2025-12-04T11:45:26.1323705Z triton_mm_4 0.0074 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1323835Z SingleProcess AUTOTUNE benchmarking takes 0.0231 seconds and 0.1201 seconds precompiling for 5 choices 2025-12-04T11:45:26.1323907Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1323951Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1324008Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1324107Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1324586Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1324624Z graph_break [] 2025-12-04T11:45:26.1324684Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1324775Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1324815Z Autotune Choices Stats: 2025-12-04T11:45:26.1325178Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.1325237Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1325280Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1325377Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1325607Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1325835Z triton_mm_10 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1326064Z triton_mm_11 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1326288Z triton_mm_8 0.0077 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1326344Z _scaled_mm 0.0204 ms 29.5% 2025-12-04T11:45:26.1326471Z SingleProcess AUTOTUNE benchmarking takes 0.0308 seconds and 0.2048 seconds precompiling for 5 choices 2025-12-04T11:45:26.1326663Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-24a45fdc5dc47fc3.xml - 2025-12-04T11:45:26.1326724Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1327410Z FAILED [0.5896s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1327415Z 2025-12-04T11:45:26.1327489Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1327749Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1327752Z 2025-12-04T11:45:26.1327839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1327903Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1327969Z ================== 1 failed, 130 deselected, 2 rerun in 3.09s ================== 2025-12-04T11:45:26.1328008Z Got exit code 1 2025-12-04T11:45:26.1328049Z Retrying single test... 2025-12-04T11:45:26.1328196Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4d2135e403b7255.xml 2025-12-04T11:45:26.1328253Z ============================= test session starts ============================== 2025-12-04T11:45:26.1328365Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1328405Z cachedir: .pytest_cache 2025-12-04T11:45:26.1328577Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1328622Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1328664Z configfile: pytest.ini 2025-12-04T11:45:26.1328824Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1328900Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1329166Z stepcurrent: skipping 130 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1329211Z Running 1 items in this shard 2025-12-04T11:45:26.1329214Z 2025-12-04T11:45:26.1329427Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0277s] [100%] 2025-12-04T11:45:26.1329643Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.6471s] [100%] 2025-12-04T11:45:26.1329834Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6738s] [100%] 2025-12-04T11:45:26.1329836Z 2025-12-04T11:45:26.1329887Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1330031Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1330102Z Traceback (most recent call last): 2025-12-04T11:45:26.1330260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1330301Z method(*args, **kwargs) 2025-12-04T11:45:26.1330454Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1330495Z method(*args, **kwargs) 2025-12-04T11:45:26.1330658Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1330695Z with policy(): 2025-12-04T11:45:26.1330848Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1330889Z raise RuntimeError(msg) 2025-12-04T11:45:26.1331277Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.1331280Z 2025-12-04T11:45:26.1331353Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1331614Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1331616Z 2025-12-04T11:45:26.1331704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1331776Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1331819Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1331876Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1332353Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1332463Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1332501Z graph_break [] 2025-12-04T11:45:26.1332562Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1332637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1333134Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1333184Z current_size = base.storage().size() 2025-12-04T11:45:26.1333225Z Autotune Choices Stats: 2025-12-04T11:45:26.1333632Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:26.1333679Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1333721Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1333821Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1334058Z triton_mm_1 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1334300Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1334541Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1334765Z triton_mm_0 0.0074 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1334807Z _scaled_mm 0.0211 ms 27.7% 2025-12-04T11:45:26.1334937Z SingleProcess AUTOTUNE benchmarking takes 0.0252 seconds and 0.1182 seconds precompiling for 5 choices 2025-12-04T11:45:26.1335082Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1335130Z Traceback (most recent call last): 2025-12-04T11:45:26.1335284Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1335327Z method(*args, **kwargs) 2025-12-04T11:45:26.1335480Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1335525Z method(*args, **kwargs) 2025-12-04T11:45:26.1335675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1335715Z with policy(): 2025-12-04T11:45:26.1335870Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1335912Z raise RuntimeError(msg) 2025-12-04T11:45:26.1336302Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.1336318Z 2025-12-04T11:45:26.1336391Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1336650Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1336652Z 2025-12-04T11:45:26.1336738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1336827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1336871Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1336932Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1337413Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1337514Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1337552Z graph_break [] 2025-12-04T11:45:26.1337612Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1337688Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1338175Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1338237Z current_size = base.storage().size() 2025-12-04T11:45:26.1338279Z Autotune Choices Stats: 2025-12-04T11:45:26.1338654Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:26.1338701Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1338743Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1338844Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1339082Z triton_mm_1 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1339309Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1339538Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1339765Z triton_mm_0 0.0074 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1339808Z _scaled_mm 0.0211 ms 27.7% 2025-12-04T11:45:26.1339937Z SingleProcess AUTOTUNE benchmarking takes 0.0252 seconds and 0.1182 seconds precompiling for 5 choices 2025-12-04T11:45:26.1340010Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1340065Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1340121Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1340222Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1340709Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1340750Z graph_break [] 2025-12-04T11:45:26.1340810Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1340887Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1340928Z Autotune Choices Stats: 2025-12-04T11:45:26.1341291Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1341337Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1341378Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1341477Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1341713Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1341949Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1342187Z triton_mm_7 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1342413Z triton_mm_4 0.0074 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1342455Z _scaled_mm 0.0214 ms 28.4% 2025-12-04T11:45:26.1342582Z SingleProcess AUTOTUNE benchmarking takes 0.0269 seconds and 0.1139 seconds precompiling for 5 choices 2025-12-04T11:45:26.1342637Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1342782Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1342830Z Traceback (most recent call last): 2025-12-04T11:45:26.1342988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1343030Z method(*args, **kwargs) 2025-12-04T11:45:26.1343184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1343223Z method(*args, **kwargs) 2025-12-04T11:45:26.1343425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1343461Z with policy(): 2025-12-04T11:45:26.1343616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1343658Z raise RuntimeError(msg) 2025-12-04T11:45:26.1344046Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1344064Z 2025-12-04T11:45:26.1344138Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1344414Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1344416Z 2025-12-04T11:45:26.1344507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1344582Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1344625Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1344683Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1345163Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1345262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1345303Z graph_break [] 2025-12-04T11:45:26.1345365Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1345438Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1345938Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1345987Z current_size = base.storage().size() 2025-12-04T11:45:26.1346044Z Autotune Choices Stats: 2025-12-04T11:45:26.1346412Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005840000230818987, "best_triton_pos": 0} 2025-12-04T11:45:26.1346461Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1346504Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1346605Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1346850Z triton_mm_1 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1347076Z triton_mm_2 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1347304Z triton_mm_3 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1347532Z triton_mm_0 0.0074 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1347577Z _scaled_mm 0.0211 ms 27.7% 2025-12-04T11:45:26.1347716Z SingleProcess AUTOTUNE benchmarking takes 0.0252 seconds and 0.1182 seconds precompiling for 5 choices 2025-12-04T11:45:26.1347795Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1347838Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1347897Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1347997Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1348484Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1348523Z graph_break [] 2025-12-04T11:45:26.1348584Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1348659Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1348700Z Autotune Choices Stats: 2025-12-04T11:45:26.1349061Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1349109Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1349150Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1349249Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1349492Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1349729Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1349957Z triton_mm_7 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1350180Z triton_mm_4 0.0074 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1350223Z _scaled_mm 0.0214 ms 28.4% 2025-12-04T11:45:26.1350350Z SingleProcess AUTOTUNE benchmarking takes 0.0269 seconds and 0.1139 seconds precompiling for 5 choices 2025-12-04T11:45:26.1350425Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1350467Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1350525Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1350624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1351105Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1351142Z graph_break [] 2025-12-04T11:45:26.1351203Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1351276Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1351328Z Autotune Choices Stats: 2025-12-04T11:45:26.1351691Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.1351736Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1351779Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1351887Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1352117Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1352344Z triton_mm_11 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1352572Z triton_mm_9 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1352795Z triton_mm_8 0.0077 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1352836Z _scaled_mm 0.0211 ms 28.3% 2025-12-04T11:45:26.1352976Z SingleProcess AUTOTUNE benchmarking takes 0.0315 seconds and 0.2080 seconds precompiling for 5 choices 2025-12-04T11:45:26.1353167Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4d2135e403b7255.xml - 2025-12-04T11:45:26.1353232Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1353863Z FAILED [0.6738s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1353866Z 2025-12-04T11:45:26.1353944Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1354207Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1354210Z 2025-12-04T11:45:26.1354299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1354362Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1354432Z ================== 1 failed, 187 deselected, 2 rerun in 3.37s ================== 2025-12-04T11:45:26.1354469Z Got exit code 1 2025-12-04T11:45:26.1354510Z Retrying single test... 2025-12-04T11:45:26.1354654Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-64d045e33610e028.xml 2025-12-04T11:45:26.1354713Z ============================= test session starts ============================== 2025-12-04T11:45:26.1354824Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1354868Z cachedir: .pytest_cache 2025-12-04T11:45:26.1355026Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1355087Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1355128Z configfile: pytest.ini 2025-12-04T11:45:26.1355289Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1355366Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1355622Z stepcurrent: skipping 130 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1355680Z Running 1 items in this shard 2025-12-04T11:45:26.1355682Z 2025-12-04T11:45:26.1355899Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0525s] [100%] 2025-12-04T11:45:26.1356116Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.6062s] [100%] 2025-12-04T11:45:26.1356307Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7157s] [100%] 2025-12-04T11:45:26.1356310Z 2025-12-04T11:45:26.1356364Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1356507Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1356557Z Traceback (most recent call last): 2025-12-04T11:45:26.1356716Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1356776Z method(*args, **kwargs) 2025-12-04T11:45:26.1356929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1356972Z method(*args, **kwargs) 2025-12-04T11:45:26.1357127Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1357167Z with policy(): 2025-12-04T11:45:26.1357331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1357373Z raise RuntimeError(msg) 2025-12-04T11:45:26.1357763Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.1357766Z 2025-12-04T11:45:26.1357840Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1358101Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1358104Z 2025-12-04T11:45:26.1358191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1358271Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1358314Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1358371Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1358853Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1358953Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1359007Z graph_break [] 2025-12-04T11:45:26.1359069Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1359142Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1359642Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1359692Z current_size = base.storage().size() 2025-12-04T11:45:26.1359734Z Autotune Choices Stats: 2025-12-04T11:45:26.1360099Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1360145Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1360186Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1360286Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1360527Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1360757Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1360999Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1361235Z triton_mm_0 0.0074 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1361277Z _scaled_mm 0.0190 ms 31.0% 2025-12-04T11:45:26.1361404Z SingleProcess AUTOTUNE benchmarking takes 0.0234 seconds and 0.1245 seconds precompiling for 5 choices 2025-12-04T11:45:26.1361549Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1361597Z Traceback (most recent call last): 2025-12-04T11:45:26.1361753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1361795Z method(*args, **kwargs) 2025-12-04T11:45:26.1361948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1361990Z method(*args, **kwargs) 2025-12-04T11:45:26.1362141Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1362179Z with policy(): 2025-12-04T11:45:26.1362331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1362373Z raise RuntimeError(msg) 2025-12-04T11:45:26.1362769Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.1362772Z 2025-12-04T11:45:26.1362862Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1363124Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1363129Z 2025-12-04T11:45:26.1363216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1363345Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1363388Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1363460Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1363943Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1364048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1364086Z graph_break [] 2025-12-04T11:45:26.1364149Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1364222Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1364709Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1364772Z current_size = base.storage().size() 2025-12-04T11:45:26.1364815Z Autotune Choices Stats: 2025-12-04T11:45:26.1365194Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1365243Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1365285Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1365466Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1365701Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1365927Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1366158Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1366381Z triton_mm_0 0.0074 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1366424Z _scaled_mm 0.0190 ms 31.0% 2025-12-04T11:45:26.1366552Z SingleProcess AUTOTUNE benchmarking takes 0.0234 seconds and 0.1245 seconds precompiling for 5 choices 2025-12-04T11:45:26.1366629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1366670Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1366728Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1366842Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1367332Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1367385Z graph_break [] 2025-12-04T11:45:26.1367445Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1367521Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1367563Z Autotune Choices Stats: 2025-12-04T11:45:26.1367920Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1367968Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1368009Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1368106Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1368336Z triton_mm_6 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1368575Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1368804Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1369039Z triton_mm_4 0.0074 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1369083Z _scaled_mm 0.0158 ms 37.1% 2025-12-04T11:45:26.1369214Z SingleProcess AUTOTUNE benchmarking takes 0.0220 seconds and 0.1265 seconds precompiling for 5 choices 2025-12-04T11:45:26.1369267Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1369413Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1369460Z Traceback (most recent call last): 2025-12-04T11:45:26.1369619Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1369659Z method(*args, **kwargs) 2025-12-04T11:45:26.1369813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1369854Z method(*args, **kwargs) 2025-12-04T11:45:26.1370006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1370043Z with policy(): 2025-12-04T11:45:26.1370198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1370241Z raise RuntimeError(msg) 2025-12-04T11:45:26.1370632Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1370645Z 2025-12-04T11:45:26.1370719Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1370982Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1370984Z 2025-12-04T11:45:26.1371082Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1371158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1371203Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1371262Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1371744Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1371845Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1371883Z graph_break [] 2025-12-04T11:45:26.1371942Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1372019Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1372502Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1372564Z current_size = base.storage().size() 2025-12-04T11:45:26.1372606Z Autotune Choices Stats: 2025-12-04T11:45:26.1372985Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1373030Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1373075Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1373174Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1373437Z triton_mm_2 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1373670Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1373898Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1374122Z triton_mm_0 0.0074 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1374166Z _scaled_mm 0.0190 ms 31.0% 2025-12-04T11:45:26.1374297Z SingleProcess AUTOTUNE benchmarking takes 0.0234 seconds and 0.1245 seconds precompiling for 5 choices 2025-12-04T11:45:26.1374368Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1374431Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1374488Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1374591Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1375085Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1375125Z graph_break [] 2025-12-04T11:45:26.1375185Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1375260Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1375301Z Autotune Choices Stats: 2025-12-04T11:45:26.1375661Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1375707Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1375749Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1375848Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1376078Z triton_mm_6 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1376317Z triton_mm_5 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1376557Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1376784Z triton_mm_4 0.0074 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1376826Z _scaled_mm 0.0158 ms 37.1% 2025-12-04T11:45:26.1376955Z SingleProcess AUTOTUNE benchmarking takes 0.0220 seconds and 0.1265 seconds precompiling for 5 choices 2025-12-04T11:45:26.1377028Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1377076Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1377135Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1377235Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1377715Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1377752Z graph_break [] 2025-12-04T11:45:26.1377815Z aten_mm_info [('aten._scaled_mm.default_1_16_1024', 1)] 2025-12-04T11:45:26.1377889Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1377929Z Autotune Choices Stats: 2025-12-04T11:45:26.1378292Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005760000087320805, "best_triton_pos": 0} 2025-12-04T11:45:26.1378351Z AUTOTUNE scaled_mm(1x1024, 1024x16, , ) 2025-12-04T11:45:26.1378391Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1378492Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1378736Z triton_mm_11 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1378966Z triton_mm_9 0.0058 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1379193Z triton_mm_10 0.0059 ms 97.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1379416Z triton_mm_8 0.0073 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1379457Z _scaled_mm 0.0204 ms 28.2% 2025-12-04T11:45:26.1379585Z SingleProcess AUTOTUNE benchmarking takes 0.0316 seconds and 0.2092 seconds precompiling for 5 choices 2025-12-04T11:45:26.1379787Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-64d045e33610e028.xml - 2025-12-04T11:45:26.1379847Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1380448Z FAILED [0.7157s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.1380451Z 2025-12-04T11:45:26.1380525Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1380786Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1380789Z 2025-12-04T11:45:26.1380879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1380941Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1381012Z ================== 1 failed, 187 deselected, 2 rerun in 3.39s ================== 2025-12-04T11:45:26.1381050Z Got exit code 1 2025-12-04T11:45:26.1381261Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1381388Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1381535Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-95635abeb11bf230.xml 2025-12-04T11:45:26.1381592Z ============================= test session starts ============================== 2025-12-04T11:45:26.1381705Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1381746Z cachedir: .pytest_cache 2025-12-04T11:45:26.1381904Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1381962Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1382004Z configfile: pytest.ini 2025-12-04T11:45:26.1382167Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1382245Z collecting ... collected 188 items / 131 deselected / 57 selected 2025-12-04T11:45:26.1382300Z stepcurrent: skipping 131 already run items. 2025-12-04T11:45:26.1382357Z Running 57 items in this shard 2025-12-04T11:45:26.1382359Z 2025-12-04T11:45:26.1382581Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4647s] [ 1%] 2025-12-04T11:45:26.1382800Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9438s] [ 1%] 2025-12-04T11:45:26.1382995Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.8728s] [ 1%] 2025-12-04T11:45:26.1382998Z 2025-12-04T11:45:26.1383050Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1383200Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1383271Z Traceback (most recent call last): 2025-12-04T11:45:26.1383432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1383488Z method(*args, **kwargs) 2025-12-04T11:45:26.1383643Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1383683Z method(*args, **kwargs) 2025-12-04T11:45:26.1383835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1383872Z with policy(): 2025-12-04T11:45:26.1384040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1384081Z raise RuntimeError(msg) 2025-12-04T11:45:26.1384480Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1384483Z 2025-12-04T11:45:26.1384557Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1384820Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1384824Z 2025-12-04T11:45:26.1384910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1384987Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1385031Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1385090Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1385584Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1385684Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1385739Z graph_break [] 2025-12-04T11:45:26.1385804Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1385878Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1386380Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1386430Z current_size = base.storage().size() 2025-12-04T11:45:26.1386473Z Autotune Choices Stats: 2025-12-04T11:45:26.1386846Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1386896Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1386941Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1387042Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1387284Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1387519Z triton_mm_17 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1387762Z triton_mm_7 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1388000Z triton_mm_12 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1388231Z triton_mm_6 0.0069 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1388463Z triton_mm_9 0.0071 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1388689Z triton_mm_10 0.0074 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1388914Z triton_mm_14 0.0076 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1389139Z triton_mm_5 0.0083 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1389369Z triton_mm_18 0.0087 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1389501Z SingleProcess AUTOTUNE benchmarking takes 0.0852 seconds and 0.3441 seconds precompiling for 20 choices 2025-12-04T11:45:26.1389662Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1389710Z Traceback (most recent call last): 2025-12-04T11:45:26.1389866Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1389912Z method(*args, **kwargs) 2025-12-04T11:45:26.1390064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1390116Z method(*args, **kwargs) 2025-12-04T11:45:26.1390268Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1390311Z with policy(): 2025-12-04T11:45:26.1390464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1390507Z raise RuntimeError(msg) 2025-12-04T11:45:26.1390905Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1390907Z 2025-12-04T11:45:26.1390982Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1391244Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1391258Z 2025-12-04T11:45:26.1391346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1391420Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1391466Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1391524Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1392026Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1393908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1394030Z graph_break [] 2025-12-04T11:45:26.1394097Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1394171Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1394662Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1394711Z current_size = base.storage().size() 2025-12-04T11:45:26.1394753Z Autotune Choices Stats: 2025-12-04T11:45:26.1395131Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1395179Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1395223Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1395322Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1395594Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1395827Z triton_mm_17 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1396074Z triton_mm_7 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1396301Z triton_mm_12 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1396533Z triton_mm_6 0.0069 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1396762Z triton_mm_9 0.0071 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1396989Z triton_mm_10 0.0074 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1397233Z triton_mm_14 0.0076 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1397470Z triton_mm_5 0.0083 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1397702Z triton_mm_18 0.0087 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1397837Z SingleProcess AUTOTUNE benchmarking takes 0.0852 seconds and 0.3441 seconds precompiling for 20 choices 2025-12-04T11:45:26.1397912Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1397955Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1398012Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1398113Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1398607Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1398645Z graph_break [] 2025-12-04T11:45:26.1398709Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1398783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1398824Z Autotune Choices Stats: 2025-12-04T11:45:26.1399196Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:26.1399254Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1399296Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1399395Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1399641Z triton_mm_36 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1399871Z triton_mm_26 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1400098Z triton_mm_31 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1400330Z triton_mm_35 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1400560Z triton_mm_25 0.0070 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1400787Z triton_mm_33 0.0073 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1401031Z triton_mm_28 0.0074 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1401276Z triton_mm_29 0.0077 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1401503Z triton_mm_24 0.0079 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1401732Z triton_mm_37 0.0091 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1401864Z SingleProcess AUTOTUNE benchmarking takes 0.1198 seconds and 0.2497 seconds precompiling for 20 choices 2025-12-04T11:45:26.1401918Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1402068Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1402114Z Traceback (most recent call last): 2025-12-04T11:45:26.1402272Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1402312Z method(*args, **kwargs) 2025-12-04T11:45:26.1402467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1402508Z method(*args, **kwargs) 2025-12-04T11:45:26.1402660Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1402697Z with policy(): 2025-12-04T11:45:26.1402862Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1402903Z raise RuntimeError(msg) 2025-12-04T11:45:26.1403341Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1403343Z 2025-12-04T11:45:26.1403441Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1403708Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1403712Z 2025-12-04T11:45:26.1403801Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1403874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1403918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1403975Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1404469Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1404567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1404621Z graph_break [] 2025-12-04T11:45:26.1404684Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1404757Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1405261Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1405308Z current_size = base.storage().size() 2025-12-04T11:45:26.1405349Z Autotune Choices Stats: 2025-12-04T11:45:26.1405722Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1405771Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1405813Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1405913Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1406149Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1406381Z triton_mm_17 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1406610Z triton_mm_7 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1406838Z triton_mm_12 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1407085Z triton_mm_6 0.0069 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1407323Z triton_mm_9 0.0071 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1407551Z triton_mm_10 0.0074 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1407779Z triton_mm_14 0.0076 ms 77.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1408009Z triton_mm_5 0.0083 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1408242Z triton_mm_18 0.0087 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1408383Z SingleProcess AUTOTUNE benchmarking takes 0.0852 seconds and 0.3441 seconds precompiling for 20 choices 2025-12-04T11:45:26.1408458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1408500Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1408559Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1408660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1409161Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1409198Z graph_break [] 2025-12-04T11:45:26.1409262Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1409335Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1409375Z Autotune Choices Stats: 2025-12-04T11:45:26.1409742Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:26.1409791Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1409833Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1409931Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1410165Z triton_mm_36 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1410397Z triton_mm_26 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1410633Z triton_mm_31 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1410863Z triton_mm_35 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1411108Z triton_mm_25 0.0070 ms 92.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1411335Z triton_mm_33 0.0073 ms 89.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1411565Z triton_mm_28 0.0074 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1411792Z triton_mm_29 0.0077 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1412018Z triton_mm_24 0.0079 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1412258Z triton_mm_37 0.0091 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1412389Z SingleProcess AUTOTUNE benchmarking takes 0.1198 seconds and 0.2497 seconds precompiling for 20 choices 2025-12-04T11:45:26.1412462Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1412514Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1412573Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1412670Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1413163Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1413203Z graph_break [] 2025-12-04T11:45:26.1413377Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1413450Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1413491Z Autotune Choices Stats: 2025-12-04T11:45:26.1413861Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.1413907Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1413951Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1414048Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1414284Z triton_mm_55 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1414528Z triton_mm_54 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1414768Z triton_mm_50 0.0064 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1415000Z triton_mm_44 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1415231Z triton_mm_45 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1415462Z triton_mm_47 0.0073 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1415690Z triton_mm_48 0.0076 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1415920Z triton_mm_52 0.0078 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1416160Z triton_mm_43 0.0083 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1416403Z triton_mm_56 0.0089 ms 67.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1416534Z SingleProcess AUTOTUNE benchmarking takes 0.1310 seconds and 0.2437 seconds precompiling for 20 choices 2025-12-04T11:45:26.1416728Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-95635abeb11bf230.xml - 2025-12-04T11:45:26.1416792Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1417392Z FAILED [0.8728s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1417396Z 2025-12-04T11:45:26.1417471Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1417735Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1417738Z 2025-12-04T11:45:26.1417826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1417889Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1417960Z ================== 1 failed, 131 deselected, 2 rerun in 4.30s ================== 2025-12-04T11:45:26.1418019Z Got exit code 1 2025-12-04T11:45:26.1418059Z Retrying single test... 2025-12-04T11:45:26.1418206Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f982b81d9939abb.xml 2025-12-04T11:45:26.1418264Z ============================= test session starts ============================== 2025-12-04T11:45:26.1418379Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1418420Z cachedir: .pytest_cache 2025-12-04T11:45:26.1418591Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1418638Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1418683Z configfile: pytest.ini 2025-12-04T11:45:26.1418847Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1418924Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1419182Z stepcurrent: skipping 131 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1419227Z Running 1 items in this shard 2025-12-04T11:45:26.1419229Z 2025-12-04T11:45:26.1419447Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3847s] [100%] 2025-12-04T11:45:26.1419664Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8477s] [100%] 2025-12-04T11:45:26.1419870Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.7647s] [100%] 2025-12-04T11:45:26.1419872Z 2025-12-04T11:45:26.1419925Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1420073Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1420120Z Traceback (most recent call last): 2025-12-04T11:45:26.1420292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1420334Z method(*args, **kwargs) 2025-12-04T11:45:26.1420490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1420530Z method(*args, **kwargs) 2025-12-04T11:45:26.1420685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1420722Z with policy(): 2025-12-04T11:45:26.1420879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1420921Z raise RuntimeError(msg) 2025-12-04T11:45:26.1421317Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1421319Z 2025-12-04T11:45:26.1421392Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1421655Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1421658Z 2025-12-04T11:45:26.1421744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1421818Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1421874Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1421931Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1422424Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1422534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1422573Z graph_break [] 2025-12-04T11:45:26.1422635Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1422709Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1423201Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1423271Z current_size = base.storage().size() 2025-12-04T11:45:26.1423313Z Autotune Choices Stats: 2025-12-04T11:45:26.1423688Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006118999794125557, "best_triton_pos": 0} 2025-12-04T11:45:26.1423753Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1423795Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1423897Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1424150Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1424381Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1424613Z triton_mm_7 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1424847Z triton_mm_6 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1425081Z triton_mm_9 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1425306Z triton_mm_12 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1425530Z triton_mm_5 0.0080 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1425757Z triton_mm_14 0.0080 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1425997Z triton_mm_10 0.0081 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1426241Z triton_mm_18 0.0093 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1426372Z SingleProcess AUTOTUNE benchmarking takes 0.0809 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:26.1426520Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1426567Z Traceback (most recent call last): 2025-12-04T11:45:26.1426722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1426766Z method(*args, **kwargs) 2025-12-04T11:45:26.1426921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1426965Z method(*args, **kwargs) 2025-12-04T11:45:26.1427116Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1427154Z with policy(): 2025-12-04T11:45:26.1427309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1427361Z raise RuntimeError(msg) 2025-12-04T11:45:26.1427755Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1427760Z 2025-12-04T11:45:26.1427832Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1428103Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1428106Z 2025-12-04T11:45:26.1428193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1428267Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1428310Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1428367Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1428854Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1428954Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1428991Z graph_break [] 2025-12-04T11:45:26.1429054Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1429129Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1429619Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1429678Z current_size = base.storage().size() 2025-12-04T11:45:26.1429719Z Autotune Choices Stats: 2025-12-04T11:45:26.1430090Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006118999794125557, "best_triton_pos": 0} 2025-12-04T11:45:26.1430135Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1430190Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1430290Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1430529Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1430763Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1430992Z triton_mm_7 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1431222Z triton_mm_6 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1431467Z triton_mm_9 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1431694Z triton_mm_12 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1431934Z triton_mm_5 0.0080 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1432163Z triton_mm_14 0.0080 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1432389Z triton_mm_10 0.0081 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1432621Z triton_mm_18 0.0093 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1432754Z SingleProcess AUTOTUNE benchmarking takes 0.0809 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:26.1432827Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1432869Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1432927Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1433027Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1433542Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1433596Z graph_break [] 2025-12-04T11:45:26.1433658Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1433732Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1433772Z Autotune Choices Stats: 2025-12-04T11:45:26.1434151Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-12-04T11:45:26.1434198Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1434240Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1434342Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1434578Z triton_mm_35 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1434806Z triton_mm_36 0.0065 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1435032Z triton_mm_31 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1435275Z triton_mm_26 0.0069 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1435524Z triton_mm_25 0.0072 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1435755Z triton_mm_28 0.0075 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1435979Z triton_mm_33 0.0081 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1436206Z triton_mm_29 0.0082 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1436432Z triton_mm_24 0.0085 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1436658Z triton_mm_30 0.0094 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1436791Z SingleProcess AUTOTUNE benchmarking takes 0.1152 seconds and 0.2600 seconds precompiling for 20 choices 2025-12-04T11:45:26.1436844Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1436992Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1437049Z Traceback (most recent call last): 2025-12-04T11:45:26.1437207Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1437247Z method(*args, **kwargs) 2025-12-04T11:45:26.1437402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1437441Z method(*args, **kwargs) 2025-12-04T11:45:26.1437604Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1437641Z with policy(): 2025-12-04T11:45:26.1437796Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1437839Z raise RuntimeError(msg) 2025-12-04T11:45:26.1438232Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1438236Z 2025-12-04T11:45:26.1438311Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1438575Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1438578Z 2025-12-04T11:45:26.1438667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1438750Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1438793Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1438849Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1439351Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1439450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1439487Z graph_break [] 2025-12-04T11:45:26.1439550Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1439625Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1440113Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1440161Z current_size = base.storage().size() 2025-12-04T11:45:26.1440202Z Autotune Choices Stats: 2025-12-04T11:45:26.1440573Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006118999794125557, "best_triton_pos": 0} 2025-12-04T11:45:26.1440621Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1440664Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1440764Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1441000Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1441246Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1441488Z triton_mm_7 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1441717Z triton_mm_6 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1441952Z triton_mm_9 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1442179Z triton_mm_12 0.0068 ms 90.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1442405Z triton_mm_5 0.0080 ms 76.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1442629Z triton_mm_14 0.0080 ms 76.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1442867Z triton_mm_10 0.0081 ms 75.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1443108Z triton_mm_18 0.0093 ms 65.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1443238Z SingleProcess AUTOTUNE benchmarking takes 0.0809 seconds and 0.3624 seconds precompiling for 20 choices 2025-12-04T11:45:26.1443339Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1443381Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1443439Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1443539Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1444031Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1444068Z graph_break [] 2025-12-04T11:45:26.1444131Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1444204Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1444245Z Autotune Choices Stats: 2025-12-04T11:45:26.1444611Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006399999838322401, "best_triton_pos": 0} 2025-12-04T11:45:26.1444678Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1444719Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1444819Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1445053Z triton_mm_35 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1445294Z triton_mm_36 0.0065 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1445521Z triton_mm_31 0.0067 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1445752Z triton_mm_26 0.0069 ms 93.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1445986Z triton_mm_25 0.0072 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1446216Z triton_mm_28 0.0075 ms 85.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1446466Z triton_mm_33 0.0081 ms 78.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1446693Z triton_mm_29 0.0082 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1446930Z triton_mm_24 0.0085 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1447159Z triton_mm_30 0.0094 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1447290Z SingleProcess AUTOTUNE benchmarking takes 0.1152 seconds and 0.2600 seconds precompiling for 20 choices 2025-12-04T11:45:26.1447363Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1447407Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1447466Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1447566Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1448060Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1448096Z graph_break [] 2025-12-04T11:45:26.1448161Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1448233Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1448274Z Autotune Choices Stats: 2025-12-04T11:45:26.1448646Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1448703Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1448744Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1448841Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1449087Z triton_mm_54 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1449314Z triton_mm_50 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1449546Z triton_mm_55 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1449775Z triton_mm_45 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1450007Z triton_mm_47 0.0069 ms 89.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1450251Z triton_mm_44 0.0071 ms 87.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1450487Z triton_mm_48 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1450713Z triton_mm_52 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1450938Z triton_mm_43 0.0086 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1451167Z triton_mm_49 0.0090 ms 68.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1451298Z SingleProcess AUTOTUNE benchmarking takes 0.1365 seconds and 0.2391 seconds precompiling for 20 choices 2025-12-04T11:45:26.1451494Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-8f982b81d9939abb.xml - 2025-12-04T11:45:26.1451554Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1452158Z FAILED [0.7647s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1452173Z 2025-12-04T11:45:26.1452247Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1452512Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1452514Z 2025-12-04T11:45:26.1452602Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1452664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1452743Z ================== 1 failed, 187 deselected, 2 rerun in 4.02s ================== 2025-12-04T11:45:26.1452782Z Got exit code 1 2025-12-04T11:45:26.1452823Z Retrying single test... 2025-12-04T11:45:26.1452970Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fcbe45d8b8d1830f.xml 2025-12-04T11:45:26.1453027Z ============================= test session starts ============================== 2025-12-04T11:45:26.1453141Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1453182Z cachedir: .pytest_cache 2025-12-04T11:45:26.1453377Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1453425Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1453466Z configfile: pytest.ini 2025-12-04T11:45:26.1453631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1453706Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1453978Z stepcurrent: skipping 131 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1454022Z Running 1 items in this shard 2025-12-04T11:45:26.1454025Z 2025-12-04T11:45:26.1454245Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3571s] [100%] 2025-12-04T11:45:26.1454477Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8492s] [100%] 2025-12-04T11:45:26.1454673Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.7719s] [100%] 2025-12-04T11:45:26.1454675Z 2025-12-04T11:45:26.1454727Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1454874Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1454921Z Traceback (most recent call last): 2025-12-04T11:45:26.1455081Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1455123Z method(*args, **kwargs) 2025-12-04T11:45:26.1455278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1455319Z method(*args, **kwargs) 2025-12-04T11:45:26.1455470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1455508Z with policy(): 2025-12-04T11:45:26.1455662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1455704Z raise RuntimeError(msg) 2025-12-04T11:45:26.1456099Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1456116Z 2025-12-04T11:45:26.1456190Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1456455Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1456457Z 2025-12-04T11:45:26.1456556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1456631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1456675Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1456732Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1457221Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1457322Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1457360Z graph_break [] 2025-12-04T11:45:26.1457424Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1457499Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1457989Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1458048Z current_size = base.storage().size() 2025-12-04T11:45:26.1458088Z Autotune Choices Stats: 2025-12-04T11:45:26.1458469Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1458516Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1458559Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1458657Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1458898Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1459132Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1459363Z triton_mm_7 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1459591Z triton_mm_12 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1459822Z triton_mm_6 0.0069 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1459877Z _scaled_mm 0.0070 ms 87.9% 2025-12-04T11:45:26.1460107Z triton_mm_9 0.0070 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1460350Z triton_mm_10 0.0079 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1460575Z triton_mm_14 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1460802Z triton_mm_5 0.0087 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1460935Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3523 seconds precompiling for 20 choices 2025-12-04T11:45:26.1461084Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1461131Z Traceback (most recent call last): 2025-12-04T11:45:26.1461287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1461330Z method(*args, **kwargs) 2025-12-04T11:45:26.1461482Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1461541Z method(*args, **kwargs) 2025-12-04T11:45:26.1461691Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1461730Z with policy(): 2025-12-04T11:45:26.1461882Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1461923Z raise RuntimeError(msg) 2025-12-04T11:45:26.1462329Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1462331Z 2025-12-04T11:45:26.1462406Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1462669Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1462670Z 2025-12-04T11:45:26.1462761Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1462835Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1462879Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1462937Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1463466Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1463566Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1463603Z graph_break [] 2025-12-04T11:45:26.1463667Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1463757Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1464250Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1464296Z current_size = base.storage().size() 2025-12-04T11:45:26.1464351Z Autotune Choices Stats: 2025-12-04T11:45:26.1464724Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1464773Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1464815Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1464916Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1465154Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1465389Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1465634Z triton_mm_7 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1465862Z triton_mm_12 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1466107Z triton_mm_6 0.0069 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1466150Z _scaled_mm 0.0070 ms 87.9% 2025-12-04T11:45:26.1466379Z triton_mm_9 0.0070 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1466607Z triton_mm_10 0.0079 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1466835Z triton_mm_14 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1467061Z triton_mm_5 0.0087 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1467191Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3523 seconds precompiling for 20 choices 2025-12-04T11:45:26.1467265Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1467307Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1467365Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1467463Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1467966Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1468003Z graph_break [] 2025-12-04T11:45:26.1468074Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1468148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1468190Z Autotune Choices Stats: 2025-12-04T11:45:26.1468554Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:26.1468602Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1468646Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1468743Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1468981Z triton_mm_36 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1469210Z triton_mm_26 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1469452Z triton_mm_35 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1469690Z triton_mm_31 0.0068 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1469922Z triton_mm_28 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1470154Z triton_mm_25 0.0070 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1470379Z triton_mm_33 0.0078 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1470606Z triton_mm_29 0.0081 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1470832Z triton_mm_24 0.0086 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1471059Z triton_mm_30 0.0095 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1471189Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.2605 seconds precompiling for 20 choices 2025-12-04T11:45:26.1471258Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1471406Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1471453Z Traceback (most recent call last): 2025-12-04T11:45:26.1471614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1471655Z method(*args, **kwargs) 2025-12-04T11:45:26.1471819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1471860Z method(*args, **kwargs) 2025-12-04T11:45:26.1472012Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1472048Z with policy(): 2025-12-04T11:45:26.1472203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1472243Z raise RuntimeError(msg) 2025-12-04T11:45:26.1472638Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1472641Z 2025-12-04T11:45:26.1472714Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1472978Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1472991Z 2025-12-04T11:45:26.1473079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1473154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1473198Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1473284Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1473794Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1473895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1473933Z graph_break [] 2025-12-04T11:45:26.1473994Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1474069Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1474560Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1474608Z current_size = base.storage().size() 2025-12-04T11:45:26.1474648Z Autotune Choices Stats: 2025-12-04T11:45:26.1475019Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1475065Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1475128Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1475227Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1475464Z triton_mm_16 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1475706Z triton_mm_17 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1475936Z triton_mm_7 0.0066 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1476167Z triton_mm_12 0.0067 ms 91.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1476398Z triton_mm_6 0.0069 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1476441Z _scaled_mm 0.0070 ms 87.9% 2025-12-04T11:45:26.1476670Z triton_mm_9 0.0070 ms 87.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1476910Z triton_mm_10 0.0079 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1477137Z triton_mm_14 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1477370Z triton_mm_5 0.0087 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1477502Z SingleProcess AUTOTUNE benchmarking takes 0.0813 seconds and 0.3523 seconds precompiling for 20 choices 2025-12-04T11:45:26.1477575Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1477618Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1477675Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1477776Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1478271Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1478309Z graph_break [] 2025-12-04T11:45:26.1478370Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1478445Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1478485Z Autotune Choices Stats: 2025-12-04T11:45:26.1478853Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006519999820739031, "best_triton_pos": 0} 2025-12-04T11:45:26.1478910Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1478953Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1479051Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1479287Z triton_mm_36 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1479533Z triton_mm_26 0.0066 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1479762Z triton_mm_35 0.0066 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1479990Z triton_mm_31 0.0068 ms 95.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1480223Z triton_mm_28 0.0070 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1480456Z triton_mm_25 0.0070 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1480694Z triton_mm_33 0.0078 ms 83.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1480930Z triton_mm_29 0.0081 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1481156Z triton_mm_24 0.0086 ms 75.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1481384Z triton_mm_30 0.0095 ms 68.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1481516Z SingleProcess AUTOTUNE benchmarking takes 0.1206 seconds and 0.2605 seconds precompiling for 20 choices 2025-12-04T11:45:26.1481590Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1481633Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1481689Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1481789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1482280Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1482318Z graph_break [] 2025-12-04T11:45:26.1482379Z aten_mm_info [('aten._scaled_mm.default_1_2048_1024', 1)] 2025-12-04T11:45:26.1482452Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1482505Z Autotune Choices Stats: 2025-12-04T11:45:26.1482870Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006318999920040369, "best_triton_pos": 0} 2025-12-04T11:45:26.1482917Z AUTOTUNE scaled_mm(1x1024, 1024x2048, , ) 2025-12-04T11:45:26.1482958Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1483066Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1483323Z triton_mm_55 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1483367Z _scaled_mm 0.0064 ms 98.7% 2025-12-04T11:45:26.1483595Z triton_mm_54 0.0066 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1483825Z triton_mm_45 0.0070 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1484057Z triton_mm_47 0.0070 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1484302Z triton_mm_50 0.0070 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1484528Z triton_mm_52 0.0074 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1484768Z triton_mm_44 0.0074 ms 85.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1484995Z triton_mm_48 0.0075 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1485219Z triton_mm_43 0.0082 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1485350Z SingleProcess AUTOTUNE benchmarking takes 0.1376 seconds and 0.2387 seconds precompiling for 20 choices 2025-12-04T11:45:26.1485544Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-fcbe45d8b8d1830f.xml - 2025-12-04T11:45:26.1485604Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1486197Z FAILED [0.7719s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1486200Z 2025-12-04T11:45:26.1486274Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1486552Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1486554Z 2025-12-04T11:45:26.1486642Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1486706Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1486786Z ================== 1 failed, 187 deselected, 2 rerun in 4.00s ================== 2025-12-04T11:45:26.1486825Z Got exit code 1 2025-12-04T11:45:26.1487035Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1487164Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1487311Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-baa9e33bf331f6bb.xml 2025-12-04T11:45:26.1487369Z ============================= test session starts ============================== 2025-12-04T11:45:26.1487483Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1487524Z cachedir: .pytest_cache 2025-12-04T11:45:26.1487683Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1487728Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1487770Z configfile: pytest.ini 2025-12-04T11:45:26.1487932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1488020Z collecting ... collected 188 items / 132 deselected / 56 selected 2025-12-04T11:45:26.1488074Z stepcurrent: skipping 132 already run items. 2025-12-04T11:45:26.1488121Z Running 56 items in this shard 2025-12-04T11:45:26.1488123Z 2025-12-04T11:45:26.1488340Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6967s] [ 1%] 2025-12-04T11:45:26.1488573Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2741s] [ 1%] 2025-12-04T11:45:26.1488764Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2292s] [ 1%] 2025-12-04T11:45:26.1488766Z 2025-12-04T11:45:26.1488817Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1488961Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1489009Z Traceback (most recent call last): 2025-12-04T11:45:26.1489169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1489211Z method(*args, **kwargs) 2025-12-04T11:45:26.1489364Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1489406Z method(*args, **kwargs) 2025-12-04T11:45:26.1489557Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1489596Z with policy(): 2025-12-04T11:45:26.1489749Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1489791Z raise RuntimeError(msg) 2025-12-04T11:45:26.1490181Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1490194Z 2025-12-04T11:45:26.1490269Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1490530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1490532Z 2025-12-04T11:45:26.1490630Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1490705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1490748Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1490804Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1490871Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1490973Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1491010Z graph_break [] 2025-12-04T11:45:26.1491072Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1491214Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1491261Z Traceback (most recent call last): 2025-12-04T11:45:26.1491415Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1491459Z method(*args, **kwargs) 2025-12-04T11:45:26.1491610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1491662Z method(*args, **kwargs) 2025-12-04T11:45:26.1491813Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1491853Z with policy(): 2025-12-04T11:45:26.1492005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1492047Z raise RuntimeError(msg) 2025-12-04T11:45:26.1492443Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1492447Z 2025-12-04T11:45:26.1492521Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1492779Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1492782Z 2025-12-04T11:45:26.1492869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1492944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1492986Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1493044Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1493113Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1493213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1493274Z graph_break [] 2025-12-04T11:45:26.1493335Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1493408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1493451Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1493507Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1493606Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1493690Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1493727Z graph_break [] 2025-12-04T11:45:26.1493785Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1493839Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1493981Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1494028Z Traceback (most recent call last): 2025-12-04T11:45:26.1494196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1494238Z method(*args, **kwargs) 2025-12-04T11:45:26.1494390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1494430Z method(*args, **kwargs) 2025-12-04T11:45:26.1494579Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1494618Z with policy(): 2025-12-04T11:45:26.1494769Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1494811Z raise RuntimeError(msg) 2025-12-04T11:45:26.1495200Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1495205Z 2025-12-04T11:45:26.1495291Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1495549Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1495553Z 2025-12-04T11:45:26.1495639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1495713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1495767Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1495827Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1495891Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1495989Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1496026Z graph_break [] 2025-12-04T11:45:26.1496088Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1496162Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1496204Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1496258Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1496357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1496420Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1496457Z graph_break [] 2025-12-04T11:45:26.1496516Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1496589Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1496629Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1496685Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1496781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1496847Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1496885Z graph_break [] 2025-12-04T11:45:26.1496943Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1497139Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-baa9e33bf331f6bb.xml - 2025-12-04T11:45:26.1497210Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1497804Z FAILED [0.2292s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1497811Z 2025-12-04T11:45:26.1497883Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1498140Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1498144Z 2025-12-04T11:45:26.1498230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1498293Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1498361Z ================== 1 failed, 132 deselected, 2 rerun in 2.22s ================== 2025-12-04T11:45:26.1498400Z Got exit code 1 2025-12-04T11:45:26.1498439Z Retrying single test... 2025-12-04T11:45:26.1498588Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9f4a6024ffbcced0.xml 2025-12-04T11:45:26.1498647Z ============================= test session starts ============================== 2025-12-04T11:45:26.1498757Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1498809Z cachedir: .pytest_cache 2025-12-04T11:45:26.1498967Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1499013Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1499054Z configfile: pytest.ini 2025-12-04T11:45:26.1499214Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1499301Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1499558Z stepcurrent: skipping 132 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1499605Z Running 1 items in this shard 2025-12-04T11:45:26.1499607Z 2025-12-04T11:45:26.1499822Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6693s] [100%] 2025-12-04T11:45:26.1500035Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2664s] [100%] 2025-12-04T11:45:26.1500225Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2258s] [100%] 2025-12-04T11:45:26.1500227Z 2025-12-04T11:45:26.1500280Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1500423Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1500469Z Traceback (most recent call last): 2025-12-04T11:45:26.1500627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1500668Z method(*args, **kwargs) 2025-12-04T11:45:26.1500821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1500860Z method(*args, **kwargs) 2025-12-04T11:45:26.1501025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1501061Z with policy(): 2025-12-04T11:45:26.1501215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1501255Z raise RuntimeError(msg) 2025-12-04T11:45:26.1501652Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1501655Z 2025-12-04T11:45:26.1501729Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1501990Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1501993Z 2025-12-04T11:45:26.1502079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1502154Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1502197Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1502251Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1502319Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1502418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1502473Z graph_break [] 2025-12-04T11:45:26.1502532Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1502673Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1502718Z Traceback (most recent call last): 2025-12-04T11:45:26.1502874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1502915Z method(*args, **kwargs) 2025-12-04T11:45:26.1503076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1503116Z method(*args, **kwargs) 2025-12-04T11:45:26.1503296Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1503334Z with policy(): 2025-12-04T11:45:26.1503486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1503530Z raise RuntimeError(msg) 2025-12-04T11:45:26.1503916Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1503920Z 2025-12-04T11:45:26.1503992Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1504251Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1504254Z 2025-12-04T11:45:26.1504342Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1504417Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1504461Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1504520Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1504586Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1504705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1504741Z graph_break [] 2025-12-04T11:45:26.1504800Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1504873Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1504915Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1504969Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1505066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1505143Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1505181Z graph_break [] 2025-12-04T11:45:26.1505240Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1505292Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1505434Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1505482Z Traceback (most recent call last): 2025-12-04T11:45:26.1505637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1505678Z method(*args, **kwargs) 2025-12-04T11:45:26.1505831Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1505870Z method(*args, **kwargs) 2025-12-04T11:45:26.1506022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1506058Z with policy(): 2025-12-04T11:45:26.1506230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1506271Z raise RuntimeError(msg) 2025-12-04T11:45:26.1506660Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1506676Z 2025-12-04T11:45:26.1506749Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1507007Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1507009Z 2025-12-04T11:45:26.1507095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1507170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1507211Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1507267Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1507333Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1507430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1507465Z graph_break [] 2025-12-04T11:45:26.1507528Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1507600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1507641Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1507696Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1507792Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1507857Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1507895Z graph_break [] 2025-12-04T11:45:26.1507953Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1508026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1508077Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1508134Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1508229Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1508296Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1508331Z graph_break [] 2025-12-04T11:45:26.1508389Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1508592Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9f4a6024ffbcced0.xml - 2025-12-04T11:45:26.1508654Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1509235Z FAILED [0.2258s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1509239Z 2025-12-04T11:45:26.1509313Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1509569Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1509572Z 2025-12-04T11:45:26.1509657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1509731Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1509799Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.1509836Z Got exit code 1 2025-12-04T11:45:26.1509877Z Retrying single test... 2025-12-04T11:45:26.1510022Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e993d88b4691ebfe.xml 2025-12-04T11:45:26.1510078Z ============================= test session starts ============================== 2025-12-04T11:45:26.1510198Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1510239Z cachedir: .pytest_cache 2025-12-04T11:45:26.1510400Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1510445Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1510486Z configfile: pytest.ini 2025-12-04T11:45:26.1510649Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1510724Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1510978Z stepcurrent: skipping 132 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1511022Z Running 1 items in this shard 2025-12-04T11:45:26.1511024Z 2025-12-04T11:45:26.1511240Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7604s] [100%] 2025-12-04T11:45:26.1511452Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3682s] [100%] 2025-12-04T11:45:26.1511641Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3349s] [100%] 2025-12-04T11:45:26.1511643Z 2025-12-04T11:45:26.1511693Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1511845Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1511891Z Traceback (most recent call last): 2025-12-04T11:45:26.1512049Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1512090Z method(*args, **kwargs) 2025-12-04T11:45:26.1512244Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1512294Z method(*args, **kwargs) 2025-12-04T11:45:26.1512447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1512484Z with policy(): 2025-12-04T11:45:26.1512640Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1512681Z raise RuntimeError(msg) 2025-12-04T11:45:26.1513072Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1513075Z 2025-12-04T11:45:26.1513148Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1513437Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1513455Z 2025-12-04T11:45:26.1513543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1513615Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1513658Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1513715Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1513780Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1513891Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1513929Z graph_break [] 2025-12-04T11:45:26.1513988Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1514129Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1514175Z Traceback (most recent call last): 2025-12-04T11:45:26.1514328Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1514369Z method(*args, **kwargs) 2025-12-04T11:45:26.1514519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1514560Z method(*args, **kwargs) 2025-12-04T11:45:26.1514710Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1514747Z with policy(): 2025-12-04T11:45:26.1514903Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1514944Z raise RuntimeError(msg) 2025-12-04T11:45:26.1515336Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1515339Z 2025-12-04T11:45:26.1515411Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1515669Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1515685Z 2025-12-04T11:45:26.1515771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1515845Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1515887Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1515942Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1516009Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1516125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1516164Z graph_break [] 2025-12-04T11:45:26.1516223Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1516296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1516337Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1516393Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1516488Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1516553Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1516590Z graph_break [] 2025-12-04T11:45:26.1516648Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1516702Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1516846Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1516891Z Traceback (most recent call last): 2025-12-04T11:45:26.1517063Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1517103Z method(*args, **kwargs) 2025-12-04T11:45:26.1517254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1517295Z method(*args, **kwargs) 2025-12-04T11:45:26.1517446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1517493Z with policy(): 2025-12-04T11:45:26.1517648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1517690Z raise RuntimeError(msg) 2025-12-04T11:45:26.1518081Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1518084Z 2025-12-04T11:45:26.1518156Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1518415Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1518417Z 2025-12-04T11:45:26.1518506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1518578Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1518624Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1518681Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1518749Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1518847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1518885Z graph_break [] 2025-12-04T11:45:26.1518943Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1519016Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1519068Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1519125Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1519223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1519290Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1519326Z graph_break [] 2025-12-04T11:45:26.1519384Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1519466Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1519509Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1519564Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1519661Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1519724Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1519763Z graph_break [] 2025-12-04T11:45:26.1519819Z aten_mm_info [('aten._scaled_mm.default_1_16_16', 1)] 2025-12-04T11:45:26.1520012Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e993d88b4691ebfe.xml - 2025-12-04T11:45:26.1520072Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1520656Z FAILED [0.3349s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1520671Z 2025-12-04T11:45:26.1520743Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1521002Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1521004Z 2025-12-04T11:45:26.1521102Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1521164Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1521232Z ================== 1 failed, 187 deselected, 2 rerun in 2.48s ================== 2025-12-04T11:45:26.1521269Z Got exit code 1 2025-12-04T11:45:26.1521479Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1521608Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1521753Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c129f70651612fea.xml 2025-12-04T11:45:26.1521811Z ============================= test session starts ============================== 2025-12-04T11:45:26.1521922Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1521963Z cachedir: .pytest_cache 2025-12-04T11:45:26.1522122Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1522167Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1522208Z configfile: pytest.ini 2025-12-04T11:45:26.1522368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1522445Z collecting ... collected 188 items / 133 deselected / 55 selected 2025-12-04T11:45:26.1522499Z stepcurrent: skipping 133 already run items. 2025-12-04T11:45:26.1522545Z Running 55 items in this shard 2025-12-04T11:45:26.1522557Z 2025-12-04T11:45:26.1522783Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7715s] [ 1%] 2025-12-04T11:45:26.1523002Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3411s] [ 1%] 2025-12-04T11:45:26.1523191Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3053s] [ 1%] 2025-12-04T11:45:26.1523204Z 2025-12-04T11:45:26.1523280Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1523425Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1523470Z Traceback (most recent call last): 2025-12-04T11:45:26.1523631Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1523673Z method(*args, **kwargs) 2025-12-04T11:45:26.1523829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1523869Z method(*args, **kwargs) 2025-12-04T11:45:26.1524023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1524059Z with policy(): 2025-12-04T11:45:26.1524213Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1524274Z raise RuntimeError(msg) 2025-12-04T11:45:26.1524662Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1524666Z 2025-12-04T11:45:26.1524737Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1525012Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1525015Z 2025-12-04T11:45:26.1525101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1525176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1525217Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1525274Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1525338Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1525436Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1525472Z graph_break [] 2025-12-04T11:45:26.1525533Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1525676Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1525724Z Traceback (most recent call last): 2025-12-04T11:45:26.1525879Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1525920Z method(*args, **kwargs) 2025-12-04T11:45:26.1526071Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1526112Z method(*args, **kwargs) 2025-12-04T11:45:26.1526265Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1526300Z with policy(): 2025-12-04T11:45:26.1526467Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1526509Z raise RuntimeError(msg) 2025-12-04T11:45:26.1526900Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1526902Z 2025-12-04T11:45:26.1526989Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1527251Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1527254Z 2025-12-04T11:45:26.1527339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1527414Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1527457Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1527516Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1527583Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1527682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1527718Z graph_break [] 2025-12-04T11:45:26.1527778Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1527852Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1527905Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1527960Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1528057Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1528122Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1528161Z graph_break [] 2025-12-04T11:45:26.1528219Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1528272Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1528423Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1528470Z Traceback (most recent call last): 2025-12-04T11:45:26.1528625Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1528666Z method(*args, **kwargs) 2025-12-04T11:45:26.1528817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1528858Z method(*args, **kwargs) 2025-12-04T11:45:26.1529008Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1529046Z with policy(): 2025-12-04T11:45:26.1529203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1529245Z raise RuntimeError(msg) 2025-12-04T11:45:26.1529638Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1529640Z 2025-12-04T11:45:26.1529714Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1529975Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1529997Z 2025-12-04T11:45:26.1530084Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1530158Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1530199Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1530257Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1530321Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1530422Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1530469Z graph_break [] 2025-12-04T11:45:26.1530529Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1530604Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1530647Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1530701Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1530798Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1530862Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1530899Z graph_break [] 2025-12-04T11:45:26.1530956Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1531032Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1531074Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1531129Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1531224Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1531290Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1531337Z graph_break [] 2025-12-04T11:45:26.1531396Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1531588Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c129f70651612fea.xml - 2025-12-04T11:45:26.1531650Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1532249Z FAILED [0.3053s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1532252Z 2025-12-04T11:45:26.1532324Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1532587Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1532589Z 2025-12-04T11:45:26.1532675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1532740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1532808Z ================== 1 failed, 133 deselected, 2 rerun in 2.44s ================== 2025-12-04T11:45:26.1532847Z Got exit code 1 2025-12-04T11:45:26.1532889Z Retrying single test... 2025-12-04T11:45:26.1533036Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f69c48c2a53da992.xml 2025-12-04T11:45:26.1533093Z ============================= test session starts ============================== 2025-12-04T11:45:26.1533202Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1533244Z cachedir: .pytest_cache 2025-12-04T11:45:26.1533427Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1533487Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1533528Z configfile: pytest.ini 2025-12-04T11:45:26.1533688Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1533763Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1534019Z stepcurrent: skipping 133 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1534080Z Running 1 items in this shard 2025-12-04T11:45:26.1534083Z 2025-12-04T11:45:26.1534299Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6664s] [100%] 2025-12-04T11:45:26.1534516Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2710s] [100%] 2025-12-04T11:45:26.1534711Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2258s] [100%] 2025-12-04T11:45:26.1534713Z 2025-12-04T11:45:26.1534765Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1534909Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1534954Z Traceback (most recent call last): 2025-12-04T11:45:26.1535112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1535167Z method(*args, **kwargs) 2025-12-04T11:45:26.1535320Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1535360Z method(*args, **kwargs) 2025-12-04T11:45:26.1535513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1535551Z with policy(): 2025-12-04T11:45:26.1535719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1535762Z raise RuntimeError(msg) 2025-12-04T11:45:26.1536152Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1536155Z 2025-12-04T11:45:26.1536228Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1536487Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1536490Z 2025-12-04T11:45:26.1536577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1536651Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1536696Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1536751Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1536818Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1536918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1536954Z graph_break [] 2025-12-04T11:45:26.1537016Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1537160Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1537206Z Traceback (most recent call last): 2025-12-04T11:45:26.1537371Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1537412Z method(*args, **kwargs) 2025-12-04T11:45:26.1537564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1537604Z method(*args, **kwargs) 2025-12-04T11:45:26.1537753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1537801Z with policy(): 2025-12-04T11:45:26.1537956Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1537998Z raise RuntimeError(msg) 2025-12-04T11:45:26.1538391Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1538394Z 2025-12-04T11:45:26.1538466Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1538726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1538728Z 2025-12-04T11:45:26.1538815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1538888Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1538942Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1538998Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1539064Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1539164Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1539201Z graph_break [] 2025-12-04T11:45:26.1539259Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1539342Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1539385Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1539441Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1539536Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1539602Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1539638Z graph_break [] 2025-12-04T11:45:26.1539697Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1539750Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1539892Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1539939Z Traceback (most recent call last): 2025-12-04T11:45:26.1540098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1540140Z method(*args, **kwargs) 2025-12-04T11:45:26.1540293Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1540333Z method(*args, **kwargs) 2025-12-04T11:45:26.1540485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1540522Z with policy(): 2025-12-04T11:45:26.1540676Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1540717Z raise RuntimeError(msg) 2025-12-04T11:45:26.1541106Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1541121Z 2025-12-04T11:45:26.1541193Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1541464Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1541467Z 2025-12-04T11:45:26.1541554Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1541628Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1541670Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1541727Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1541793Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1541892Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1541928Z graph_break [] 2025-12-04T11:45:26.1541988Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1542063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1542107Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1542162Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1542259Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1542334Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1542370Z graph_break [] 2025-12-04T11:45:26.1542428Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1542500Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1542543Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1542597Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1542692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1542772Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1542811Z graph_break [] 2025-12-04T11:45:26.1542869Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1543062Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f69c48c2a53da992.xml - 2025-12-04T11:45:26.1543121Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1543747Z FAILED [0.2258s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1543750Z 2025-12-04T11:45:26.1543824Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1544084Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1544086Z 2025-12-04T11:45:26.1544171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1544235Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1544302Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.1544341Z Got exit code 1 2025-12-04T11:45:26.1544397Z Retrying single test... 2025-12-04T11:45:26.1544545Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-99dd922f3c02f464.xml 2025-12-04T11:45:26.1544601Z ============================= test session starts ============================== 2025-12-04T11:45:26.1544712Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1544754Z cachedir: .pytest_cache 2025-12-04T11:45:26.1544925Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1544972Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1545013Z configfile: pytest.ini 2025-12-04T11:45:26.1545177Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1545252Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1545513Z stepcurrent: skipping 133 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1545558Z Running 1 items in this shard 2025-12-04T11:45:26.1545561Z 2025-12-04T11:45:26.1545778Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7916s] [100%] 2025-12-04T11:45:26.1545991Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3344s] [100%] 2025-12-04T11:45:26.1546197Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3012s] [100%] 2025-12-04T11:45:26.1546199Z 2025-12-04T11:45:26.1546249Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1546393Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1546439Z Traceback (most recent call last): 2025-12-04T11:45:26.1546610Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1546652Z method(*args, **kwargs) 2025-12-04T11:45:26.1546804Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1546845Z method(*args, **kwargs) 2025-12-04T11:45:26.1546996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1547035Z with policy(): 2025-12-04T11:45:26.1547187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1547230Z raise RuntimeError(msg) 2025-12-04T11:45:26.1547616Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1547618Z 2025-12-04T11:45:26.1547692Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1547954Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1547957Z 2025-12-04T11:45:26.1548043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1548116Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1548169Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1548225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1548291Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1548388Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1548425Z graph_break [] 2025-12-04T11:45:26.1548483Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1548626Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1548682Z Traceback (most recent call last): 2025-12-04T11:45:26.1548837Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1548878Z method(*args, **kwargs) 2025-12-04T11:45:26.1549031Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1549071Z method(*args, **kwargs) 2025-12-04T11:45:26.1549222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1549258Z with policy(): 2025-12-04T11:45:26.1549412Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1549452Z raise RuntimeError(msg) 2025-12-04T11:45:26.1549839Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1549852Z 2025-12-04T11:45:26.1549927Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1550187Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1550191Z 2025-12-04T11:45:26.1550288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1550362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1550406Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1550462Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1550529Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1550626Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1550665Z graph_break [] 2025-12-04T11:45:26.1550722Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1550797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1550841Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1550898Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1550993Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1551059Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1551095Z graph_break [] 2025-12-04T11:45:26.1551153Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1551205Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1551349Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1551396Z Traceback (most recent call last): 2025-12-04T11:45:26.1551550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1551590Z method(*args, **kwargs) 2025-12-04T11:45:26.1551754Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1551793Z method(*args, **kwargs) 2025-12-04T11:45:26.1551947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1551983Z with policy(): 2025-12-04T11:45:26.1552137Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1552179Z raise RuntimeError(msg) 2025-12-04T11:45:26.1552577Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1552581Z 2025-12-04T11:45:26.1552654Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1552912Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1552915Z 2025-12-04T11:45:26.1553002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1553077Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1553121Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1553177Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1553244Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1553398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1553437Z graph_break [] 2025-12-04T11:45:26.1553495Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1553571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1553612Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1553667Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1553775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1553843Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1553879Z graph_break [] 2025-12-04T11:45:26.1553938Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1554012Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1554055Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1554112Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1554212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1554275Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1554313Z graph_break [] 2025-12-04T11:45:26.1554370Z aten_mm_info [('aten._scaled_mm.default_1_2048_16', 1)] 2025-12-04T11:45:26.1554561Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-99dd922f3c02f464.xml - 2025-12-04T11:45:26.1554621Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1555207Z FAILED [0.3012s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1555211Z 2025-12-04T11:45:26.1555285Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1555556Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1555559Z 2025-12-04T11:45:26.1555647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1555709Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1555789Z ================== 1 failed, 187 deselected, 2 rerun in 2.45s ================== 2025-12-04T11:45:26.1555826Z Got exit code 1 2025-12-04T11:45:26.1556035Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1556164Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1556310Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-287630e4a4ba94f0.xml 2025-12-04T11:45:26.1556366Z ============================= test session starts ============================== 2025-12-04T11:45:26.1556477Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1556518Z cachedir: .pytest_cache 2025-12-04T11:45:26.1556675Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1556721Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1556762Z configfile: pytest.ini 2025-12-04T11:45:26.1556922Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1557017Z collecting ... collected 188 items / 134 deselected / 54 selected 2025-12-04T11:45:26.1557070Z stepcurrent: skipping 134 already run items. 2025-12-04T11:45:26.1557115Z Running 54 items in this shard 2025-12-04T11:45:26.1557117Z 2025-12-04T11:45:26.1557333Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8354s] [ 1%] 2025-12-04T11:45:26.1557558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3876s] [ 1%] 2025-12-04T11:45:26.1557748Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.3451s] [ 1%] 2025-12-04T11:45:26.1557751Z 2025-12-04T11:45:26.1557803Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1557946Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1557991Z Traceback (most recent call last): 2025-12-04T11:45:26.1558150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1558190Z method(*args, **kwargs) 2025-12-04T11:45:26.1558343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1558383Z method(*args, **kwargs) 2025-12-04T11:45:26.1558533Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1558571Z with policy(): 2025-12-04T11:45:26.1558727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1558769Z raise RuntimeError(msg) 2025-12-04T11:45:26.1559153Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.1559166Z 2025-12-04T11:45:26.1559238Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1559498Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1559500Z 2025-12-04T11:45:26.1559598Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1559670Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1559715Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1559770Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1560262Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1560360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1560398Z graph_break [] 2025-12-04T11:45:26.1560458Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1560532Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1561030Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1561092Z current_size = base.storage().size() 2025-12-04T11:45:26.1561132Z Autotune Choices Stats: 2025-12-04T11:45:26.1561519Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1561567Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1561609Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1561710Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1561946Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1561991Z _scaled_mm 0.0183 ms 33.3% 2025-12-04T11:45:26.1562119Z SingleProcess AUTOTUNE benchmarking takes 0.0124 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.1562261Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1562307Z Traceback (most recent call last): 2025-12-04T11:45:26.1562464Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1562505Z method(*args, **kwargs) 2025-12-04T11:45:26.1562662Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1562703Z method(*args, **kwargs) 2025-12-04T11:45:26.1562855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1562903Z with policy(): 2025-12-04T11:45:26.1563057Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1563101Z raise RuntimeError(msg) 2025-12-04T11:45:26.1563526Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.1563544Z 2025-12-04T11:45:26.1563618Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1563877Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1563879Z 2025-12-04T11:45:26.1563966Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1564040Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1564083Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1564139Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1564633Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1564748Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1564786Z graph_break [] 2025-12-04T11:45:26.1564848Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1564923Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1565423Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1565471Z current_size = base.storage().size() 2025-12-04T11:45:26.1565511Z Autotune Choices Stats: 2025-12-04T11:45:26.1565884Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1565930Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1565972Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1566071Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1566307Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1566350Z _scaled_mm 0.0183 ms 33.3% 2025-12-04T11:45:26.1566480Z SingleProcess AUTOTUNE benchmarking takes 0.0124 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.1566555Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1566599Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1566656Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1566755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1567256Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1567294Z graph_break [] 2025-12-04T11:45:26.1567355Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1567438Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1567479Z Autotune Choices Stats: 2025-12-04T11:45:26.1569289Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.1569339Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1569380Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1569482Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1569718Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1569761Z _scaled_mm 0.0191 ms 32.6% 2025-12-04T11:45:26.1569888Z SingleProcess AUTOTUNE benchmarking takes 0.0110 seconds and 0.0495 seconds precompiling for 2 choices 2025-12-04T11:45:26.1569959Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1570102Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1570149Z Traceback (most recent call last): 2025-12-04T11:45:26.1570309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1570349Z method(*args, **kwargs) 2025-12-04T11:45:26.1570513Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1570553Z method(*args, **kwargs) 2025-12-04T11:45:26.1570706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1570742Z with policy(): 2025-12-04T11:45:26.1570896Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1570937Z raise RuntimeError(msg) 2025-12-04T11:45:26.1571326Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1571330Z 2025-12-04T11:45:26.1571404Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1571665Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1571668Z 2025-12-04T11:45:26.1571754Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1571830Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1571872Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1571930Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1572413Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1572525Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1572563Z graph_break [] 2025-12-04T11:45:26.1572639Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1572714Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1573198Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1573278Z current_size = base.storage().size() 2025-12-04T11:45:26.1573319Z Autotune Choices Stats: 2025-12-04T11:45:26.1573686Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1573730Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1573773Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1573891Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1574125Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1574167Z _scaled_mm 0.0183 ms 33.3% 2025-12-04T11:45:26.1574295Z SingleProcess AUTOTUNE benchmarking takes 0.0124 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.1574383Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1574426Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1574482Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1574582Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1575059Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1575097Z graph_break [] 2025-12-04T11:45:26.1575157Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1575231Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1575273Z Autotune Choices Stats: 2025-12-04T11:45:26.1575632Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.1575678Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1575718Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1575817Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1576062Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1576104Z _scaled_mm 0.0191 ms 32.6% 2025-12-04T11:45:26.1576230Z SingleProcess AUTOTUNE benchmarking takes 0.0110 seconds and 0.0495 seconds precompiling for 2 choices 2025-12-04T11:45:26.1576305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1576346Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1576417Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1576515Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1576999Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1577038Z graph_break [] 2025-12-04T11:45:26.1577098Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1577171Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1577211Z Autotune Choices Stats: 2025-12-04T11:45:26.1577572Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1577627Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1577667Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1577766Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1578013Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1578055Z _scaled_mm 0.0198 ms 30.8% 2025-12-04T11:45:26.1578182Z SingleProcess AUTOTUNE benchmarking takes 0.0112 seconds and 0.0486 seconds precompiling for 2 choices 2025-12-04T11:45:26.1578374Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-287630e4a4ba94f0.xml - 2025-12-04T11:45:26.1578435Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1579021Z FAILED [0.3451s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1579028Z 2025-12-04T11:45:26.1579100Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1579361Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1579363Z 2025-12-04T11:45:26.1579450Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1579514Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1579581Z ================== 1 failed, 134 deselected, 2 rerun in 2.59s ================== 2025-12-04T11:45:26.1579634Z Got exit code 1 2025-12-04T11:45:26.1579673Z Retrying single test... 2025-12-04T11:45:26.1579818Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e7e4c3683c9ffae7.xml 2025-12-04T11:45:26.1579874Z ============================= test session starts ============================== 2025-12-04T11:45:26.1579994Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1580034Z cachedir: .pytest_cache 2025-12-04T11:45:26.1580205Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1580250Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1580293Z configfile: pytest.ini 2025-12-04T11:45:26.1580457Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1580532Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1580788Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1580833Z Running 1 items in this shard 2025-12-04T11:45:26.1580834Z 2025-12-04T11:45:26.1581049Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9160s] [100%] 2025-12-04T11:45:26.1581260Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4726s] [100%] 2025-12-04T11:45:26.1581461Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5811s] [100%] 2025-12-04T11:45:26.1581463Z 2025-12-04T11:45:26.1581513Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1581656Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1581701Z Traceback (most recent call last): 2025-12-04T11:45:26.1581872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1581914Z method(*args, **kwargs) 2025-12-04T11:45:26.1582068Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1582108Z method(*args, **kwargs) 2025-12-04T11:45:26.1582260Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1582298Z with policy(): 2025-12-04T11:45:26.1582451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1582493Z raise RuntimeError(msg) 2025-12-04T11:45:26.1582879Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.1582881Z 2025-12-04T11:45:26.1582954Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1583212Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1583215Z 2025-12-04T11:45:26.1583335Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1583408Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1583472Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1583528Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1584013Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1584125Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1584162Z graph_break [] 2025-12-04T11:45:26.1584223Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1584296Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1584786Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1584835Z current_size = base.storage().size() 2025-12-04T11:45:26.1584875Z Autotune Choices Stats: 2025-12-04T11:45:26.1585242Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1585301Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1585342Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1585442Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1585673Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1585728Z _scaled_mm 0.0206 ms 28.6% 2025-12-04T11:45:26.1585855Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0579 seconds precompiling for 2 choices 2025-12-04T11:45:26.1586000Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1586044Z Traceback (most recent call last): 2025-12-04T11:45:26.1586199Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1586239Z method(*args, **kwargs) 2025-12-04T11:45:26.1586390Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1586431Z method(*args, **kwargs) 2025-12-04T11:45:26.1586583Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1586620Z with policy(): 2025-12-04T11:45:26.1586775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1586816Z raise RuntimeError(msg) 2025-12-04T11:45:26.1587203Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.1587206Z 2025-12-04T11:45:26.1587279Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1587536Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1587554Z 2025-12-04T11:45:26.1587643Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1587716Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1587759Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1587814Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1588307Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1588407Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1588444Z graph_break [] 2025-12-04T11:45:26.1588503Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1588577Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1589067Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1589125Z current_size = base.storage().size() 2025-12-04T11:45:26.1589165Z Autotune Choices Stats: 2025-12-04T11:45:26.1589531Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1589576Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1589626Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1589726Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1589959Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1590001Z _scaled_mm 0.0206 ms 28.6% 2025-12-04T11:45:26.1590127Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0579 seconds precompiling for 2 choices 2025-12-04T11:45:26.1590200Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1590242Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1590299Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1590397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1590880Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1590916Z graph_break [] 2025-12-04T11:45:26.1590977Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1591050Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1591090Z Autotune Choices Stats: 2025-12-04T11:45:26.1591457Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1591513Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1591553Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1591650Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1591890Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1591932Z _scaled_mm 0.0194 ms 31.8% 2025-12-04T11:45:26.1592059Z SingleProcess AUTOTUNE benchmarking takes 0.0111 seconds and 0.0448 seconds precompiling for 2 choices 2025-12-04T11:45:26.1592112Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1592254Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1592300Z Traceback (most recent call last): 2025-12-04T11:45:26.1592456Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1592495Z method(*args, **kwargs) 2025-12-04T11:45:26.1592648Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1592688Z method(*args, **kwargs) 2025-12-04T11:45:26.1592851Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1592887Z with policy(): 2025-12-04T11:45:26.1593040Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1593082Z raise RuntimeError(msg) 2025-12-04T11:45:26.1593526Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1593529Z 2025-12-04T11:45:26.1593602Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1593861Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1593864Z 2025-12-04T11:45:26.1593950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1594023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1594067Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1594123Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1594602Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1594699Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1594737Z graph_break [] 2025-12-04T11:45:26.1594796Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1594869Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1595366Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1595414Z current_size = base.storage().size() 2025-12-04T11:45:26.1595454Z Autotune Choices Stats: 2025-12-04T11:45:26.1595832Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1595878Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1595918Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1596017Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1596249Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1596293Z _scaled_mm 0.0206 ms 28.6% 2025-12-04T11:45:26.1596419Z SingleProcess AUTOTUNE benchmarking takes 0.0128 seconds and 0.0579 seconds precompiling for 2 choices 2025-12-04T11:45:26.1596493Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1596534Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1596605Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1596703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1597194Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1597231Z graph_break [] 2025-12-04T11:45:26.1597291Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1597363Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1597406Z Autotune Choices Stats: 2025-12-04T11:45:26.1597769Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1597815Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1597854Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1597952Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1598183Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1598226Z _scaled_mm 0.0194 ms 31.8% 2025-12-04T11:45:26.1598353Z SingleProcess AUTOTUNE benchmarking takes 0.0111 seconds and 0.0448 seconds precompiling for 2 choices 2025-12-04T11:45:26.1598426Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1598470Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1598526Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1598624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1599112Z inductor [('triton_bundler_save_kernel', 16), ('async_compile_cache_miss', 3), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1599149Z graph_break [] 2025-12-04T11:45:26.1599207Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1599290Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1599330Z Autotune Choices Stats: 2025-12-04T11:45:26.1599688Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.00595899997279048, "best_triton_pos": 0} 2025-12-04T11:45:26.1599732Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1599771Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1599869Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1600101Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1600142Z _scaled_mm 0.0188 ms 31.6% 2025-12-04T11:45:26.1600268Z SingleProcess AUTOTUNE benchmarking takes 0.0145 seconds and 0.1654 seconds precompiling for 2 choices 2025-12-04T11:45:26.1600471Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e7e4c3683c9ffae7.xml - 2025-12-04T11:45:26.1600532Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1601135Z FAILED [0.5811s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1601138Z 2025-12-04T11:45:26.1601211Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1601470Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1601473Z 2025-12-04T11:45:26.1601559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1601623Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1601689Z ================== 1 failed, 187 deselected, 2 rerun in 2.99s ================== 2025-12-04T11:45:26.1601727Z Got exit code 1 2025-12-04T11:45:26.1601767Z Retrying single test... 2025-12-04T11:45:26.1601911Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b803f6c817c97e02.xml 2025-12-04T11:45:26.1601967Z ============================= test session starts ============================== 2025-12-04T11:45:26.1602080Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1602121Z cachedir: .pytest_cache 2025-12-04T11:45:26.1602280Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1602324Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1602365Z configfile: pytest.ini 2025-12-04T11:45:26.1602539Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1602613Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1602869Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1602914Z Running 1 items in this shard 2025-12-04T11:45:26.1602916Z 2025-12-04T11:45:26.1603140Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8389s] [100%] 2025-12-04T11:45:26.1603387Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3909s] [100%] 2025-12-04T11:45:26.1603576Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda FAILED [0.3471s] [100%] 2025-12-04T11:45:26.1603578Z 2025-12-04T11:45:26.1603628Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1603770Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1603816Z Traceback (most recent call last): 2025-12-04T11:45:26.1603974Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1604014Z method(*args, **kwargs) 2025-12-04T11:45:26.1604182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1604222Z method(*args, **kwargs) 2025-12-04T11:45:26.1604374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1604412Z with policy(): 2025-12-04T11:45:26.1604567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1604622Z raise RuntimeError(msg) 2025-12-04T11:45:26.1605012Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.1605014Z 2025-12-04T11:45:26.1605088Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1605345Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1605348Z 2025-12-04T11:45:26.1605435Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1605508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1605554Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1605610Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1606093Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1606191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1606228Z graph_break [] 2025-12-04T11:45:26.1606287Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1606377Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1606871Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1606933Z current_size = base.storage().size() 2025-12-04T11:45:26.1606975Z Autotune Choices Stats: 2025-12-04T11:45:26.1607337Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1607385Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1607425Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1607525Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1607759Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1607802Z _scaled_mm 0.0197 ms 30.9% 2025-12-04T11:45:26.1607928Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0591 seconds precompiling for 2 choices 2025-12-04T11:45:26.1608082Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1608127Z Traceback (most recent call last): 2025-12-04T11:45:26.1608285Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1608325Z method(*args, **kwargs) 2025-12-04T11:45:26.1608478Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1608530Z method(*args, **kwargs) 2025-12-04T11:45:26.1608682Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1608718Z with policy(): 2025-12-04T11:45:26.1608873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1608914Z raise RuntimeError(msg) 2025-12-04T11:45:26.1609302Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.1609305Z 2025-12-04T11:45:26.1609378Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1609636Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1609638Z 2025-12-04T11:45:26.1609726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1609800Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1609842Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1609899Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1610380Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1610491Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1610528Z graph_break [] 2025-12-04T11:45:26.1610587Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1610661Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1611161Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1611210Z current_size = base.storage().size() 2025-12-04T11:45:26.1611251Z Autotune Choices Stats: 2025-12-04T11:45:26.1611614Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1611659Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1611698Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1611798Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1612039Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1612080Z _scaled_mm 0.0197 ms 30.9% 2025-12-04T11:45:26.1612207Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0591 seconds precompiling for 2 choices 2025-12-04T11:45:26.1612280Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1612332Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1612395Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1612493Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1612974Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1613013Z graph_break [] 2025-12-04T11:45:26.1613074Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1613148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1613191Z Autotune Choices Stats: 2025-12-04T11:45:26.1613589Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1613633Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1613673Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1613772Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1614004Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1614061Z _scaled_mm 0.0186 ms 33.8% 2025-12-04T11:45:26.1614188Z SingleProcess AUTOTUNE benchmarking takes 0.0110 seconds and 0.0468 seconds precompiling for 2 choices 2025-12-04T11:45:26.1614240Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1614382Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1614427Z Traceback (most recent call last): 2025-12-04T11:45:26.1614596Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1614637Z method(*args, **kwargs) 2025-12-04T11:45:26.1614790Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1614829Z method(*args, **kwargs) 2025-12-04T11:45:26.1614982Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1615018Z with policy(): 2025-12-04T11:45:26.1615172Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1615213Z raise RuntimeError(msg) 2025-12-04T11:45:26.1615607Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1615631Z 2025-12-04T11:45:26.1615707Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1615962Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1615966Z 2025-12-04T11:45:26.1616054Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1616141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1616184Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1616244Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1616731Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1616829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1616869Z graph_break [] 2025-12-04T11:45:26.1616928Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1617003Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1617491Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1617538Z current_size = base.storage().size() 2025-12-04T11:45:26.1617580Z Autotune Choices Stats: 2025-12-04T11:45:26.1617948Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1618004Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1618044Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1618143Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1618374Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1618427Z _scaled_mm 0.0197 ms 30.9% 2025-12-04T11:45:26.1618556Z SingleProcess AUTOTUNE benchmarking takes 0.0126 seconds and 0.0591 seconds precompiling for 2 choices 2025-12-04T11:45:26.1618632Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1618673Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1618730Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1618830Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1619311Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1619348Z graph_break [] 2025-12-04T11:45:26.1619408Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1619493Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1619533Z Autotune Choices Stats: 2025-12-04T11:45:26.1619895Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1619940Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1619991Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1620089Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1620320Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1620361Z _scaled_mm 0.0186 ms 33.8% 2025-12-04T11:45:26.1620490Z SingleProcess AUTOTUNE benchmarking takes 0.0110 seconds and 0.0468 seconds precompiling for 2 choices 2025-12-04T11:45:26.1620562Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1620605Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1620660Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1620758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1621238Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1621278Z graph_break [] 2025-12-04T11:45:26.1621336Z aten_mm_info [('aten._scaled_mm.default_1_16_32', 1)] 2025-12-04T11:45:26.1621410Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1621450Z Autotune Choices Stats: 2025-12-04T11:45:26.1621821Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1621865Z AUTOTUNE scaled_mm(1x32, 32x16, , ) 2025-12-04T11:45:26.1621905Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1622003Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1622247Z triton_mm_2 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1622291Z _scaled_mm 0.0175 ms 35.8% 2025-12-04T11:45:26.1622417Z SingleProcess AUTOTUNE benchmarking takes 0.0112 seconds and 0.0464 seconds precompiling for 2 choices 2025-12-04T11:45:26.1622608Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b803f6c817c97e02.xml - 2025-12-04T11:45:26.1622668Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1623287Z FAILED [0.3471s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.1623305Z 2025-12-04T11:45:26.1623378Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1623635Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1623639Z 2025-12-04T11:45:26.1623724Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1623803Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1623871Z ================== 1 failed, 187 deselected, 2 rerun in 2.60s ================== 2025-12-04T11:45:26.1623907Z Got exit code 1 2025-12-04T11:45:26.1624115Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1624244Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1624388Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-56132e91b7a0b8f0.xml 2025-12-04T11:45:26.1624445Z ============================= test session starts ============================== 2025-12-04T11:45:26.1624556Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1624596Z cachedir: .pytest_cache 2025-12-04T11:45:26.1624757Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1624801Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1624841Z configfile: pytest.ini 2025-12-04T11:45:26.1625003Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1625080Z collecting ... collected 188 items / 135 deselected / 53 selected 2025-12-04T11:45:26.1625135Z stepcurrent: skipping 135 already run items. 2025-12-04T11:45:26.1625180Z Running 53 items in this shard 2025-12-04T11:45:26.1625182Z 2025-12-04T11:45:26.1625400Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0510s] [ 1%] 2025-12-04T11:45:26.1625633Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.6837s] [ 1%] 2025-12-04T11:45:26.1625822Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.5932s] [ 1%] 2025-12-04T11:45:26.1625824Z 2025-12-04T11:45:26.1625889Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1626032Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1626080Z Traceback (most recent call last): 2025-12-04T11:45:26.1626238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1626280Z method(*args, **kwargs) 2025-12-04T11:45:26.1626435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1626475Z method(*args, **kwargs) 2025-12-04T11:45:26.1626627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1626664Z with policy(): 2025-12-04T11:45:26.1626817Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1626858Z raise RuntimeError(msg) 2025-12-04T11:45:26.1627245Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1627259Z 2025-12-04T11:45:26.1627333Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1627602Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1627605Z 2025-12-04T11:45:26.1627691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1627766Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1627807Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1627864Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1628342Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1628442Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1628478Z graph_break [] 2025-12-04T11:45:26.1628540Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1628614Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1629098Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1629147Z current_size = base.storage().size() 2025-12-04T11:45:26.1629199Z Autotune Choices Stats: 2025-12-04T11:45:26.1629568Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1629613Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1629654Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1629771Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1630008Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1630236Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1630465Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1630695Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1630920Z triton_mm_6 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1631156Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1631392Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1631619Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1631660Z _scaled_mm 0.0196 ms 31.1% 2025-12-04T11:45:26.1631788Z SingleProcess AUTOTUNE benchmarking takes 0.0348 seconds and 0.1699 seconds precompiling for 9 choices 2025-12-04T11:45:26.1631932Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1631978Z Traceback (most recent call last): 2025-12-04T11:45:26.1632136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1632176Z method(*args, **kwargs) 2025-12-04T11:45:26.1632332Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1632371Z method(*args, **kwargs) 2025-12-04T11:45:26.1632523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1632559Z with policy(): 2025-12-04T11:45:26.1632713Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1632754Z raise RuntimeError(msg) 2025-12-04T11:45:26.1633149Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1633163Z 2025-12-04T11:45:26.1633237Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1633538Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1633540Z 2025-12-04T11:45:26.1633644Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1633719Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1633761Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1633818Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1634299Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1634398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1634435Z graph_break [] 2025-12-04T11:45:26.1634496Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1634570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1635069Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1635117Z current_size = base.storage().size() 2025-12-04T11:45:26.1635158Z Autotune Choices Stats: 2025-12-04T11:45:26.1635538Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1635584Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1635624Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1635727Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1635957Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1636183Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1636407Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1636632Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1636855Z triton_mm_6 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1637094Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1637322Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1637556Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1637599Z _scaled_mm 0.0196 ms 31.1% 2025-12-04T11:45:26.1637726Z SingleProcess AUTOTUNE benchmarking takes 0.0348 seconds and 0.1699 seconds precompiling for 9 choices 2025-12-04T11:45:26.1637801Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1637843Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1637900Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1638000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1638479Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1638525Z graph_break [] 2025-12-04T11:45:26.1638587Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1638660Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1638702Z Autotune Choices Stats: 2025-12-04T11:45:26.1639068Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1639114Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1639154Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1639255Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1639488Z triton_mm_10 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1639714Z triton_mm_15 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1639940Z triton_mm_13 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1640166Z triton_mm_11 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1640392Z triton_mm_8 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1640627Z triton_mm_12 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1640852Z triton_mm_14 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1641087Z triton_mm_9 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1641128Z _scaled_mm 0.0193 ms 30.6% 2025-12-04T11:45:26.1641257Z SingleProcess AUTOTUNE benchmarking takes 0.0332 seconds and 0.0916 seconds precompiling for 9 choices 2025-12-04T11:45:26.1641310Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1641455Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1641500Z Traceback (most recent call last): 2025-12-04T11:45:26.1641657Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1641697Z method(*args, **kwargs) 2025-12-04T11:45:26.1641850Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1641890Z method(*args, **kwargs) 2025-12-04T11:45:26.1642043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1642092Z with policy(): 2025-12-04T11:45:26.1642245Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1642287Z raise RuntimeError(msg) 2025-12-04T11:45:26.1642708Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1642711Z 2025-12-04T11:45:26.1642784Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1643050Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1643053Z 2025-12-04T11:45:26.1643142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1643214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1643291Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1643347Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1643835Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1643934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1643971Z graph_break [] 2025-12-04T11:45:26.1644031Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1644106Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1644588Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1644662Z current_size = base.storage().size() 2025-12-04T11:45:26.1644702Z Autotune Choices Stats: 2025-12-04T11:45:26.1645080Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1645126Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1645166Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1645264Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1645497Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1645725Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1645951Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1646188Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1646413Z triton_mm_6 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1646656Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1646881Z triton_mm_0 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1647105Z triton_mm_4 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1647146Z _scaled_mm 0.0196 ms 31.1% 2025-12-04T11:45:26.1647273Z SingleProcess AUTOTUNE benchmarking takes 0.0348 seconds and 0.1699 seconds precompiling for 9 choices 2025-12-04T11:45:26.1647348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1647389Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1647446Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1647545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1648027Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1648079Z graph_break [] 2025-12-04T11:45:26.1648140Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1648213Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1648254Z Autotune Choices Stats: 2025-12-04T11:45:26.1648623Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.1648667Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1648711Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1648808Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1649039Z triton_mm_10 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1649268Z triton_mm_15 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1649494Z triton_mm_13 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1649722Z triton_mm_11 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1649958Z triton_mm_8 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1650195Z triton_mm_12 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1650418Z triton_mm_14 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1650642Z triton_mm_9 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1650683Z _scaled_mm 0.0193 ms 30.6% 2025-12-04T11:45:26.1650811Z SingleProcess AUTOTUNE benchmarking takes 0.0332 seconds and 0.0916 seconds precompiling for 9 choices 2025-12-04T11:45:26.1650885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1650928Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1650985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1651085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1651566Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1651605Z graph_break [] 2025-12-04T11:45:26.1651665Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1651749Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1651789Z Autotune Choices Stats: 2025-12-04T11:45:26.1652150Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_19", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1652212Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1652254Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1652353Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1652588Z triton_mm_19 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1652817Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1653042Z triton_mm_23 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1653297Z triton_mm_20 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1653538Z triton_mm_21 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1653781Z triton_mm_17 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1654006Z triton_mm_18 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1654229Z triton_mm_22 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1654271Z _scaled_mm 0.0206 ms 29.3% 2025-12-04T11:45:26.1654397Z SingleProcess AUTOTUNE benchmarking takes 0.0485 seconds and 0.1795 seconds precompiling for 9 choices 2025-12-04T11:45:26.1654587Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-56132e91b7a0b8f0.xml - 2025-12-04T11:45:26.1654647Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1655239Z FAILED [0.5932s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1655243Z 2025-12-04T11:45:26.1655317Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1655577Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1655592Z 2025-12-04T11:45:26.1655680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1655743Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1655811Z ================== 1 failed, 135 deselected, 2 rerun in 3.35s ================== 2025-12-04T11:45:26.1655849Z Got exit code 1 2025-12-04T11:45:26.1655891Z Retrying single test... 2025-12-04T11:45:26.1656050Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c37fded33a39a6a4.xml 2025-12-04T11:45:26.1656109Z ============================= test session starts ============================== 2025-12-04T11:45:26.1656220Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1656261Z cachedir: .pytest_cache 2025-12-04T11:45:26.1656419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1656465Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1656505Z configfile: pytest.ini 2025-12-04T11:45:26.1656670Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1656744Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1657001Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1657054Z Running 1 items in this shard 2025-12-04T11:45:26.1657056Z 2025-12-04T11:45:26.1657270Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0296s] [100%] 2025-12-04T11:45:26.1657485Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7022s] [100%] 2025-12-04T11:45:26.1657691Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.5892s] [100%] 2025-12-04T11:45:26.1657694Z 2025-12-04T11:45:26.1657745Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1657889Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1657936Z Traceback (most recent call last): 2025-12-04T11:45:26.1658095Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1658138Z method(*args, **kwargs) 2025-12-04T11:45:26.1658292Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1658335Z method(*args, **kwargs) 2025-12-04T11:45:26.1658486Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1658525Z with policy(): 2025-12-04T11:45:26.1658678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1658721Z raise RuntimeError(msg) 2025-12-04T11:45:26.1659108Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1659112Z 2025-12-04T11:45:26.1659186Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1659458Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1659461Z 2025-12-04T11:45:26.1659549Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1659623Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1659668Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1659735Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1660218Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1660319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1660355Z graph_break [] 2025-12-04T11:45:26.1660417Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1660490Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1660980Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1661039Z current_size = base.storage().size() 2025-12-04T11:45:26.1661082Z Autotune Choices Stats: 2025-12-04T11:45:26.1661446Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1661503Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1661544Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1661643Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1661878Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1662107Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1662334Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1662560Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1662788Z triton_mm_1 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1663011Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1663246Z triton_mm_0 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1663511Z triton_mm_4 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1663552Z _scaled_mm 0.0224 ms 27.2% 2025-12-04T11:45:26.1663698Z SingleProcess AUTOTUNE benchmarking takes 0.0423 seconds and 0.1618 seconds precompiling for 9 choices 2025-12-04T11:45:26.1663843Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1663889Z Traceback (most recent call last): 2025-12-04T11:45:26.1664043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1664085Z method(*args, **kwargs) 2025-12-04T11:45:26.1664236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1664280Z method(*args, **kwargs) 2025-12-04T11:45:26.1664430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1664468Z with policy(): 2025-12-04T11:45:26.1664620Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1664662Z raise RuntimeError(msg) 2025-12-04T11:45:26.1665069Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1665074Z 2025-12-04T11:45:26.1665147Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1665420Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1665422Z 2025-12-04T11:45:26.1665509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1665587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1665629Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1665689Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1666171Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1666272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1666308Z graph_break [] 2025-12-04T11:45:26.1666369Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1666442Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1666928Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1666990Z current_size = base.storage().size() 2025-12-04T11:45:26.1667031Z Autotune Choices Stats: 2025-12-04T11:45:26.1667398Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1667442Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1667483Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1667591Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1667824Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1668052Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1668280Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1668503Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1668740Z triton_mm_1 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1668963Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1669199Z triton_mm_0 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1669424Z triton_mm_4 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1669465Z _scaled_mm 0.0224 ms 27.2% 2025-12-04T11:45:26.1669593Z SingleProcess AUTOTUNE benchmarking takes 0.0423 seconds and 0.1618 seconds precompiling for 9 choices 2025-12-04T11:45:26.1669666Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1669710Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1669767Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1669868Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1670345Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1670384Z graph_break [] 2025-12-04T11:45:26.1670445Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1670517Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1670557Z Autotune Choices Stats: 2025-12-04T11:45:26.1670927Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006440000142902136, "best_triton_pos": 0} 2025-12-04T11:45:26.1670972Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1671011Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1671111Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1671352Z triton_mm_10 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1671579Z triton_mm_8 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1671804Z triton_mm_9 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1672029Z triton_mm_13 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1672258Z triton_mm_11 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1672507Z triton_mm_12 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1672744Z triton_mm_14 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1672967Z triton_mm_15 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1673009Z _scaled_mm 0.0199 ms 32.4% 2025-12-04T11:45:26.1673138Z SingleProcess AUTOTUNE benchmarking takes 0.0378 seconds and 0.0875 seconds precompiling for 9 choices 2025-12-04T11:45:26.1673194Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1673365Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1673412Z Traceback (most recent call last): 2025-12-04T11:45:26.1673567Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1673609Z method(*args, **kwargs) 2025-12-04T11:45:26.1673763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1673803Z method(*args, **kwargs) 2025-12-04T11:45:26.1673955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1673991Z with policy(): 2025-12-04T11:45:26.1674146Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1674187Z raise RuntimeError(msg) 2025-12-04T11:45:26.1674577Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1674595Z 2025-12-04T11:45:26.1674670Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1674930Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1674945Z 2025-12-04T11:45:26.1675032Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1675108Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1675150Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1675209Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1675695Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1675794Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1675831Z graph_break [] 2025-12-04T11:45:26.1675892Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1675965Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1676462Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1676511Z current_size = base.storage().size() 2025-12-04T11:45:26.1676552Z Autotune Choices Stats: 2025-12-04T11:45:26.1676933Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1676977Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1677019Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1677117Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1677349Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1677580Z triton_mm_3 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1677807Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1678030Z triton_mm_2 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1678253Z triton_mm_1 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1678486Z triton_mm_6 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1678719Z triton_mm_0 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1678942Z triton_mm_4 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1678983Z _scaled_mm 0.0224 ms 27.2% 2025-12-04T11:45:26.1679113Z SingleProcess AUTOTUNE benchmarking takes 0.0423 seconds and 0.1618 seconds precompiling for 9 choices 2025-12-04T11:45:26.1679188Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1679230Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1679287Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1679387Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1679871Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1679920Z graph_break [] 2025-12-04T11:45:26.1679981Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1680055Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1680096Z Autotune Choices Stats: 2025-12-04T11:45:26.1680465Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006440000142902136, "best_triton_pos": 0} 2025-12-04T11:45:26.1680510Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1680551Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1680650Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1680879Z triton_mm_10 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1681104Z triton_mm_8 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1681327Z triton_mm_9 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1681554Z triton_mm_13 0.0065 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1681784Z triton_mm_11 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1682023Z triton_mm_12 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1682246Z triton_mm_14 0.0066 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1682480Z triton_mm_15 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1682526Z _scaled_mm 0.0199 ms 32.4% 2025-12-04T11:45:26.1682653Z SingleProcess AUTOTUNE benchmarking takes 0.0378 seconds and 0.0875 seconds precompiling for 9 choices 2025-12-04T11:45:26.1682726Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1682770Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1682826Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1682927Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1683436Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1683486Z graph_break [] 2025-12-04T11:45:26.1683546Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1683620Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1683660Z Autotune Choices Stats: 2025-12-04T11:45:26.1684031Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.1684075Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1684115Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1684214Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1684445Z triton_mm_18 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1684671Z triton_mm_17 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1684899Z triton_mm_20 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1685126Z triton_mm_16 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1685350Z triton_mm_19 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1685576Z triton_mm_21 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1685818Z triton_mm_22 0.0064 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1686060Z triton_mm_23 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1686102Z _scaled_mm 0.0195 ms 30.8% 2025-12-04T11:45:26.1686229Z SingleProcess AUTOTUNE benchmarking takes 0.0550 seconds and 0.1828 seconds precompiling for 9 choices 2025-12-04T11:45:26.1686420Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c37fded33a39a6a4.xml - 2025-12-04T11:45:26.1686481Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1687069Z FAILED [0.5892s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1687073Z 2025-12-04T11:45:26.1687147Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1687420Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1687422Z 2025-12-04T11:45:26.1687509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1687572Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1687650Z ================== 1 failed, 187 deselected, 2 rerun in 3.34s ================== 2025-12-04T11:45:26.1687687Z Got exit code 1 2025-12-04T11:45:26.1687727Z Retrying single test... 2025-12-04T11:45:26.1687874Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-239afe9ea16dc942.xml 2025-12-04T11:45:26.1687932Z ============================= test session starts ============================== 2025-12-04T11:45:26.1688043Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1688085Z cachedir: .pytest_cache 2025-12-04T11:45:26.1688243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1688291Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1688331Z configfile: pytest.ini 2025-12-04T11:45:26.1688492Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1688567Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1688824Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1688867Z Running 1 items in this shard 2025-12-04T11:45:26.1688870Z 2025-12-04T11:45:26.1689085Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0939s] [100%] 2025-12-04T11:45:26.1689298Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7859s] [100%] 2025-12-04T11:45:26.1689500Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6794s] [100%] 2025-12-04T11:45:26.1689502Z 2025-12-04T11:45:26.1689553Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1689697Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1689743Z Traceback (most recent call last): 2025-12-04T11:45:26.1689911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1689953Z method(*args, **kwargs) 2025-12-04T11:45:26.1690107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1690147Z method(*args, **kwargs) 2025-12-04T11:45:26.1690299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1690336Z with policy(): 2025-12-04T11:45:26.1690490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1690531Z raise RuntimeError(msg) 2025-12-04T11:45:26.1690920Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.1690934Z 2025-12-04T11:45:26.1691008Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1691266Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1691269Z 2025-12-04T11:45:26.1691357Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1691439Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1691484Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1691540Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1692021Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1692121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1692159Z graph_break [] 2025-12-04T11:45:26.1692219Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1692293Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1692782Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1692829Z current_size = base.storage().size() 2025-12-04T11:45:26.1692871Z Autotune Choices Stats: 2025-12-04T11:45:26.1693234Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1693322Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1693361Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1693462Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1693712Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1693940Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1694167Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1694394Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1694620Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1694844Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1695080Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1695320Z triton_mm_4 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1695363Z _scaled_mm 0.0223 ms 26.3% 2025-12-04T11:45:26.1695490Z SingleProcess AUTOTUNE benchmarking takes 0.0439 seconds and 0.1537 seconds precompiling for 9 choices 2025-12-04T11:45:26.1695634Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1695681Z Traceback (most recent call last): 2025-12-04T11:45:26.1695835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1695876Z method(*args, **kwargs) 2025-12-04T11:45:26.1696028Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1696069Z method(*args, **kwargs) 2025-12-04T11:45:26.1696220Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1696257Z with policy(): 2025-12-04T11:45:26.1696410Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1696452Z raise RuntimeError(msg) 2025-12-04T11:45:26.1696848Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.1696851Z 2025-12-04T11:45:26.1696928Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1697200Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1697203Z 2025-12-04T11:45:26.1697292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1697365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1697409Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1697475Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1697953Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1698054Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1698090Z graph_break [] 2025-12-04T11:45:26.1698152Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1698225Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1698707Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1698765Z current_size = base.storage().size() 2025-12-04T11:45:26.1698806Z Autotune Choices Stats: 2025-12-04T11:45:26.1699178Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1699225Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1699265Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1699364Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1699593Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1699822Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1700048Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1700271Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1700495Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1700722Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1700965Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1701191Z triton_mm_4 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1701242Z _scaled_mm 0.0223 ms 26.3% 2025-12-04T11:45:26.1701371Z SingleProcess AUTOTUNE benchmarking takes 0.0439 seconds and 0.1537 seconds precompiling for 9 choices 2025-12-04T11:45:26.1701446Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1701488Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1701545Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1701647Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1702126Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1702163Z graph_break [] 2025-12-04T11:45:26.1702225Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1702299Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1702357Z Autotune Choices Stats: 2025-12-04T11:45:26.1702715Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1702760Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1702811Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1702909Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1703144Z triton_mm_13 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1703396Z triton_mm_8 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1703620Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1703849Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1704074Z triton_mm_15 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1704300Z triton_mm_9 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1704526Z triton_mm_12 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1704766Z triton_mm_14 0.0080 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1704807Z _scaled_mm 0.0227 ms 26.8% 2025-12-04T11:45:26.1704949Z SingleProcess AUTOTUNE benchmarking takes 0.0393 seconds and 0.0932 seconds precompiling for 9 choices 2025-12-04T11:45:26.1705004Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1705148Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1705194Z Traceback (most recent call last): 2025-12-04T11:45:26.1705350Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1705391Z method(*args, **kwargs) 2025-12-04T11:45:26.1705544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1705584Z method(*args, **kwargs) 2025-12-04T11:45:26.1705737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1705776Z with policy(): 2025-12-04T11:45:26.1705929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1705984Z raise RuntimeError(msg) 2025-12-04T11:45:26.1706371Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1706374Z 2025-12-04T11:45:26.1706449Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1706721Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1706725Z 2025-12-04T11:45:26.1706813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1706887Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1706930Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1706989Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1707467Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1707567Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1707603Z graph_break [] 2025-12-04T11:45:26.1707664Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1707736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1708224Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1708282Z current_size = base.storage().size() 2025-12-04T11:45:26.1708323Z Autotune Choices Stats: 2025-12-04T11:45:26.1708682Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.1708727Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1708779Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1708878Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1709108Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1709336Z triton_mm_1 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1709561Z triton_mm_3 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1709785Z triton_mm_6 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1710021Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1710249Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1710481Z triton_mm_2 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1710709Z triton_mm_4 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1710752Z _scaled_mm 0.0223 ms 26.3% 2025-12-04T11:45:26.1710882Z SingleProcess AUTOTUNE benchmarking takes 0.0439 seconds and 0.1537 seconds precompiling for 9 choices 2025-12-04T11:45:26.1710955Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1711002Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1711057Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1711158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1711634Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1711675Z graph_break [] 2025-12-04T11:45:26.1711735Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1711808Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1711849Z Autotune Choices Stats: 2025-12-04T11:45:26.1712223Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_13", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.1712267Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1712307Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1712415Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1712645Z triton_mm_13 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1712877Z triton_mm_8 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1713106Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1713358Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1713581Z triton_mm_15 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1713823Z triton_mm_9 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1714072Z triton_mm_12 0.0071 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1714296Z triton_mm_14 0.0080 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1714338Z _scaled_mm 0.0227 ms 26.8% 2025-12-04T11:45:26.1714470Z SingleProcess AUTOTUNE benchmarking takes 0.0393 seconds and 0.0932 seconds precompiling for 9 choices 2025-12-04T11:45:26.1714546Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1714587Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1714646Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1714745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1715224Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1715262Z graph_break [] 2025-12-04T11:45:26.1715323Z aten_mm_info [('aten._scaled_mm.default_1_2048_32', 1)] 2025-12-04T11:45:26.1715397Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1715439Z Autotune Choices Stats: 2025-12-04T11:45:26.1715795Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1715853Z AUTOTUNE scaled_mm(1x32, 32x2048, , ) 2025-12-04T11:45:26.1715895Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.1715991Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1716233Z triton_mm_22 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1716462Z triton_mm_19 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1716687Z triton_mm_23 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1716912Z triton_mm_16 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1717143Z triton_mm_21 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1717379Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1717604Z triton_mm_18 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1717839Z triton_mm_20 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1717880Z _scaled_mm 0.0212 ms 28.5% 2025-12-04T11:45:26.1718009Z SingleProcess AUTOTUNE benchmarking takes 0.0557 seconds and 0.1804 seconds precompiling for 9 choices 2025-12-04T11:45:26.1718199Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-239afe9ea16dc942.xml - 2025-12-04T11:45:26.1718261Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1718847Z FAILED [0.6794s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.1718852Z 2025-12-04T11:45:26.1718925Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1719187Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1719190Z 2025-12-04T11:45:26.1719277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1719340Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1719419Z ================== 1 failed, 187 deselected, 2 rerun in 3.58s ================== 2025-12-04T11:45:26.1719459Z Got exit code 1 2025-12-04T11:45:26.1719669Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1719797Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1719951Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-501c713aee3e290a.xml 2025-12-04T11:45:26.1720010Z ============================= test session starts ============================== 2025-12-04T11:45:26.1720121Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1720163Z cachedir: .pytest_cache 2025-12-04T11:45:26.1720320Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1720368Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1720407Z configfile: pytest.ini 2025-12-04T11:45:26.1720570Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1720646Z collecting ... collected 188 items / 136 deselected / 52 selected 2025-12-04T11:45:26.1720703Z stepcurrent: skipping 136 already run items. 2025-12-04T11:45:26.1720746Z Running 52 items in this shard 2025-12-04T11:45:26.1720749Z 2025-12-04T11:45:26.1720979Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3155s] [ 1%] 2025-12-04T11:45:26.1721208Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8428s] [ 1%] 2025-12-04T11:45:26.1721402Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.8701s] [ 1%] 2025-12-04T11:45:26.1721404Z 2025-12-04T11:45:26.1721466Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1721612Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1721659Z Traceback (most recent call last): 2025-12-04T11:45:26.1721825Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1721867Z method(*args, **kwargs) 2025-12-04T11:45:26.1722019Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1722061Z method(*args, **kwargs) 2025-12-04T11:45:26.1722212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1722251Z with policy(): 2025-12-04T11:45:26.1722403Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1722446Z raise RuntimeError(msg) 2025-12-04T11:45:26.1722838Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1722841Z 2025-12-04T11:45:26.1722916Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1723178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1723192Z 2025-12-04T11:45:26.1723313Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1723387Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1723429Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1723485Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1723987Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1724087Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1724126Z graph_break [] 2025-12-04T11:45:26.1724190Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1724262Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1724751Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1724798Z current_size = base.storage().size() 2025-12-04T11:45:26.1724853Z Autotune Choices Stats: 2025-12-04T11:45:26.1725220Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1725269Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1725311Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1725423Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1725658Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1725889Z triton_mm_17 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1726116Z triton_mm_14 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1726348Z triton_mm_8 0.0065 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1726579Z triton_mm_18 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1726807Z triton_mm_9 0.0069 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1727031Z triton_mm_12 0.0078 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1727271Z triton_mm_11 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1727505Z triton_mm_15 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1727728Z triton_mm_13 0.0081 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1727858Z SingleProcess AUTOTUNE benchmarking takes 0.0739 seconds and 0.3504 seconds precompiling for 20 choices 2025-12-04T11:45:26.1728007Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1728053Z Traceback (most recent call last): 2025-12-04T11:45:26.1728212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1728253Z method(*args, **kwargs) 2025-12-04T11:45:26.1728406Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1728448Z method(*args, **kwargs) 2025-12-04T11:45:26.1728599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1728654Z with policy(): 2025-12-04T11:45:26.1728806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1728848Z raise RuntimeError(msg) 2025-12-04T11:45:26.1729251Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1729253Z 2025-12-04T11:45:26.1729329Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1729592Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1729595Z 2025-12-04T11:45:26.1729683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1729755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1729798Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1729855Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1730343Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1730445Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1730482Z graph_break [] 2025-12-04T11:45:26.1730548Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1730621Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1731107Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1731168Z current_size = base.storage().size() 2025-12-04T11:45:26.1731210Z Autotune Choices Stats: 2025-12-04T11:45:26.1731584Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1731635Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1731676Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1731777Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1732009Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1732241Z triton_mm_17 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1732468Z triton_mm_14 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1732709Z triton_mm_8 0.0065 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1732941Z triton_mm_18 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1733178Z triton_mm_9 0.0069 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1733430Z triton_mm_12 0.0078 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1733655Z triton_mm_11 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1733878Z triton_mm_15 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1734102Z triton_mm_13 0.0081 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1734233Z SingleProcess AUTOTUNE benchmarking takes 0.0739 seconds and 0.3504 seconds precompiling for 20 choices 2025-12-04T11:45:26.1734309Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1734352Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1734409Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1734508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1735015Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1735052Z graph_break [] 2025-12-04T11:45:26.1735115Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1735201Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1735244Z Autotune Choices Stats: 2025-12-04T11:45:26.1735604Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.1735652Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1735695Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1735792Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1736028Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1736254Z triton_mm_27 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1736492Z triton_mm_33 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1736732Z triton_mm_35 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1736965Z triton_mm_37 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1737193Z triton_mm_28 0.0069 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1737417Z triton_mm_34 0.0077 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1737642Z triton_mm_31 0.0078 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1737865Z triton_mm_30 0.0082 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1738089Z triton_mm_32 0.0082 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1738218Z SingleProcess AUTOTUNE benchmarking takes 0.1082 seconds and 0.2576 seconds precompiling for 20 choices 2025-12-04T11:45:26.1738284Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1738430Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1738480Z Traceback (most recent call last): 2025-12-04T11:45:26.1738638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1738678Z method(*args, **kwargs) 2025-12-04T11:45:26.1738842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1738882Z method(*args, **kwargs) 2025-12-04T11:45:26.1739037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1739074Z with policy(): 2025-12-04T11:45:26.1739227Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1739269Z raise RuntimeError(msg) 2025-12-04T11:45:26.1739661Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1739664Z 2025-12-04T11:45:26.1739738Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1740001Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1740014Z 2025-12-04T11:45:26.1740101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1740176Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1740218Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1740275Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1740770Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1740869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1740910Z graph_break [] 2025-12-04T11:45:26.1740973Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1741049Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1741539Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1741589Z current_size = base.storage().size() 2025-12-04T11:45:26.1741628Z Autotune Choices Stats: 2025-12-04T11:45:26.1741996Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.1742042Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1742086Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1742196Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1742429Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1742675Z triton_mm_17 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1742899Z triton_mm_14 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1743130Z triton_mm_8 0.0065 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1743384Z triton_mm_18 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1743612Z triton_mm_9 0.0069 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1743835Z triton_mm_12 0.0078 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1744074Z triton_mm_11 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1744310Z triton_mm_15 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1744534Z triton_mm_13 0.0081 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1744662Z SingleProcess AUTOTUNE benchmarking takes 0.0739 seconds and 0.3504 seconds precompiling for 20 choices 2025-12-04T11:45:26.1744736Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1744778Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1744834Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1744934Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1745422Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1745461Z graph_break [] 2025-12-04T11:45:26.1745523Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1745597Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1745637Z Autotune Choices Stats: 2025-12-04T11:45:26.1746000Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.1746064Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1746105Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1746203Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1746448Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1746678Z triton_mm_27 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1746902Z triton_mm_33 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1747129Z triton_mm_35 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1747360Z triton_mm_37 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1747599Z triton_mm_28 0.0069 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1747824Z triton_mm_34 0.0077 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1748057Z triton_mm_31 0.0078 ms 79.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1748281Z triton_mm_30 0.0082 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1748502Z triton_mm_32 0.0082 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1748633Z SingleProcess AUTOTUNE benchmarking takes 0.1082 seconds and 0.2576 seconds precompiling for 20 choices 2025-12-04T11:45:26.1748705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1748749Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1748806Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1748904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1749393Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1749430Z graph_break [] 2025-12-04T11:45:26.1749504Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1749576Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1749617Z Autotune Choices Stats: 2025-12-04T11:45:26.1749979Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_46", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1750036Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1750078Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1750176Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1750407Z triton_mm_46 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1750638Z triton_mm_55 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1750864Z triton_mm_54 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1751087Z triton_mm_52 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1751329Z triton_mm_56 0.0068 ms 89.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1751568Z triton_mm_47 0.0070 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1751793Z triton_mm_49 0.0075 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1752016Z triton_mm_50 0.0075 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1752239Z triton_mm_53 0.0076 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1752463Z triton_mm_51 0.0079 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1752591Z SingleProcess AUTOTUNE benchmarking takes 0.1419 seconds and 0.2418 seconds precompiling for 20 choices 2025-12-04T11:45:26.1752785Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-501c713aee3e290a.xml - 2025-12-04T11:45:26.1752845Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1753472Z FAILED [0.8701s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1753491Z 2025-12-04T11:45:26.1753566Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1753830Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1753846Z 2025-12-04T11:45:26.1753934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1753997Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1754065Z ================== 1 failed, 136 deselected, 2 rerun in 4.05s ================== 2025-12-04T11:45:26.1754102Z Got exit code 1 2025-12-04T11:45:26.1754144Z Retrying single test... 2025-12-04T11:45:26.1754288Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5af4c1c0b63c6daf.xml 2025-12-04T11:45:26.1754345Z ============================= test session starts ============================== 2025-12-04T11:45:26.1754456Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1754497Z cachedir: .pytest_cache 2025-12-04T11:45:26.1754654Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1754700Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1754741Z configfile: pytest.ini 2025-12-04T11:45:26.1754915Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1754989Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1755249Z stepcurrent: skipping 136 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1755293Z Running 1 items in this shard 2025-12-04T11:45:26.1755310Z 2025-12-04T11:45:26.1755531Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3654s] [100%] 2025-12-04T11:45:26.1755749Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8458s] [100%] 2025-12-04T11:45:26.1755944Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7586s] [100%] 2025-12-04T11:45:26.1755946Z 2025-12-04T11:45:26.1755998Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1756145Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1756192Z Traceback (most recent call last): 2025-12-04T11:45:26.1756351Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1756392Z method(*args, **kwargs) 2025-12-04T11:45:26.1756544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1756588Z method(*args, **kwargs) 2025-12-04T11:45:26.1756738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1756776Z with policy(): 2025-12-04T11:45:26.1756928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1756987Z raise RuntimeError(msg) 2025-12-04T11:45:26.1757375Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1757377Z 2025-12-04T11:45:26.1757452Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1757726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1757730Z 2025-12-04T11:45:26.1757819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1757894Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1757937Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1757994Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1758480Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1758579Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1758615Z graph_break [] 2025-12-04T11:45:26.1758691Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1758764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1759259Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1759307Z current_size = base.storage().size() 2025-12-04T11:45:26.1759348Z Autotune Choices Stats: 2025-12-04T11:45:26.1759719Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1759768Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1759809Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1759910Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1760153Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1760383Z triton_mm_8 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1760614Z triton_mm_16 0.0067 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1760839Z triton_mm_14 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1762628Z triton_mm_18 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1762861Z triton_mm_9 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1763104Z triton_mm_12 0.0078 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1763358Z triton_mm_15 0.0078 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1763587Z triton_mm_11 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1763814Z triton_mm_13 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1763945Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3637 seconds precompiling for 20 choices 2025-12-04T11:45:26.1764115Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1764161Z Traceback (most recent call last): 2025-12-04T11:45:26.1764321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1764362Z method(*args, **kwargs) 2025-12-04T11:45:26.1764515Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1764568Z method(*args, **kwargs) 2025-12-04T11:45:26.1764722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1764758Z with policy(): 2025-12-04T11:45:26.1764912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1764953Z raise RuntimeError(msg) 2025-12-04T11:45:26.1765354Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1765357Z 2025-12-04T11:45:26.1765433Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1765698Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1765700Z 2025-12-04T11:45:26.1765789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1765866Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1765910Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1765968Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1766457Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1766570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1766608Z graph_break [] 2025-12-04T11:45:26.1766671Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1766747Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1767250Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1767299Z current_size = base.storage().size() 2025-12-04T11:45:26.1767340Z Autotune Choices Stats: 2025-12-04T11:45:26.1767712Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1767760Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1767801Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1767902Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1768151Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1768381Z triton_mm_8 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1768619Z triton_mm_16 0.0067 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1768846Z triton_mm_14 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1769076Z triton_mm_18 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1769304Z triton_mm_9 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1769532Z triton_mm_12 0.0078 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1769757Z triton_mm_15 0.0078 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1769982Z triton_mm_11 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1770223Z triton_mm_13 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1770352Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3637 seconds precompiling for 20 choices 2025-12-04T11:45:26.1770427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1770470Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1770528Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1770639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1771126Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1771164Z graph_break [] 2025-12-04T11:45:26.1771227Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1771302Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1771343Z Autotune Choices Stats: 2025-12-04T11:45:26.1771707Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1771764Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1771805Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1771904Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1772159Z triton_mm_36 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1772388Z triton_mm_35 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1772615Z triton_mm_27 0.0066 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1772840Z triton_mm_33 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1773071Z triton_mm_37 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1773324Z triton_mm_28 0.0069 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1773550Z triton_mm_31 0.0078 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1773774Z triton_mm_34 0.0079 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1774012Z triton_mm_30 0.0080 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1774240Z triton_mm_32 0.0083 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1774382Z SingleProcess AUTOTUNE benchmarking takes 0.1167 seconds and 0.2583 seconds precompiling for 20 choices 2025-12-04T11:45:26.1774439Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1774585Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1774632Z Traceback (most recent call last): 2025-12-04T11:45:26.1774792Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1774833Z method(*args, **kwargs) 2025-12-04T11:45:26.1774988Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1775029Z method(*args, **kwargs) 2025-12-04T11:45:26.1775180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1775219Z with policy(): 2025-12-04T11:45:26.1775373Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1775429Z raise RuntimeError(msg) 2025-12-04T11:45:26.1775822Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1775826Z 2025-12-04T11:45:26.1775899Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1776178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1776180Z 2025-12-04T11:45:26.1776270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1776345Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1776389Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1776448Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1776939Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1777040Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1777076Z graph_break [] 2025-12-04T11:45:26.1777140Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1777213Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1777704Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1777762Z current_size = base.storage().size() 2025-12-04T11:45:26.1777804Z Autotune Choices Stats: 2025-12-04T11:45:26.1778173Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.1778231Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1778273Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1778373Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1778612Z triton_mm_17 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1778846Z triton_mm_8 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1779071Z triton_mm_16 0.0067 ms 92.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1779297Z triton_mm_14 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1779539Z triton_mm_18 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1779779Z triton_mm_9 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1780004Z triton_mm_12 0.0078 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1780230Z triton_mm_15 0.0078 ms 79.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1780456Z triton_mm_11 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1780686Z triton_mm_13 0.0080 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1780816Z SingleProcess AUTOTUNE benchmarking takes 0.0820 seconds and 0.3637 seconds precompiling for 20 choices 2025-12-04T11:45:26.1780890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1780934Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1780991Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1781093Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1781579Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1781629Z graph_break [] 2025-12-04T11:45:26.1781690Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1781764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1781805Z Autotune Choices Stats: 2025-12-04T11:45:26.1782181Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1782229Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1782272Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1782370Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1782606Z triton_mm_36 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1782835Z triton_mm_35 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1783063Z triton_mm_27 0.0066 ms 95.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1783320Z triton_mm_33 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1783564Z triton_mm_37 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1783795Z triton_mm_28 0.0069 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1784022Z triton_mm_31 0.0078 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1784243Z triton_mm_34 0.0079 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1784470Z triton_mm_30 0.0080 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1784698Z triton_mm_32 0.0083 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1784829Z SingleProcess AUTOTUNE benchmarking takes 0.1167 seconds and 0.2583 seconds precompiling for 20 choices 2025-12-04T11:45:26.1784902Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1784945Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1785018Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1785117Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1785617Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1785655Z graph_break [] 2025-12-04T11:45:26.1785719Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1785793Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1785834Z Autotune Choices Stats: 2025-12-04T11:45:26.1786196Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_54", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.1786245Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1786286Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1786384Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1786616Z triton_mm_54 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1786869Z triton_mm_55 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1787099Z triton_mm_46 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1787337Z triton_mm_52 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1787568Z triton_mm_56 0.0068 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1787796Z triton_mm_47 0.0072 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1788023Z triton_mm_53 0.0077 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1788249Z triton_mm_49 0.0077 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1788475Z triton_mm_50 0.0078 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1788706Z triton_mm_51 0.0080 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1788850Z SingleProcess AUTOTUNE benchmarking takes 0.1337 seconds and 0.2505 seconds precompiling for 20 choices 2025-12-04T11:45:26.1789046Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5af4c1c0b63c6daf.xml - 2025-12-04T11:45:26.1789106Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1789714Z FAILED [0.7586s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1789717Z 2025-12-04T11:45:26.1789790Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1790055Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1790058Z 2025-12-04T11:45:26.1790146Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1790209Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1790279Z ================== 1 failed, 187 deselected, 2 rerun in 3.99s ================== 2025-12-04T11:45:26.1790317Z Got exit code 1 2025-12-04T11:45:26.1790359Z Retrying single test... 2025-12-04T11:45:26.1790518Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bffb3cac270b5490.xml 2025-12-04T11:45:26.1790576Z ============================= test session starts ============================== 2025-12-04T11:45:26.1790688Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1790729Z cachedir: .pytest_cache 2025-12-04T11:45:26.1790900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1790946Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1790987Z configfile: pytest.ini 2025-12-04T11:45:26.1791151Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1791226Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1791485Z stepcurrent: skipping 136 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1791530Z Running 1 items in this shard 2025-12-04T11:45:26.1791532Z 2025-12-04T11:45:26.1791754Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3516s] [100%] 2025-12-04T11:45:26.1791970Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8221s] [100%] 2025-12-04T11:45:26.1792165Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7730s] [100%] 2025-12-04T11:45:26.1792169Z 2025-12-04T11:45:26.1792220Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1792370Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1792416Z Traceback (most recent call last): 2025-12-04T11:45:26.1792575Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1792629Z method(*args, **kwargs) 2025-12-04T11:45:26.1792782Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1792824Z method(*args, **kwargs) 2025-12-04T11:45:26.1792977Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1793015Z with policy(): 2025-12-04T11:45:26.1793180Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1793222Z raise RuntimeError(msg) 2025-12-04T11:45:26.1793680Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.1793683Z 2025-12-04T11:45:26.1793758Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1794023Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1794025Z 2025-12-04T11:45:26.1794113Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1794187Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1794230Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1794303Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1794796Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1794911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1794948Z graph_break [] 2025-12-04T11:45:26.1795015Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1795088Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1795578Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1795626Z current_size = base.storage().size() 2025-12-04T11:45:26.1795666Z Autotune Choices Stats: 2025-12-04T11:45:26.1796039Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1796086Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1796128Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1796229Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1796465Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1796710Z triton_mm_8 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1796936Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1797181Z triton_mm_18 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1797408Z triton_mm_16 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1797640Z triton_mm_9 0.0070 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1797868Z triton_mm_12 0.0075 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1798093Z triton_mm_13 0.0076 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1798327Z triton_mm_11 0.0078 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1798553Z triton_mm_15 0.0080 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1798693Z SingleProcess AUTOTUNE benchmarking takes 0.0876 seconds and 0.3455 seconds precompiling for 20 choices 2025-12-04T11:45:26.1798842Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1798888Z Traceback (most recent call last): 2025-12-04T11:45:26.1799048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1799088Z method(*args, **kwargs) 2025-12-04T11:45:26.1799242Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1799282Z method(*args, **kwargs) 2025-12-04T11:45:26.1799435Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1799471Z with policy(): 2025-12-04T11:45:26.1799626Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1799668Z raise RuntimeError(msg) 2025-12-04T11:45:26.1800068Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.1800071Z 2025-12-04T11:45:26.1800146Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1800407Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1800420Z 2025-12-04T11:45:26.1800509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1800587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1800630Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1800686Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1801189Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1801290Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1801328Z graph_break [] 2025-12-04T11:45:26.1801390Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1801464Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1801952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1801999Z current_size = base.storage().size() 2025-12-04T11:45:26.1802051Z Autotune Choices Stats: 2025-12-04T11:45:26.1802421Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1802469Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1802510Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1802620Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1802857Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1803087Z triton_mm_8 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1803337Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1803570Z triton_mm_18 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1803797Z triton_mm_16 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1804025Z triton_mm_9 0.0070 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1804254Z triton_mm_12 0.0075 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1804494Z triton_mm_13 0.0076 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1804734Z triton_mm_11 0.0078 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1804959Z triton_mm_15 0.0080 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1805090Z SingleProcess AUTOTUNE benchmarking takes 0.0876 seconds and 0.3455 seconds precompiling for 20 choices 2025-12-04T11:45:26.1805165Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1805207Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1805264Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1805364Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1805852Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1805904Z graph_break [] 2025-12-04T11:45:26.1805968Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1806042Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1806085Z Autotune Choices Stats: 2025-12-04T11:45:26.1806460Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.1806507Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1806550Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1806648Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1806880Z triton_mm_35 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1807110Z triton_mm_36 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1807336Z triton_mm_33 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1807567Z triton_mm_27 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1807798Z triton_mm_37 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1808038Z triton_mm_28 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1808268Z triton_mm_31 0.0078 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1808506Z triton_mm_30 0.0079 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1808731Z triton_mm_34 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1808963Z triton_mm_25 0.0084 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1809093Z SingleProcess AUTOTUNE benchmarking takes 0.1201 seconds and 0.2583 seconds precompiling for 20 choices 2025-12-04T11:45:26.1809147Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1809294Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1809342Z Traceback (most recent call last): 2025-12-04T11:45:26.1809512Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1809553Z method(*args, **kwargs) 2025-12-04T11:45:26.1809706Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1809747Z method(*args, **kwargs) 2025-12-04T11:45:26.1809899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1809948Z with policy(): 2025-12-04T11:45:26.1810102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1810145Z raise RuntimeError(msg) 2025-12-04T11:45:26.1810543Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1810548Z 2025-12-04T11:45:26.1810622Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1810889Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1810891Z 2025-12-04T11:45:26.1810980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1811054Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1811096Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1811153Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1811640Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1811752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1811788Z graph_break [] 2025-12-04T11:45:26.1811851Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1811925Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1812425Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1812474Z current_size = base.storage().size() 2025-12-04T11:45:26.1812514Z Autotune Choices Stats: 2025-12-04T11:45:26.1812883Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.1812931Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1812974Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1813072Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1813441Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1813687Z triton_mm_8 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1813914Z triton_mm_14 0.0066 ms 94.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1814159Z triton_mm_18 0.0067 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1814387Z triton_mm_16 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1814618Z triton_mm_9 0.0070 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1814848Z triton_mm_12 0.0075 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1815075Z triton_mm_13 0.0076 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1815300Z triton_mm_11 0.0078 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1815527Z triton_mm_15 0.0080 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1815677Z SingleProcess AUTOTUNE benchmarking takes 0.0876 seconds and 0.3455 seconds precompiling for 20 choices 2025-12-04T11:45:26.1815751Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1815795Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1815854Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1815957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1816459Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1816499Z graph_break [] 2025-12-04T11:45:26.1816562Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1816636Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1816676Z Autotune Choices Stats: 2025-12-04T11:45:26.1817042Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.1817088Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1817130Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1817241Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1817473Z triton_mm_35 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1817717Z triton_mm_36 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1817942Z triton_mm_33 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1818171Z triton_mm_27 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1818402Z triton_mm_37 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1818632Z triton_mm_28 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1818860Z triton_mm_31 0.0078 ms 79.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1819087Z triton_mm_30 0.0079 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1819312Z triton_mm_34 0.0080 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1819553Z triton_mm_25 0.0084 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1819684Z SingleProcess AUTOTUNE benchmarking takes 0.1201 seconds and 0.2583 seconds precompiling for 20 choices 2025-12-04T11:45:26.1819770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1819814Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1819871Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1819973Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1820460Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1820501Z graph_break [] 2025-12-04T11:45:26.1820563Z aten_mm_info [('aten._scaled_mm.default_257_16_1024', 1)] 2025-12-04T11:45:26.1820637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1820677Z Autotune Choices Stats: 2025-12-04T11:45:26.1821043Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.1821104Z AUTOTUNE scaled_mm(257x1024, 1024x16, , ) 2025-12-04T11:45:26.1821145Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1821245Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1821489Z triton_mm_55 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1821717Z triton_mm_52 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1821946Z triton_mm_46 0.0064 ms 93.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1822171Z triton_mm_54 0.0069 ms 87.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1822402Z triton_mm_56 0.0071 ms 85.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.1822632Z triton_mm_47 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1822863Z triton_mm_50 0.0075 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1823105Z triton_mm_49 0.0077 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1823378Z triton_mm_53 0.0080 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1823616Z triton_mm_51 0.0083 ms 72.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.1823748Z SingleProcess AUTOTUNE benchmarking takes 0.1395 seconds and 0.2444 seconds precompiling for 20 choices 2025-12-04T11:45:26.1823942Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bffb3cac270b5490.xml - 2025-12-04T11:45:26.1824004Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1824604Z FAILED [0.7730s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.1824607Z 2025-12-04T11:45:26.1824680Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1824960Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1824963Z 2025-12-04T11:45:26.1825050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1825115Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1825197Z ================== 1 failed, 187 deselected, 2 rerun in 3.97s ================== 2025-12-04T11:45:26.1825236Z Got exit code 1 2025-12-04T11:45:26.1825445Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1825574Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1825718Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-541486264211f8f8.xml 2025-12-04T11:45:26.1825777Z ============================= test session starts ============================== 2025-12-04T11:45:26.1825889Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1825931Z cachedir: .pytest_cache 2025-12-04T11:45:26.1826091Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1826137Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1826178Z configfile: pytest.ini 2025-12-04T11:45:26.1826340Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1826420Z collecting ... collected 188 items / 137 deselected / 51 selected 2025-12-04T11:45:26.1826476Z stepcurrent: skipping 137 already run items. 2025-12-04T11:45:26.1826522Z Running 51 items in this shard 2025-12-04T11:45:26.1826524Z 2025-12-04T11:45:26.1826752Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.9953s] [ 1%] 2025-12-04T11:45:26.1826987Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3988s] [ 1%] 2025-12-04T11:45:26.1827184Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.2118s] [ 1%] 2025-12-04T11:45:26.1827186Z 2025-12-04T11:45:26.1827237Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1827403Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1827451Z Traceback (most recent call last): 2025-12-04T11:45:26.1827612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1827655Z method(*args, **kwargs) 2025-12-04T11:45:26.1827808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1827850Z method(*args, **kwargs) 2025-12-04T11:45:26.1828005Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1828043Z with policy(): 2025-12-04T11:45:26.1828197Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1828240Z raise RuntimeError(msg) 2025-12-04T11:45:26.1828641Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1115684864. 2025-12-04T11:45:26.1828655Z 2025-12-04T11:45:26.1828729Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1828996Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1828998Z 2025-12-04T11:45:26.1829101Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1829177Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1829220Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1829280Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1829772Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1829873Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1829911Z graph_break [] 2025-12-04T11:45:26.1829977Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1830051Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1830537Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1830586Z current_size = base.storage().size() 2025-12-04T11:45:26.1830626Z Autotune Choices Stats: 2025-12-04T11:45:26.1830997Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.009359999559819698, "best_triton_pos": 0} 2025-12-04T11:45:26.1831058Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1831102Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1831201Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1831452Z triton_mm_34 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1831682Z triton_mm_29 0.0094 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1831917Z triton_mm_33 0.0099 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1832142Z triton_mm_21 0.0100 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1832367Z triton_mm_22 0.0103 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1832603Z triton_mm_16 0.0104 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1832837Z triton_mm_30 0.0107 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1833064Z triton_mm_23 0.0112 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1833332Z triton_mm_35 0.0113 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1833560Z triton_mm_15 0.0116 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1833691Z SingleProcess AUTOTUNE benchmarking takes 0.1635 seconds and 0.7348 seconds precompiling for 39 choices 2025-12-04T11:45:26.1833841Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1833888Z Traceback (most recent call last): 2025-12-04T11:45:26.1834048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1834090Z method(*args, **kwargs) 2025-12-04T11:45:26.1834243Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1834285Z method(*args, **kwargs) 2025-12-04T11:45:26.1834436Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1834490Z with policy(): 2025-12-04T11:45:26.1834644Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1834685Z raise RuntimeError(msg) 2025-12-04T11:45:26.1835087Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1115684864 and is now 1212153856. 2025-12-04T11:45:26.1835091Z 2025-12-04T11:45:26.1835178Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1835448Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1835451Z 2025-12-04T11:45:26.1835537Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1835612Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1835655Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1835714Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1836200Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1836314Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1836350Z graph_break [] 2025-12-04T11:45:26.1836415Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1836488Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1836987Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1837034Z current_size = base.storage().size() 2025-12-04T11:45:26.1837077Z Autotune Choices Stats: 2025-12-04T11:45:26.1837448Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.009359999559819698, "best_triton_pos": 0} 2025-12-04T11:45:26.1837497Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1837540Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1837639Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1837877Z triton_mm_34 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1838104Z triton_mm_29 0.0094 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1838336Z triton_mm_33 0.0099 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1838575Z triton_mm_21 0.0100 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1838801Z triton_mm_22 0.0103 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1839034Z triton_mm_16 0.0104 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1839259Z triton_mm_30 0.0107 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1839488Z triton_mm_23 0.0112 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1839719Z triton_mm_35 0.0113 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1839947Z triton_mm_15 0.0116 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1840091Z SingleProcess AUTOTUNE benchmarking takes 0.1635 seconds and 0.7348 seconds precompiling for 39 choices 2025-12-04T11:45:26.1840165Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1840210Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1840268Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1840369Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1840869Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1840906Z graph_break [] 2025-12-04T11:45:26.1840971Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1841045Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1841085Z Autotune Choices Stats: 2025-12-04T11:45:26.1841448Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-12-04T11:45:26.1841496Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1841539Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1841637Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1841868Z triton_mm_67 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1842099Z triton_mm_72 0.0091 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1842343Z triton_mm_71 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1842570Z triton_mm_59 0.0099 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1842806Z triton_mm_68 0.0101 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1843032Z triton_mm_60 0.0102 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1843289Z triton_mm_54 0.0102 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1843518Z triton_mm_61 0.0109 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1843746Z triton_mm_69 0.0113 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1843994Z triton_mm_53 0.0114 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1844126Z SingleProcess AUTOTUNE benchmarking takes 0.2589 seconds and 0.5170 seconds precompiling for 39 choices 2025-12-04T11:45:26.1844191Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1844343Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1844389Z Traceback (most recent call last): 2025-12-04T11:45:26.1844550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1844591Z method(*args, **kwargs) 2025-12-04T11:45:26.1844744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1844785Z method(*args, **kwargs) 2025-12-04T11:45:26.1844937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1844975Z with policy(): 2025-12-04T11:45:26.1845129Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1845170Z raise RuntimeError(msg) 2025-12-04T11:45:26.1845569Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1845572Z 2025-12-04T11:45:26.1845647Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1845912Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1845929Z 2025-12-04T11:45:26.1846017Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1846090Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1846134Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1846190Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1846689Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1846789Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1846828Z graph_break [] 2025-12-04T11:45:26.1846891Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1846965Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1847453Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1847500Z current_size = base.storage().size() 2025-12-04T11:45:26.1847541Z Autotune Choices Stats: 2025-12-04T11:45:26.1847923Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.009359999559819698, "best_triton_pos": 0} 2025-12-04T11:45:26.1847972Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1848013Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1848123Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1848360Z triton_mm_34 0.0094 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1848588Z triton_mm_29 0.0094 ms 99.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1848819Z triton_mm_33 0.0099 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1849048Z triton_mm_21 0.0100 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1849274Z triton_mm_22 0.0103 ms 91.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1849496Z triton_mm_16 0.0104 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1849719Z triton_mm_30 0.0107 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1849957Z triton_mm_23 0.0112 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1850197Z triton_mm_35 0.0113 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1850423Z triton_mm_15 0.0116 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1850554Z SingleProcess AUTOTUNE benchmarking takes 0.1635 seconds and 0.7348 seconds precompiling for 39 choices 2025-12-04T11:45:26.1850629Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1850671Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1850729Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1850830Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1851325Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1851372Z graph_break [] 2025-12-04T11:45:26.1851436Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1851509Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1851551Z Autotune Choices Stats: 2025-12-04T11:45:26.1851925Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008960000239312649, "best_triton_pos": 0} 2025-12-04T11:45:26.1851974Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1852016Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1852115Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1852347Z triton_mm_67 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1852576Z triton_mm_72 0.0091 ms 98.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1852807Z triton_mm_71 0.0093 ms 96.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1853034Z triton_mm_59 0.0099 ms 90.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1853298Z triton_mm_68 0.0101 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1853538Z triton_mm_60 0.0102 ms 88.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1853765Z triton_mm_54 0.0102 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1854006Z triton_mm_61 0.0109 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1854235Z triton_mm_69 0.0113 ms 79.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1854462Z triton_mm_53 0.0114 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1854591Z SingleProcess AUTOTUNE benchmarking takes 0.2589 seconds and 0.5170 seconds precompiling for 39 choices 2025-12-04T11:45:26.1854665Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1854707Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1854766Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1854865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1855366Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1855404Z graph_break [] 2025-12-04T11:45:26.1855468Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1855553Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1855595Z Autotune Choices Stats: 2025-12-04T11:45:26.1855962Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:26.1856011Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1856053Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1856152Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1856390Z triton_mm_105 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1856618Z triton_mm_110 0.0091 ms 96.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1856847Z triton_mm_109 0.0098 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1857075Z triton_mm_97 0.0099 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1857315Z triton_mm_106 0.0100 ms 87.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1857541Z triton_mm_92 0.0103 ms 84.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1857783Z triton_mm_98 0.0104 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1858011Z triton_mm_99 0.0113 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1858238Z triton_mm_91 0.0113 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1858468Z triton_mm_107 0.0114 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1858599Z SingleProcess AUTOTUNE benchmarking takes 0.2615 seconds and 0.3592 seconds precompiling for 39 choices 2025-12-04T11:45:26.1858804Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-541486264211f8f8.xml - 2025-12-04T11:45:26.1858867Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1859483Z FAILED [1.2118s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1859486Z 2025-12-04T11:45:26.1859658Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1859924Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1859927Z 2025-12-04T11:45:26.1860015Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1860079Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1860147Z ================== 1 failed, 137 deselected, 2 rerun in 5.63s ================== 2025-12-04T11:45:26.1860184Z Got exit code 1 2025-12-04T11:45:26.1860224Z Retrying single test... 2025-12-04T11:45:26.1860369Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b103e2902ac9e48.xml 2025-12-04T11:45:26.1860430Z ============================= test session starts ============================== 2025-12-04T11:45:26.1860542Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1860584Z cachedir: .pytest_cache 2025-12-04T11:45:26.1860746Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1860792Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1860831Z configfile: pytest.ini 2025-12-04T11:45:26.1861007Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1861081Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1861348Z stepcurrent: skipping 137 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1861394Z Running 1 items in this shard 2025-12-04T11:45:26.1861396Z 2025-12-04T11:45:26.1861631Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [3.0944s] [100%] 2025-12-04T11:45:26.1861853Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3784s] [100%] 2025-12-04T11:45:26.1862049Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.2040s] [100%] 2025-12-04T11:45:26.1862052Z 2025-12-04T11:45:26.1862106Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1862257Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1862305Z Traceback (most recent call last): 2025-12-04T11:45:26.1862465Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1862508Z method(*args, **kwargs) 2025-12-04T11:45:26.1862675Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1862717Z method(*args, **kwargs) 2025-12-04T11:45:26.1862868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1862906Z with policy(): 2025-12-04T11:45:26.1863059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1863113Z raise RuntimeError(msg) 2025-12-04T11:45:26.1863547Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1115684864. 2025-12-04T11:45:26.1863549Z 2025-12-04T11:45:26.1863623Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1863896Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1863899Z 2025-12-04T11:45:26.1863988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1864063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1864106Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1864163Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1864648Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1864749Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1864787Z graph_break [] 2025-12-04T11:45:26.1864870Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1864943Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1865429Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1865491Z current_size = base.storage().size() 2025-12-04T11:45:26.1865533Z Autotune Choices Stats: 2025-12-04T11:45:26.1865904Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008958999998867512, "best_triton_pos": 0} 2025-12-04T11:45:26.1865953Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1865995Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1866097Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1866336Z triton_mm_34 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1866564Z triton_mm_29 0.0095 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1866807Z triton_mm_33 0.0096 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1867046Z triton_mm_21 0.0100 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1867270Z triton_mm_16 0.0104 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1867495Z triton_mm_22 0.0104 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1867719Z triton_mm_30 0.0107 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1867950Z triton_mm_23 0.0111 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1868179Z triton_mm_35 0.0114 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1868406Z triton_mm_15 0.0114 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1868536Z SingleProcess AUTOTUNE benchmarking takes 0.1743 seconds and 0.7398 seconds precompiling for 39 choices 2025-12-04T11:45:26.1868687Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1868745Z Traceback (most recent call last): 2025-12-04T11:45:26.1868902Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1868944Z method(*args, **kwargs) 2025-12-04T11:45:26.1869098Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1869139Z method(*args, **kwargs) 2025-12-04T11:45:26.1869300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1869339Z with policy(): 2025-12-04T11:45:26.1869493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1869535Z raise RuntimeError(msg) 2025-12-04T11:45:26.1869935Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1115684864 and is now 1212153856. 2025-12-04T11:45:26.1869938Z 2025-12-04T11:45:26.1870012Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1870280Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1870283Z 2025-12-04T11:45:26.1870382Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1870458Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1870500Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1870560Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1871050Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1871152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1871192Z graph_break [] 2025-12-04T11:45:26.1871258Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1871334Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1871821Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1871870Z current_size = base.storage().size() 2025-12-04T11:45:26.1871912Z Autotune Choices Stats: 2025-12-04T11:45:26.1872281Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008958999998867512, "best_triton_pos": 0} 2025-12-04T11:45:26.1872329Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1872371Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1872472Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1872729Z triton_mm_34 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1872955Z triton_mm_29 0.0095 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1873195Z triton_mm_33 0.0096 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1873445Z triton_mm_21 0.0100 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1873669Z triton_mm_16 0.0104 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1873892Z triton_mm_22 0.0104 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1874115Z triton_mm_30 0.0107 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1874365Z triton_mm_23 0.0111 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1874597Z triton_mm_35 0.0114 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1874840Z triton_mm_15 0.0114 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1874971Z SingleProcess AUTOTUNE benchmarking takes 0.1743 seconds and 0.7398 seconds precompiling for 39 choices 2025-12-04T11:45:26.1875045Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1875088Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1875146Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1875245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1875739Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1875777Z graph_break [] 2025-12-04T11:45:26.1875840Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1875915Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1875955Z Autotune Choices Stats: 2025-12-04T11:45:26.1876314Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:26.1876376Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1876419Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1876517Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1876749Z triton_mm_67 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1876990Z triton_mm_72 0.0093 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1877219Z triton_mm_71 0.0095 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1877447Z triton_mm_54 0.0099 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1877670Z triton_mm_68 0.0102 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1877896Z triton_mm_59 0.0103 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1878130Z triton_mm_60 0.0104 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1878374Z triton_mm_53 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1878603Z triton_mm_61 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1878828Z triton_mm_69 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1878959Z SingleProcess AUTOTUNE benchmarking takes 0.2586 seconds and 0.5192 seconds precompiling for 39 choices 2025-12-04T11:45:26.1879012Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1879162Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1879208Z Traceback (most recent call last): 2025-12-04T11:45:26.1879366Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1879406Z method(*args, **kwargs) 2025-12-04T11:45:26.1879560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1879600Z method(*args, **kwargs) 2025-12-04T11:45:26.1879752Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1879789Z with policy(): 2025-12-04T11:45:26.1879943Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1879994Z raise RuntimeError(msg) 2025-12-04T11:45:26.1880391Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1880393Z 2025-12-04T11:45:26.1880467Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1880745Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1880749Z 2025-12-04T11:45:26.1880839Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1880914Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1880960Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1881016Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1881502Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1881600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1881649Z graph_break [] 2025-12-04T11:45:26.1881714Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1881787Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1882281Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1882330Z current_size = base.storage().size() 2025-12-04T11:45:26.1882370Z Autotune Choices Stats: 2025-12-04T11:45:26.1882740Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_34", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.008958999998867512, "best_triton_pos": 0} 2025-12-04T11:45:26.1882789Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1882829Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1882931Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1883168Z triton_mm_34 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1883431Z triton_mm_29 0.0095 ms 94.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1883659Z triton_mm_33 0.0096 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1883885Z triton_mm_21 0.0100 ms 89.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1884134Z triton_mm_16 0.0104 ms 86.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1884371Z triton_mm_22 0.0104 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1884598Z triton_mm_30 0.0107 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1884830Z triton_mm_23 0.0111 ms 80.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1885064Z triton_mm_35 0.0114 ms 78.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1885290Z triton_mm_15 0.0114 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1885420Z SingleProcess AUTOTUNE benchmarking takes 0.1743 seconds and 0.7398 seconds precompiling for 39 choices 2025-12-04T11:45:26.1885508Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1885553Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1885610Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1885712Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1886225Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1886262Z graph_break [] 2025-12-04T11:45:26.1886328Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1886401Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1886443Z Autotune Choices Stats: 2025-12-04T11:45:26.1886804Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_67", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008679999969899654, "best_triton_pos": 0} 2025-12-04T11:45:26.1886854Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1886896Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1886997Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1887229Z triton_mm_67 0.0087 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1887460Z triton_mm_72 0.0093 ms 93.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1887685Z triton_mm_71 0.0095 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1887925Z triton_mm_54 0.0099 ms 87.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1888164Z triton_mm_68 0.0102 ms 84.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1888389Z triton_mm_59 0.0103 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1888613Z triton_mm_60 0.0104 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1888842Z triton_mm_53 0.0111 ms 78.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1889072Z triton_mm_61 0.0112 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1889310Z triton_mm_69 0.0113 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1889439Z SingleProcess AUTOTUNE benchmarking takes 0.2586 seconds and 0.5192 seconds precompiling for 39 choices 2025-12-04T11:45:26.1889513Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1889555Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1889623Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1889723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1890210Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1890247Z graph_break [] 2025-12-04T11:45:26.1890312Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1890386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1890427Z Autotune Choices Stats: 2025-12-04T11:45:26.1890792Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008799999952316284, "best_triton_pos": 0} 2025-12-04T11:45:26.1890841Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1890882Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1890981Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1891216Z triton_mm_105 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1891457Z triton_mm_109 0.0098 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1891686Z triton_mm_110 0.0098 ms 90.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1891739Z _scaled_mm 0.0099 ms 88.7% 2025-12-04T11:45:26.1891966Z triton_mm_97 0.0101 ms 87.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1892191Z triton_mm_92 0.0102 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1892420Z triton_mm_106 0.0102 ms 85.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1892645Z triton_mm_98 0.0106 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1892871Z triton_mm_99 0.0112 ms 78.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1893115Z triton_mm_107 0.0113 ms 78.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1893246Z SingleProcess AUTOTUNE benchmarking takes 0.2574 seconds and 0.3712 seconds precompiling for 39 choices 2025-12-04T11:45:26.1893490Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6b103e2902ac9e48.xml - 2025-12-04T11:45:26.1893551Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1894149Z FAILED [1.2040s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1894154Z 2025-12-04T11:45:26.1894229Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1894493Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1894496Z 2025-12-04T11:45:26.1894583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1894646Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1894714Z ================== 1 failed, 187 deselected, 2 rerun in 5.70s ================== 2025-12-04T11:45:26.1894752Z Got exit code 1 2025-12-04T11:45:26.1894793Z Retrying single test... 2025-12-04T11:45:26.1894938Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b0aac443c2e77ec9.xml 2025-12-04T11:45:26.1895010Z ============================= test session starts ============================== 2025-12-04T11:45:26.1895122Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1895164Z cachedir: .pytest_cache 2025-12-04T11:45:26.1895324Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1895370Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1895410Z configfile: pytest.ini 2025-12-04T11:45:26.1895587Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1895663Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1895923Z stepcurrent: skipping 137 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1895968Z Running 1 items in this shard 2025-12-04T11:45:26.1895970Z 2025-12-04T11:45:26.1896193Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.9846s] [100%] 2025-12-04T11:45:26.1896413Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.3175s] [100%] 2025-12-04T11:45:26.1896608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.2084s] [100%] 2025-12-04T11:45:26.1896624Z 2025-12-04T11:45:26.1896675Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1896823Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1896872Z Traceback (most recent call last): 2025-12-04T11:45:26.1897030Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1897071Z method(*args, **kwargs) 2025-12-04T11:45:26.1897236Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1897279Z method(*args, **kwargs) 2025-12-04T11:45:26.1897430Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1897468Z with policy(): 2025-12-04T11:45:26.1897624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1897666Z raise RuntimeError(msg) 2025-12-04T11:45:26.1898060Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1115684864. 2025-12-04T11:45:26.1898063Z 2025-12-04T11:45:26.1898140Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1898406Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1898408Z 2025-12-04T11:45:26.1898496Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1898571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1898614Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1898672Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1899159Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1899270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1899306Z graph_break [] 2025-12-04T11:45:26.1899382Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1899455Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1899949Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1899996Z current_size = base.storage().size() 2025-12-04T11:45:26.1900038Z Autotune Choices Stats: 2025-12-04T11:45:26.1900412Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008999999612569809, "best_triton_pos": 0} 2025-12-04T11:45:26.1900459Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1900501Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1900622Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1900856Z triton_mm_29 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1901101Z triton_mm_34 0.0092 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1901332Z triton_mm_33 0.0096 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1901556Z triton_mm_21 0.0102 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1901784Z triton_mm_16 0.0103 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1902013Z triton_mm_22 0.0104 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1902237Z triton_mm_30 0.0105 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1902467Z triton_mm_35 0.0112 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1902694Z triton_mm_23 0.0113 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1902934Z triton_mm_15 0.0114 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1903064Z SingleProcess AUTOTUNE benchmarking takes 0.1642 seconds and 0.7647 seconds precompiling for 39 choices 2025-12-04T11:45:26.1903226Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1903302Z Traceback (most recent call last): 2025-12-04T11:45:26.1903460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1903501Z method(*args, **kwargs) 2025-12-04T11:45:26.1903653Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1903696Z method(*args, **kwargs) 2025-12-04T11:45:26.1903846Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1903888Z with policy(): 2025-12-04T11:45:26.1904043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1904084Z raise RuntimeError(msg) 2025-12-04T11:45:26.1904481Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1115684864 and is now 1212153856. 2025-12-04T11:45:26.1904499Z 2025-12-04T11:45:26.1904576Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1904839Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1904841Z 2025-12-04T11:45:26.1904943Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1905017Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1905062Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1905121Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1905610Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1905713Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1905749Z graph_break [] 2025-12-04T11:45:26.1905815Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1905888Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1906379Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1906426Z current_size = base.storage().size() 2025-12-04T11:45:26.1906468Z Autotune Choices Stats: 2025-12-04T11:45:26.1906834Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008999999612569809, "best_triton_pos": 0} 2025-12-04T11:45:26.1906898Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1906940Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1907041Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1907287Z triton_mm_29 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1907517Z triton_mm_34 0.0092 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1907747Z triton_mm_33 0.0096 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1907973Z triton_mm_21 0.0102 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1908199Z triton_mm_16 0.0103 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1908437Z triton_mm_22 0.0104 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1908663Z triton_mm_30 0.0105 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1908904Z triton_mm_35 0.0112 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1909131Z triton_mm_23 0.0113 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1909363Z triton_mm_15 0.0114 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1909493Z SingleProcess AUTOTUNE benchmarking takes 0.1642 seconds and 0.7647 seconds precompiling for 39 choices 2025-12-04T11:45:26.1909567Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1909610Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1909667Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1909766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1910255Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1910304Z graph_break [] 2025-12-04T11:45:26.1910368Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1910441Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1910484Z Autotune Choices Stats: 2025-12-04T11:45:26.1910861Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:26.1910909Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1910953Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1911050Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1911285Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1911509Z triton_mm_67 0.0093 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1911733Z triton_mm_59 0.0096 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1911959Z triton_mm_71 0.0097 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1912014Z _scaled_mm 0.0097 ms 90.1% 2025-12-04T11:45:26.1912238Z triton_mm_68 0.0101 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1912479Z triton_mm_54 0.0103 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1912705Z triton_mm_60 0.0104 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1912932Z triton_mm_53 0.0112 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1913158Z triton_mm_61 0.0114 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1913331Z SingleProcess AUTOTUNE benchmarking takes 0.2606 seconds and 0.5157 seconds precompiling for 39 choices 2025-12-04T11:45:26.1913386Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1913534Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1913585Z Traceback (most recent call last): 2025-12-04T11:45:26.1913742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1913784Z method(*args, **kwargs) 2025-12-04T11:45:26.1913937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1913993Z method(*args, **kwargs) 2025-12-04T11:45:26.1914145Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1914184Z with policy(): 2025-12-04T11:45:26.1914340Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1914383Z raise RuntimeError(msg) 2025-12-04T11:45:26.1914803Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1914807Z 2025-12-04T11:45:26.1914880Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1915144Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1915147Z 2025-12-04T11:45:26.1915234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1915308Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1915352Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1915411Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1915894Z inductor [('triton_bundler_save_kernel', 312), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1916008Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1916046Z graph_break [] 2025-12-04T11:45:26.1916110Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1916196Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1916680Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.1916732Z current_size = base.storage().size() 2025-12-04T11:45:26.1916772Z Autotune Choices Stats: 2025-12-04T11:45:26.1917138Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.008999999612569809, "best_triton_pos": 0} 2025-12-04T11:45:26.1917187Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1917230Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1917329Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1917564Z triton_mm_29 0.0090 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1917794Z triton_mm_34 0.0092 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1918022Z triton_mm_33 0.0096 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1918259Z triton_mm_21 0.0102 ms 87.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1918492Z triton_mm_16 0.0103 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1918716Z triton_mm_22 0.0104 ms 86.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1918941Z triton_mm_30 0.0105 ms 85.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1919173Z triton_mm_35 0.0112 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1919400Z triton_mm_23 0.0113 ms 79.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1919636Z triton_mm_15 0.0114 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1919770Z SingleProcess AUTOTUNE benchmarking takes 0.1642 seconds and 0.7647 seconds precompiling for 39 choices 2025-12-04T11:45:26.1919844Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1919887Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1919954Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1920055Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1920542Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1920583Z graph_break [] 2025-12-04T11:45:26.1920646Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1920722Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1920762Z Autotune Choices Stats: 2025-12-04T11:45:26.1921131Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_72", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.00875999964773655, "best_triton_pos": 0} 2025-12-04T11:45:26.1921181Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1921223Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1921322Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1921555Z triton_mm_72 0.0088 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1921796Z triton_mm_67 0.0093 ms 94.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1922020Z triton_mm_59 0.0096 ms 91.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1922256Z triton_mm_71 0.0097 ms 90.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1922299Z _scaled_mm 0.0097 ms 90.1% 2025-12-04T11:45:26.1922525Z triton_mm_68 0.0101 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1922749Z triton_mm_54 0.0103 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1922980Z triton_mm_60 0.0104 ms 83.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1923208Z triton_mm_53 0.0112 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1923468Z triton_mm_61 0.0114 ms 76.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1923599Z SingleProcess AUTOTUNE benchmarking takes 0.2606 seconds and 0.5157 seconds precompiling for 39 choices 2025-12-04T11:45:26.1923687Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1923730Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1923786Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1923887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1924375Z inductor [('triton_bundler_save_kernel', 312), ('async_compile_cache_miss', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 39), ('generated_module_cache_miss', 38), ('select_algorithm_num_precompiles', 38), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.1924417Z graph_break [] 2025-12-04T11:45:26.1924481Z aten_mm_info [('aten._scaled_mm.default_257_2048_1024', 1)] 2025-12-04T11:45:26.1924558Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.1924598Z Autotune Choices Stats: 2025-12-04T11:45:26.1924961Z {"num_choices": 39, "num_triton_choices": 38, "best_kernel": "triton_mm_105", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.00887999963015318, "best_triton_pos": 0} 2025-12-04T11:45:26.1925010Z AUTOTUNE scaled_mm(257x1024, 1024x2048, , ) 2025-12-04T11:45:26.1925051Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.1925151Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.1925401Z triton_mm_105 0.0089 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1925634Z triton_mm_110 0.0090 ms 99.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1925877Z triton_mm_109 0.0095 ms 93.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1926107Z triton_mm_98 0.0100 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1926331Z triton_mm_97 0.0100 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1926556Z triton_mm_92 0.0103 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1926785Z triton_mm_106 0.0103 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.1927027Z triton_mm_99 0.0110 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1927256Z triton_mm_91 0.0112 ms 79.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1927499Z triton_mm_111 0.0113 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.1927632Z SingleProcess AUTOTUNE benchmarking takes 0.2608 seconds and 0.3687 seconds precompiling for 39 choices 2025-12-04T11:45:26.1927826Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b0aac443c2e77ec9.xml - 2025-12-04T11:45:26.1927887Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1928486Z FAILED [1.2084s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1212153856 and is now 1308622848. 2025-12-04T11:45:26.1928490Z 2025-12-04T11:45:26.1928563Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1928832Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1928835Z 2025-12-04T11:45:26.1928922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1928988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1929072Z ================== 1 failed, 187 deselected, 2 rerun in 5.53s ================== 2025-12-04T11:45:26.1929110Z Got exit code 1 2025-12-04T11:45:26.1929324Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1929452Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1929599Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b4c2eb64a10c17d.xml 2025-12-04T11:45:26.1929666Z ============================= test session starts ============================== 2025-12-04T11:45:26.1929781Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1929821Z cachedir: .pytest_cache 2025-12-04T11:45:26.1929981Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1930028Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1930069Z configfile: pytest.ini 2025-12-04T11:45:26.1930232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1930312Z collecting ... collected 188 items / 138 deselected / 50 selected 2025-12-04T11:45:26.1930366Z stepcurrent: skipping 138 already run items. 2025-12-04T11:45:26.1930412Z Running 50 items in this shard 2025-12-04T11:45:26.1930414Z 2025-12-04T11:45:26.1930633Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6557s] [ 2%] 2025-12-04T11:45:26.1930858Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2628s] [ 2%] 2025-12-04T11:45:26.1931049Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2183s] [ 2%] 2025-12-04T11:45:26.1931051Z 2025-12-04T11:45:26.1931102Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1931256Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1931303Z Traceback (most recent call last): 2025-12-04T11:45:26.1931463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1931504Z method(*args, **kwargs) 2025-12-04T11:45:26.1931659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1931701Z method(*args, **kwargs) 2025-12-04T11:45:26.1931853Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1931890Z with policy(): 2025-12-04T11:45:26.1932045Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1932086Z raise RuntimeError(msg) 2025-12-04T11:45:26.1932477Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1932479Z 2025-12-04T11:45:26.1932553Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1932815Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1932829Z 2025-12-04T11:45:26.1932917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1932992Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1933037Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1933095Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1933161Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1933298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1933355Z graph_break [] 2025-12-04T11:45:26.1933419Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1933565Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1933612Z Traceback (most recent call last): 2025-12-04T11:45:26.1933768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1933811Z method(*args, **kwargs) 2025-12-04T11:45:26.1933964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1934007Z method(*args, **kwargs) 2025-12-04T11:45:26.1934157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1934194Z with policy(): 2025-12-04T11:45:26.1934349Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1934393Z raise RuntimeError(msg) 2025-12-04T11:45:26.1934795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1934798Z 2025-12-04T11:45:26.1934871Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1935144Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1935147Z 2025-12-04T11:45:26.1935234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1935311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1935353Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1935411Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1935478Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1935578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1935616Z graph_break [] 2025-12-04T11:45:26.1935679Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1935753Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1935799Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1935855Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1935954Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1936019Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1936056Z graph_break [] 2025-12-04T11:45:26.1936115Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1936169Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1936312Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1936360Z Traceback (most recent call last): 2025-12-04T11:45:26.1936529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1936572Z method(*args, **kwargs) 2025-12-04T11:45:26.1936724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1936767Z method(*args, **kwargs) 2025-12-04T11:45:26.1936917Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1936965Z with policy(): 2025-12-04T11:45:26.1937120Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1937164Z raise RuntimeError(msg) 2025-12-04T11:45:26.1937550Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1937556Z 2025-12-04T11:45:26.1937630Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1937890Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1937892Z 2025-12-04T11:45:26.1937979Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1938056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1938108Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1938165Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1938231Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1938332Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1938368Z graph_break [] 2025-12-04T11:45:26.1938428Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1938511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1938555Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1938610Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1938710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1938775Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1938812Z graph_break [] 2025-12-04T11:45:26.1938869Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1938944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1938986Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1939042Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1939137Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1939202Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1939239Z graph_break [] 2025-12-04T11:45:26.1939299Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1939492Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7b4c2eb64a10c17d.xml - 2025-12-04T11:45:26.1939555Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1940135Z FAILED [0.2183s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1940151Z 2025-12-04T11:45:26.1940225Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1940487Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1940489Z 2025-12-04T11:45:26.1940586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1940651Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1940720Z ================== 1 failed, 138 deselected, 2 rerun in 2.15s ================== 2025-12-04T11:45:26.1940758Z Got exit code 1 2025-12-04T11:45:26.1940797Z Retrying single test... 2025-12-04T11:45:26.1940944Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8e201afc4b89b25.xml 2025-12-04T11:45:26.1941002Z ============================= test session starts ============================== 2025-12-04T11:45:26.1941114Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1941155Z cachedir: .pytest_cache 2025-12-04T11:45:26.1941313Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1941358Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1941399Z configfile: pytest.ini 2025-12-04T11:45:26.1941559Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1941648Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1941904Z stepcurrent: skipping 138 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1941950Z Running 1 items in this shard 2025-12-04T11:45:26.1941952Z 2025-12-04T11:45:26.1942185Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6391s] [100%] 2025-12-04T11:45:26.1942403Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2669s] [100%] 2025-12-04T11:45:26.1942596Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2190s] [100%] 2025-12-04T11:45:26.1942600Z 2025-12-04T11:45:26.1942651Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1942794Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1942841Z Traceback (most recent call last): 2025-12-04T11:45:26.1943001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1943041Z method(*args, **kwargs) 2025-12-04T11:45:26.1943196Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1943236Z method(*args, **kwargs) 2025-12-04T11:45:26.1943427Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1943465Z with policy(): 2025-12-04T11:45:26.1943624Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1943664Z raise RuntimeError(msg) 2025-12-04T11:45:26.1944053Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1944073Z 2025-12-04T11:45:26.1944148Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1944419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1944422Z 2025-12-04T11:45:26.1944510Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1944586Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1944631Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1944687Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1944755Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1944854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1944890Z graph_break [] 2025-12-04T11:45:26.1944950Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1945095Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1945140Z Traceback (most recent call last): 2025-12-04T11:45:26.1945297Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1945357Z method(*args, **kwargs) 2025-12-04T11:45:26.1945511Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1945550Z method(*args, **kwargs) 2025-12-04T11:45:26.1945704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1945741Z with policy(): 2025-12-04T11:45:26.1945908Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1945951Z raise RuntimeError(msg) 2025-12-04T11:45:26.1946336Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1946339Z 2025-12-04T11:45:26.1946412Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1946670Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1946673Z 2025-12-04T11:45:26.1946759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1946836Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1946879Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1946936Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1947003Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1947104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1947139Z graph_break [] 2025-12-04T11:45:26.1947205Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1947279Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1947322Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1947376Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1947486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1947554Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1947590Z graph_break [] 2025-12-04T11:45:26.1947648Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1947704Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1947860Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1947907Z Traceback (most recent call last): 2025-12-04T11:45:26.1948062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1948103Z method(*args, **kwargs) 2025-12-04T11:45:26.1948255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1948295Z method(*args, **kwargs) 2025-12-04T11:45:26.1948447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1948483Z with policy(): 2025-12-04T11:45:26.1948637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1948678Z raise RuntimeError(msg) 2025-12-04T11:45:26.1949068Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1949081Z 2025-12-04T11:45:26.1949156Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1949415Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1949419Z 2025-12-04T11:45:26.1949516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1949594Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1949637Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1949692Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1949760Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1949859Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1949897Z graph_break [] 2025-12-04T11:45:26.1949958Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1950033Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1950076Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1950131Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1950229Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1950295Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1950333Z graph_break [] 2025-12-04T11:45:26.1950391Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1950467Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1950511Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1950568Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1950664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1950728Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1950764Z graph_break [] 2025-12-04T11:45:26.1950834Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1951024Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c8e201afc4b89b25.xml - 2025-12-04T11:45:26.1951086Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1951681Z FAILED [0.2190s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1951685Z 2025-12-04T11:45:26.1951757Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1952016Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1952020Z 2025-12-04T11:45:26.1952107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1952172Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1952239Z ================== 1 failed, 187 deselected, 2 rerun in 2.14s ================== 2025-12-04T11:45:26.1952278Z Got exit code 1 2025-12-04T11:45:26.1952319Z Retrying single test... 2025-12-04T11:45:26.1952467Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e9bd9d4dcd07f4f0.xml 2025-12-04T11:45:26.1952536Z ============================= test session starts ============================== 2025-12-04T11:45:26.1952646Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1952688Z cachedir: .pytest_cache 2025-12-04T11:45:26.1952846Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1952890Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1952941Z configfile: pytest.ini 2025-12-04T11:45:26.1953102Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1953177Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1953455Z stepcurrent: skipping 138 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1953501Z Running 1 items in this shard 2025-12-04T11:45:26.1953503Z 2025-12-04T11:45:26.1953719Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6720s] [100%] 2025-12-04T11:45:26.1953933Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2594s] [100%] 2025-12-04T11:45:26.1954124Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2384s] [100%] 2025-12-04T11:45:26.1954126Z 2025-12-04T11:45:26.1954176Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1954323Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1954369Z Traceback (most recent call last): 2025-12-04T11:45:26.1954527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1954566Z method(*args, **kwargs) 2025-12-04T11:45:26.1954737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1954776Z method(*args, **kwargs) 2025-12-04T11:45:26.1954930Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1954967Z with policy(): 2025-12-04T11:45:26.1955123Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1955180Z raise RuntimeError(msg) 2025-12-04T11:45:26.1955568Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.1955571Z 2025-12-04T11:45:26.1955648Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1955910Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1955912Z 2025-12-04T11:45:26.1956002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1956075Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1956117Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1956175Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1956243Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1956361Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1956397Z graph_break [] 2025-12-04T11:45:26.1956455Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1956599Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1956646Z Traceback (most recent call last): 2025-12-04T11:45:26.1956815Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1956857Z method(*args, **kwargs) 2025-12-04T11:45:26.1957013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1957054Z method(*args, **kwargs) 2025-12-04T11:45:26.1957205Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1957242Z with policy(): 2025-12-04T11:45:26.1957397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1957439Z raise RuntimeError(msg) 2025-12-04T11:45:26.1957824Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.1957826Z 2025-12-04T11:45:26.1957899Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1958159Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1958162Z 2025-12-04T11:45:26.1958251Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1958323Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1958367Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1958436Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1958503Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1958602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1958640Z graph_break [] 2025-12-04T11:45:26.1958700Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1958775Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1958827Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1958883Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1958980Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1960491Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1960528Z graph_break [] 2025-12-04T11:45:26.1960589Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1960645Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1960788Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1960835Z Traceback (most recent call last): 2025-12-04T11:45:26.1960993Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1961034Z method(*args, **kwargs) 2025-12-04T11:45:26.1961187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1961226Z method(*args, **kwargs) 2025-12-04T11:45:26.1961394Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1961431Z with policy(): 2025-12-04T11:45:26.1961586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1961627Z raise RuntimeError(msg) 2025-12-04T11:45:26.1962027Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1962029Z 2025-12-04T11:45:26.1962104Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1962364Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1962367Z 2025-12-04T11:45:26.1962454Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1962528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1962572Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1962627Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1962694Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1962791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1962828Z graph_break [] 2025-12-04T11:45:26.1962887Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1962961Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1963003Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1963060Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1963155Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1963220Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1963301Z graph_break [] 2025-12-04T11:45:26.1963360Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1963433Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1963475Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1963531Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1963629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1963692Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1963748Z graph_break [] 2025-12-04T11:45:26.1963806Z aten_mm_info [('aten._scaled_mm.default_257_16_16', 1)] 2025-12-04T11:45:26.1964001Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e9bd9d4dcd07f4f0.xml - 2025-12-04T11:45:26.1964063Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1964644Z FAILED [0.2384s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.1964646Z 2025-12-04T11:45:26.1964718Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1964976Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1964992Z 2025-12-04T11:45:26.1965079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1965141Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1965212Z ================== 1 failed, 187 deselected, 2 rerun in 2.19s ================== 2025-12-04T11:45:26.1965250Z Got exit code 1 2025-12-04T11:45:26.1965474Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.1965602Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.1965748Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4af67f81a6ee14f3.xml 2025-12-04T11:45:26.1965805Z ============================= test session starts ============================== 2025-12-04T11:45:26.1965920Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1965961Z cachedir: .pytest_cache 2025-12-04T11:45:26.1966119Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1966165Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1966206Z configfile: pytest.ini 2025-12-04T11:45:26.1966368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1966445Z collecting ... collected 188 items / 139 deselected / 49 selected 2025-12-04T11:45:26.1966500Z stepcurrent: skipping 139 already run items. 2025-12-04T11:45:26.1966545Z Running 49 items in this shard 2025-12-04T11:45:26.1966547Z 2025-12-04T11:45:26.1966769Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6588s] [ 2%] 2025-12-04T11:45:26.1966987Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2534s] [ 2%] 2025-12-04T11:45:26.1967194Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2094s] [ 2%] 2025-12-04T11:45:26.1967197Z 2025-12-04T11:45:26.1967249Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1967396Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1967441Z Traceback (most recent call last): 2025-12-04T11:45:26.1967613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1967656Z method(*args, **kwargs) 2025-12-04T11:45:26.1967809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1967848Z method(*args, **kwargs) 2025-12-04T11:45:26.1968003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1968039Z with policy(): 2025-12-04T11:45:26.1968299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1968341Z raise RuntimeError(msg) 2025-12-04T11:45:26.1968735Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1968749Z 2025-12-04T11:45:26.1968822Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1969085Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1969088Z 2025-12-04T11:45:26.1969175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1969260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1969302Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1969360Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1969425Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1969524Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1969561Z graph_break [] 2025-12-04T11:45:26.1969626Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1969771Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1969818Z Traceback (most recent call last): 2025-12-04T11:45:26.1969971Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1970012Z method(*args, **kwargs) 2025-12-04T11:45:26.1970164Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1970204Z method(*args, **kwargs) 2025-12-04T11:45:26.1970359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1970398Z with policy(): 2025-12-04T11:45:26.1970551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1970592Z raise RuntimeError(msg) 2025-12-04T11:45:26.1970981Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1970994Z 2025-12-04T11:45:26.1971068Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1971329Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1971332Z 2025-12-04T11:45:26.1971435Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1971511Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1971553Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1971610Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1971675Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1971775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1971811Z graph_break [] 2025-12-04T11:45:26.1971872Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1971946Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1971987Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1972042Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1972139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1972203Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1972242Z graph_break [] 2025-12-04T11:45:26.1972313Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1972367Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1972513Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1972560Z Traceback (most recent call last): 2025-12-04T11:45:26.1972715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1972767Z method(*args, **kwargs) 2025-12-04T11:45:26.1972919Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1972960Z method(*args, **kwargs) 2025-12-04T11:45:26.1973112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1973149Z with policy(): 2025-12-04T11:45:26.1973331Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1973372Z raise RuntimeError(msg) 2025-12-04T11:45:26.1973761Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1973765Z 2025-12-04T11:45:26.1973838Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1974100Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1974102Z 2025-12-04T11:45:26.1974188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1974264Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1974306Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1974361Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1974442Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1974541Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1974577Z graph_break [] 2025-12-04T11:45:26.1974638Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1974712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1974755Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1974809Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1974920Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1974986Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1975022Z graph_break [] 2025-12-04T11:45:26.1975081Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1975156Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1975198Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1975253Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1975349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1975412Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1975448Z graph_break [] 2025-12-04T11:45:26.1975507Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1975699Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4af67f81a6ee14f3.xml - 2025-12-04T11:45:26.1975761Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1976360Z FAILED [0.2094s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1976378Z 2025-12-04T11:45:26.1976451Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1976711Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1976713Z 2025-12-04T11:45:26.1976799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1976863Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1976931Z ================== 1 failed, 139 deselected, 2 rerun in 2.14s ================== 2025-12-04T11:45:26.1976969Z Got exit code 1 2025-12-04T11:45:26.1977009Z Retrying single test... 2025-12-04T11:45:26.1977154Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-25308541ade5ca4c.xml 2025-12-04T11:45:26.1977212Z ============================= test session starts ============================== 2025-12-04T11:45:26.1977322Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1977363Z cachedir: .pytest_cache 2025-12-04T11:45:26.1977522Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1977567Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1977609Z configfile: pytest.ini 2025-12-04T11:45:26.1977771Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1977845Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1978113Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1978158Z Running 1 items in this shard 2025-12-04T11:45:26.1978160Z 2025-12-04T11:45:26.1978380Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6520s] [100%] 2025-12-04T11:45:26.1978608Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2723s] [100%] 2025-12-04T11:45:26.1978802Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2259s] [100%] 2025-12-04T11:45:26.1978804Z 2025-12-04T11:45:26.1978855Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1979004Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1979051Z Traceback (most recent call last): 2025-12-04T11:45:26.1979210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1979250Z method(*args, **kwargs) 2025-12-04T11:45:26.1979404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1979443Z method(*args, **kwargs) 2025-12-04T11:45:26.1979597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1979645Z with policy(): 2025-12-04T11:45:26.1979800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1979841Z raise RuntimeError(msg) 2025-12-04T11:45:26.1980241Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1980243Z 2025-12-04T11:45:26.1980318Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1980580Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1980583Z 2025-12-04T11:45:26.1980671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1980745Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1980791Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1980847Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1980913Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1981013Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1981050Z graph_break [] 2025-12-04T11:45:26.1981112Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1981261Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1981306Z Traceback (most recent call last): 2025-12-04T11:45:26.1981463Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1981502Z method(*args, **kwargs) 2025-12-04T11:45:26.1981654Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1981704Z method(*args, **kwargs) 2025-12-04T11:45:26.1981855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1981892Z with policy(): 2025-12-04T11:45:26.1982044Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1982085Z raise RuntimeError(msg) 2025-12-04T11:45:26.1982482Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1982486Z 2025-12-04T11:45:26.1982559Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1982821Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1982823Z 2025-12-04T11:45:26.1982911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1982985Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1983028Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1983083Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1983150Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1983247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1983345Z graph_break [] 2025-12-04T11:45:26.1983405Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1983481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1983523Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1983579Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1983676Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1983756Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1983793Z graph_break [] 2025-12-04T11:45:26.1983852Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1983905Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1984053Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1984100Z Traceback (most recent call last): 2025-12-04T11:45:26.1984253Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1984294Z method(*args, **kwargs) 2025-12-04T11:45:26.1984447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1984487Z method(*args, **kwargs) 2025-12-04T11:45:26.1984638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1984674Z with policy(): 2025-12-04T11:45:26.1984827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1984868Z raise RuntimeError(msg) 2025-12-04T11:45:26.1985257Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1985282Z 2025-12-04T11:45:26.1985355Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1985616Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1985618Z 2025-12-04T11:45:26.1985704Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1985777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1985833Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1985890Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1985956Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1986052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1986088Z graph_break [] 2025-12-04T11:45:26.1986149Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1986224Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1986264Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1986320Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1986417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1986481Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1986516Z graph_break [] 2025-12-04T11:45:26.1986577Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1986649Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1986702Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1986756Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1986851Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1986915Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1986951Z graph_break [] 2025-12-04T11:45:26.1987009Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1987211Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-25308541ade5ca4c.xml - 2025-12-04T11:45:26.1987271Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1987862Z FAILED [0.2259s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1987866Z 2025-12-04T11:45:26.1987939Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1988202Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1988204Z 2025-12-04T11:45:26.1988292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1988353Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1988423Z ================== 1 failed, 187 deselected, 2 rerun in 2.17s ================== 2025-12-04T11:45:26.1988460Z Got exit code 1 2025-12-04T11:45:26.1988501Z Retrying single test... 2025-12-04T11:45:26.1988645Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-525a5b82e2f91e29.xml 2025-12-04T11:45:26.1988702Z ============================= test session starts ============================== 2025-12-04T11:45:26.1988824Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.1988865Z cachedir: .pytest_cache 2025-12-04T11:45:26.1989023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.1989068Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.1989108Z configfile: pytest.ini 2025-12-04T11:45:26.1989279Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.1989353Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.1989611Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1989654Z Running 1 items in this shard 2025-12-04T11:45:26.1989657Z 2025-12-04T11:45:26.1989877Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6514s] [100%] 2025-12-04T11:45:26.1990095Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2891s] [100%] 2025-12-04T11:45:26.1990290Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2268s] [100%] 2025-12-04T11:45:26.1990292Z 2025-12-04T11:45:26.1990343Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.1990501Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1990547Z Traceback (most recent call last): 2025-12-04T11:45:26.1990704Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1990746Z method(*args, **kwargs) 2025-12-04T11:45:26.1990909Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1990950Z method(*args, **kwargs) 2025-12-04T11:45:26.1991101Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1991138Z with policy(): 2025-12-04T11:45:26.1991291Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1991334Z raise RuntimeError(msg) 2025-12-04T11:45:26.1991722Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1113587712. 2025-12-04T11:45:26.1991725Z 2025-12-04T11:45:26.1991799Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1992059Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1992062Z 2025-12-04T11:45:26.1992150Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1992226Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1992271Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1992326Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1992394Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1992494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1992541Z graph_break [] 2025-12-04T11:45:26.1992602Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1992749Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1992794Z Traceback (most recent call last): 2025-12-04T11:45:26.1992948Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1992988Z method(*args, **kwargs) 2025-12-04T11:45:26.1993150Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1993191Z method(*args, **kwargs) 2025-12-04T11:45:26.1993380Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1993418Z with policy(): 2025-12-04T11:45:26.1993572Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1993614Z raise RuntimeError(msg) 2025-12-04T11:45:26.1994000Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1128267776. 2025-12-04T11:45:26.1994003Z 2025-12-04T11:45:26.1994076Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1994335Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1994351Z 2025-12-04T11:45:26.1994439Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1994514Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1994557Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1994612Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1994691Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1994788Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1994826Z graph_break [] 2025-12-04T11:45:26.1994887Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1994962Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1995004Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1995059Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1995154Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1995219Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1995255Z graph_break [] 2025-12-04T11:45:26.1995315Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1995367Z =================================== FAILURES =================================== 2025-12-04T11:45:26.1995516Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.1995562Z Traceback (most recent call last): 2025-12-04T11:45:26.1995719Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1995759Z method(*args, **kwargs) 2025-12-04T11:45:26.1995911Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.1995950Z method(*args, **kwargs) 2025-12-04T11:45:26.1996104Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.1996153Z with policy(): 2025-12-04T11:45:26.1996308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.1996349Z raise RuntimeError(msg) 2025-12-04T11:45:26.1996759Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1996761Z 2025-12-04T11:45:26.1996838Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1997096Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1997099Z 2025-12-04T11:45:26.1997187Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1997260Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1997303Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1997358Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1997423Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1997521Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1997559Z graph_break [] 2025-12-04T11:45:26.1997618Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1997707Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1997747Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1997805Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1997901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1997964Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1998001Z graph_break [] 2025-12-04T11:45:26.1998077Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1998150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.1998191Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.1998246Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.1998342Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.1998408Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.1998445Z graph_break [] 2025-12-04T11:45:26.1998504Z aten_mm_info [('aten._scaled_mm.default_257_2048_16', 1)] 2025-12-04T11:45:26.1998695Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-525a5b82e2f91e29.xml - 2025-12-04T11:45:26.1998757Z =========================== short test summary info ============================ 2025-12-04T11:45:26.1999349Z FAILED [0.2268s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1128267776 and is now 1142947840. 2025-12-04T11:45:26.1999351Z 2025-12-04T11:45:26.1999425Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.1999685Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.1999698Z 2025-12-04T11:45:26.1999786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.1999847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.1999916Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.1999952Z Got exit code 1 2025-12-04T11:45:26.2000162Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2000299Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2000446Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6535897e955f223f.xml 2025-12-04T11:45:26.2000505Z ============================= test session starts ============================== 2025-12-04T11:45:26.2000615Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2000657Z cachedir: .pytest_cache 2025-12-04T11:45:26.2000819Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2000864Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2000905Z configfile: pytest.ini 2025-12-04T11:45:26.2001066Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2001144Z collecting ... collected 188 items / 140 deselected / 48 selected 2025-12-04T11:45:26.2001198Z stepcurrent: skipping 140 already run items. 2025-12-04T11:45:26.2001258Z Running 48 items in this shard 2025-12-04T11:45:26.2001260Z 2025-12-04T11:45:26.2001479Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0337s] [ 2%] 2025-12-04T11:45:26.2001694Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7121s] [ 2%] 2025-12-04T11:45:26.2001896Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5780s] [ 2%] 2025-12-04T11:45:26.2001899Z 2025-12-04T11:45:26.2001951Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2002096Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2002142Z Traceback (most recent call last): 2025-12-04T11:45:26.2002300Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2002340Z method(*args, **kwargs) 2025-12-04T11:45:26.2002492Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2002532Z method(*args, **kwargs) 2025-12-04T11:45:26.2002684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2002720Z with policy(): 2025-12-04T11:45:26.2002874Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2002915Z raise RuntimeError(msg) 2025-12-04T11:45:26.2003344Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.2003347Z 2025-12-04T11:45:26.2003422Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2003696Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2003699Z 2025-12-04T11:45:26.2003787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2003859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2003902Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2003970Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2004455Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2004555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2004592Z graph_break [] 2025-12-04T11:45:26.2004654Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2004730Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2005219Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2005283Z current_size = base.storage().size() 2025-12-04T11:45:26.2005325Z Autotune Choices Stats: 2025-12-04T11:45:26.2005710Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.2005758Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2005797Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2005899Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2006135Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2006367Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2006595Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2006824Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2007051Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2007275Z triton_mm_0 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2007589Z triton_mm_6 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2007812Z triton_mm_7 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2007868Z _scaled_mm 0.0213 ms 28.3% 2025-12-04T11:45:26.2007999Z SingleProcess AUTOTUNE benchmarking takes 0.0390 seconds and 0.1682 seconds precompiling for 9 choices 2025-12-04T11:45:26.2008144Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2008189Z Traceback (most recent call last): 2025-12-04T11:45:26.2008347Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2008388Z method(*args, **kwargs) 2025-12-04T11:45:26.2008541Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2008581Z method(*args, **kwargs) 2025-12-04T11:45:26.2008731Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2008767Z with policy(): 2025-12-04T11:45:26.2008923Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2008977Z raise RuntimeError(msg) 2025-12-04T11:45:26.2009367Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.2009370Z 2025-12-04T11:45:26.2009445Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2009717Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2009719Z 2025-12-04T11:45:26.2009809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2009883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2009929Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2009985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2010466Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2010568Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2010606Z graph_break [] 2025-12-04T11:45:26.2010667Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2010742Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2011231Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2011291Z current_size = base.storage().size() 2025-12-04T11:45:26.2011332Z Autotune Choices Stats: 2025-12-04T11:45:26.2011697Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.2011744Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2011794Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2011896Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2012132Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2012363Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2012587Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2012815Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2013065Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2013316Z triton_mm_0 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2013553Z triton_mm_6 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2013777Z triton_mm_7 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2013821Z _scaled_mm 0.0213 ms 28.3% 2025-12-04T11:45:26.2013950Z SingleProcess AUTOTUNE benchmarking takes 0.0390 seconds and 0.1682 seconds precompiling for 9 choices 2025-12-04T11:45:26.2014026Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2014069Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2014126Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2014225Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2014706Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2014744Z graph_break [] 2025-12-04T11:45:26.2014805Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2014879Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2014919Z Autotune Choices Stats: 2025-12-04T11:45:26.2015298Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2015342Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2015382Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2015493Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2015725Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2015956Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2016189Z triton_mm_12 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2016417Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2016642Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2016880Z triton_mm_14 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2017117Z triton_mm_13 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2017342Z triton_mm_15 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2017383Z _scaled_mm 0.0216 ms 28.0% 2025-12-04T11:45:26.2017512Z SingleProcess AUTOTUNE benchmarking takes 0.0339 seconds and 0.0933 seconds precompiling for 9 choices 2025-12-04T11:45:26.2017566Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2017711Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2017757Z Traceback (most recent call last): 2025-12-04T11:45:26.2017915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2017957Z method(*args, **kwargs) 2025-12-04T11:45:26.2018110Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2018150Z method(*args, **kwargs) 2025-12-04T11:45:26.2018302Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2018340Z with policy(): 2025-12-04T11:45:26.2018493Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2018534Z raise RuntimeError(msg) 2025-12-04T11:45:26.2018922Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2018936Z 2025-12-04T11:45:26.2019011Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2019282Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2019284Z 2025-12-04T11:45:26.2019373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2019447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2019490Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2019547Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2020030Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2020129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2020165Z graph_break [] 2025-12-04T11:45:26.2020227Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2020301Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2020801Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2020849Z current_size = base.storage().size() 2025-12-04T11:45:26.2020903Z Autotune Choices Stats: 2025-12-04T11:45:26.2021269Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.2021314Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2021355Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2021454Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2021685Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2021917Z triton_mm_4 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2022144Z triton_mm_5 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2022370Z triton_mm_1 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2022597Z triton_mm_2 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2022832Z triton_mm_0 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2023068Z triton_mm_6 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2023317Z triton_mm_7 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2023360Z _scaled_mm 0.0213 ms 28.3% 2025-12-04T11:45:26.2023489Z SingleProcess AUTOTUNE benchmarking takes 0.0390 seconds and 0.1682 seconds precompiling for 9 choices 2025-12-04T11:45:26.2023563Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2023607Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2023664Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2023764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2024241Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2024294Z graph_break [] 2025-12-04T11:45:26.2024354Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2024431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2024470Z Autotune Choices Stats: 2025-12-04T11:45:26.2024846Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2024892Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2024934Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2025032Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2025265Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2025498Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2025725Z triton_mm_12 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2025950Z triton_mm_9 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2026175Z triton_mm_11 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2026417Z triton_mm_14 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2026641Z triton_mm_13 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2026884Z triton_mm_15 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2026926Z _scaled_mm 0.0216 ms 28.0% 2025-12-04T11:45:26.2027053Z SingleProcess AUTOTUNE benchmarking takes 0.0339 seconds and 0.0933 seconds precompiling for 9 choices 2025-12-04T11:45:26.2027129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2027170Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2027227Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2027328Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2027811Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2027858Z graph_break [] 2025-12-04T11:45:26.2027919Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2027993Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2028035Z Autotune Choices Stats: 2025-12-04T11:45:26.2028405Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2028451Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2028491Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2028591Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2028826Z triton_mm_17 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2029054Z triton_mm_19 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2029286Z triton_mm_20 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2029512Z triton_mm_21 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2029739Z triton_mm_16 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2029963Z triton_mm_18 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2030198Z triton_mm_23 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2030431Z triton_mm_22 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2030473Z _scaled_mm 0.0173 ms 34.7% 2025-12-04T11:45:26.2030601Z SingleProcess AUTOTUNE benchmarking takes 0.0479 seconds and 0.1847 seconds precompiling for 9 choices 2025-12-04T11:45:26.2030791Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-6535897e955f223f.xml - 2025-12-04T11:45:26.2030854Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2031443Z FAILED [0.5780s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2031445Z 2025-12-04T11:45:26.2031520Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2031791Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2031795Z 2025-12-04T11:45:26.2031882Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2031945Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2032022Z ================== 1 failed, 140 deselected, 2 rerun in 3.34s ================== 2025-12-04T11:45:26.2032060Z Got exit code 1 2025-12-04T11:45:26.2032100Z Retrying single test... 2025-12-04T11:45:26.2032246Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0071a6eb7046d58.xml 2025-12-04T11:45:26.2032305Z ============================= test session starts ============================== 2025-12-04T11:45:26.2032421Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2032462Z cachedir: .pytest_cache 2025-12-04T11:45:26.2032621Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2032667Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2032708Z configfile: pytest.ini 2025-12-04T11:45:26.2032870Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2032948Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2033205Z stepcurrent: skipping 140 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2033278Z Running 1 items in this shard 2025-12-04T11:45:26.2033280Z 2025-12-04T11:45:26.2033496Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0159s] [100%] 2025-12-04T11:45:26.2033713Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.6965s] [100%] 2025-12-04T11:45:26.2033918Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5816s] [100%] 2025-12-04T11:45:26.2033921Z 2025-12-04T11:45:26.2033972Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2034115Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2034174Z Traceback (most recent call last): 2025-12-04T11:45:26.2034335Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2034377Z method(*args, **kwargs) 2025-12-04T11:45:26.2034531Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2034571Z method(*args, **kwargs) 2025-12-04T11:45:26.2034723Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2034759Z with policy(): 2025-12-04T11:45:26.2034914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2034955Z raise RuntimeError(msg) 2025-12-04T11:45:26.2035348Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.2035363Z 2025-12-04T11:45:26.2035437Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2035699Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2035702Z 2025-12-04T11:45:26.2035789Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2035878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2035920Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2035978Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2036456Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2036558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2036597Z graph_break [] 2025-12-04T11:45:26.2036656Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2036729Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2037216Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2037268Z current_size = base.storage().size() 2025-12-04T11:45:26.2037310Z Autotune Choices Stats: 2025-12-04T11:45:26.2037678Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2037736Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2037776Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2037877Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2038127Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2038354Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2038582Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2038810Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2039036Z triton_mm_4 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2039259Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2039495Z triton_mm_2 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2039732Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2039775Z _scaled_mm 0.0191 ms 31.0% 2025-12-04T11:45:26.2039902Z SingleProcess AUTOTUNE benchmarking takes 0.0419 seconds and 0.1673 seconds precompiling for 9 choices 2025-12-04T11:45:26.2040048Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2040094Z Traceback (most recent call last): 2025-12-04T11:45:26.2040252Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2040291Z method(*args, **kwargs) 2025-12-04T11:45:26.2040446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2040485Z method(*args, **kwargs) 2025-12-04T11:45:26.2040638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2040674Z with policy(): 2025-12-04T11:45:26.2040829Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2040871Z raise RuntimeError(msg) 2025-12-04T11:45:26.2041261Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.2041265Z 2025-12-04T11:45:26.2041358Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2041621Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2041623Z 2025-12-04T11:45:26.2041710Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2041787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2041842Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2041901Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2042383Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2042483Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2042521Z graph_break [] 2025-12-04T11:45:26.2042583Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2042660Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2043144Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2043203Z current_size = base.storage().size() 2025-12-04T11:45:26.2043243Z Autotune Choices Stats: 2025-12-04T11:45:26.2043645Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2043690Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2043731Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2043831Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2044064Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2044290Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2044516Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2044742Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2044966Z triton_mm_4 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2045188Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2045428Z triton_mm_2 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2045650Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2045706Z _scaled_mm 0.0191 ms 31.0% 2025-12-04T11:45:26.2045835Z SingleProcess AUTOTUNE benchmarking takes 0.0419 seconds and 0.1673 seconds precompiling for 9 choices 2025-12-04T11:45:26.2045911Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2045953Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2046011Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2046111Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2046587Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2046624Z graph_break [] 2025-12-04T11:45:26.2046686Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2046772Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2046814Z Autotune Choices Stats: 2025-12-04T11:45:26.2047174Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2047222Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2047271Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2047372Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2047605Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2047834Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2048062Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2048290Z triton_mm_8 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2048518Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2048742Z triton_mm_14 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2048976Z triton_mm_11 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2049200Z triton_mm_13 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2049241Z _scaled_mm 0.0198 ms 31.2% 2025-12-04T11:45:26.2052586Z SingleProcess AUTOTUNE benchmarking takes 0.0372 seconds and 0.0926 seconds precompiling for 9 choices 2025-12-04T11:45:26.2052646Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2052793Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2052838Z Traceback (most recent call last): 2025-12-04T11:45:26.2052997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2053037Z method(*args, **kwargs) 2025-12-04T11:45:26.2053193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2053232Z method(*args, **kwargs) 2025-12-04T11:45:26.2053408Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2053444Z with policy(): 2025-12-04T11:45:26.2053600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2053658Z raise RuntimeError(msg) 2025-12-04T11:45:26.2054047Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2054051Z 2025-12-04T11:45:26.2054125Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2054399Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2054402Z 2025-12-04T11:45:26.2054491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2054564Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2054609Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2054665Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2055143Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2055242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2055279Z graph_break [] 2025-12-04T11:45:26.2055339Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2055415Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2055902Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2055964Z current_size = base.storage().size() 2025-12-04T11:45:26.2056005Z Autotune Choices Stats: 2025-12-04T11:45:26.2056369Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2056415Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2056467Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2056568Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2056799Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2057026Z triton_mm_5 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2057253Z triton_mm_0 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2057483Z triton_mm_1 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2057720Z triton_mm_4 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2057945Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2058184Z triton_mm_2 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2058409Z triton_mm_7 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2058452Z _scaled_mm 0.0191 ms 31.0% 2025-12-04T11:45:26.2058579Z SingleProcess AUTOTUNE benchmarking takes 0.0419 seconds and 0.1673 seconds precompiling for 9 choices 2025-12-04T11:45:26.2058655Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2058699Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2058756Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2058856Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2059335Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2059372Z graph_break [] 2025-12-04T11:45:26.2059434Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2059508Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2059559Z Autotune Choices Stats: 2025-12-04T11:45:26.2059921Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2059966Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2060006Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2060113Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2060345Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2060573Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2060800Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2061027Z triton_mm_8 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2061255Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2061490Z triton_mm_14 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2061726Z triton_mm_11 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2061953Z triton_mm_13 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2061993Z _scaled_mm 0.0198 ms 31.2% 2025-12-04T11:45:26.2062123Z SingleProcess AUTOTUNE benchmarking takes 0.0372 seconds and 0.0926 seconds precompiling for 9 choices 2025-12-04T11:45:26.2062196Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2062238Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2062295Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2062396Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2062876Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2062914Z graph_break [] 2025-12-04T11:45:26.2062974Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2063049Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2063089Z Autotune Choices Stats: 2025-12-04T11:45:26.2063487Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2063548Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2063588Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2063686Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2063933Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2064164Z triton_mm_20 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2064391Z triton_mm_18 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2064615Z triton_mm_19 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2064840Z triton_mm_23 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2065079Z triton_mm_17 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2065303Z triton_mm_21 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2065539Z triton_mm_22 0.0065 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2065581Z _scaled_mm 0.0212 ms 27.9% 2025-12-04T11:45:26.2065709Z SingleProcess AUTOTUNE benchmarking takes 0.0528 seconds and 0.1907 seconds precompiling for 9 choices 2025-12-04T11:45:26.2065902Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0071a6eb7046d58.xml - 2025-12-04T11:45:26.2065962Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2066553Z FAILED [0.5816s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2066555Z 2025-12-04T11:45:26.2066631Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2066891Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2066894Z 2025-12-04T11:45:26.2066983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2067045Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2067123Z ================== 1 failed, 187 deselected, 2 rerun in 3.31s ================== 2025-12-04T11:45:26.2067160Z Got exit code 1 2025-12-04T11:45:26.2067200Z Retrying single test... 2025-12-04T11:45:26.2067347Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03edd9eeca2e5a01.xml 2025-12-04T11:45:26.2067406Z ============================= test session starts ============================== 2025-12-04T11:45:26.2067526Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2067568Z cachedir: .pytest_cache 2025-12-04T11:45:26.2067729Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2067777Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2067817Z configfile: pytest.ini 2025-12-04T11:45:26.2067981Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2068056Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2068312Z stepcurrent: skipping 140 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2068354Z Running 1 items in this shard 2025-12-04T11:45:26.2068356Z 2025-12-04T11:45:26.2068572Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0057s] [100%] 2025-12-04T11:45:26.2068797Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7012s] [100%] 2025-12-04T11:45:26.2068987Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5925s] [100%] 2025-12-04T11:45:26.2068990Z 2025-12-04T11:45:26.2069043Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2069196Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2069242Z Traceback (most recent call last): 2025-12-04T11:45:26.2069402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2069445Z method(*args, **kwargs) 2025-12-04T11:45:26.2069599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2069640Z method(*args, **kwargs) 2025-12-04T11:45:26.2069791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2069829Z with policy(): 2025-12-04T11:45:26.2069981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2070023Z raise RuntimeError(msg) 2025-12-04T11:45:26.2070417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.2070421Z 2025-12-04T11:45:26.2070495Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2070758Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2070769Z 2025-12-04T11:45:26.2070856Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2070930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2070973Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2071030Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2071527Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2071628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2071665Z graph_break [] 2025-12-04T11:45:26.2071726Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2071800Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2072284Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2072332Z current_size = base.storage().size() 2025-12-04T11:45:26.2072374Z Autotune Choices Stats: 2025-12-04T11:45:26.2072747Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2072809Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2072850Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2072950Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2073196Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2073452Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2073680Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2073903Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2074127Z triton_mm_5 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2074355Z triton_mm_2 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2074580Z triton_mm_3 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2074817Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2074859Z _scaled_mm 0.0064 ms 93.1% 2025-12-04T11:45:26.2074987Z SingleProcess AUTOTUNE benchmarking takes 0.0420 seconds and 0.1584 seconds precompiling for 9 choices 2025-12-04T11:45:26.2075131Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2075190Z Traceback (most recent call last): 2025-12-04T11:45:26.2075348Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2075390Z method(*args, **kwargs) 2025-12-04T11:45:26.2075543Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2075585Z method(*args, **kwargs) 2025-12-04T11:45:26.2075736Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2075774Z with policy(): 2025-12-04T11:45:26.2075927Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2075969Z raise RuntimeError(msg) 2025-12-04T11:45:26.2076356Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.2076374Z 2025-12-04T11:45:26.2076448Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2076708Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2076711Z 2025-12-04T11:45:26.2076798Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2076884Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2076927Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2076984Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2077462Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2077563Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2077600Z graph_break [] 2025-12-04T11:45:26.2077663Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2077737Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2078225Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2078274Z current_size = base.storage().size() 2025-12-04T11:45:26.2078314Z Autotune Choices Stats: 2025-12-04T11:45:26.2078678Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2078733Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2078775Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2078873Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2079119Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2079345Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2079570Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2079795Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2080020Z triton_mm_5 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2080255Z triton_mm_2 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2080477Z triton_mm_3 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2080709Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2080750Z _scaled_mm 0.0064 ms 93.1% 2025-12-04T11:45:26.2080880Z SingleProcess AUTOTUNE benchmarking takes 0.0420 seconds and 0.1584 seconds precompiling for 9 choices 2025-12-04T11:45:26.2080954Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2080997Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2081053Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2081154Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2081631Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2081667Z graph_break [] 2025-12-04T11:45:26.2081729Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2081802Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2081843Z Autotune Choices Stats: 2025-12-04T11:45:26.2082203Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2082257Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2082296Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2082395Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2082626Z triton_mm_8 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2082866Z triton_mm_15 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2083094Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2083346Z triton_mm_11 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2083570Z triton_mm_13 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2083797Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2084036Z triton_mm_10 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2084272Z triton_mm_14 0.0065 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2084314Z _scaled_mm 0.0216 ms 28.5% 2025-12-04T11:45:26.2084441Z SingleProcess AUTOTUNE benchmarking takes 0.0374 seconds and 0.1011 seconds precompiling for 9 choices 2025-12-04T11:45:26.2084496Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2084639Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2084686Z Traceback (most recent call last): 2025-12-04T11:45:26.2084844Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2084887Z method(*args, **kwargs) 2025-12-04T11:45:26.2085039Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2085079Z method(*args, **kwargs) 2025-12-04T11:45:26.2085232Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2085269Z with policy(): 2025-12-04T11:45:26.2085424Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2085465Z raise RuntimeError(msg) 2025-12-04T11:45:26.2085858Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2085874Z 2025-12-04T11:45:26.2085949Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2086211Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2086214Z 2025-12-04T11:45:26.2086301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2086374Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2086434Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2086493Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2086975Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2087074Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2087112Z graph_break [] 2025-12-04T11:45:26.2087172Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2087247Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2087735Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2087797Z current_size = base.storage().size() 2025-12-04T11:45:26.2087839Z Autotune Choices Stats: 2025-12-04T11:45:26.2088214Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2088260Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2088300Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2088399Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2088635Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2088862Z triton_mm_0 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2089092Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2089323Z triton_mm_7 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2089548Z triton_mm_5 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2089773Z triton_mm_2 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2090005Z triton_mm_3 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2090236Z triton_mm_6 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2090278Z _scaled_mm 0.0064 ms 93.1% 2025-12-04T11:45:26.2090405Z SingleProcess AUTOTUNE benchmarking takes 0.0420 seconds and 0.1584 seconds precompiling for 9 choices 2025-12-04T11:45:26.2090481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2090524Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2090582Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2090682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2091163Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2091200Z graph_break [] 2025-12-04T11:45:26.2091261Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2091345Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2091387Z Autotune Choices Stats: 2025-12-04T11:45:26.2091748Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2091806Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2091848Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2091950Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2092181Z triton_mm_8 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2092411Z triton_mm_15 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2092638Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2092861Z triton_mm_11 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2093089Z triton_mm_13 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2093343Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2093587Z triton_mm_10 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2093811Z triton_mm_14 0.0065 ms 94.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2093852Z _scaled_mm 0.0216 ms 28.5% 2025-12-04T11:45:26.2093994Z SingleProcess AUTOTUNE benchmarking takes 0.0374 seconds and 0.1011 seconds precompiling for 9 choices 2025-12-04T11:45:26.2094069Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2094112Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2094169Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2094268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2094745Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2094783Z graph_break [] 2025-12-04T11:45:26.2094844Z aten_mm_info [('aten._scaled_mm.default_257_16_32', 1)] 2025-12-04T11:45:26.2094918Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2094970Z Autotune Choices Stats: 2025-12-04T11:45:26.2095331Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_20", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2095376Z AUTOTUNE scaled_mm(257x32, 32x16, , ) 2025-12-04T11:45:26.2095417Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2095527Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2095765Z triton_mm_20 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2095994Z triton_mm_18 0.0061 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2096224Z triton_mm_19 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2096454Z triton_mm_16 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2096682Z triton_mm_17 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=256, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2096909Z triton_mm_21 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2097134Z triton_mm_23 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2097370Z triton_mm_22 0.0065 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2097412Z _scaled_mm 0.0070 ms 85.2% 2025-12-04T11:45:26.2097538Z SingleProcess AUTOTUNE benchmarking takes 0.0532 seconds and 0.1898 seconds precompiling for 9 choices 2025-12-04T11:45:26.2097740Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-03edd9eeca2e5a01.xml - 2025-12-04T11:45:26.2097802Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2098385Z FAILED [0.5925s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.2098388Z 2025-12-04T11:45:26.2098462Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2098723Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2098734Z 2025-12-04T11:45:26.2098823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2098888Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2098958Z ================== 1 failed, 187 deselected, 2 rerun in 3.32s ================== 2025-12-04T11:45:26.2098996Z Got exit code 1 2025-12-04T11:45:26.2099207Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2099342Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2099493Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f86ff1aebd5bdfe2.xml 2025-12-04T11:45:26.2099555Z ============================= test session starts ============================== 2025-12-04T11:45:26.2099669Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2099710Z cachedir: .pytest_cache 2025-12-04T11:45:26.2099874Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2099919Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2099961Z configfile: pytest.ini 2025-12-04T11:45:26.2100120Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2100200Z collecting ... collected 188 items / 141 deselected / 47 selected 2025-12-04T11:45:26.2100255Z stepcurrent: skipping 141 already run items. 2025-12-04T11:45:26.2100299Z Running 47 items in this shard 2025-12-04T11:45:26.2100301Z 2025-12-04T11:45:26.2100523Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3423s] [ 2%] 2025-12-04T11:45:26.2100742Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8828s] [ 2%] 2025-12-04T11:45:26.2100934Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7176s] [ 2%] 2025-12-04T11:45:26.2100952Z 2025-12-04T11:45:26.2101004Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2101151Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2101199Z Traceback (most recent call last): 2025-12-04T11:45:26.2101358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2101409Z method(*args, **kwargs) 2025-12-04T11:45:26.2101564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2101605Z method(*args, **kwargs) 2025-12-04T11:45:26.2101758Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2101796Z with policy(): 2025-12-04T11:45:26.2101951Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2101991Z raise RuntimeError(msg) 2025-12-04T11:45:26.2102384Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.2102387Z 2025-12-04T11:45:26.2102460Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2102736Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2102739Z 2025-12-04T11:45:26.2102827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2102901Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2102943Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2103011Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2103529Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2103630Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2103668Z graph_break [] 2025-12-04T11:45:26.2103732Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2103808Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2104294Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2104343Z current_size = base.storage().size() 2025-12-04T11:45:26.2104384Z Autotune Choices Stats: 2025-12-04T11:45:26.2104757Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2104818Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2104861Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2104961Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2105197Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2105436Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2105666Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2105891Z triton_mm_13 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2106116Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2106346Z triton_mm_18 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2106588Z triton_mm_5 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2106815Z triton_mm_11 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2107050Z triton_mm_14 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2107277Z triton_mm_7 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2107410Z SingleProcess AUTOTUNE benchmarking takes 0.0743 seconds and 0.3681 seconds precompiling for 21 choices 2025-12-04T11:45:26.2107558Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2107605Z Traceback (most recent call last): 2025-12-04T11:45:26.2107763Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2107805Z method(*args, **kwargs) 2025-12-04T11:45:26.2107957Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2107999Z method(*args, **kwargs) 2025-12-04T11:45:26.2108151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2108190Z with policy(): 2025-12-04T11:45:26.2108343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2108386Z raise RuntimeError(msg) 2025-12-04T11:45:26.2108784Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.2108797Z 2025-12-04T11:45:26.2108871Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2109145Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2109147Z 2025-12-04T11:45:26.2109236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2109311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2109354Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2109414Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2109903Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2110003Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2110040Z graph_break [] 2025-12-04T11:45:26.2110106Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2110180Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2110676Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2110727Z current_size = base.storage().size() 2025-12-04T11:45:26.2110767Z Autotune Choices Stats: 2025-12-04T11:45:26.2111147Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2111194Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2111237Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2111336Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2111568Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2111797Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2112024Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2112248Z triton_mm_13 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2112475Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2112712Z triton_mm_18 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2112950Z triton_mm_5 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2113179Z triton_mm_11 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2113436Z triton_mm_14 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2113666Z triton_mm_7 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2113796Z SingleProcess AUTOTUNE benchmarking takes 0.0743 seconds and 0.3681 seconds precompiling for 21 choices 2025-12-04T11:45:26.2113871Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2113915Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2113985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2114084Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2114586Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2114625Z graph_break [] 2025-12-04T11:45:26.2114689Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2114764Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2114806Z Autotune Choices Stats: 2025-12-04T11:45:26.2115174Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2115221Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2115263Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2115361Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2115597Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2115825Z triton_mm_30 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2116056Z triton_mm_25 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2116296Z triton_mm_26 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2116525Z triton_mm_27 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2116762Z triton_mm_29 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2116991Z triton_mm_31 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2117217Z triton_mm_33 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2117443Z triton_mm_34 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2117672Z triton_mm_35 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2117815Z SingleProcess AUTOTUNE benchmarking takes 0.1137 seconds and 0.2893 seconds precompiling for 21 choices 2025-12-04T11:45:26.2117869Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2118019Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2118065Z Traceback (most recent call last): 2025-12-04T11:45:26.2118235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2118275Z method(*args, **kwargs) 2025-12-04T11:45:26.2118432Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2118474Z method(*args, **kwargs) 2025-12-04T11:45:26.2118627Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2118666Z with policy(): 2025-12-04T11:45:26.2118823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2118864Z raise RuntimeError(msg) 2025-12-04T11:45:26.2119269Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2119272Z 2025-12-04T11:45:26.2119348Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2119611Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2119615Z 2025-12-04T11:45:26.2119705Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2119777Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2119820Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2119887Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2120377Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2120486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2120525Z graph_break [] 2025-12-04T11:45:26.2120587Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2120663Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2121149Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2121200Z current_size = base.storage().size() 2025-12-04T11:45:26.2121242Z Autotune Choices Stats: 2025-12-04T11:45:26.2121613Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2121673Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2121713Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2121814Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2122049Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2122285Z triton_mm_9 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2122512Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2122738Z triton_mm_13 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2122965Z triton_mm_15 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2123188Z triton_mm_18 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2123431Z triton_mm_5 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2123657Z triton_mm_11 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2123902Z triton_mm_14 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2124128Z triton_mm_7 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2124271Z SingleProcess AUTOTUNE benchmarking takes 0.0743 seconds and 0.3681 seconds precompiling for 21 choices 2025-12-04T11:45:26.2124346Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2124389Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2124449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2124549Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2125039Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2125078Z graph_break [] 2025-12-04T11:45:26.2125142Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2125215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2125260Z Autotune Choices Stats: 2025-12-04T11:45:26.2125639Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2125688Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2125727Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2125837Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2126070Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2126297Z triton_mm_30 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2126529Z triton_mm_25 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2126758Z triton_mm_26 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2126986Z triton_mm_27 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2127210Z triton_mm_29 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2127440Z triton_mm_31 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2127677Z triton_mm_33 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2127910Z triton_mm_34 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2128137Z triton_mm_35 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2128267Z SingleProcess AUTOTUNE benchmarking takes 0.1137 seconds and 0.2893 seconds precompiling for 21 choices 2025-12-04T11:45:26.2128347Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2128388Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2128449Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2128550Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2129034Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2129092Z graph_break [] 2025-12-04T11:45:26.2129155Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2129228Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2129271Z Autotune Choices Stats: 2025-12-04T11:45:26.2129648Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_48", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2129695Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2129737Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2129837Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2130070Z triton_mm_48 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2130295Z triton_mm_51 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2130523Z triton_mm_52 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2130750Z triton_mm_55 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2130975Z triton_mm_56 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2131201Z triton_mm_45 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2131440Z triton_mm_46 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2131684Z triton_mm_47 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2131909Z triton_mm_53 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2132134Z triton_mm_58 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2132264Z SingleProcess AUTOTUNE benchmarking takes 0.1200 seconds and 0.2250 seconds precompiling for 21 choices 2025-12-04T11:45:26.2132460Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f86ff1aebd5bdfe2.xml - 2025-12-04T11:45:26.2132522Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2133116Z FAILED [0.7176s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2133129Z 2025-12-04T11:45:26.2133204Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2133514Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2133516Z 2025-12-04T11:45:26.2133607Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2133671Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2133741Z ================== 1 failed, 141 deselected, 2 rerun in 3.96s ================== 2025-12-04T11:45:26.2133779Z Got exit code 1 2025-12-04T11:45:26.2133820Z Retrying single test... 2025-12-04T11:45:26.2133966Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3dc26e954a006b4a.xml 2025-12-04T11:45:26.2134027Z ============================= test session starts ============================== 2025-12-04T11:45:26.2134139Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2134183Z cachedir: .pytest_cache 2025-12-04T11:45:26.2134342Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2134387Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2134427Z configfile: pytest.ini 2025-12-04T11:45:26.2134589Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2134665Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2134923Z stepcurrent: skipping 141 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2134981Z Running 1 items in this shard 2025-12-04T11:45:26.2134983Z 2025-12-04T11:45:26.2135203Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3621s] [100%] 2025-12-04T11:45:26.2135420Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8829s] [100%] 2025-12-04T11:45:26.2135625Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7402s] [100%] 2025-12-04T11:45:26.2135628Z 2025-12-04T11:45:26.2135681Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2135828Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2135877Z Traceback (most recent call last): 2025-12-04T11:45:26.2136037Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2136079Z method(*args, **kwargs) 2025-12-04T11:45:26.2136235Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2136276Z method(*args, **kwargs) 2025-12-04T11:45:26.2136428Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2136467Z with policy(): 2025-12-04T11:45:26.2136634Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2136677Z raise RuntimeError(msg) 2025-12-04T11:45:26.2137070Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.2137074Z 2025-12-04T11:45:26.2137157Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2137423Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2137425Z 2025-12-04T11:45:26.2137512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2137588Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2137632Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2137690Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2138183Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2138283Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2138321Z graph_break [] 2025-12-04T11:45:26.2138388Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2138462Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2138952Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2139014Z current_size = base.storage().size() 2025-12-04T11:45:26.2139056Z Autotune Choices Stats: 2025-12-04T11:45:26.2139437Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.2139483Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2139526Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2139626Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2139863Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2140098Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2140332Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2140559Z triton_mm_10 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2140795Z triton_mm_13 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2141033Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2141257Z triton_mm_9 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2141481Z triton_mm_18 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2141706Z triton_mm_14 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2141934Z triton_mm_11 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2142065Z SingleProcess AUTOTUNE benchmarking takes 0.0824 seconds and 0.3582 seconds precompiling for 21 choices 2025-12-04T11:45:26.2142213Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2142259Z Traceback (most recent call last): 2025-12-04T11:45:26.2142417Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2142459Z method(*args, **kwargs) 2025-12-04T11:45:26.2142612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2142670Z method(*args, **kwargs) 2025-12-04T11:45:26.2142821Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2142859Z with policy(): 2025-12-04T11:45:26.2143013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2143057Z raise RuntimeError(msg) 2025-12-04T11:45:26.2143511Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.2143515Z 2025-12-04T11:45:26.2143591Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2143855Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2143859Z 2025-12-04T11:45:26.2143948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2144023Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2144065Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2144124Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2144614Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2144731Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2144768Z graph_break [] 2025-12-04T11:45:26.2144843Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2144917Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2145407Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2145455Z current_size = base.storage().size() 2025-12-04T11:45:26.2145497Z Autotune Choices Stats: 2025-12-04T11:45:26.2145864Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.2145913Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2145954Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2146052Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2146289Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2146519Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2146765Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2146993Z triton_mm_10 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2147227Z triton_mm_13 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2147452Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2147677Z triton_mm_9 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2147901Z triton_mm_18 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2149493Z triton_mm_14 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2149733Z triton_mm_11 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2149862Z SingleProcess AUTOTUNE benchmarking takes 0.0824 seconds and 0.3582 seconds precompiling for 21 choices 2025-12-04T11:45:26.2149939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2149994Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2150054Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2151152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2151645Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2151685Z graph_break [] 2025-12-04T11:45:26.2151749Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2151823Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2151865Z Autotune Choices Stats: 2025-12-04T11:45:26.2152240Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.2152289Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2152330Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2152430Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2152661Z triton_mm_35 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2152901Z triton_mm_32 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2153126Z triton_mm_38 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2153388Z triton_mm_27 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2153616Z triton_mm_34 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2153845Z triton_mm_31 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2154071Z triton_mm_30 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2154296Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2154535Z triton_mm_28 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2154771Z triton_mm_29 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2154901Z SingleProcess AUTOTUNE benchmarking takes 0.1182 seconds and 0.2898 seconds precompiling for 21 choices 2025-12-04T11:45:26.2154998Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2155146Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2155194Z Traceback (most recent call last): 2025-12-04T11:45:26.2155354Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2155397Z method(*args, **kwargs) 2025-12-04T11:45:26.2155550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2155590Z method(*args, **kwargs) 2025-12-04T11:45:26.2155741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2155779Z with policy(): 2025-12-04T11:45:26.2155933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2155974Z raise RuntimeError(msg) 2025-12-04T11:45:26.2156369Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2156389Z 2025-12-04T11:45:26.2156463Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2156726Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2156728Z 2025-12-04T11:45:26.2156815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2156890Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2156933Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2156990Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2157479Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2157580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2157617Z graph_break [] 2025-12-04T11:45:26.2157681Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2157754Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2158242Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2158301Z current_size = base.storage().size() 2025-12-04T11:45:26.2158343Z Autotune Choices Stats: 2025-12-04T11:45:26.2158720Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.2158765Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2158807Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2158931Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2159166Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2159396Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2159627Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2159857Z triton_mm_10 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2160084Z triton_mm_13 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2160311Z triton_mm_6 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2160543Z triton_mm_9 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2160770Z triton_mm_18 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2160994Z triton_mm_14 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2161218Z triton_mm_11 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2161347Z SingleProcess AUTOTUNE benchmarking takes 0.0824 seconds and 0.3582 seconds precompiling for 21 choices 2025-12-04T11:45:26.2161421Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2161464Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2161520Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2161620Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2162109Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2162157Z graph_break [] 2025-12-04T11:45:26.2162218Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2162300Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2162341Z Autotune Choices Stats: 2025-12-04T11:45:26.2162713Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006240000016987324, "best_triton_pos": 0} 2025-12-04T11:45:26.2162761Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2162804Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2162901Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2163135Z triton_mm_35 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2163393Z triton_mm_32 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2163617Z triton_mm_38 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2163846Z triton_mm_27 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2164086Z triton_mm_34 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2164315Z triton_mm_31 0.0064 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2164539Z triton_mm_30 0.0064 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2164762Z triton_mm_33 0.0064 ms 96.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2164987Z triton_mm_28 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2165209Z triton_mm_29 0.0065 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2165340Z SingleProcess AUTOTUNE benchmarking takes 0.1182 seconds and 0.2898 seconds precompiling for 21 choices 2025-12-04T11:45:26.2165413Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2165468Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2165525Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2165625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2166125Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2166163Z graph_break [] 2025-12-04T11:45:26.2166226Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2166312Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2166354Z Autotune Choices Stats: 2025-12-04T11:45:26.2166711Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_58", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2166758Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2166798Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2166897Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2167128Z triton_mm_58 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2167353Z triton_mm_49 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2167576Z triton_mm_50 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2167816Z triton_mm_55 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2168045Z triton_mm_46 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2168268Z triton_mm_52 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2168495Z triton_mm_53 0.0063 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2168719Z triton_mm_48 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2168942Z triton_mm_54 0.0064 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2169167Z triton_mm_56 0.0065 ms 93.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2169305Z SingleProcess AUTOTUNE benchmarking takes 0.1373 seconds and 0.2221 seconds precompiling for 21 choices 2025-12-04T11:45:26.2169499Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3dc26e954a006b4a.xml - 2025-12-04T11:45:26.2169560Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2170179Z FAILED [0.7402s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2170183Z 2025-12-04T11:45:26.2170256Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2170519Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2170522Z 2025-12-04T11:45:26.2170610Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2170674Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2170741Z ================== 1 failed, 187 deselected, 2 rerun in 4.00s ================== 2025-12-04T11:45:26.2170780Z Got exit code 1 2025-12-04T11:45:26.2170821Z Retrying single test... 2025-12-04T11:45:26.2170965Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ab02943352906c83.xml 2025-12-04T11:45:26.2171025Z ============================= test session starts ============================== 2025-12-04T11:45:26.2171138Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2171180Z cachedir: .pytest_cache 2025-12-04T11:45:26.2171338Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2171397Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2171438Z configfile: pytest.ini 2025-12-04T11:45:26.2171602Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2171676Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2171938Z stepcurrent: skipping 141 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2171986Z Running 1 items in this shard 2025-12-04T11:45:26.2171988Z 2025-12-04T11:45:26.2172210Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.3504s] [100%] 2025-12-04T11:45:26.2172425Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8883s] [100%] 2025-12-04T11:45:26.2172621Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.8550s] [100%] 2025-12-04T11:45:26.2172623Z 2025-12-04T11:45:26.2172676Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2172826Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2172873Z Traceback (most recent call last): 2025-12-04T11:45:26.2173046Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2173088Z method(*args, **kwargs) 2025-12-04T11:45:26.2173240Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2173307Z method(*args, **kwargs) 2025-12-04T11:45:26.2173458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2173518Z with policy(): 2025-12-04T11:45:26.2173671Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2173713Z raise RuntimeError(msg) 2025-12-04T11:45:26.2174117Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1077936128. 2025-12-04T11:45:26.2174120Z 2025-12-04T11:45:26.2174195Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2174457Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2174459Z 2025-12-04T11:45:26.2174548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2174621Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2174664Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2174722Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2175208Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2175321Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2175357Z graph_break [] 2025-12-04T11:45:26.2175420Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2175494Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2175982Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2176029Z current_size = base.storage().size() 2025-12-04T11:45:26.2176072Z Autotune Choices Stats: 2025-12-04T11:45:26.2176439Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2176488Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2176529Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2176631Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2176869Z triton_mm_5 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2177110Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2177339Z triton_mm_16 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2177573Z triton_mm_10 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2177809Z triton_mm_6 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2178033Z triton_mm_13 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2178258Z triton_mm_7 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2178480Z triton_mm_14 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2178705Z triton_mm_18 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2178929Z triton_mm_8 0.0064 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2179070Z SingleProcess AUTOTUNE benchmarking takes 0.0828 seconds and 0.3690 seconds precompiling for 21 choices 2025-12-04T11:45:26.2179220Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2179265Z Traceback (most recent call last): 2025-12-04T11:45:26.2179422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2179463Z method(*args, **kwargs) 2025-12-04T11:45:26.2179615Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2179657Z method(*args, **kwargs) 2025-12-04T11:45:26.2179809Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2179846Z with policy(): 2025-12-04T11:45:26.2180003Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2180043Z raise RuntimeError(msg) 2025-12-04T11:45:26.2180440Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1077936128 and is now 1136656384. 2025-12-04T11:45:26.2180443Z 2025-12-04T11:45:26.2180517Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2180776Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2180789Z 2025-12-04T11:45:26.2180878Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2180953Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2180995Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2181051Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2181560Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2181660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2181697Z graph_break [] 2025-12-04T11:45:26.2181759Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2181836Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2182324Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2182372Z current_size = base.storage().size() 2025-12-04T11:45:26.2182414Z Autotune Choices Stats: 2025-12-04T11:45:26.2182780Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2182827Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2182876Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2182976Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2183213Z triton_mm_5 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2183479Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2183707Z triton_mm_16 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2183932Z triton_mm_10 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2184158Z triton_mm_6 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2184383Z triton_mm_13 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2184609Z triton_mm_7 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2184846Z triton_mm_14 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2185085Z triton_mm_18 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2185323Z triton_mm_8 0.0064 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2185454Z SingleProcess AUTOTUNE benchmarking takes 0.0828 seconds and 0.3690 seconds precompiling for 21 choices 2025-12-04T11:45:26.2185527Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2185569Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2185629Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2185727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2186214Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2186251Z graph_break [] 2025-12-04T11:45:26.2186313Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2186386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2186429Z Autotune Choices Stats: 2025-12-04T11:45:26.2186789Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2186850Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2186893Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2186991Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2187228Z triton_mm_35 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2187461Z triton_mm_26 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2187685Z triton_mm_33 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2187909Z triton_mm_30 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2188136Z triton_mm_34 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2188378Z triton_mm_27 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2188604Z triton_mm_31 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2188839Z triton_mm_32 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2189072Z triton_mm_38 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2189299Z triton_mm_36 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2189428Z SingleProcess AUTOTUNE benchmarking takes 0.1195 seconds and 0.2924 seconds precompiling for 21 choices 2025-12-04T11:45:26.2189481Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2189627Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2189673Z Traceback (most recent call last): 2025-12-04T11:45:26.2189830Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2189871Z method(*args, **kwargs) 2025-12-04T11:45:26.2190023Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2190065Z method(*args, **kwargs) 2025-12-04T11:45:26.2190215Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2190264Z with policy(): 2025-12-04T11:45:26.2190416Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2190458Z raise RuntimeError(msg) 2025-12-04T11:45:26.2190855Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2190858Z 2025-12-04T11:45:26.2190931Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2191194Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2191197Z 2025-12-04T11:45:26.2191283Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2191358Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2191401Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2191458Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2191947Z inductor [('triton_bundler_save_kernel', 168), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2192058Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2192094Z graph_break [] 2025-12-04T11:45:26.2192157Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2192231Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2192725Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2192787Z current_size = base.storage().size() 2025-12-04T11:45:26.2192828Z Autotune Choices Stats: 2025-12-04T11:45:26.2193194Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2193242Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2193321Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2193420Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2193655Z triton_mm_5 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2193882Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2194109Z triton_mm_16 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2194347Z triton_mm_10 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2194572Z triton_mm_6 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2194800Z triton_mm_13 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2195025Z triton_mm_7 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2195249Z triton_mm_14 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2195472Z triton_mm_18 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2195700Z triton_mm_8 0.0064 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2195841Z SingleProcess AUTOTUNE benchmarking takes 0.0828 seconds and 0.3690 seconds precompiling for 21 choices 2025-12-04T11:45:26.2195913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2195957Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2196014Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2196114Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2196622Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2196661Z graph_break [] 2025-12-04T11:45:26.2196722Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2196796Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2196836Z Autotune Choices Stats: 2025-12-04T11:45:26.2197200Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_35", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2197247Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2197290Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2197389Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2197623Z triton_mm_35 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2197852Z triton_mm_26 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2198087Z triton_mm_33 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2198311Z triton_mm_30 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2198535Z triton_mm_34 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2198761Z triton_mm_27 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2198987Z triton_mm_31 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2199212Z triton_mm_32 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2199436Z triton_mm_38 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2199673Z triton_mm_36 0.0066 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2199802Z SingleProcess AUTOTUNE benchmarking takes 0.1195 seconds and 0.2924 seconds precompiling for 21 choices 2025-12-04T11:45:26.2199883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2199926Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2199982Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2200094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2200577Z inductor [('triton_bundler_save_kernel', 168), ('async_compile_cache_miss', 22), ('benchmarking.InductorBenchmarker.benchmark_gpu', 21), ('generated_module_cache_miss', 20), ('select_algorithm_num_precompiles', 20), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2200618Z graph_break [] 2025-12-04T11:45:26.2200680Z aten_mm_info [('aten._scaled_mm.default_257_2048_32', 1)] 2025-12-04T11:45:26.2200755Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2200795Z Autotune Choices Stats: 2025-12-04T11:45:26.2201154Z {"num_choices": 21, "num_triton_choices": 20, "best_kernel": "triton_mm_48", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2201203Z AUTOTUNE scaled_mm(257x32, 32x2048, , ) 2025-12-04T11:45:26.2201243Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2201341Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2201569Z triton_mm_48 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2201815Z triton_mm_50 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2202043Z triton_mm_51 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2202269Z triton_mm_54 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2202494Z triton_mm_55 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2202719Z triton_mm_46 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2202946Z triton_mm_52 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2203179Z triton_mm_58 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2203441Z triton_mm_45 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2203689Z triton_mm_47 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=128, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2203836Z SingleProcess AUTOTUNE benchmarking takes 0.1443 seconds and 0.2243 seconds precompiling for 21 choices 2025-12-04T11:45:26.2204025Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ab02943352906c83.xml - 2025-12-04T11:45:26.2204087Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2204681Z FAILED [0.8550s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1136656384 and is now 1195376640. 2025-12-04T11:45:26.2204685Z 2025-12-04T11:45:26.2204757Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2205021Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2205023Z 2025-12-04T11:45:26.2205110Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2205174Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2205241Z ================== 1 failed, 187 deselected, 2 rerun in 4.11s ================== 2025-12-04T11:45:26.2205292Z Got exit code 1 2025-12-04T11:45:26.2205501Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2205629Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2205775Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-acd5c69b6e3f5dbb.xml 2025-12-04T11:45:26.2205832Z ============================= test session starts ============================== 2025-12-04T11:45:26.2205944Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2205986Z cachedir: .pytest_cache 2025-12-04T11:45:26.2206145Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2206190Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2206230Z configfile: pytest.ini 2025-12-04T11:45:26.2206390Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2206467Z collecting ... collected 188 items / 142 deselected / 46 selected 2025-12-04T11:45:26.2206521Z stepcurrent: skipping 142 already run items. 2025-12-04T11:45:26.2206566Z Running 46 items in this shard 2025-12-04T11:45:26.2206569Z 2025-12-04T11:45:26.2206791Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1979s] [ 2%] 2025-12-04T11:45:26.2207021Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8020s] [ 2%] 2025-12-04T11:45:26.2207212Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7977s] [ 2%] 2025-12-04T11:45:26.2207216Z 2025-12-04T11:45:26.2207268Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2207423Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2207469Z Traceback (most recent call last): 2025-12-04T11:45:26.2207638Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2207680Z method(*args, **kwargs) 2025-12-04T11:45:26.2207833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2207874Z method(*args, **kwargs) 2025-12-04T11:45:26.2208025Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2208063Z with policy(): 2025-12-04T11:45:26.2208216Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2208259Z raise RuntimeError(msg) 2025-12-04T11:45:26.2208649Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1038090240. 2025-12-04T11:45:26.2208652Z 2025-12-04T11:45:26.2208725Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2208986Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2208998Z 2025-12-04T11:45:26.2209085Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2209159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2209201Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2209258Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2209739Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2209839Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2209875Z graph_break [] 2025-12-04T11:45:26.2209939Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2210012Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2210503Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2210553Z current_size = base.storage().size() 2025-12-04T11:45:26.2210593Z Autotune Choices Stats: 2025-12-04T11:45:26.2210960Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2211020Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2211063Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2211162Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2211406Z triton_mm_6 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2211644Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2211874Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2212103Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2212328Z triton_mm_3 0.0067 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2212554Z triton_mm_4 0.0069 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2212778Z triton_mm_7 0.0073 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2213011Z triton_mm_5 0.0076 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2213231Z triton_mm_1 0.0078 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2213484Z triton_mm_0 0.0113 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2213614Z SingleProcess AUTOTUNE benchmarking takes 0.0487 seconds and 0.2051 seconds precompiling for 11 choices 2025-12-04T11:45:26.2213760Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2213807Z Traceback (most recent call last): 2025-12-04T11:45:26.2213961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2214003Z method(*args, **kwargs) 2025-12-04T11:45:26.2214155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2214195Z method(*args, **kwargs) 2025-12-04T11:45:26.2214345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2214384Z with policy(): 2025-12-04T11:45:26.2214555Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2214596Z raise RuntimeError(msg) 2025-12-04T11:45:26.2214991Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1075838976. 2025-12-04T11:45:26.2215008Z 2025-12-04T11:45:26.2215085Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2215371Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2215375Z 2025-12-04T11:45:26.2215462Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2215561Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2215603Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2215660Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2216146Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2216247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2216283Z graph_break [] 2025-12-04T11:45:26.2216346Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2216418Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2216908Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2216972Z current_size = base.storage().size() 2025-12-04T11:45:26.2217013Z Autotune Choices Stats: 2025-12-04T11:45:26.2217376Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2217423Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2217467Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2217566Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2217798Z triton_mm_6 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2218025Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2218256Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2218482Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2218716Z triton_mm_3 0.0067 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2218954Z triton_mm_4 0.0069 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2219186Z triton_mm_7 0.0073 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2219408Z triton_mm_5 0.0076 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2219629Z triton_mm_1 0.0078 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2219852Z triton_mm_0 0.0113 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2219981Z SingleProcess AUTOTUNE benchmarking takes 0.0487 seconds and 0.2051 seconds precompiling for 11 choices 2025-12-04T11:45:26.2220055Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2220097Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2220155Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2220254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2220738Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('async_compile_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2220786Z graph_break [] 2025-12-04T11:45:26.2220847Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2220923Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2220963Z Autotune Choices Stats: 2025-12-04T11:45:26.2221325Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2221373Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2221415Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2221512Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2221746Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2221974Z triton_mm_12 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2222211Z triton_mm_19 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2222434Z triton_mm_18 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2222664Z triton_mm_13 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2222898Z triton_mm_14 0.0070 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2223124Z triton_mm_17 0.0073 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2223384Z triton_mm_15 0.0076 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2223607Z triton_mm_11 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2223830Z triton_mm_10 0.0113 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2223959Z SingleProcess AUTOTUNE benchmarking takes 0.0647 seconds and 0.2050 seconds precompiling for 11 choices 2025-12-04T11:45:26.2224012Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2224174Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2224221Z Traceback (most recent call last): 2025-12-04T11:45:26.2224379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2224420Z method(*args, **kwargs) 2025-12-04T11:45:26.2224573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2224613Z method(*args, **kwargs) 2025-12-04T11:45:26.2224768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2224806Z with policy(): 2025-12-04T11:45:26.2224958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2225000Z raise RuntimeError(msg) 2025-12-04T11:45:26.2225393Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2225395Z 2025-12-04T11:45:26.2225469Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2225731Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2225746Z 2025-12-04T11:45:26.2225834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2225906Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2225951Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2226009Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2226503Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2226615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2226654Z graph_break [] 2025-12-04T11:45:26.2226717Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2226791Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2227279Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2227328Z current_size = base.storage().size() 2025-12-04T11:45:26.2227368Z Autotune Choices Stats: 2025-12-04T11:45:26.2227733Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2227781Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2227822Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2227922Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2228162Z triton_mm_6 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2228388Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2228620Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2228848Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2229071Z triton_mm_3 0.0067 ms 88.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2229299Z triton_mm_4 0.0069 ms 86.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2229523Z triton_mm_7 0.0073 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2229755Z triton_mm_5 0.0076 ms 78.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2229976Z triton_mm_1 0.0078 ms 76.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2230215Z triton_mm_0 0.0113 ms 52.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2230355Z SingleProcess AUTOTUNE benchmarking takes 0.0487 seconds and 0.2051 seconds precompiling for 11 choices 2025-12-04T11:45:26.2230429Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2230473Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2230529Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2230628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2231113Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('async_compile_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2231150Z graph_break [] 2025-12-04T11:45:26.2231212Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2231286Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2231327Z Autotune Choices Stats: 2025-12-04T11:45:26.2231692Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2231752Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2231793Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2231891Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2232123Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2232351Z triton_mm_12 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2232580Z triton_mm_19 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2232806Z triton_mm_18 0.0063 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2233030Z triton_mm_13 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2233291Z triton_mm_14 0.0070 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2233531Z triton_mm_17 0.0073 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2233766Z triton_mm_15 0.0076 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2234003Z triton_mm_11 0.0079 ms 76.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2234226Z triton_mm_10 0.0113 ms 53.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2234357Z SingleProcess AUTOTUNE benchmarking takes 0.0647 seconds and 0.2050 seconds precompiling for 11 choices 2025-12-04T11:45:26.2234431Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2234472Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2234529Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2234629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2235110Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2235147Z graph_break [] 2025-12-04T11:45:26.2235209Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2235281Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2235335Z Autotune Choices Stats: 2025-12-04T11:45:26.2235699Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_29", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2235746Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2235790Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2235889Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2236123Z triton_mm_29 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2236350Z triton_mm_22 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2236575Z triton_mm_26 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2236798Z triton_mm_28 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2237031Z triton_mm_24 0.0067 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2237252Z triton_mm_23 0.0069 ms 87.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2237487Z triton_mm_27 0.0074 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2237720Z triton_mm_25 0.0075 ms 80.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2237944Z triton_mm_21 0.0077 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2238167Z triton_mm_20 0.0109 ms 54.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2238296Z SingleProcess AUTOTUNE benchmarking takes 0.0679 seconds and 0.2178 seconds precompiling for 11 choices 2025-12-04T11:45:26.2238488Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-acd5c69b6e3f5dbb.xml - 2025-12-04T11:45:26.2238548Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2239136Z FAILED [0.7977s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2239150Z 2025-12-04T11:45:26.2239224Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2239485Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2239488Z 2025-12-04T11:45:26.2239576Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2239638Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2239707Z ================== 1 failed, 142 deselected, 2 rerun in 3.82s ================== 2025-12-04T11:45:26.2239745Z Got exit code 1 2025-12-04T11:45:26.2239787Z Retrying single test... 2025-12-04T11:45:26.2239931Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c209f0833b5dd3ff.xml 2025-12-04T11:45:26.2239989Z ============================= test session starts ============================== 2025-12-04T11:45:26.2240099Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2240141Z cachedir: .pytest_cache 2025-12-04T11:45:26.2240299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2240346Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2240387Z configfile: pytest.ini 2025-12-04T11:45:26.2240547Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2240634Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2240888Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2240932Z Running 1 items in this shard 2025-12-04T11:45:26.2240934Z 2025-12-04T11:45:26.2241166Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0957s] [100%] 2025-12-04T11:45:26.2241392Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7641s] [100%] 2025-12-04T11:45:26.2241583Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6471s] [100%] 2025-12-04T11:45:26.2241586Z 2025-12-04T11:45:26.2241638Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2241783Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2241830Z Traceback (most recent call last): 2025-12-04T11:45:26.2241989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2242031Z method(*args, **kwargs) 2025-12-04T11:45:26.2242185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2242226Z method(*args, **kwargs) 2025-12-04T11:45:26.2242376Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2242414Z with policy(): 2025-12-04T11:45:26.2242568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2242610Z raise RuntimeError(msg) 2025-12-04T11:45:26.2242996Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1038090240. 2025-12-04T11:45:26.2243011Z 2025-12-04T11:45:26.2243085Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2243374Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2243376Z 2025-12-04T11:45:26.2243463Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2243537Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2243580Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2243638Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2244122Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2244223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2244260Z graph_break [] 2025-12-04T11:45:26.2244323Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2244396Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2244915Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2244963Z current_size = base.storage().size() 2025-12-04T11:45:26.2245004Z Autotune Choices Stats: 2025-12-04T11:45:26.2245401Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.2245446Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2245491Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2245589Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2245822Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2246049Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2246280Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2246507Z triton_mm_9 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2246731Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2246966Z triton_mm_4 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2247187Z triton_mm_7 0.0072 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2247410Z triton_mm_5 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2247631Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2247853Z triton_mm_0 0.0113 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2247981Z SingleProcess AUTOTUNE benchmarking takes 0.0418 seconds and 0.2081 seconds precompiling for 11 choices 2025-12-04T11:45:26.2248129Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2248177Z Traceback (most recent call last): 2025-12-04T11:45:26.2248345Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2248386Z method(*args, **kwargs) 2025-12-04T11:45:26.2248538Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2248579Z method(*args, **kwargs) 2025-12-04T11:45:26.2248729Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2248778Z with policy(): 2025-12-04T11:45:26.2248932Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2248973Z raise RuntimeError(msg) 2025-12-04T11:45:26.2249374Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1075838976. 2025-12-04T11:45:26.2249378Z 2025-12-04T11:45:26.2249453Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2249712Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2249714Z 2025-12-04T11:45:26.2249802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2249874Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2249922Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2249980Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2250467Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2250578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2250614Z graph_break [] 2025-12-04T11:45:26.2250676Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2250751Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2251239Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2251286Z current_size = base.storage().size() 2025-12-04T11:45:26.2251327Z Autotune Choices Stats: 2025-12-04T11:45:26.2251689Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.2251740Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2251781Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2251881Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2252110Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2252346Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2252574Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2252808Z triton_mm_9 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2253045Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2253288Z triton_mm_4 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2253513Z triton_mm_7 0.0072 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2253735Z triton_mm_5 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2253959Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2254180Z triton_mm_0 0.0113 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2254324Z SingleProcess AUTOTUNE benchmarking takes 0.0418 seconds and 0.2081 seconds precompiling for 11 choices 2025-12-04T11:45:26.2254399Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2254442Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2254504Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2254603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2255082Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2255120Z graph_break [] 2025-12-04T11:45:26.2255186Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2255258Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2255299Z Autotune Choices Stats: 2025-12-04T11:45:26.2255661Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2255708Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2255764Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2255860Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2256089Z triton_mm_18 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2256329Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2256570Z triton_mm_19 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2256797Z triton_mm_16 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2256842Z _scaled_mm 0.0066 ms 93.3% 2025-12-04T11:45:26.2257062Z triton_mm_13 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2257291Z triton_mm_14 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2257516Z triton_mm_17 0.0073 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2257741Z triton_mm_15 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2257967Z triton_mm_11 0.0078 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2258115Z SingleProcess AUTOTUNE benchmarking takes 0.0481 seconds and 0.1001 seconds precompiling for 11 choices 2025-12-04T11:45:26.2258168Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2258315Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2258364Z Traceback (most recent call last): 2025-12-04T11:45:26.2258519Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2258563Z method(*args, **kwargs) 2025-12-04T11:45:26.2258714Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2258756Z method(*args, **kwargs) 2025-12-04T11:45:26.2258906Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2258945Z with policy(): 2025-12-04T11:45:26.2259099Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2259139Z raise RuntimeError(msg) 2025-12-04T11:45:26.2259536Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2259549Z 2025-12-04T11:45:26.2259622Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2259882Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2259885Z 2025-12-04T11:45:26.2259971Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2260056Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2260100Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2260157Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2260645Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2260745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2260785Z graph_break [] 2025-12-04T11:45:26.2260846Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2260923Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2261411Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2261459Z current_size = base.storage().size() 2025-12-04T11:45:26.2261499Z Autotune Choices Stats: 2025-12-04T11:45:26.2261861Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060789999552071095, "best_triton_pos": 0} 2025-12-04T11:45:26.2261916Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2261958Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2262058Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2262289Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2262515Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2262743Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2262974Z triton_mm_9 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2263199Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2263452Z triton_mm_4 0.0069 ms 88.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2263687Z triton_mm_7 0.0072 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2263922Z triton_mm_5 0.0076 ms 80.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2264157Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2264378Z triton_mm_0 0.0113 ms 53.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2264507Z SingleProcess AUTOTUNE benchmarking takes 0.0418 seconds and 0.2081 seconds precompiling for 11 choices 2025-12-04T11:45:26.2264580Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2264624Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2264680Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2264781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2265268Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2265306Z graph_break [] 2025-12-04T11:45:26.2265366Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2265453Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2265493Z Autotune Choices Stats: 2025-12-04T11:45:26.2265852Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2265900Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2265940Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2266037Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2266273Z triton_mm_18 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2266504Z triton_mm_12 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2266731Z triton_mm_19 0.0063 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2266956Z triton_mm_16 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2267008Z _scaled_mm 0.0066 ms 93.3% 2025-12-04T11:45:26.2267233Z triton_mm_13 0.0067 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2267467Z triton_mm_14 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2267698Z triton_mm_17 0.0073 ms 84.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2267922Z triton_mm_15 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2268146Z triton_mm_11 0.0078 ms 79.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2268276Z SingleProcess AUTOTUNE benchmarking takes 0.0481 seconds and 0.1001 seconds precompiling for 11 choices 2025-12-04T11:45:26.2268349Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2268392Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2268448Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2268548Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2269031Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2269079Z graph_break [] 2025-12-04T11:45:26.2269140Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2269214Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2269254Z Autotune Choices Stats: 2025-12-04T11:45:26.2269617Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_26", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:26.2269662Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2269704Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2269802Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2270032Z triton_mm_26 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2270264Z triton_mm_29 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2270492Z triton_mm_22 0.0064 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2270715Z triton_mm_28 0.0064 ms 98.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2270767Z _scaled_mm 0.0065 ms 96.9% 2025-12-04T11:45:26.2270993Z triton_mm_23 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2271229Z triton_mm_24 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2271467Z triton_mm_27 0.0073 ms 86.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2271691Z triton_mm_21 0.0078 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2271915Z triton_mm_25 0.0078 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2272043Z SingleProcess AUTOTUNE benchmarking takes 0.0681 seconds and 0.2131 seconds precompiling for 11 choices 2025-12-04T11:45:26.2272233Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c209f0833b5dd3ff.xml - 2025-12-04T11:45:26.2272294Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2272885Z FAILED [0.6471s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2272899Z 2025-12-04T11:45:26.2272972Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2273236Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2273238Z 2025-12-04T11:45:26.2273345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2273408Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2273476Z ================== 1 failed, 187 deselected, 2 rerun in 3.53s ================== 2025-12-04T11:45:26.2273514Z Got exit code 1 2025-12-04T11:45:26.2273554Z Retrying single test... 2025-12-04T11:45:26.2273699Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3e59037fddc0add6.xml 2025-12-04T11:45:26.2273757Z ============================= test session starts ============================== 2025-12-04T11:45:26.2273868Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2273910Z cachedir: .pytest_cache 2025-12-04T11:45:26.2274068Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2274114Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2274155Z configfile: pytest.ini 2025-12-04T11:45:26.2274316Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2274405Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2274662Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2274708Z Running 1 items in this shard 2025-12-04T11:45:26.2274710Z 2025-12-04T11:45:26.2274942Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0952s] [100%] 2025-12-04T11:45:26.2275170Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7801s] [100%] 2025-12-04T11:45:26.2275363Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6578s] [100%] 2025-12-04T11:45:26.2275366Z 2025-12-04T11:45:26.2275417Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2275563Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2275609Z Traceback (most recent call last): 2025-12-04T11:45:26.2275767Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2275809Z method(*args, **kwargs) 2025-12-04T11:45:26.2275962Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2276002Z method(*args, **kwargs) 2025-12-04T11:45:26.2276153Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2276190Z with policy(): 2025-12-04T11:45:26.2276343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2276386Z raise RuntimeError(msg) 2025-12-04T11:45:26.2276776Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1038090240. 2025-12-04T11:45:26.2276793Z 2025-12-04T11:45:26.2276868Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2277128Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2277131Z 2025-12-04T11:45:26.2277217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2277291Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2277334Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2277390Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2277878Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2277976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2278015Z graph_break [] 2025-12-04T11:45:26.2278076Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2278149Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2278645Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2278694Z current_size = base.storage().size() 2025-12-04T11:45:26.2278744Z Autotune Choices Stats: 2025-12-04T11:45:26.2279123Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2279170Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2279212Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2279311Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2279542Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2279777Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2280008Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2280230Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2280453Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2280685Z triton_mm_4 0.0068 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2280908Z triton_mm_7 0.0073 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2281129Z triton_mm_5 0.0074 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2281352Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2281574Z triton_mm_0 0.0113 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2281703Z SingleProcess AUTOTUNE benchmarking takes 0.0463 seconds and 0.2116 seconds precompiling for 11 choices 2025-12-04T11:45:26.2281850Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2281905Z Traceback (most recent call last): 2025-12-04T11:45:26.2282061Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2282102Z method(*args, **kwargs) 2025-12-04T11:45:26.2282254Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2282295Z method(*args, **kwargs) 2025-12-04T11:45:26.2282455Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2282492Z with policy(): 2025-12-04T11:45:26.2282645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2282697Z raise RuntimeError(msg) 2025-12-04T11:45:26.2283090Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1075838976. 2025-12-04T11:45:26.2283094Z 2025-12-04T11:45:26.2283167Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2283462Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2283464Z 2025-12-04T11:45:26.2283553Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2283626Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2283670Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2283729Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2284211Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2284324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2284362Z graph_break [] 2025-12-04T11:45:26.2284424Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2284498Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2284986Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2285035Z current_size = base.storage().size() 2025-12-04T11:45:26.2285076Z Autotune Choices Stats: 2025-12-04T11:45:26.2285439Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2285486Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2285528Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2285627Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2285857Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2286101Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2286352Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2286587Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2286809Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2287031Z triton_mm_4 0.0068 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2287254Z triton_mm_7 0.0073 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2287475Z triton_mm_5 0.0074 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2287697Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2287918Z triton_mm_0 0.0113 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2288058Z SingleProcess AUTOTUNE benchmarking takes 0.0463 seconds and 0.2116 seconds precompiling for 11 choices 2025-12-04T11:45:26.2288134Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2288178Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2288234Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2288337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2288818Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2288857Z graph_break [] 2025-12-04T11:45:26.2288919Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2288991Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2289032Z Autotune Choices Stats: 2025-12-04T11:45:26.2289392Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2289439Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2289491Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2289589Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2289817Z triton_mm_18 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2290054Z triton_mm_12 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2290291Z triton_mm_16 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2290520Z triton_mm_19 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2290744Z triton_mm_13 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2290788Z _scaled_mm 0.0068 ms 91.2% 2025-12-04T11:45:26.2291012Z triton_mm_14 0.0069 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2291234Z triton_mm_17 0.0073 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2291459Z triton_mm_15 0.0076 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2291681Z triton_mm_11 0.0078 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2291821Z SingleProcess AUTOTUNE benchmarking takes 0.0452 seconds and 0.1104 seconds precompiling for 11 choices 2025-12-04T11:45:26.2291874Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2292020Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2292066Z Traceback (most recent call last): 2025-12-04T11:45:26.2292224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2292266Z method(*args, **kwargs) 2025-12-04T11:45:26.2292419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2292462Z method(*args, **kwargs) 2025-12-04T11:45:26.2292611Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2292648Z with policy(): 2025-12-04T11:45:26.2292799Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2292841Z raise RuntimeError(msg) 2025-12-04T11:45:26.2293230Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2293243Z 2025-12-04T11:45:26.2293345Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2293604Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2293607Z 2025-12-04T11:45:26.2293695Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2293782Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2293827Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2293886Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2294379Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2294479Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2294518Z graph_break [] 2025-12-04T11:45:26.2294581Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2294654Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2295139Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2295186Z current_size = base.storage().size() 2025-12-04T11:45:26.2295228Z Autotune Choices Stats: 2025-12-04T11:45:26.2295592Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2295655Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2295697Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2295796Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2296026Z triton_mm_8 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2296259Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2296488Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2296716Z triton_mm_6 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2296940Z triton_mm_3 0.0066 ms 92.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2297171Z triton_mm_4 0.0068 ms 88.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2297392Z triton_mm_7 0.0073 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2297622Z triton_mm_5 0.0074 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2297853Z triton_mm_1 0.0076 ms 79.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2298075Z triton_mm_0 0.0113 ms 53.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2298203Z SingleProcess AUTOTUNE benchmarking takes 0.0463 seconds and 0.2116 seconds precompiling for 11 choices 2025-12-04T11:45:26.2298279Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2298321Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2298379Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2298478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2298961Z inductor [('triton_bundler_save_kernel', 88), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2298998Z graph_break [] 2025-12-04T11:45:26.2299062Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2299148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2299189Z Autotune Choices Stats: 2025-12-04T11:45:26.2299550Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2299598Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2299640Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2299739Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2299972Z triton_mm_18 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2300202Z triton_mm_12 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2300427Z triton_mm_16 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2300657Z triton_mm_19 0.0063 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2300899Z triton_mm_13 0.0067 ms 92.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2300943Z _scaled_mm 0.0068 ms 91.2% 2025-12-04T11:45:26.2301176Z triton_mm_14 0.0069 ms 89.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2301412Z triton_mm_17 0.0073 ms 85.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2301635Z triton_mm_15 0.0076 ms 81.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2301859Z triton_mm_11 0.0078 ms 79.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2301989Z SingleProcess AUTOTUNE benchmarking takes 0.0452 seconds and 0.1104 seconds precompiling for 11 choices 2025-12-04T11:45:26.2302065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2302108Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2302167Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2302265Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2302751Z inductor [('triton_bundler_save_kernel', 88), ('async_compile_cache_miss', 12), ('benchmarking.InductorBenchmarker.benchmark_gpu', 11), ('generated_module_cache_miss', 10), ('select_algorithm_num_precompiles', 10), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2302799Z graph_break [] 2025-12-04T11:45:26.2302859Z aten_mm_info [('aten._scaled_mm.default_33_16_1024', 1)] 2025-12-04T11:45:26.2302933Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2302974Z Autotune Choices Stats: 2025-12-04T11:45:26.2303375Z {"num_choices": 11, "num_triton_choices": 10, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2303422Z AUTOTUNE scaled_mm(33x1024, 1024x16, , ) 2025-12-04T11:45:26.2303464Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2303561Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2303794Z triton_mm_22 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2304024Z triton_mm_29 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2304251Z triton_mm_28 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2304488Z triton_mm_26 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2304715Z triton_mm_23 0.0065 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2304957Z triton_mm_24 0.0073 ms 81.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2305193Z triton_mm_25 0.0077 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2305418Z triton_mm_27 0.0077 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2305640Z triton_mm_21 0.0078 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2305865Z triton_mm_20 0.0117 ms 51.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2305998Z SingleProcess AUTOTUNE benchmarking takes 0.0649 seconds and 0.2113 seconds precompiling for 11 choices 2025-12-04T11:45:26.2306188Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3e59037fddc0add6.xml - 2025-12-04T11:45:26.2306251Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2306844Z FAILED [0.6578s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1075838976 and is now 1113587712. 2025-12-04T11:45:26.2306858Z 2025-12-04T11:45:26.2306934Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2307195Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2307199Z 2025-12-04T11:45:26.2307286Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2307349Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2307416Z ================== 1 failed, 187 deselected, 2 rerun in 3.55s ================== 2025-12-04T11:45:26.2307457Z Got exit code 1 2025-12-04T11:45:26.2307663Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2307792Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2307935Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b5c479c03dd5277e.xml 2025-12-04T11:45:26.2307995Z ============================= test session starts ============================== 2025-12-04T11:45:26.2308105Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2308158Z cachedir: .pytest_cache 2025-12-04T11:45:26.2308315Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2308362Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2308402Z configfile: pytest.ini 2025-12-04T11:45:26.2308564Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2308651Z collecting ... collected 188 items / 143 deselected / 45 selected 2025-12-04T11:45:26.2308705Z stepcurrent: skipping 143 already run items. 2025-12-04T11:45:26.2308750Z Running 45 items in this shard 2025-12-04T11:45:26.2308752Z 2025-12-04T11:45:26.2308986Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.9354s] [ 2%] 2025-12-04T11:45:26.2309206Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.1424s] [ 2%] 2025-12-04T11:45:26.2309400Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.9502s] [ 2%] 2025-12-04T11:45:26.2309403Z 2025-12-04T11:45:26.2309453Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2309604Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2309653Z Traceback (most recent call last): 2025-12-04T11:45:26.2309812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2309854Z method(*args, **kwargs) 2025-12-04T11:45:26.2310006Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2310048Z method(*args, **kwargs) 2025-12-04T11:45:26.2310198Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2310247Z with policy(): 2025-12-04T11:45:26.2310399Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2310441Z raise RuntimeError(msg) 2025-12-04T11:45:26.2310834Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1088421888. 2025-12-04T11:45:26.2310836Z 2025-12-04T11:45:26.2310911Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2311177Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2311180Z 2025-12-04T11:45:26.2311270Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2311344Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2311388Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2311445Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2311934Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2312042Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2312079Z graph_break [] 2025-12-04T11:45:26.2312145Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2312220Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2312718Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2312780Z current_size = base.storage().size() 2025-12-04T11:45:26.2312821Z Autotune Choices Stats: 2025-12-04T11:45:26.2313189Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2313240Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2313299Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2313399Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2313641Z triton_mm_31 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2313875Z triton_mm_32 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2314105Z triton_mm_19 0.0068 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2314347Z triton_mm_27 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2314575Z triton_mm_8 0.0078 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2314800Z triton_mm_20 0.0081 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2315023Z triton_mm_15 0.0082 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2315246Z triton_mm_13 0.0083 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2315470Z triton_mm_14 0.0087 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2315697Z triton_mm_28 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2315837Z SingleProcess AUTOTUNE benchmarking takes 0.1298 seconds and 0.6175 seconds precompiling for 35 choices 2025-12-04T11:45:26.2315986Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2316032Z Traceback (most recent call last): 2025-12-04T11:45:26.2316189Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2316241Z method(*args, **kwargs) 2025-12-04T11:45:26.2316396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2316435Z method(*args, **kwargs) 2025-12-04T11:45:26.2316600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2316638Z with policy(): 2025-12-04T11:45:26.2316793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2316833Z raise RuntimeError(msg) 2025-12-04T11:45:26.2317225Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1088421888 and is now 1176502272. 2025-12-04T11:45:26.2317229Z 2025-12-04T11:45:26.2317303Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2317570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2317572Z 2025-12-04T11:45:26.2317659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2317734Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2317778Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2317836Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2318335Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2318433Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2318471Z graph_break [] 2025-12-04T11:45:26.2318534Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2318608Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2319095Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2319143Z current_size = base.storage().size() 2025-12-04T11:45:26.2319185Z Autotune Choices Stats: 2025-12-04T11:45:26.2319553Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2319603Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2319657Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2319757Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2319994Z triton_mm_31 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2320235Z triton_mm_32 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2320470Z triton_mm_19 0.0068 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2320694Z triton_mm_27 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2320921Z triton_mm_8 0.0078 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2321145Z triton_mm_20 0.0081 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2321370Z triton_mm_15 0.0082 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2321592Z triton_mm_13 0.0083 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2321816Z triton_mm_14 0.0087 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2322055Z triton_mm_28 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2322185Z SingleProcess AUTOTUNE benchmarking takes 0.1298 seconds and 0.6175 seconds precompiling for 35 choices 2025-12-04T11:45:26.2322259Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2322301Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2322358Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2322457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2322943Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2322981Z graph_break [] 2025-12-04T11:45:26.2323045Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2323118Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2323159Z Autotune Choices Stats: 2025-12-04T11:45:26.2323551Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2323614Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2323656Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2323754Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2324001Z triton_mm_65 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2324244Z triton_mm_66 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2324472Z triton_mm_53 0.0068 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2324697Z triton_mm_61 0.0073 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2324926Z triton_mm_42 0.0075 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2324969Z _scaled_mm 0.0081 ms 74.8% 2025-12-04T11:45:26.2325192Z triton_mm_54 0.0081 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2325415Z triton_mm_47 0.0082 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2325655Z triton_mm_49 0.0084 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2325880Z triton_mm_48 0.0087 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2326010Z SingleProcess AUTOTUNE benchmarking takes 0.2065 seconds and 0.3321 seconds precompiling for 35 choices 2025-12-04T11:45:26.2326066Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2326215Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2326262Z Traceback (most recent call last): 2025-12-04T11:45:26.2326418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2326460Z method(*args, **kwargs) 2025-12-04T11:45:26.2326614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2326655Z method(*args, **kwargs) 2025-12-04T11:45:26.2326806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2326843Z with policy(): 2025-12-04T11:45:26.2326995Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2327048Z raise RuntimeError(msg) 2025-12-04T11:45:26.2327441Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2327444Z 2025-12-04T11:45:26.2327529Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2327794Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2327813Z 2025-12-04T11:45:26.2327902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2327976Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2328020Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2328077Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2328559Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2328660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2328696Z graph_break [] 2025-12-04T11:45:26.2328761Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2328834Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2329319Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2329376Z current_size = base.storage().size() 2025-12-04T11:45:26.2329419Z Autotune Choices Stats: 2025-12-04T11:45:26.2329791Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2329838Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2329880Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2329979Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2330214Z triton_mm_31 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2330449Z triton_mm_32 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2330675Z triton_mm_19 0.0068 ms 89.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2330898Z triton_mm_27 0.0073 ms 83.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2331136Z triton_mm_8 0.0078 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2331370Z triton_mm_20 0.0081 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2331601Z triton_mm_15 0.0082 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2331825Z triton_mm_13 0.0083 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2332048Z triton_mm_14 0.0087 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2332274Z triton_mm_28 0.0088 ms 68.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2332403Z SingleProcess AUTOTUNE benchmarking takes 0.1298 seconds and 0.6175 seconds precompiling for 35 choices 2025-12-04T11:45:26.2332478Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2332521Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2332579Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2332679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2333163Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2333213Z graph_break [] 2025-12-04T11:45:26.2333310Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2333386Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2333427Z Autotune Choices Stats: 2025-12-04T11:45:26.2333792Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2333841Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2333884Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2333982Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2334215Z triton_mm_65 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2334444Z triton_mm_66 0.0067 ms 90.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2334684Z triton_mm_53 0.0068 ms 88.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2334911Z triton_mm_61 0.0073 ms 82.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2335150Z triton_mm_42 0.0075 ms 80.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2335193Z _scaled_mm 0.0081 ms 74.8% 2025-12-04T11:45:26.2335429Z triton_mm_54 0.0081 ms 74.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2335655Z triton_mm_47 0.0082 ms 73.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2335877Z triton_mm_49 0.0084 ms 71.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2336102Z triton_mm_48 0.0087 ms 69.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2336232Z SingleProcess AUTOTUNE benchmarking takes 0.2065 seconds and 0.3321 seconds precompiling for 35 choices 2025-12-04T11:45:26.2336305Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2336349Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2336405Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2336506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2337007Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2337045Z graph_break [] 2025-12-04T11:45:26.2337108Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2337183Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2337224Z Autotune Choices Stats: 2025-12-04T11:45:26.2337590Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2337638Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2337683Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2337781Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2338015Z triton_mm_99 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2338244Z triton_mm_100 0.0065 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2338483Z triton_mm_87 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2338722Z triton_mm_95 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2338960Z triton_mm_76 0.0079 ms 77.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2339184Z triton_mm_81 0.0081 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2339409Z triton_mm_88 0.0082 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2339635Z triton_mm_83 0.0086 ms 70.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2339860Z triton_mm_91 0.0088 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2340083Z triton_mm_96 0.0088 ms 69.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2340215Z SingleProcess AUTOTUNE benchmarking takes 0.1859 seconds and 0.1832 seconds precompiling for 35 choices 2025-12-04T11:45:26.2340417Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b5c479c03dd5277e.xml - 2025-12-04T11:45:26.2340477Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2341080Z FAILED [0.9502s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2341085Z 2025-12-04T11:45:26.2341159Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2341421Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2341424Z 2025-12-04T11:45:26.2341512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2341576Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2341643Z ================== 1 failed, 143 deselected, 2 rerun in 5.05s ================== 2025-12-04T11:45:26.2341682Z Got exit code 1 2025-12-04T11:45:26.2341722Z Retrying single test... 2025-12-04T11:45:26.2341866Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5da540c73b3262e6.xml 2025-12-04T11:45:26.2341934Z ============================= test session starts ============================== 2025-12-04T11:45:26.2342045Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2342085Z cachedir: .pytest_cache 2025-12-04T11:45:26.2342245Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2342290Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2342333Z configfile: pytest.ini 2025-12-04T11:45:26.2342511Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2342587Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2342859Z stepcurrent: skipping 143 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2342904Z Running 1 items in this shard 2025-12-04T11:45:26.2342906Z 2025-12-04T11:45:26.2343126Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.8219s] [100%] 2025-12-04T11:45:26.2343378Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.0574s] [100%] 2025-12-04T11:45:26.2343574Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda FAILED [1.0135s] [100%] 2025-12-04T11:45:26.2343577Z 2025-12-04T11:45:26.2343629Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2343777Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2343825Z Traceback (most recent call last): 2025-12-04T11:45:26.2343985Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2344025Z method(*args, **kwargs) 2025-12-04T11:45:26.2344194Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2344234Z method(*args, **kwargs) 2025-12-04T11:45:26.2344387Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2344424Z with policy(): 2025-12-04T11:45:26.2344577Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2344619Z raise RuntimeError(msg) 2025-12-04T11:45:26.2345014Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1088421888. 2025-12-04T11:45:26.2345018Z 2025-12-04T11:45:26.2345092Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2345359Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2345362Z 2025-12-04T11:45:26.2345449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2345522Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2345566Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2345622Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2346108Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2346220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2346257Z graph_break [] 2025-12-04T11:45:26.2346334Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2346410Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2346919Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2346968Z current_size = base.storage().size() 2025-12-04T11:45:26.2347008Z Autotune Choices Stats: 2025-12-04T11:45:26.2347384Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2347432Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2347474Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2347573Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2347810Z triton_mm_31 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2348042Z triton_mm_32 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2348278Z triton_mm_19 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2348503Z triton_mm_27 0.0074 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2348729Z triton_mm_8 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2348953Z triton_mm_14 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2349179Z triton_mm_20 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2349405Z triton_mm_13 0.0084 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2349627Z triton_mm_15 0.0087 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2349859Z triton_mm_23 0.0090 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2349989Z SingleProcess AUTOTUNE benchmarking takes 0.1351 seconds and 0.6261 seconds precompiling for 35 choices 2025-12-04T11:45:26.2350147Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2350194Z Traceback (most recent call last): 2025-12-04T11:45:26.2350358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2350399Z method(*args, **kwargs) 2025-12-04T11:45:26.2350551Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2350592Z method(*args, **kwargs) 2025-12-04T11:45:26.2350743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2350781Z with policy(): 2025-12-04T11:45:26.2350934Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2350976Z raise RuntimeError(msg) 2025-12-04T11:45:26.2351373Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1088421888 and is now 1176502272. 2025-12-04T11:45:26.2351376Z 2025-12-04T11:45:26.2351449Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2351716Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2351718Z 2025-12-04T11:45:26.2351815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2351889Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2351931Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2351991Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2352478Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2352577Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2352614Z graph_break [] 2025-12-04T11:45:26.2352676Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2352751Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2353237Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2353334Z current_size = base.storage().size() 2025-12-04T11:45:26.2353375Z Autotune Choices Stats: 2025-12-04T11:45:26.2353748Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2353814Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2353856Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2353953Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2354201Z triton_mm_31 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2354444Z triton_mm_32 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2354670Z triton_mm_19 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2354897Z triton_mm_27 0.0074 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2355124Z triton_mm_8 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2355348Z triton_mm_14 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2355573Z triton_mm_20 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2355812Z triton_mm_13 0.0084 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2356146Z triton_mm_15 0.0087 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2356369Z triton_mm_23 0.0090 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2356499Z SingleProcess AUTOTUNE benchmarking takes 0.1351 seconds and 0.6261 seconds precompiling for 35 choices 2025-12-04T11:45:26.2356572Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2356616Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2356672Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2356772Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2357257Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2357294Z graph_break [] 2025-12-04T11:45:26.2359013Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2359091Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2359133Z Autotune Choices Stats: 2025-12-04T11:45:26.2359515Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2359565Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2359607Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2359719Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2359954Z triton_mm_65 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2360183Z triton_mm_66 0.0069 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2360410Z triton_mm_53 0.0073 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2360637Z triton_mm_61 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2360679Z _scaled_mm 0.0082 ms 75.6% 2025-12-04T11:45:26.2360905Z triton_mm_42 0.0084 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2361130Z triton_mm_47 0.0085 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2361366Z triton_mm_54 0.0085 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2361594Z triton_mm_49 0.0088 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2361819Z triton_mm_48 0.0088 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2361950Z SingleProcess AUTOTUNE benchmarking takes 0.2000 seconds and 0.3249 seconds precompiling for 35 choices 2025-12-04T11:45:26.2362004Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2362154Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2362201Z Traceback (most recent call last): 2025-12-04T11:45:26.2362362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2362404Z method(*args, **kwargs) 2025-12-04T11:45:26.2362559Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2362609Z method(*args, **kwargs) 2025-12-04T11:45:26.2362762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2362799Z with policy(): 2025-12-04T11:45:26.2362954Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2362997Z raise RuntimeError(msg) 2025-12-04T11:45:26.2363446Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2363449Z 2025-12-04T11:45:26.2363542Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2363807Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2363810Z 2025-12-04T11:45:26.2363899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2363974Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2364017Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2364076Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2364558Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2364657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2364697Z graph_break [] 2025-12-04T11:45:26.2364760Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2364835Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2365339Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2365386Z current_size = base.storage().size() 2025-12-04T11:45:26.2365427Z Autotune Choices Stats: 2025-12-04T11:45:26.2365795Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2365844Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2365887Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2365987Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2366226Z triton_mm_31 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2366460Z triton_mm_32 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2366686Z triton_mm_19 0.0070 ms 88.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2366923Z triton_mm_27 0.0074 ms 83.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2367160Z triton_mm_8 0.0082 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2367396Z triton_mm_14 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2367621Z triton_mm_20 0.0084 ms 74.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2367844Z triton_mm_13 0.0084 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2368073Z triton_mm_15 0.0087 ms 71.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2368301Z triton_mm_23 0.0090 ms 68.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2368429Z SingleProcess AUTOTUNE benchmarking takes 0.1351 seconds and 0.6261 seconds precompiling for 35 choices 2025-12-04T11:45:26.2368504Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2368546Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2368615Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2368714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2369203Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2369239Z graph_break [] 2025-12-04T11:45:26.2369303Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2369377Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2369418Z Autotune Choices Stats: 2025-12-04T11:45:26.2369782Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2369832Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2369874Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2369972Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2370207Z triton_mm_65 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2370446Z triton_mm_66 0.0069 ms 90.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2370671Z triton_mm_53 0.0073 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2370904Z triton_mm_61 0.0077 ms 80.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2370957Z _scaled_mm 0.0082 ms 75.6% 2025-12-04T11:45:26.2371184Z triton_mm_42 0.0084 ms 73.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2371410Z triton_mm_47 0.0085 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2371636Z triton_mm_54 0.0085 ms 72.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2371859Z triton_mm_49 0.0088 ms 70.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2372083Z triton_mm_48 0.0088 ms 70.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2372213Z SingleProcess AUTOTUNE benchmarking takes 0.2000 seconds and 0.3249 seconds precompiling for 35 choices 2025-12-04T11:45:26.2372287Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2372350Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2372407Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2372507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2372993Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2373031Z graph_break [] 2025-12-04T11:45:26.2373094Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2373167Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2373209Z Autotune Choices Stats: 2025-12-04T11:45:26.2373589Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2373637Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2373680Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2373777Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2374011Z triton_mm_99 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2374261Z triton_mm_100 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2374507Z triton_mm_87 0.0070 ms 89.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2374744Z triton_mm_95 0.0075 ms 83.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2374971Z triton_mm_76 0.0079 ms 79.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2375195Z triton_mm_81 0.0080 ms 78.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2375419Z triton_mm_88 0.0081 ms 77.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2375461Z _scaled_mm 0.0083 ms 75.8% 2025-12-04T11:45:26.2375683Z triton_mm_83 0.0086 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2375908Z triton_mm_82 0.0087 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2376048Z SingleProcess AUTOTUNE benchmarking takes 0.2059 seconds and 0.1834 seconds precompiling for 35 choices 2025-12-04T11:45:26.2376242Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-5da540c73b3262e6.xml - 2025-12-04T11:45:26.2376304Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2376898Z FAILED [1.0135s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2376902Z 2025-12-04T11:45:26.2376977Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2377242Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2377244Z 2025-12-04T11:45:26.2377333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2377396Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2377466Z ================== 1 failed, 187 deselected, 2 rerun in 4.91s ================== 2025-12-04T11:45:26.2377503Z Got exit code 1 2025-12-04T11:45:26.2377544Z Retrying single test... 2025-12-04T11:45:26.2377688Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db6d8117087b5438.xml 2025-12-04T11:45:26.2377756Z ============================= test session starts ============================== 2025-12-04T11:45:26.2377868Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2377912Z cachedir: .pytest_cache 2025-12-04T11:45:26.2378071Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2378128Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2378169Z configfile: pytest.ini 2025-12-04T11:45:26.2378334Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2378420Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2378680Z stepcurrent: skipping 143 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2378725Z Running 1 items in this shard 2025-12-04T11:45:26.2378727Z 2025-12-04T11:45:26.2378949Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.8111s] [100%] 2025-12-04T11:45:26.2379168Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.0324s] [100%] 2025-12-04T11:45:26.2379363Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.9654s] [100%] 2025-12-04T11:45:26.2379366Z 2025-12-04T11:45:26.2379418Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2379565Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2379613Z Traceback (most recent call last): 2025-12-04T11:45:26.2379771Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2379823Z method(*args, **kwargs) 2025-12-04T11:45:26.2379976Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2380017Z method(*args, **kwargs) 2025-12-04T11:45:26.2380168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2380206Z with policy(): 2025-12-04T11:45:26.2380360Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2380403Z raise RuntimeError(msg) 2025-12-04T11:45:26.2380798Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1088421888. 2025-12-04T11:45:26.2380803Z 2025-12-04T11:45:26.2380876Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2381144Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2381145Z 2025-12-04T11:45:26.2381233Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2381307Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2381349Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2381417Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2381904Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2382014Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2382051Z graph_break [] 2025-12-04T11:45:26.2382115Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2382197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2382687Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2382735Z current_size = base.storage().size() 2025-12-04T11:45:26.2382776Z Autotune Choices Stats: 2025-12-04T11:45:26.2383150Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.2383198Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2383241Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2383378Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2383621Z triton_mm_31 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2383850Z triton_mm_19 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2384100Z triton_mm_32 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2384328Z triton_mm_27 0.0079 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2384555Z triton_mm_8 0.0081 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2384779Z triton_mm_20 0.0085 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2385006Z triton_mm_13 0.0086 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2385232Z triton_mm_15 0.0087 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2385468Z triton_mm_14 0.0088 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2385690Z triton_mm_23 0.0090 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2385832Z SingleProcess AUTOTUNE benchmarking takes 0.1355 seconds and 0.6054 seconds precompiling for 35 choices 2025-12-04T11:45:26.2385981Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2386027Z Traceback (most recent call last): 2025-12-04T11:45:26.2386201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2386244Z method(*args, **kwargs) 2025-12-04T11:45:26.2386397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2386437Z method(*args, **kwargs) 2025-12-04T11:45:26.2386588Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2386627Z with policy(): 2025-12-04T11:45:26.2386781Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2386823Z raise RuntimeError(msg) 2025-12-04T11:45:26.2387223Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1088421888 and is now 1176502272. 2025-12-04T11:45:26.2387226Z 2025-12-04T11:45:26.2387300Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2387564Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2387577Z 2025-12-04T11:45:26.2387665Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2387739Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2387782Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2387840Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2388323Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2388423Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2388460Z graph_break [] 2025-12-04T11:45:26.2388524Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2388597Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2389084Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2389131Z current_size = base.storage().size() 2025-12-04T11:45:26.2389183Z Autotune Choices Stats: 2025-12-04T11:45:26.2389553Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.2389602Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2389643Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2389753Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2390000Z triton_mm_31 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2390228Z triton_mm_19 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2390458Z triton_mm_32 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2390682Z triton_mm_27 0.0079 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2390909Z triton_mm_8 0.0081 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2391133Z triton_mm_20 0.0085 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2391361Z triton_mm_13 0.0086 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2391597Z triton_mm_15 0.0087 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2391820Z triton_mm_14 0.0088 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2392042Z triton_mm_23 0.0090 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2392172Z SingleProcess AUTOTUNE benchmarking takes 0.1355 seconds and 0.6054 seconds precompiling for 35 choices 2025-12-04T11:45:26.2392250Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2392291Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2392349Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2392449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2392937Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2392986Z graph_break [] 2025-12-04T11:45:26.2393049Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2393122Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2393163Z Autotune Choices Stats: 2025-12-04T11:45:26.2393568Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:26.2393616Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2393672Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2393770Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2394008Z triton_mm_65 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2394233Z triton_mm_53 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2394462Z triton_mm_66 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2394687Z triton_mm_61 0.0076 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2394913Z triton_mm_42 0.0082 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2395150Z triton_mm_47 0.0084 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2395374Z triton_mm_49 0.0084 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2395601Z triton_mm_54 0.0087 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2395824Z triton_mm_48 0.0089 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2396050Z triton_mm_57 0.0092 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2396182Z SingleProcess AUTOTUNE benchmarking takes 0.2020 seconds and 0.3269 seconds precompiling for 35 choices 2025-12-04T11:45:26.2396234Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2396383Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2396429Z Traceback (most recent call last): 2025-12-04T11:45:26.2396586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2396641Z method(*args, **kwargs) 2025-12-04T11:45:26.2396794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2396835Z method(*args, **kwargs) 2025-12-04T11:45:26.2396987Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2397024Z with policy(): 2025-12-04T11:45:26.2397187Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2397227Z raise RuntimeError(msg) 2025-12-04T11:45:26.2397643Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2397647Z 2025-12-04T11:45:26.2397722Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2397985Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2397988Z 2025-12-04T11:45:26.2398077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2398150Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2398193Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2398251Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2398735Z inductor [('triton_bundler_save_kernel', 280), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2398846Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2398884Z graph_break [] 2025-12-04T11:45:26.2398946Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2399020Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2399507Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2399555Z current_size = base.storage().size() 2025-12-04T11:45:26.2399596Z Autotune Choices Stats: 2025-12-04T11:45:26.2399965Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_31", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006639999803155661, "best_triton_pos": 0} 2025-12-04T11:45:26.2400015Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2400057Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2400157Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2400395Z triton_mm_31 0.0066 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2400631Z triton_mm_19 0.0070 ms 95.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2400860Z triton_mm_32 0.0071 ms 93.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2401102Z triton_mm_27 0.0079 ms 84.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2401337Z triton_mm_8 0.0081 ms 82.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2401564Z triton_mm_20 0.0085 ms 77.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2401790Z triton_mm_13 0.0086 ms 77.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2402018Z triton_mm_15 0.0087 ms 76.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2402244Z triton_mm_14 0.0088 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2402467Z triton_mm_23 0.0090 ms 73.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2402606Z SingleProcess AUTOTUNE benchmarking takes 0.1355 seconds and 0.6054 seconds precompiling for 35 choices 2025-12-04T11:45:26.2402679Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2402722Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2402779Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2402880Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2403395Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2403432Z graph_break [] 2025-12-04T11:45:26.2403496Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2403570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2403610Z Autotune Choices Stats: 2025-12-04T11:45:26.2403979Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_65", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:26.2404026Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2404068Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2404179Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2404412Z triton_mm_65 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2404652Z triton_mm_53 0.0070 ms 90.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2404892Z triton_mm_66 0.0071 ms 88.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2405118Z triton_mm_61 0.0076 ms 82.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2405345Z triton_mm_42 0.0082 ms 77.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2405568Z triton_mm_47 0.0084 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2405792Z triton_mm_49 0.0084 ms 75.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2406018Z triton_mm_54 0.0087 ms 72.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2406245Z triton_mm_48 0.0089 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2406484Z triton_mm_57 0.0092 ms 68.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2406612Z SingleProcess AUTOTUNE benchmarking takes 0.2020 seconds and 0.3269 seconds precompiling for 35 choices 2025-12-04T11:45:26.2406686Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2406728Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2406786Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2406887Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2407371Z inductor [('triton_bundler_save_kernel', 280), ('async_compile_cache_miss', 36), ('benchmarking.InductorBenchmarker.benchmark_gpu', 35), ('generated_module_cache_miss', 34), ('select_algorithm_num_precompiles', 34), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2407409Z graph_break [] 2025-12-04T11:45:26.2407473Z aten_mm_info [('aten._scaled_mm.default_33_2048_1024', 1)] 2025-12-04T11:45:26.2407546Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2407588Z Autotune Choices Stats: 2025-12-04T11:45:26.2407955Z {"num_choices": 35, "num_triton_choices": 34, "best_kernel": "triton_mm_99", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006320000160485506, "best_triton_pos": 0} 2025-12-04T11:45:26.2408013Z AUTOTUNE scaled_mm(33x1024, 1024x2048, , ) 2025-12-04T11:45:26.2408055Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2408153Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2408397Z triton_mm_99 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2408638Z triton_mm_100 0.0068 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2408863Z triton_mm_87 0.0068 ms 92.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2409088Z triton_mm_95 0.0075 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2409317Z triton_mm_76 0.0080 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2409542Z triton_mm_88 0.0082 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2409585Z _scaled_mm 0.0082 ms 77.1% 2025-12-04T11:45:26.2409806Z triton_mm_81 0.0084 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2410042Z triton_mm_83 0.0084 ms 75.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2410267Z triton_mm_82 0.0086 ms 73.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2410397Z SingleProcess AUTOTUNE benchmarking takes 0.2101 seconds and 0.1827 seconds precompiling for 35 choices 2025-12-04T11:45:26.2410589Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-db6d8117087b5438.xml - 2025-12-04T11:45:26.2410651Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2411249Z FAILED [0.9654s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1176502272 and is now 1264582656. 2025-12-04T11:45:26.2411253Z 2025-12-04T11:45:26.2411327Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2411589Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2411601Z 2025-12-04T11:45:26.2411690Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2411752Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2411823Z ================== 1 failed, 187 deselected, 2 rerun in 4.83s ================== 2025-12-04T11:45:26.2411861Z Got exit code 1 2025-12-04T11:45:26.2412083Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2412213Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2412372Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c15b75bd6d17a764.xml 2025-12-04T11:45:26.2412430Z ============================= test session starts ============================== 2025-12-04T11:45:26.2412543Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2412584Z cachedir: .pytest_cache 2025-12-04T11:45:26.2412745Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2412791Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2412832Z configfile: pytest.ini 2025-12-04T11:45:26.2412994Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2413072Z collecting ... collected 188 items / 144 deselected / 44 selected 2025-12-04T11:45:26.2413125Z stepcurrent: skipping 144 already run items. 2025-12-04T11:45:26.2413170Z Running 44 items in this shard 2025-12-04T11:45:26.2413173Z 2025-12-04T11:45:26.2413497Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7450s] [ 2%] 2025-12-04T11:45:26.2413715Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3582s] [ 2%] 2025-12-04T11:45:26.2413927Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3435s] [ 2%] 2025-12-04T11:45:26.2413929Z 2025-12-04T11:45:26.2413980Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2414127Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2414172Z Traceback (most recent call last): 2025-12-04T11:45:26.2414334Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2414374Z method(*args, **kwargs) 2025-12-04T11:45:26.2414530Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2414569Z method(*args, **kwargs) 2025-12-04T11:45:26.2414721Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2414759Z with policy(): 2025-12-04T11:45:26.2414912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2414953Z raise RuntimeError(msg) 2025-12-04T11:45:26.2415342Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2415344Z 2025-12-04T11:45:26.2415418Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2415700Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2415703Z 2025-12-04T11:45:26.2415790Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2415863Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2415918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2415975Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2416044Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2416157Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2416194Z graph_break [] 2025-12-04T11:45:26.2416254Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2416398Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2416443Z Traceback (most recent call last): 2025-12-04T11:45:26.2416600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2416640Z method(*args, **kwargs) 2025-12-04T11:45:26.2416794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2416835Z method(*args, **kwargs) 2025-12-04T11:45:26.2416986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2417022Z with policy(): 2025-12-04T11:45:26.2417176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2417216Z raise RuntimeError(msg) 2025-12-04T11:45:26.2417603Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2417618Z 2025-12-04T11:45:26.2417693Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2417953Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2417955Z 2025-12-04T11:45:26.2418042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2418118Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2418160Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2418217Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2418283Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2418383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2418420Z graph_break [] 2025-12-04T11:45:26.2418481Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2418554Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2418597Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2418652Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2418751Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2418817Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2418853Z graph_break [] 2025-12-04T11:45:26.2418913Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2418976Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2419118Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2419164Z Traceback (most recent call last): 2025-12-04T11:45:26.2419321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2419361Z method(*args, **kwargs) 2025-12-04T11:45:26.2419525Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2419565Z method(*args, **kwargs) 2025-12-04T11:45:26.2419724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2419761Z with policy(): 2025-12-04T11:45:26.2419914Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2419956Z raise RuntimeError(msg) 2025-12-04T11:45:26.2420341Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2420344Z 2025-12-04T11:45:26.2420416Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2420676Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2420678Z 2025-12-04T11:45:26.2420765Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2420840Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2420883Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2420939Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2421006Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2421116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2421153Z graph_break [] 2025-12-04T11:45:26.2421212Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2421285Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2421328Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2421382Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2421479Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2421544Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2421581Z graph_break [] 2025-12-04T11:45:26.2421639Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2421713Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2421754Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2421809Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2421904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2421968Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2422004Z graph_break [] 2025-12-04T11:45:26.2422062Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2422255Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-c15b75bd6d17a764.xml - 2025-12-04T11:45:26.2422315Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2422894Z FAILED [0.3435s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2422908Z 2025-12-04T11:45:26.2422981Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2423279Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2423281Z 2025-12-04T11:45:26.2423383Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2423448Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2423516Z ================== 1 failed, 144 deselected, 2 rerun in 2.47s ================== 2025-12-04T11:45:26.2423554Z Got exit code 1 2025-12-04T11:45:26.2423594Z Retrying single test... 2025-12-04T11:45:26.2423738Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-39f439f0154bc133.xml 2025-12-04T11:45:26.2423796Z ============================= test session starts ============================== 2025-12-04T11:45:26.2423908Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2423948Z cachedir: .pytest_cache 2025-12-04T11:45:26.2424106Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2424153Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2424195Z configfile: pytest.ini 2025-12-04T11:45:26.2424357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2424433Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2424689Z stepcurrent: skipping 144 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2424747Z Running 1 items in this shard 2025-12-04T11:45:26.2424749Z 2025-12-04T11:45:26.2424963Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6415s] [100%] 2025-12-04T11:45:26.2425177Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2554s] [100%] 2025-12-04T11:45:26.2425367Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2109s] [100%] 2025-12-04T11:45:26.2425370Z 2025-12-04T11:45:26.2425420Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2425563Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2425610Z Traceback (most recent call last): 2025-12-04T11:45:26.2425768Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2425809Z method(*args, **kwargs) 2025-12-04T11:45:26.2425963Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2426002Z method(*args, **kwargs) 2025-12-04T11:45:26.2426155Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2426191Z with policy(): 2025-12-04T11:45:26.2426358Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2426399Z raise RuntimeError(msg) 2025-12-04T11:45:26.2426783Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2426798Z 2025-12-04T11:45:26.2426873Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2427140Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2427143Z 2025-12-04T11:45:26.2427231Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2427304Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2427346Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2427403Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2427471Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2427570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2427606Z graph_break [] 2025-12-04T11:45:26.2427667Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2427809Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2427854Z Traceback (most recent call last): 2025-12-04T11:45:26.2428010Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2428051Z method(*args, **kwargs) 2025-12-04T11:45:26.2428201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2428241Z method(*args, **kwargs) 2025-12-04T11:45:26.2428391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2428445Z with policy(): 2025-12-04T11:45:26.2428599Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2428640Z raise RuntimeError(msg) 2025-12-04T11:45:26.2429023Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2429026Z 2025-12-04T11:45:26.2429100Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2429356Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2429359Z 2025-12-04T11:45:26.2429446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2429519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2429562Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2429618Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2429684Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2429787Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2429823Z graph_break [] 2025-12-04T11:45:26.2429882Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2429967Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2430008Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2430064Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2430160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2430224Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2430259Z graph_break [] 2025-12-04T11:45:26.2430328Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2430380Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2430537Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2430582Z Traceback (most recent call last): 2025-12-04T11:45:26.2430737Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2430777Z method(*args, **kwargs) 2025-12-04T11:45:26.2430928Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2430969Z method(*args, **kwargs) 2025-12-04T11:45:26.2431118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2431156Z with policy(): 2025-12-04T11:45:26.2431309Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2431350Z raise RuntimeError(msg) 2025-12-04T11:45:26.2431737Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2431739Z 2025-12-04T11:45:26.2431813Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2432071Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2432084Z 2025-12-04T11:45:26.2432173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2432246Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2432288Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2432344Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2432410Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2432506Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2432544Z graph_break [] 2025-12-04T11:45:26.2432603Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2432676Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2432718Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2432773Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2432867Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2432933Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2432969Z graph_break [] 2025-12-04T11:45:26.2433027Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2433099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2433142Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2433196Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2433338Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2433401Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2433438Z graph_break [] 2025-12-04T11:45:26.2433495Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2433691Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-39f439f0154bc133.xml - 2025-12-04T11:45:26.2433752Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2434367Z FAILED [0.2109s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2434371Z 2025-12-04T11:45:26.2434443Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2434701Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2434704Z 2025-12-04T11:45:26.2434792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2434856Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2434925Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:26.2434961Z Got exit code 1 2025-12-04T11:45:26.2435003Z Retrying single test... 2025-12-04T11:45:26.2435149Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cdd450d1acb7a1cc.xml 2025-12-04T11:45:26.2435207Z ============================= test session starts ============================== 2025-12-04T11:45:26.2435317Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2435358Z cachedir: .pytest_cache 2025-12-04T11:45:26.2435531Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2435577Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2435617Z configfile: pytest.ini 2025-12-04T11:45:26.2435779Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2435855Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2436110Z stepcurrent: skipping 144 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2436154Z Running 1 items in this shard 2025-12-04T11:45:26.2436156Z 2025-12-04T11:45:26.2436374Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6452s] [100%] 2025-12-04T11:45:26.2436587Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2547s] [100%] 2025-12-04T11:45:26.2436777Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2125s] [100%] 2025-12-04T11:45:26.2436780Z 2025-12-04T11:45:26.2436831Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2436973Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2437034Z Traceback (most recent call last): 2025-12-04T11:45:26.2437190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2437231Z method(*args, **kwargs) 2025-12-04T11:45:26.2437383Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2437425Z method(*args, **kwargs) 2025-12-04T11:45:26.2437586Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2437623Z with policy(): 2025-12-04T11:45:26.2437775Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2437826Z raise RuntimeError(msg) 2025-12-04T11:45:26.2438213Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2438216Z 2025-12-04T11:45:26.2438291Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2438550Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2438552Z 2025-12-04T11:45:26.2438640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2438712Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2438756Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2438811Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2438878Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2438977Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2439014Z graph_break [] 2025-12-04T11:45:26.2439072Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2439216Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2439271Z Traceback (most recent call last): 2025-12-04T11:45:26.2439425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2439466Z method(*args, **kwargs) 2025-12-04T11:45:26.2439616Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2439658Z method(*args, **kwargs) 2025-12-04T11:45:26.2439808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2439848Z with policy(): 2025-12-04T11:45:26.2440001Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2440043Z raise RuntimeError(msg) 2025-12-04T11:45:26.2440430Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2440432Z 2025-12-04T11:45:26.2440507Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2440766Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2440768Z 2025-12-04T11:45:26.2440868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2440942Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2440985Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2441042Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2441108Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2441206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2441253Z graph_break [] 2025-12-04T11:45:26.2441313Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2441386Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2441427Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2441499Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2441595Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2441661Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2441696Z graph_break [] 2025-12-04T11:45:26.2441754Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2441806Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2441949Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2441993Z Traceback (most recent call last): 2025-12-04T11:45:26.2442149Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2442189Z method(*args, **kwargs) 2025-12-04T11:45:26.2442341Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2442381Z method(*args, **kwargs) 2025-12-04T11:45:26.2442532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2442569Z with policy(): 2025-12-04T11:45:26.2442722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2442773Z raise RuntimeError(msg) 2025-12-04T11:45:26.2443162Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2443165Z 2025-12-04T11:45:26.2443238Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2443530Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2443533Z 2025-12-04T11:45:26.2443622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2443695Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2443739Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2443794Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2443860Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2443957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2443995Z graph_break [] 2025-12-04T11:45:26.2444053Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2444129Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2444170Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2444225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2444338Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2444403Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2444439Z graph_break [] 2025-12-04T11:45:26.2444499Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2444571Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2444613Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2444681Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2444777Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2444841Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2444890Z graph_break [] 2025-12-04T11:45:26.2444950Z aten_mm_info [('aten._scaled_mm.default_33_16_16', 1)] 2025-12-04T11:45:26.2445147Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-cdd450d1acb7a1cc.xml - 2025-12-04T11:45:26.2445207Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2445786Z FAILED [0.2125s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2445789Z 2025-12-04T11:45:26.2445865Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2446124Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2446127Z 2025-12-04T11:45:26.2446214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2446276Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2446357Z ================== 1 failed, 187 deselected, 2 rerun in 2.13s ================== 2025-12-04T11:45:26.2446393Z Got exit code 1 2025-12-04T11:45:26.2446603Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2446730Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2446879Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2af6b61cbb199ccf.xml 2025-12-04T11:45:26.2446935Z ============================= test session starts ============================== 2025-12-04T11:45:26.2447048Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2447108Z cachedir: .pytest_cache 2025-12-04T11:45:26.2447269Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2447315Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2447357Z configfile: pytest.ini 2025-12-04T11:45:26.2447517Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2447595Z collecting ... collected 188 items / 145 deselected / 43 selected 2025-12-04T11:45:26.2447649Z stepcurrent: skipping 145 already run items. 2025-12-04T11:45:26.2447693Z Running 43 items in this shard 2025-12-04T11:45:26.2447696Z 2025-12-04T11:45:26.2447919Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8313s] [ 2%] 2025-12-04T11:45:26.2448152Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3624s] [ 2%] 2025-12-04T11:45:26.2448361Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3252s] [ 2%] 2025-12-04T11:45:26.2448363Z 2025-12-04T11:45:26.2448424Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2448570Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2448615Z Traceback (most recent call last): 2025-12-04T11:45:26.2448786Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2448827Z method(*args, **kwargs) 2025-12-04T11:45:26.2448983Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2449022Z method(*args, **kwargs) 2025-12-04T11:45:26.2449175Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2449212Z with policy(): 2025-12-04T11:45:26.2449365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2449406Z raise RuntimeError(msg) 2025-12-04T11:45:26.2449798Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2449801Z 2025-12-04T11:45:26.2449875Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2450134Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2450147Z 2025-12-04T11:45:26.2450234Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2450307Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2450350Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2450406Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2450472Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2450571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2450607Z graph_break [] 2025-12-04T11:45:26.2450669Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2450816Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2450862Z Traceback (most recent call last): 2025-12-04T11:45:26.2451018Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2451058Z method(*args, **kwargs) 2025-12-04T11:45:26.2451210Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2451251Z method(*args, **kwargs) 2025-12-04T11:45:26.2451402Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2451438Z with policy(): 2025-12-04T11:45:26.2451592Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2451644Z raise RuntimeError(msg) 2025-12-04T11:45:26.2452033Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2452036Z 2025-12-04T11:45:26.2452108Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2452379Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2452381Z 2025-12-04T11:45:26.2452479Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2452552Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2452595Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2452652Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2452718Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2452816Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2452853Z graph_break [] 2025-12-04T11:45:26.2452914Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2452989Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2453031Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2453086Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2453183Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2453292Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2453329Z graph_break [] 2025-12-04T11:45:26.2453389Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2453442Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2453588Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2453662Z Traceback (most recent call last): 2025-12-04T11:45:26.2453819Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2453859Z method(*args, **kwargs) 2025-12-04T11:45:26.2454013Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2454052Z method(*args, **kwargs) 2025-12-04T11:45:26.2454203Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2454239Z with policy(): 2025-12-04T11:45:26.2454392Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2454434Z raise RuntimeError(msg) 2025-12-04T11:45:26.2454820Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2454824Z 2025-12-04T11:45:26.2454898Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2455159Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2455162Z 2025-12-04T11:45:26.2455248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2455321Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2455387Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2455442Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2455509Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2455607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2455643Z graph_break [] 2025-12-04T11:45:26.2455703Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2455810Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2455852Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2455908Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2456022Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2456088Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2456124Z graph_break [] 2025-12-04T11:45:26.2456184Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2456256Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2456297Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2456353Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2456449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2456513Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2456549Z graph_break [] 2025-12-04T11:45:26.2456608Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2456802Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-2af6b61cbb199ccf.xml - 2025-12-04T11:45:26.2456862Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2457447Z FAILED [0.3252s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2457461Z 2025-12-04T11:45:26.2457534Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2457793Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2457796Z 2025-12-04T11:45:26.2457883Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2457945Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2458014Z ================== 1 failed, 145 deselected, 2 rerun in 2.54s ================== 2025-12-04T11:45:26.2458051Z Got exit code 1 2025-12-04T11:45:26.2458092Z Retrying single test... 2025-12-04T11:45:26.2458239Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-746cb47d0c584970.xml 2025-12-04T11:45:26.2458297Z ============================= test session starts ============================== 2025-12-04T11:45:26.2458407Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2458448Z cachedir: .pytest_cache 2025-12-04T11:45:26.2458605Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2458650Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2458690Z configfile: pytest.ini 2025-12-04T11:45:26.2458851Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2458937Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2459195Z stepcurrent: skipping 145 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2459240Z Running 1 items in this shard 2025-12-04T11:45:26.2459242Z 2025-12-04T11:45:26.2459470Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6538s] [100%] 2025-12-04T11:45:26.2459699Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2602s] [100%] 2025-12-04T11:45:26.2459891Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2164s] [100%] 2025-12-04T11:45:26.2459894Z 2025-12-04T11:45:26.2459945Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2460091Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2460137Z Traceback (most recent call last): 2025-12-04T11:45:26.2460295Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2460336Z method(*args, **kwargs) 2025-12-04T11:45:26.2460489Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2460530Z method(*args, **kwargs) 2025-12-04T11:45:26.2460681Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2460719Z with policy(): 2025-12-04T11:45:26.2460873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2461017Z raise RuntimeError(msg) 2025-12-04T11:45:26.2461460Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2461462Z 2025-12-04T11:45:26.2461537Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2461797Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2461799Z 2025-12-04T11:45:26.2461888Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2461964Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2462006Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2462066Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2462133Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2462232Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2462268Z graph_break [] 2025-12-04T11:45:26.2462330Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2462475Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2462522Z Traceback (most recent call last): 2025-12-04T11:45:26.2462677Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2462733Z method(*args, **kwargs) 2025-12-04T11:45:26.2462885Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2462926Z method(*args, **kwargs) 2025-12-04T11:45:26.2463082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2463119Z with policy(): 2025-12-04T11:45:26.2463325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2463367Z raise RuntimeError(msg) 2025-12-04T11:45:26.2463764Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2463768Z 2025-12-04T11:45:26.2463842Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2464101Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2464104Z 2025-12-04T11:45:26.2464192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2464266Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2464309Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2464364Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2464431Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2464528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2464566Z graph_break [] 2025-12-04T11:45:26.2464626Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2464701Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2464742Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2464812Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2464909Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2464975Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2465012Z graph_break [] 2025-12-04T11:45:26.2465072Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2465127Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2465275Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2465320Z Traceback (most recent call last): 2025-12-04T11:45:26.2465477Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2465517Z method(*args, **kwargs) 2025-12-04T11:45:26.2465668Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2465709Z method(*args, **kwargs) 2025-12-04T11:45:26.2465858Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2465896Z with policy(): 2025-12-04T11:45:26.2466048Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2466089Z raise RuntimeError(msg) 2025-12-04T11:45:26.2466476Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2466491Z 2025-12-04T11:45:26.2466565Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2466824Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2466826Z 2025-12-04T11:45:26.2466924Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2466998Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2467040Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2467106Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2467173Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2467272Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2467309Z graph_break [] 2025-12-04T11:45:26.2467368Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2467443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2467485Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2467540Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2467637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2467702Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2467738Z graph_break [] 2025-12-04T11:45:26.2467798Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2467871Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2467912Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2467970Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2468066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2468129Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2468178Z graph_break [] 2025-12-04T11:45:26.2468237Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2468430Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-746cb47d0c584970.xml - 2025-12-04T11:45:26.2468491Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2469074Z FAILED [0.2164s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2469077Z 2025-12-04T11:45:26.2469150Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2469411Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2469413Z 2025-12-04T11:45:26.2469500Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2469562Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2469630Z ================== 1 failed, 187 deselected, 2 rerun in 2.15s ================== 2025-12-04T11:45:26.2469667Z Got exit code 1 2025-12-04T11:45:26.2469708Z Retrying single test... 2025-12-04T11:45:26.2469852Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c0a6c15f361d4a2.xml 2025-12-04T11:45:26.2469930Z ============================= test session starts ============================== 2025-12-04T11:45:26.2470039Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2470081Z cachedir: .pytest_cache 2025-12-04T11:45:26.2470238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2470295Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2470336Z configfile: pytest.ini 2025-12-04T11:45:26.2470497Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2470582Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2470838Z stepcurrent: skipping 145 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2470883Z Running 1 items in this shard 2025-12-04T11:45:26.2470885Z 2025-12-04T11:45:26.2471103Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7679s] [100%] 2025-12-04T11:45:26.2471320Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3741s] [100%] 2025-12-04T11:45:26.2471511Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3437s] [100%] 2025-12-04T11:45:26.2471514Z 2025-12-04T11:45:26.2471566Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2471710Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2471758Z Traceback (most recent call last): 2025-12-04T11:45:26.2471915Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2471968Z method(*args, **kwargs) 2025-12-04T11:45:26.2472119Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2472160Z method(*args, **kwargs) 2025-12-04T11:45:26.2472311Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2472348Z with policy(): 2025-12-04T11:45:26.2472503Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2472545Z raise RuntimeError(msg) 2025-12-04T11:45:26.2472934Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2472939Z 2025-12-04T11:45:26.2473012Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2473314Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2473316Z 2025-12-04T11:45:26.2473402Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2473476Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2473518Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2473575Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2473661Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2473761Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2473797Z graph_break [] 2025-12-04T11:45:26.2473859Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2474003Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2474049Z Traceback (most recent call last): 2025-12-04T11:45:26.2474222Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2474263Z method(*args, **kwargs) 2025-12-04T11:45:26.2474425Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2474465Z method(*args, **kwargs) 2025-12-04T11:45:26.2474614Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2474653Z with policy(): 2025-12-04T11:45:26.2474806Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2474849Z raise RuntimeError(msg) 2025-12-04T11:45:26.2475234Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2475237Z 2025-12-04T11:45:26.2475310Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2475570Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2475573Z 2025-12-04T11:45:26.2475659Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2475733Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2475791Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2475847Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2475913Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2476013Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2476049Z graph_break [] 2025-12-04T11:45:26.2476110Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2476184Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2476226Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2476280Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2476379Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2476443Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2476481Z graph_break [] 2025-12-04T11:45:26.2476540Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2476593Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2476739Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2476788Z Traceback (most recent call last): 2025-12-04T11:45:26.2476944Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2476986Z method(*args, **kwargs) 2025-12-04T11:45:26.2477139Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2477189Z method(*args, **kwargs) 2025-12-04T11:45:26.2477338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2477375Z with policy(): 2025-12-04T11:45:26.2477529Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2477570Z raise RuntimeError(msg) 2025-12-04T11:45:26.2477964Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2477975Z 2025-12-04T11:45:26.2478050Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2478309Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2478313Z 2025-12-04T11:45:26.2478398Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2478473Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2478514Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2478570Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2478635Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2478733Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2478769Z graph_break [] 2025-12-04T11:45:26.2478830Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2478905Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2478950Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2479004Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2479102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2479177Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2479214Z graph_break [] 2025-12-04T11:45:26.2479274Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2479348Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2479389Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2479444Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2479544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2479611Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2479648Z graph_break [] 2025-12-04T11:45:26.2479709Z aten_mm_info [('aten._scaled_mm.default_33_2048_16', 1)] 2025-12-04T11:45:26.2479899Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3c0a6c15f361d4a2.xml - 2025-12-04T11:45:26.2479960Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2480547Z FAILED [0.3437s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2480550Z 2025-12-04T11:45:26.2480623Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2480880Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2480898Z 2025-12-04T11:45:26.2480984Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2481048Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2481114Z ================== 1 failed, 187 deselected, 2 rerun in 2.50s ================== 2025-12-04T11:45:26.2481152Z Got exit code 1 2025-12-04T11:45:26.2481370Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2481513Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2481657Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b32c5159c3158538.xml 2025-12-04T11:45:26.2481718Z ============================= test session starts ============================== 2025-12-04T11:45:26.2481828Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2481869Z cachedir: .pytest_cache 2025-12-04T11:45:26.2482031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2482078Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2482119Z configfile: pytest.ini 2025-12-04T11:45:26.2482284Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2482360Z collecting ... collected 188 items / 146 deselected / 42 selected 2025-12-04T11:45:26.2482416Z stepcurrent: skipping 146 already run items. 2025-12-04T11:45:26.2482459Z Running 42 items in this shard 2025-12-04T11:45:26.2482461Z 2025-12-04T11:45:26.2482680Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0168s] [ 2%] 2025-12-04T11:45:26.2482893Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5689s] [ 2%] 2025-12-04T11:45:26.2483102Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.6415s] [ 2%] 2025-12-04T11:45:26.2483105Z 2025-12-04T11:45:26.2483156Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2483323Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2483370Z Traceback (most recent call last): 2025-12-04T11:45:26.2483527Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2483570Z method(*args, **kwargs) 2025-12-04T11:45:26.2483722Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2483763Z method(*args, **kwargs) 2025-12-04T11:45:26.2483912Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2483952Z with policy(): 2025-12-04T11:45:26.2484107Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2484149Z raise RuntimeError(msg) 2025-12-04T11:45:26.2484537Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2484555Z 2025-12-04T11:45:26.2484630Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2484889Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2484892Z 2025-12-04T11:45:26.2484980Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2485065Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2485109Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2485165Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2485662Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2485763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2485801Z graph_break [] 2025-12-04T11:45:26.2485862Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2485935Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2486426Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2486475Z current_size = base.storage().size() 2025-12-04T11:45:26.2486517Z Autotune Choices Stats: 2025-12-04T11:45:26.2486886Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005799000151455402, "best_triton_pos": 0} 2025-12-04T11:45:26.2486945Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2486987Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2487088Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2487325Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2487554Z triton_mm_2 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2487781Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2488004Z triton_mm_3 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2488047Z _scaled_mm 0.0217 ms 26.7% 2025-12-04T11:45:26.2488176Z SingleProcess AUTOTUNE benchmarking takes 0.0257 seconds and 0.1064 seconds precompiling for 5 choices 2025-12-04T11:45:26.2488320Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2488375Z Traceback (most recent call last): 2025-12-04T11:45:26.2488532Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2488572Z method(*args, **kwargs) 2025-12-04T11:45:26.2488727Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2488767Z method(*args, **kwargs) 2025-12-04T11:45:26.2488929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2488966Z with policy(): 2025-12-04T11:45:26.2489130Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2489171Z raise RuntimeError(msg) 2025-12-04T11:45:26.2489560Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2489564Z 2025-12-04T11:45:26.2489640Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2489900Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2489902Z 2025-12-04T11:45:26.2489989Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2490063Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2490109Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2490164Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2490651Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2490760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2490798Z graph_break [] 2025-12-04T11:45:26.2490858Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2490933Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2491423Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2491473Z current_size = base.storage().size() 2025-12-04T11:45:26.2491514Z Autotune Choices Stats: 2025-12-04T11:45:26.2491882Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005799000151455402, "best_triton_pos": 0} 2025-12-04T11:45:26.2491927Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2491967Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2492068Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2492299Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2492536Z triton_mm_2 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2492768Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2493003Z triton_mm_3 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2493047Z _scaled_mm 0.0217 ms 26.7% 2025-12-04T11:45:26.2493178Z SingleProcess AUTOTUNE benchmarking takes 0.0257 seconds and 0.1064 seconds precompiling for 5 choices 2025-12-04T11:45:26.2493290Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2493332Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2493390Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2493490Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2493975Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2494012Z graph_break [] 2025-12-04T11:45:26.2494074Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2494148Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2494189Z Autotune Choices Stats: 2025-12-04T11:45:26.2494547Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2494606Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2494647Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2494747Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2494974Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2495200Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2495428Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2495654Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2495696Z _scaled_mm 0.0070 ms 86.9% 2025-12-04T11:45:26.2495825Z SingleProcess AUTOTUNE benchmarking takes 0.0238 seconds and 0.0716 seconds precompiling for 5 choices 2025-12-04T11:45:26.2495878Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2496034Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2496081Z Traceback (most recent call last): 2025-12-04T11:45:26.2496239Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2496281Z method(*args, **kwargs) 2025-12-04T11:45:26.2496447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2496487Z method(*args, **kwargs) 2025-12-04T11:45:26.2496637Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2496693Z with policy(): 2025-12-04T11:45:26.2496847Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2496889Z raise RuntimeError(msg) 2025-12-04T11:45:26.2497279Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2497282Z 2025-12-04T11:45:26.2497358Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2497622Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2497624Z 2025-12-04T11:45:26.2497713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2497787Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2497830Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2497887Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2498370Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2498480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2498516Z graph_break [] 2025-12-04T11:45:26.2498577Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2498651Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2499138Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2499186Z current_size = base.storage().size() 2025-12-04T11:45:26.2499229Z Autotune Choices Stats: 2025-12-04T11:45:26.2499591Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005799000151455402, "best_triton_pos": 0} 2025-12-04T11:45:26.2499636Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2499678Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2499777Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2500023Z triton_mm_0 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2500249Z triton_mm_2 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2500480Z triton_mm_1 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2500713Z triton_mm_3 0.0059 ms 98.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2500756Z _scaled_mm 0.0217 ms 26.7% 2025-12-04T11:45:26.2500884Z SingleProcess AUTOTUNE benchmarking takes 0.0257 seconds and 0.1064 seconds precompiling for 5 choices 2025-12-04T11:45:26.2500960Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2501002Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2501060Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2501160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2501643Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2501682Z graph_break [] 2025-12-04T11:45:26.2501743Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2501817Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2501857Z Autotune Choices Stats: 2025-12-04T11:45:26.2502231Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_4", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2502276Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2502317Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2502418Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2502650Z triton_mm_4 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2502876Z triton_mm_6 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2503103Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2503350Z triton_mm_5 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2503390Z _scaled_mm 0.0070 ms 86.9% 2025-12-04T11:45:26.2503519Z SingleProcess AUTOTUNE benchmarking takes 0.0238 seconds and 0.0716 seconds precompiling for 5 choices 2025-12-04T11:45:26.2503607Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2503651Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2503708Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2503809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2504309Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2504348Z graph_break [] 2025-12-04T11:45:26.2504408Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2504484Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2504523Z Autotune Choices Stats: 2025-12-04T11:45:26.2504881Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.2504927Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2504969Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2505066Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2505296Z triton_mm_9 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2505522Z triton_mm_10 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2505761Z triton_mm_8 0.0060 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2505990Z triton_mm_11 0.0060 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2506030Z _scaled_mm 0.0186 ms 31.7% 2025-12-04T11:45:26.2506159Z SingleProcess AUTOTUNE benchmarking takes 0.0317 seconds and 0.1797 seconds precompiling for 5 choices 2025-12-04T11:45:26.2506349Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b32c5159c3158538.xml - 2025-12-04T11:45:26.2506411Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2506995Z FAILED [0.6415s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2506999Z 2025-12-04T11:45:26.2507073Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2507336Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2507349Z 2025-12-04T11:45:26.2507436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2507499Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2507569Z ================== 1 failed, 146 deselected, 2 rerun in 3.25s ================== 2025-12-04T11:45:26.2507607Z Got exit code 1 2025-12-04T11:45:26.2507646Z Retrying single test... 2025-12-04T11:45:26.2507802Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-feac90814b000fa7.xml 2025-12-04T11:45:26.2507859Z ============================= test session starts ============================== 2025-12-04T11:45:26.2507982Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2508024Z cachedir: .pytest_cache 2025-12-04T11:45:26.2508183Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2508229Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2508269Z configfile: pytest.ini 2025-12-04T11:45:26.2508432Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2508509Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2508766Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2508810Z Running 1 items in this shard 2025-12-04T11:45:26.2508812Z 2025-12-04T11:45:26.2509026Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8990s] [100%] 2025-12-04T11:45:26.2509239Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4653s] [100%] 2025-12-04T11:45:26.2509430Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5303s] [100%] 2025-12-04T11:45:26.2509442Z 2025-12-04T11:45:26.2509492Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2509636Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2509681Z Traceback (most recent call last): 2025-12-04T11:45:26.2509842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2509884Z method(*args, **kwargs) 2025-12-04T11:45:26.2510043Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2510084Z method(*args, **kwargs) 2025-12-04T11:45:26.2510238Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2510277Z with policy(): 2025-12-04T11:45:26.2510433Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2510473Z raise RuntimeError(msg) 2025-12-04T11:45:26.2510866Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2510869Z 2025-12-04T11:45:26.2510942Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2511201Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2511213Z 2025-12-04T11:45:26.2511301Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2511375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2511418Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2511475Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2511984Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2512083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2512120Z graph_break [] 2025-12-04T11:45:26.2512179Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2512253Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2512742Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2512790Z current_size = base.storage().size() 2025-12-04T11:45:26.2512830Z Autotune Choices Stats: 2025-12-04T11:45:26.2513198Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2513243Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2513334Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2513435Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2513669Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2513896Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2514123Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2514348Z triton_mm_3 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2514389Z _scaled_mm 0.0198 ms 30.3% 2025-12-04T11:45:26.2514521Z SingleProcess AUTOTUNE benchmarking takes 0.0214 seconds and 0.1085 seconds precompiling for 5 choices 2025-12-04T11:45:26.2514664Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2514710Z Traceback (most recent call last): 2025-12-04T11:45:26.2514868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2514927Z method(*args, **kwargs) 2025-12-04T11:45:26.2515082Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2515122Z method(*args, **kwargs) 2025-12-04T11:45:26.2515277Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2515313Z with policy(): 2025-12-04T11:45:26.2515481Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2515522Z raise RuntimeError(msg) 2025-12-04T11:45:26.2515926Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2515929Z 2025-12-04T11:45:26.2516003Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2516264Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2516267Z 2025-12-04T11:45:26.2516353Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2516428Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2516471Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2516530Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2517013Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2517114Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2517168Z graph_break [] 2025-12-04T11:45:26.2517229Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2517304Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2517790Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2517838Z current_size = base.storage().size() 2025-12-04T11:45:26.2517879Z Autotune Choices Stats: 2025-12-04T11:45:26.2518242Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2518286Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2518329Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2518430Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2518664Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2518890Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2519125Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2519365Z triton_mm_3 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2519407Z _scaled_mm 0.0198 ms 30.3% 2025-12-04T11:45:26.2519546Z SingleProcess AUTOTUNE benchmarking takes 0.0214 seconds and 0.1085 seconds precompiling for 5 choices 2025-12-04T11:45:26.2519620Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2519662Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2519719Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2519819Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2520301Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2520339Z graph_break [] 2025-12-04T11:45:26.2520400Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2520475Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2520514Z Autotune Choices Stats: 2025-12-04T11:45:26.2520871Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2520928Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2520968Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2521067Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2521297Z triton_mm_7 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2521527Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2521752Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2521976Z triton_mm_6 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2522017Z _scaled_mm 0.0152 ms 39.4% 2025-12-04T11:45:26.2522146Z SingleProcess AUTOTUNE benchmarking takes 0.0201 seconds and 0.0812 seconds precompiling for 5 choices 2025-12-04T11:45:26.2522199Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2522344Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2522389Z Traceback (most recent call last): 2025-12-04T11:45:26.2522558Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2522598Z method(*args, **kwargs) 2025-12-04T11:45:26.2522753Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2522794Z method(*args, **kwargs) 2025-12-04T11:45:26.2522955Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2522993Z with policy(): 2025-12-04T11:45:26.2523147Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2523189Z raise RuntimeError(msg) 2025-12-04T11:45:26.2523658Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2523661Z 2025-12-04T11:45:26.2523738Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2524003Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2524006Z 2025-12-04T11:45:26.2524097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2524169Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2524213Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2524271Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2524756Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2524870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2524907Z graph_break [] 2025-12-04T11:45:26.2524967Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2525045Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2525534Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2525582Z current_size = base.storage().size() 2025-12-04T11:45:26.2525623Z Autotune Choices Stats: 2025-12-04T11:45:26.2525988Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2526034Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2526074Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2526174Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2526407Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2526652Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2526876Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2527114Z triton_mm_3 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2527165Z _scaled_mm 0.0198 ms 30.3% 2025-12-04T11:45:26.2527294Z SingleProcess AUTOTUNE benchmarking takes 0.0214 seconds and 0.1085 seconds precompiling for 5 choices 2025-12-04T11:45:26.2527369Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2527410Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2527471Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2527570Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2528058Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2528094Z graph_break [] 2025-12-04T11:45:26.2530009Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2530087Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2530129Z Autotune Choices Stats: 2025-12-04T11:45:26.2530494Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2530559Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2530600Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2530701Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2530933Z triton_mm_7 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2531161Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2531389Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2531617Z triton_mm_6 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2531658Z _scaled_mm 0.0152 ms 39.4% 2025-12-04T11:45:26.2531788Z SingleProcess AUTOTUNE benchmarking takes 0.0201 seconds and 0.0812 seconds precompiling for 5 choices 2025-12-04T11:45:26.2531864Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2531918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2531977Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2532076Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2532572Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2532610Z graph_break [] 2025-12-04T11:45:26.2532671Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2532756Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2532800Z Autotune Choices Stats: 2025-12-04T11:45:26.2533163Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_10", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2533208Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2533284Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2533384Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2533622Z triton_mm_10 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2533850Z triton_mm_8 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2534082Z triton_mm_11 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2534326Z triton_mm_9 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2534369Z _scaled_mm 0.0066 ms 90.9% 2025-12-04T11:45:26.2534501Z SingleProcess AUTOTUNE benchmarking takes 0.0293 seconds and 0.1740 seconds precompiling for 5 choices 2025-12-04T11:45:26.2534700Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-feac90814b000fa7.xml - 2025-12-04T11:45:26.2534763Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2535353Z FAILED [0.5303s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2535357Z 2025-12-04T11:45:26.2535434Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2535701Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2535703Z 2025-12-04T11:45:26.2535793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2535870Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2535939Z ================== 1 failed, 187 deselected, 2 rerun in 2.91s ================== 2025-12-04T11:45:26.2535977Z Got exit code 1 2025-12-04T11:45:26.2536017Z Retrying single test... 2025-12-04T11:45:26.2536164Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0870823328cc57cd.xml 2025-12-04T11:45:26.2536223Z ============================= test session starts ============================== 2025-12-04T11:45:26.2536352Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2536394Z cachedir: .pytest_cache 2025-12-04T11:45:26.2536568Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2536615Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2536655Z configfile: pytest.ini 2025-12-04T11:45:26.2536824Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2536899Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2537159Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2537204Z Running 1 items in this shard 2025-12-04T11:45:26.2537206Z 2025-12-04T11:45:26.2537424Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9066s] [100%] 2025-12-04T11:45:26.2537639Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.4868s] [100%] 2025-12-04T11:45:26.2537829Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda FAILED [0.5358s] [100%] 2025-12-04T11:45:26.2537832Z 2025-12-04T11:45:26.2537884Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2538042Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2538089Z Traceback (most recent call last): 2025-12-04T11:45:26.2538250Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2538294Z method(*args, **kwargs) 2025-12-04T11:45:26.2538449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2538491Z method(*args, **kwargs) 2025-12-04T11:45:26.2538645Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2538686Z with policy(): 2025-12-04T11:45:26.2538842Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2538885Z raise RuntimeError(msg) 2025-12-04T11:45:26.2539280Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2539283Z 2025-12-04T11:45:26.2539357Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2539622Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2539635Z 2025-12-04T11:45:26.2539723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2539797Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2539839Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2539898Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2540397Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2540510Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2540547Z graph_break [] 2025-12-04T11:45:26.2540610Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2540685Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2541184Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2541235Z current_size = base.storage().size() 2025-12-04T11:45:26.2541276Z Autotune Choices Stats: 2025-12-04T11:45:26.2541647Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2541692Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2541732Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2541834Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2542086Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2542314Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2542543Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2542774Z triton_mm_3 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2542815Z _scaled_mm 0.0202 ms 29.6% 2025-12-04T11:45:26.2542946Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1040 seconds precompiling for 5 choices 2025-12-04T11:45:26.2543093Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2543139Z Traceback (most recent call last): 2025-12-04T11:45:26.2543338Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2543381Z method(*args, **kwargs) 2025-12-04T11:45:26.2543535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2543590Z method(*args, **kwargs) 2025-12-04T11:45:26.2543743Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2543781Z with policy(): 2025-12-04T11:45:26.2543937Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2543982Z raise RuntimeError(msg) 2025-12-04T11:45:26.2544390Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2544393Z 2025-12-04T11:45:26.2544481Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2544746Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2544749Z 2025-12-04T11:45:26.2544838Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2544913Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2544957Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2545013Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2545504Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2545605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2545643Z graph_break [] 2025-12-04T11:45:26.2545705Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2545779Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2546287Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2546335Z current_size = base.storage().size() 2025-12-04T11:45:26.2546380Z Autotune Choices Stats: 2025-12-04T11:45:26.2546746Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2546791Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2546832Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2546934Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2547172Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2547401Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2547632Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2547868Z triton_mm_3 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2547910Z _scaled_mm 0.0202 ms 29.6% 2025-12-04T11:45:26.2548050Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1040 seconds precompiling for 5 choices 2025-12-04T11:45:26.2548126Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2548168Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2548238Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2548339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2548829Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2548868Z graph_break [] 2025-12-04T11:45:26.2548928Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2549003Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2549043Z Autotune Choices Stats: 2025-12-04T11:45:26.2549407Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2549451Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2549492Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2549590Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2549837Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2550063Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2550290Z triton_mm_6 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2550516Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2550559Z _scaled_mm 0.0085 ms 72.3% 2025-12-04T11:45:26.2550689Z SingleProcess AUTOTUNE benchmarking takes 0.0223 seconds and 0.0846 seconds precompiling for 5 choices 2025-12-04T11:45:26.2550745Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2550890Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2550936Z Traceback (most recent call last): 2025-12-04T11:45:26.2551097Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2551139Z method(*args, **kwargs) 2025-12-04T11:45:26.2551308Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2551348Z method(*args, **kwargs) 2025-12-04T11:45:26.2551501Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2551539Z with policy(): 2025-12-04T11:45:26.2551694Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2551748Z raise RuntimeError(msg) 2025-12-04T11:45:26.2552156Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2552158Z 2025-12-04T11:45:26.2552234Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2552497Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2552500Z 2025-12-04T11:45:26.2552589Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2552663Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2552707Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2552766Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2553304Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2553405Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2553443Z graph_break [] 2025-12-04T11:45:26.2553517Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2553592Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2554085Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2554134Z current_size = base.storage().size() 2025-12-04T11:45:26.2554175Z Autotune Choices Stats: 2025-12-04T11:45:26.2554543Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2554587Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2554628Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2554728Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2554962Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2555190Z triton_mm_1 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2555429Z triton_mm_0 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2555668Z triton_mm_3 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2555709Z _scaled_mm 0.0202 ms 29.6% 2025-12-04T11:45:26.2555838Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1040 seconds precompiling for 5 choices 2025-12-04T11:45:26.2555930Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2555974Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2556030Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2556132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2556618Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2556658Z graph_break [] 2025-12-04T11:45:26.2556717Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2556794Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2556834Z Autotune Choices Stats: 2025-12-04T11:45:26.2557196Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2557241Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2557294Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2557392Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2557623Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2557851Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2558075Z triton_mm_6 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2558302Z triton_mm_5 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2558344Z _scaled_mm 0.0085 ms 72.3% 2025-12-04T11:45:26.2558475Z SingleProcess AUTOTUNE benchmarking takes 0.0223 seconds and 0.0846 seconds precompiling for 5 choices 2025-12-04T11:45:26.2558548Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2558590Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2558647Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2558748Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2559249Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2559288Z graph_break [] 2025-12-04T11:45:26.2559348Z aten_mm_info [('aten._scaled_mm.default_33_16_32', 1)] 2025-12-04T11:45:26.2559431Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2559472Z Autotune Choices Stats: 2025-12-04T11:45:26.2559843Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005760000087320805, "best_triton_pos": 0} 2025-12-04T11:45:26.2559889Z AUTOTUNE scaled_mm(33x32, 32x16, , ) 2025-12-04T11:45:26.2559929Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2560029Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2560263Z triton_mm_8 0.0058 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2560495Z triton_mm_11 0.0061 ms 94.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2560722Z triton_mm_9 0.0062 ms 92.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2560951Z triton_mm_10 0.0063 ms 91.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2561003Z _scaled_mm 0.0184 ms 31.2% 2025-12-04T11:45:26.2561130Z SingleProcess AUTOTUNE benchmarking takes 0.0302 seconds and 0.1737 seconds precompiling for 5 choices 2025-12-04T11:45:26.2561322Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0870823328cc57cd.xml - 2025-12-04T11:45:26.2561382Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2561975Z FAILED [0.5358s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2561979Z 2025-12-04T11:45:26.2562053Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2562318Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2562321Z 2025-12-04T11:45:26.2562410Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2562475Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2562544Z ================== 1 failed, 187 deselected, 2 rerun in 2.95s ================== 2025-12-04T11:45:26.2562580Z Got exit code 1 2025-12-04T11:45:26.2562802Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2562930Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2563077Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95f29abf4672c06.xml 2025-12-04T11:45:26.2563136Z ============================= test session starts ============================== 2025-12-04T11:45:26.2563360Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2563401Z cachedir: .pytest_cache 2025-12-04T11:45:26.2563576Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2563622Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2563662Z configfile: pytest.ini 2025-12-04T11:45:26.2563826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2563904Z collecting ... collected 188 items / 147 deselected / 41 selected 2025-12-04T11:45:26.2563959Z stepcurrent: skipping 147 already run items. 2025-12-04T11:45:26.2564006Z Running 41 items in this shard 2025-12-04T11:45:26.2564008Z 2025-12-04T11:45:26.2564231Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1852s] [ 2%] 2025-12-04T11:45:26.2564449Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7435s] [ 2%] 2025-12-04T11:45:26.2564641Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.6902s] [ 2%] 2025-12-04T11:45:26.2564645Z 2025-12-04T11:45:26.2564696Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2564843Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2564904Z Traceback (most recent call last): 2025-12-04T11:45:26.2565066Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2565107Z method(*args, **kwargs) 2025-12-04T11:45:26.2565263Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2565303Z method(*args, **kwargs) 2025-12-04T11:45:26.2565458Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2565496Z with policy(): 2025-12-04T11:45:26.2565651Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2565691Z raise RuntimeError(msg) 2025-12-04T11:45:26.2566087Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1050673152. 2025-12-04T11:45:26.2566090Z 2025-12-04T11:45:26.2566166Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2566434Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2566436Z 2025-12-04T11:45:26.2566524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2566611Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2566654Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2566712Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2567223Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2567324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2567372Z graph_break [] 2025-12-04T11:45:26.2567436Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2567510Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2568005Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2568054Z current_size = base.storage().size() 2025-12-04T11:45:26.2568095Z Autotune Choices Stats: 2025-12-04T11:45:26.2568471Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2568516Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2568558Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2568659Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2568899Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2569143Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2569373Z triton_mm_7 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2569601Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2569829Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2570059Z triton_mm_10 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2570285Z triton_mm_15 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2570513Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2570750Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2570993Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2571125Z SingleProcess AUTOTUNE benchmarking takes 0.0672 seconds and 0.2843 seconds precompiling for 17 choices 2025-12-04T11:45:26.2571283Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2571330Z Traceback (most recent call last): 2025-12-04T11:45:26.2571490Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2571531Z method(*args, **kwargs) 2025-12-04T11:45:26.2571684Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2571726Z method(*args, **kwargs) 2025-12-04T11:45:26.2571877Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2571917Z with policy(): 2025-12-04T11:45:26.2572070Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2572112Z raise RuntimeError(msg) 2025-12-04T11:45:26.2572504Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1101004800. 2025-12-04T11:45:26.2572508Z 2025-12-04T11:45:26.2572583Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2572858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2572860Z 2025-12-04T11:45:26.2572950Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2573025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2573068Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2573127Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2573659Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2573763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2573800Z graph_break [] 2025-12-04T11:45:26.2573863Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2573938Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2574433Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2574495Z current_size = base.storage().size() 2025-12-04T11:45:26.2574535Z Autotune Choices Stats: 2025-12-04T11:45:26.2574907Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2574966Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2575008Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2575107Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2575357Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2575586Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2575816Z triton_mm_7 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2576047Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2576275Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2576504Z triton_mm_10 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2576746Z triton_mm_15 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2576974Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2577200Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2577426Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2577559Z SingleProcess AUTOTUNE benchmarking takes 0.0672 seconds and 0.2843 seconds precompiling for 17 choices 2025-12-04T11:45:26.2577633Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2577679Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2577736Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2577837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2578325Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('async_compile_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2578378Z graph_break [] 2025-12-04T11:45:26.2578442Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2578517Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2578558Z Autotune Choices Stats: 2025-12-04T11:45:26.2578949Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2578998Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2579045Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2579143Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2579378Z triton_mm_18 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2579613Z triton_mm_19 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2579842Z triton_mm_24 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2580074Z triton_mm_25 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2580302Z triton_mm_21 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2580543Z triton_mm_22 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2580769Z triton_mm_23 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2580996Z triton_mm_26 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2581222Z triton_mm_27 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2581451Z triton_mm_30 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2581582Z SingleProcess AUTOTUNE benchmarking takes 0.0927 seconds and 0.2005 seconds precompiling for 17 choices 2025-12-04T11:45:26.2581636Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2581785Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2581840Z Traceback (most recent call last): 2025-12-04T11:45:26.2581998Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2582040Z method(*args, **kwargs) 2025-12-04T11:45:26.2582195Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2582234Z method(*args, **kwargs) 2025-12-04T11:45:26.2582397Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2582435Z with policy(): 2025-12-04T11:45:26.2582602Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2582643Z raise RuntimeError(msg) 2025-12-04T11:45:26.2583038Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2583043Z 2025-12-04T11:45:26.2583117Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2583423Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2583425Z 2025-12-04T11:45:26.2583514Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2583587Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2583631Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2583688Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2584181Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2584296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2584336Z graph_break [] 2025-12-04T11:45:26.2584398Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2584472Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2584960Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2585008Z current_size = base.storage().size() 2025-12-04T11:45:26.2585049Z Autotune Choices Stats: 2025-12-04T11:45:26.2585420Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2585464Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2585504Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2585604Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2585838Z triton_mm_0 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2586088Z triton_mm_5 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2586333Z triton_mm_7 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2586576Z triton_mm_4 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2586804Z triton_mm_12 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2587032Z triton_mm_10 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2587260Z triton_mm_15 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2587488Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2587713Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2587941Z triton_mm_3 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2588085Z SingleProcess AUTOTUNE benchmarking takes 0.0672 seconds and 0.2843 seconds precompiling for 17 choices 2025-12-04T11:45:26.2588159Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2588201Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2588259Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2588365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2588856Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('async_compile_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2588895Z graph_break [] 2025-12-04T11:45:26.2588957Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2589030Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2589072Z Autotune Choices Stats: 2025-12-04T11:45:26.2589436Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2589492Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2589532Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2589631Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2589862Z triton_mm_18 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2590107Z triton_mm_19 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2590346Z triton_mm_24 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2590580Z triton_mm_25 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2590808Z triton_mm_21 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2591036Z triton_mm_22 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2591264Z triton_mm_23 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2591493Z triton_mm_26 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2591732Z triton_mm_27 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2591962Z triton_mm_30 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2592095Z SingleProcess AUTOTUNE benchmarking takes 0.0927 seconds and 0.2005 seconds precompiling for 17 choices 2025-12-04T11:45:26.2592170Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2592213Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2592270Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2592369Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2592864Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2592901Z graph_break [] 2025-12-04T11:45:26.2592964Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2593038Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2593080Z Autotune Choices Stats: 2025-12-04T11:45:26.2593492Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_39", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005880000069737434, "best_triton_pos": 0} 2025-12-04T11:45:26.2593537Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2593578Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2593690Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2593924Z triton_mm_39 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2594164Z triton_mm_43 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2594395Z triton_mm_44 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2594629Z triton_mm_33 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2594858Z triton_mm_45 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2595086Z triton_mm_47 0.0059 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2595315Z triton_mm_32 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2595554Z triton_mm_34 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2595780Z triton_mm_40 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2596007Z triton_mm_46 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2596137Z SingleProcess AUTOTUNE benchmarking takes 0.1144 seconds and 0.2283 seconds precompiling for 17 choices 2025-12-04T11:45:26.2596330Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-b95f29abf4672c06.xml - 2025-12-04T11:45:26.2596393Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2596994Z FAILED [0.6902s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2597011Z 2025-12-04T11:45:26.2597086Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2597352Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2597355Z 2025-12-04T11:45:26.2597446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2597520Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2597589Z ================== 1 failed, 147 deselected, 2 rerun in 3.64s ================== 2025-12-04T11:45:26.2597627Z Got exit code 1 2025-12-04T11:45:26.2597667Z Retrying single test... 2025-12-04T11:45:26.2597824Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a731e10220d101c2.xml 2025-12-04T11:45:26.2597882Z ============================= test session starts ============================== 2025-12-04T11:45:26.2597996Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2598038Z cachedir: .pytest_cache 2025-12-04T11:45:26.2598199Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2598245Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2598286Z configfile: pytest.ini 2025-12-04T11:45:26.2598450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2598527Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2598787Z stepcurrent: skipping 147 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2598832Z Running 1 items in this shard 2025-12-04T11:45:26.2598835Z 2025-12-04T11:45:26.2599057Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.2283s] [100%] 2025-12-04T11:45:26.2599273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8607s] [100%] 2025-12-04T11:45:26.2599484Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.8050s] [100%] 2025-12-04T11:45:26.2599486Z 2025-12-04T11:45:26.2599539Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2599685Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2599733Z Traceback (most recent call last): 2025-12-04T11:45:26.2599897Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2599938Z method(*args, **kwargs) 2025-12-04T11:45:26.2600094Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2600136Z method(*args, **kwargs) 2025-12-04T11:45:26.2600290Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2600329Z with policy(): 2025-12-04T11:45:26.2600483Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2600524Z raise RuntimeError(msg) 2025-12-04T11:45:26.2600918Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1050673152. 2025-12-04T11:45:26.2600931Z 2025-12-04T11:45:26.2601005Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2601269Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2601271Z 2025-12-04T11:45:26.2601369Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2601443Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2601486Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2601562Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2602057Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2602161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2602198Z graph_break [] 2025-12-04T11:45:26.2602261Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2602334Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2602829Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2602878Z current_size = base.storage().size() 2025-12-04T11:45:26.2602918Z Autotune Choices Stats: 2025-12-04T11:45:26.2603320Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2603381Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2603422Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2603523Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2603762Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2603993Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2604223Z triton_mm_11 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2604451Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2604678Z triton_mm_15 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2604921Z triton_mm_0 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2605149Z triton_mm_3 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2605391Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2605630Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2605859Z triton_mm_14 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2605992Z SingleProcess AUTOTUNE benchmarking takes 0.0614 seconds and 0.2977 seconds precompiling for 17 choices 2025-12-04T11:45:26.2606139Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2606187Z Traceback (most recent call last): 2025-12-04T11:45:26.2606346Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2606387Z method(*args, **kwargs) 2025-12-04T11:45:26.2606542Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2606583Z method(*args, **kwargs) 2025-12-04T11:45:26.2606734Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2606771Z with policy(): 2025-12-04T11:45:26.2606925Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2606976Z raise RuntimeError(msg) 2025-12-04T11:45:26.2607374Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1101004800. 2025-12-04T11:45:26.2607376Z 2025-12-04T11:45:26.2607452Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2607718Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2607722Z 2025-12-04T11:45:26.2607810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2607885Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2607927Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2607985Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2608478Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2608578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2608626Z graph_break [] 2025-12-04T11:45:26.2608687Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2608760Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2609262Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2609309Z current_size = base.storage().size() 2025-12-04T11:45:26.2609350Z Autotune Choices Stats: 2025-12-04T11:45:26.2609732Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2609779Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2609821Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2609921Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2610158Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2610388Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2610617Z triton_mm_11 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2610847Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2611089Z triton_mm_15 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2611317Z triton_mm_0 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2611545Z triton_mm_3 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2611773Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2612003Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2612235Z triton_mm_14 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2612365Z SingleProcess AUTOTUNE benchmarking takes 0.0614 seconds and 0.2977 seconds precompiling for 17 choices 2025-12-04T11:45:26.2612451Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2612493Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2612550Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2612651Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2613167Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2613205Z graph_break [] 2025-12-04T11:45:26.2613320Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2613396Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2613435Z Autotune Choices Stats: 2025-12-04T11:45:26.2613801Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2613848Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2613888Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2613986Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2614223Z triton_mm_25 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2614453Z triton_mm_27 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2614699Z triton_mm_20 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2614927Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2615152Z triton_mm_18 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2615379Z triton_mm_24 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2615605Z triton_mm_26 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2615832Z triton_mm_29 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2616064Z triton_mm_30 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2616305Z triton_mm_21 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2616436Z SingleProcess AUTOTUNE benchmarking takes 0.1000 seconds and 0.2294 seconds precompiling for 17 choices 2025-12-04T11:45:26.2616489Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2616657Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2616702Z Traceback (most recent call last): 2025-12-04T11:45:26.2616873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2616913Z method(*args, **kwargs) 2025-12-04T11:45:26.2617067Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2617107Z method(*args, **kwargs) 2025-12-04T11:45:26.2617259Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2617297Z with policy(): 2025-12-04T11:45:26.2617451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2617491Z raise RuntimeError(msg) 2025-12-04T11:45:26.2617886Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2617888Z 2025-12-04T11:45:26.2617963Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2618227Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2618229Z 2025-12-04T11:45:26.2618328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2618401Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2618445Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2618502Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2618998Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2619097Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2619134Z graph_break [] 2025-12-04T11:45:26.2619195Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2619270Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2619761Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2619809Z current_size = base.storage().size() 2025-12-04T11:45:26.2619849Z Autotune Choices Stats: 2025-12-04T11:45:26.2620220Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2620276Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2620317Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2620417Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2620661Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2620903Z triton_mm_7 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2621131Z triton_mm_11 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2621360Z triton_mm_12 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2621587Z triton_mm_15 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2621818Z triton_mm_0 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2622048Z triton_mm_3 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2622284Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2622512Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2622741Z triton_mm_14 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2622872Z SingleProcess AUTOTUNE benchmarking takes 0.0614 seconds and 0.2977 seconds precompiling for 17 choices 2025-12-04T11:45:26.2622944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2622989Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2623045Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2623146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2623672Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2623709Z graph_break [] 2025-12-04T11:45:26.2623791Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2623864Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2623906Z Autotune Choices Stats: 2025-12-04T11:45:26.2624291Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_25", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2624338Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2624377Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2624490Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2624731Z triton_mm_25 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2624965Z triton_mm_27 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2625194Z triton_mm_20 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2625423Z triton_mm_17 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2625651Z triton_mm_18 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2625877Z triton_mm_24 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2626121Z triton_mm_26 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2626347Z triton_mm_29 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2626577Z triton_mm_30 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2626808Z triton_mm_21 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2626939Z SingleProcess AUTOTUNE benchmarking takes 0.1000 seconds and 0.2294 seconds precompiling for 17 choices 2025-12-04T11:45:26.2627014Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2627055Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2627115Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2627217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2627711Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2627761Z graph_break [] 2025-12-04T11:45:26.2627823Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2627895Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2627953Z Autotune Choices Stats: 2025-12-04T11:45:26.2628331Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_43", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2628379Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2628420Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2628518Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2628753Z triton_mm_43 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2628987Z triton_mm_40 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2629218Z triton_mm_45 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2629447Z triton_mm_46 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2629673Z triton_mm_39 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2629910Z triton_mm_44 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2630138Z triton_mm_47 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2630364Z triton_mm_34 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2630591Z triton_mm_35 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2630822Z triton_mm_42 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2630952Z SingleProcess AUTOTUNE benchmarking takes 0.1178 seconds and 0.2374 seconds precompiling for 17 choices 2025-12-04T11:45:26.2631146Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a731e10220d101c2.xml - 2025-12-04T11:45:26.2631217Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2631818Z FAILED [0.8050s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2631822Z 2025-12-04T11:45:26.2631897Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2632170Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2632172Z 2025-12-04T11:45:26.2632261Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2632325Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2632396Z ================== 1 failed, 187 deselected, 2 rerun in 3.91s ================== 2025-12-04T11:45:26.2632435Z Got exit code 1 2025-12-04T11:45:26.2632475Z Retrying single test... 2025-12-04T11:45:26.2632623Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d6b1676373f99ef1.xml 2025-12-04T11:45:26.2632681Z ============================= test session starts ============================== 2025-12-04T11:45:26.2632793Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2632835Z cachedir: .pytest_cache 2025-12-04T11:45:26.2633000Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2633047Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2633089Z configfile: pytest.ini 2025-12-04T11:45:26.2633291Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2633381Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2633639Z stepcurrent: skipping 147 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2633685Z Running 1 items in this shard 2025-12-04T11:45:26.2633687Z 2025-12-04T11:45:26.2633907Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.2186s] [100%] 2025-12-04T11:45:26.2634125Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7717s] [100%] 2025-12-04T11:45:26.2634318Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7179s] [100%] 2025-12-04T11:45:26.2634321Z 2025-12-04T11:45:26.2634374Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2634520Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2634569Z Traceback (most recent call last): 2025-12-04T11:45:26.2634732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2634773Z method(*args, **kwargs) 2025-12-04T11:45:26.2634929Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2634970Z method(*args, **kwargs) 2025-12-04T11:45:26.2635136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2635173Z with policy(): 2025-12-04T11:45:26.2635330Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2635373Z raise RuntimeError(msg) 2025-12-04T11:45:26.2635786Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1050673152. 2025-12-04T11:45:26.2635791Z 2025-12-04T11:45:26.2635883Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2636147Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2636151Z 2025-12-04T11:45:26.2636237Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2636311Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2636355Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2636412Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2636901Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2637002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2637040Z graph_break [] 2025-12-04T11:45:26.2637103Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2637176Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2637682Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2637729Z current_size = base.storage().size() 2025-12-04T11:45:26.2637772Z Autotune Choices Stats: 2025-12-04T11:45:26.2638147Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:26.2638192Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2638233Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2638335Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2638573Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2638806Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2639036Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2639276Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2639518Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2639769Z triton_mm_8 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2639998Z triton_mm_12 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2640225Z triton_mm_15 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2640452Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2640681Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2640811Z SingleProcess AUTOTUNE benchmarking takes 0.0622 seconds and 0.2855 seconds precompiling for 17 choices 2025-12-04T11:45:26.2640960Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2641006Z Traceback (most recent call last): 2025-12-04T11:45:26.2641161Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2641216Z method(*args, **kwargs) 2025-12-04T11:45:26.2641369Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2641410Z method(*args, **kwargs) 2025-12-04T11:45:26.2641561Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2641600Z with policy(): 2025-12-04T11:45:26.2641756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2641799Z raise RuntimeError(msg) 2025-12-04T11:45:26.2642193Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1101004800. 2025-12-04T11:45:26.2642196Z 2025-12-04T11:45:26.2642271Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2642533Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2642535Z 2025-12-04T11:45:26.2642625Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2642699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2642762Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2642818Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2643338Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2643455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2643493Z graph_break [] 2025-12-04T11:45:26.2643558Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2643645Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2644142Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2644191Z current_size = base.storage().size() 2025-12-04T11:45:26.2644232Z Autotune Choices Stats: 2025-12-04T11:45:26.2644602Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:26.2644648Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2644688Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2644789Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2645026Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2645271Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2645500Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2645728Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2645960Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2646188Z triton_mm_8 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2646415Z triton_mm_12 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2646643Z triton_mm_15 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2646881Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2647106Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2647246Z SingleProcess AUTOTUNE benchmarking takes 0.0622 seconds and 0.2855 seconds precompiling for 17 choices 2025-12-04T11:45:26.2647320Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2647371Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2647429Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2647529Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2648022Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('async_compile_cache_miss', 14), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2648060Z graph_break [] 2025-12-04T11:45:26.2648123Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2648197Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2648238Z Autotune Choices Stats: 2025-12-04T11:45:26.2648745Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_28", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2648790Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2648830Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2648941Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2649178Z triton_mm_28 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2649406Z triton_mm_18 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2649636Z triton_mm_19 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2649863Z triton_mm_22 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2650091Z triton_mm_23 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2650321Z triton_mm_26 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2650549Z triton_mm_27 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2650788Z triton_mm_20 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2651024Z triton_mm_21 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2651263Z triton_mm_24 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2651394Z SingleProcess AUTOTUNE benchmarking takes 0.0823 seconds and 0.1979 seconds precompiling for 17 choices 2025-12-04T11:45:26.2651449Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2651595Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2651643Z Traceback (most recent call last): 2025-12-04T11:45:26.2651800Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2651841Z method(*args, **kwargs) 2025-12-04T11:45:26.2651996Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2652036Z method(*args, **kwargs) 2025-12-04T11:45:26.2652190Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2652227Z with policy(): 2025-12-04T11:45:26.2652382Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2652424Z raise RuntimeError(msg) 2025-12-04T11:45:26.2652823Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2652839Z 2025-12-04T11:45:26.2652915Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2653178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2653180Z 2025-12-04T11:45:26.2653305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2653382Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2653424Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2653481Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2653969Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2654070Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2654109Z graph_break [] 2025-12-04T11:45:26.2654171Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2654244Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2654750Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2654797Z current_size = base.storage().size() 2025-12-04T11:45:26.2654837Z Autotune Choices Stats: 2025-12-04T11:45:26.2655237Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006159000098705292, "best_triton_pos": 0} 2025-12-04T11:45:26.2655281Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2655322Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2655422Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2655657Z triton_mm_1 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2655890Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2656119Z triton_mm_5 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2656351Z triton_mm_9 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2656581Z triton_mm_10 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2656835Z triton_mm_8 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2657067Z triton_mm_12 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2657294Z triton_mm_15 0.0062 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2657521Z triton_mm_2 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2657746Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2657878Z SingleProcess AUTOTUNE benchmarking takes 0.0622 seconds and 0.2855 seconds precompiling for 17 choices 2025-12-04T11:45:26.2657951Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2657994Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2658050Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2658161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2658645Z inductor [('triton_bundler_save_kernel', 136), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('async_compile_cache_miss', 14), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2658695Z graph_break [] 2025-12-04T11:45:26.2658756Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2658833Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2658882Z Autotune Choices Stats: 2025-12-04T11:45:26.2659249Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_28", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2659295Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2659336Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2659437Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2659669Z triton_mm_28 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2659896Z triton_mm_18 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2660124Z triton_mm_19 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2660348Z triton_mm_22 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2660586Z triton_mm_23 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2660814Z triton_mm_26 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2661046Z triton_mm_27 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2661274Z triton_mm_20 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2661502Z triton_mm_21 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2661726Z triton_mm_24 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2661868Z SingleProcess AUTOTUNE benchmarking takes 0.0823 seconds and 0.1979 seconds precompiling for 17 choices 2025-12-04T11:45:26.2661941Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2661984Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2662041Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2662142Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2662652Z inductor [('triton_bundler_save_kernel', 136), ('async_compile_cache_miss', 18), ('benchmarking.InductorBenchmarker.benchmark_gpu', 17), ('generated_module_cache_miss', 16), ('select_algorithm_num_precompiles', 16), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2662689Z graph_break [] 2025-12-04T11:45:26.2662754Z aten_mm_info [('aten._scaled_mm.default_33_2048_32', 1)] 2025-12-04T11:45:26.2662829Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2662870Z Autotune Choices Stats: 2025-12-04T11:45:26.2663239Z {"num_choices": 17, "num_triton_choices": 16, "best_kernel": "triton_mm_43", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2663313Z AUTOTUNE scaled_mm(33x32, 32x2048, , ) 2025-12-04T11:45:26.2663353Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2663453Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2663685Z triton_mm_43 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2663917Z triton_mm_34 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2664158Z triton_mm_39 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2664386Z triton_mm_45 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=32, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2664611Z triton_mm_47 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2664840Z triton_mm_32 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2665073Z triton_mm_35 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2665302Z triton_mm_36 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2665532Z triton_mm_33 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2665773Z triton_mm_37 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=64, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2665903Z SingleProcess AUTOTUNE benchmarking takes 0.1066 seconds and 0.2326 seconds precompiling for 17 choices 2025-12-04T11:45:26.2666112Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d6b1676373f99ef1.xml - 2025-12-04T11:45:26.2666172Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2666775Z FAILED [0.7179s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1101004800 and is now 1151336448. 2025-12-04T11:45:26.2666779Z 2025-12-04T11:45:26.2666854Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2667121Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2667124Z 2025-12-04T11:45:26.2667215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2667278Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2667348Z ================== 1 failed, 187 deselected, 2 rerun in 3.73s ================== 2025-12-04T11:45:26.2667386Z Got exit code 1 2025-12-04T11:45:26.2667597Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2667726Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2667884Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3da400c901119374.xml 2025-12-04T11:45:26.2667941Z ============================= test session starts ============================== 2025-12-04T11:45:26.2668059Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2668100Z cachedir: .pytest_cache 2025-12-04T11:45:26.2668262Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2668308Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2668350Z configfile: pytest.ini 2025-12-04T11:45:26.2668512Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2668590Z collecting ... collected 188 items / 148 deselected / 40 selected 2025-12-04T11:45:26.2668644Z stepcurrent: skipping 148 already run items. 2025-12-04T11:45:26.2668691Z Running 40 items in this shard 2025-12-04T11:45:26.2668693Z 2025-12-04T11:45:26.2668918Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9610s] [ 2%] 2025-12-04T11:45:26.2669136Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5181s] [ 2%] 2025-12-04T11:45:26.2669328Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6242s] [ 2%] 2025-12-04T11:45:26.2669343Z 2025-12-04T11:45:26.2669394Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2669542Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2669588Z Traceback (most recent call last): 2025-12-04T11:45:26.2669751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2669791Z method(*args, **kwargs) 2025-12-04T11:45:26.2669958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2669999Z method(*args, **kwargs) 2025-12-04T11:45:26.2670169Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2670206Z with policy(): 2025-12-04T11:45:26.2670362Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2670404Z raise RuntimeError(msg) 2025-12-04T11:45:26.2670795Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2670798Z 2025-12-04T11:45:26.2670873Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2671135Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2671139Z 2025-12-04T11:45:26.2671226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2671299Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2671343Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2671402Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2671897Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2672007Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2672049Z graph_break [] 2025-12-04T11:45:26.2672111Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2672185Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2672675Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2672725Z current_size = base.storage().size() 2025-12-04T11:45:26.2672765Z Autotune Choices Stats: 2025-12-04T11:45:26.2673143Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2673189Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2673233Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2673374Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2673614Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2673863Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2674105Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2674331Z triton_mm_0 0.0074 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2674376Z _scaled_mm 0.0197 ms 30.7% 2025-12-04T11:45:26.2674507Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1429 seconds precompiling for 5 choices 2025-12-04T11:45:26.2674653Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2674701Z Traceback (most recent call last): 2025-12-04T11:45:26.2674860Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2674901Z method(*args, **kwargs) 2025-12-04T11:45:26.2675055Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2675096Z method(*args, **kwargs) 2025-12-04T11:45:26.2675247Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2675286Z with policy(): 2025-12-04T11:45:26.2675439Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2675494Z raise RuntimeError(msg) 2025-12-04T11:45:26.2675890Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2675894Z 2025-12-04T11:45:26.2675970Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2676239Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2676242Z 2025-12-04T11:45:26.2676330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2676406Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2676449Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2676510Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2676997Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2677099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2677136Z graph_break [] 2025-12-04T11:45:26.2677219Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2677292Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2677780Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2677841Z current_size = base.storage().size() 2025-12-04T11:45:26.2677881Z Autotune Choices Stats: 2025-12-04T11:45:26.2678264Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2678313Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2678355Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2678455Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2678695Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2678928Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2679157Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2679384Z triton_mm_0 0.0074 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2679438Z _scaled_mm 0.0197 ms 30.7% 2025-12-04T11:45:26.2679569Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1429 seconds precompiling for 5 choices 2025-12-04T11:45:26.2679644Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2679687Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2679743Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2679845Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2680332Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2680373Z graph_break [] 2025-12-04T11:45:26.2680433Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2680507Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2680548Z Autotune Choices Stats: 2025-12-04T11:45:26.2680913Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2680959Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2681013Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2681114Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2681347Z triton_mm_6 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2681587Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2681829Z triton_mm_7 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2682057Z triton_mm_4 0.0075 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2682098Z _scaled_mm 0.0186 ms 33.3% 2025-12-04T11:45:26.2682229Z SingleProcess AUTOTUNE benchmarking takes 0.0273 seconds and 0.1015 seconds precompiling for 5 choices 2025-12-04T11:45:26.2682283Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2682428Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2682477Z Traceback (most recent call last): 2025-12-04T11:45:26.2682635Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2682678Z method(*args, **kwargs) 2025-12-04T11:45:26.2682834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2682875Z method(*args, **kwargs) 2025-12-04T11:45:26.2683027Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2683063Z with policy(): 2025-12-04T11:45:26.2683230Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2683300Z raise RuntimeError(msg) 2025-12-04T11:45:26.2683696Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2683699Z 2025-12-04T11:45:26.2683775Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2684036Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2684039Z 2025-12-04T11:45:26.2684128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2684203Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2684248Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2684305Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2684793Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2684906Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2684944Z graph_break [] 2025-12-04T11:45:26.2685003Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2685077Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2685587Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2685635Z current_size = base.storage().size() 2025-12-04T11:45:26.2685690Z Autotune Choices Stats: 2025-12-04T11:45:26.2686064Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2686115Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2686159Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2686260Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2686498Z triton_mm_1 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2686733Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2686960Z triton_mm_2 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2687188Z triton_mm_0 0.0074 ms 82.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2687243Z _scaled_mm 0.0197 ms 30.7% 2025-12-04T11:45:26.2687373Z SingleProcess AUTOTUNE benchmarking takes 0.0236 seconds and 0.1429 seconds precompiling for 5 choices 2025-12-04T11:45:26.2687447Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2687490Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2687549Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2687651Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2688140Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2688178Z graph_break [] 2025-12-04T11:45:26.2688241Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2688315Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2688356Z Autotune Choices Stats: 2025-12-04T11:45:26.2688715Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_6", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006200000178068876, "best_triton_pos": 0} 2025-12-04T11:45:26.2688774Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2688815Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2688914Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2689159Z triton_mm_6 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2689401Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2689632Z triton_mm_7 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2689859Z triton_mm_4 0.0075 ms 82.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2689902Z _scaled_mm 0.0186 ms 33.3% 2025-12-04T11:45:26.2690031Z SingleProcess AUTOTUNE benchmarking takes 0.0273 seconds and 0.1015 seconds precompiling for 5 choices 2025-12-04T11:45:26.2690104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2690146Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2690204Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2690304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2690789Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2690840Z graph_break [] 2025-12-04T11:45:26.2690901Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2690973Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2691016Z Autotune Choices Stats: 2025-12-04T11:45:26.2691391Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2691440Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2691482Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2691579Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2691816Z triton_mm_11 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2692046Z triton_mm_9 0.0061 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2692275Z triton_mm_10 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2692511Z triton_mm_8 0.0073 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2692552Z _scaled_mm 0.0198 ms 30.6% 2025-12-04T11:45:26.2692682Z SingleProcess AUTOTUNE benchmarking takes 0.0313 seconds and 0.2205 seconds precompiling for 5 choices 2025-12-04T11:45:26.2692885Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-3da400c901119374.xml - 2025-12-04T11:45:26.2692945Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2693594Z FAILED [0.6242s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2693598Z 2025-12-04T11:45:26.2693673Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2693936Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2693938Z 2025-12-04T11:45:26.2694028Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2694090Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2694160Z ================== 1 failed, 148 deselected, 2 rerun in 3.12s ================== 2025-12-04T11:45:26.2694197Z Got exit code 1 2025-12-04T11:45:26.2694238Z Retrying single test... 2025-12-04T11:45:26.2694385Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e3b78bd533c52247.xml 2025-12-04T11:45:26.2694446Z ============================= test session starts ============================== 2025-12-04T11:45:26.2694559Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2694614Z cachedir: .pytest_cache 2025-12-04T11:45:26.2694774Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2694823Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2694865Z configfile: pytest.ini 2025-12-04T11:45:26.2695030Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2695106Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2695364Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2695411Z Running 1 items in this shard 2025-12-04T11:45:26.2695413Z 2025-12-04T11:45:26.2695632Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.9914s] [100%] 2025-12-04T11:45:26.2695852Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.5722s] [100%] 2025-12-04T11:45:26.2696044Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.6346s] [100%] 2025-12-04T11:45:26.2696047Z 2025-12-04T11:45:26.2696098Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2696242Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2696303Z Traceback (most recent call last): 2025-12-04T11:45:26.2696462Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2696508Z method(*args, **kwargs) 2025-12-04T11:45:26.2696661Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2696719Z method(*args, **kwargs) 2025-12-04T11:45:26.2696873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2696912Z with policy(): 2025-12-04T11:45:26.2697079Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2697122Z raise RuntimeError(msg) 2025-12-04T11:45:26.2697515Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2697520Z 2025-12-04T11:45:26.2697594Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2697857Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2697859Z 2025-12-04T11:45:26.2697948Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2698025Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2698069Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2698129Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2698619Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2698732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2698769Z graph_break [] 2025-12-04T11:45:26.2698832Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2698904Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2699399Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2699449Z current_size = base.storage().size() 2025-12-04T11:45:26.2699490Z Autotune Choices Stats: 2025-12-04T11:45:26.2699866Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2699914Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2699961Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2700060Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2700299Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2700549Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2700788Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2701027Z triton_mm_0 0.0073 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2701071Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:26.2701200Z SingleProcess AUTOTUNE benchmarking takes 0.0235 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:26.2701345Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2701392Z Traceback (most recent call last): 2025-12-04T11:45:26.2701550Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2701592Z method(*args, **kwargs) 2025-12-04T11:45:26.2701747Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2701789Z method(*args, **kwargs) 2025-12-04T11:45:26.2701942Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2701981Z with policy(): 2025-12-04T11:45:26.2702136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2702177Z raise RuntimeError(msg) 2025-12-04T11:45:26.2702577Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2702591Z 2025-12-04T11:45:26.2702668Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2702931Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2702933Z 2025-12-04T11:45:26.2703023Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2703099Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2703145Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2703202Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2703714Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2703816Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2703854Z graph_break [] 2025-12-04T11:45:26.2703916Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2703989Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2704497Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2704545Z current_size = base.storage().size() 2025-12-04T11:45:26.2704588Z Autotune Choices Stats: 2025-12-04T11:45:26.2704992Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2705040Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2705083Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2705183Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2705417Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2705651Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2705879Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2706103Z triton_mm_0 0.0073 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2706146Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:26.2706274Z SingleProcess AUTOTUNE benchmarking takes 0.0235 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:26.2706362Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2706405Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2706463Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2706564Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2707052Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2707089Z graph_break [] 2025-12-04T11:45:26.2707152Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2707225Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2707268Z Autotune Choices Stats: 2025-12-04T11:45:26.2707636Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2707684Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2707726Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2707825Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2708071Z triton_mm_7 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2708301Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2708542Z triton_mm_6 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2708778Z triton_mm_4 0.0074 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2708824Z _scaled_mm 0.0177 ms 33.7% 2025-12-04T11:45:26.2708953Z SingleProcess AUTOTUNE benchmarking takes 0.0203 seconds and 0.1181 seconds precompiling for 5 choices 2025-12-04T11:45:26.2709009Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2709154Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2709202Z Traceback (most recent call last): 2025-12-04T11:45:26.2709363Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2709403Z method(*args, **kwargs) 2025-12-04T11:45:26.2709560Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2709601Z method(*args, **kwargs) 2025-12-04T11:45:26.2709757Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2709795Z with policy(): 2025-12-04T11:45:26.2709950Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2710002Z raise RuntimeError(msg) 2025-12-04T11:45:26.2710398Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2710401Z 2025-12-04T11:45:26.2710475Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2710741Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2710745Z 2025-12-04T11:45:26.2710834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2710908Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2710952Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2711012Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2711497Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2711600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2711649Z graph_break [] 2025-12-04T11:45:26.2711710Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2711784Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2712284Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2712333Z current_size = base.storage().size() 2025-12-04T11:45:26.2712374Z Autotune Choices Stats: 2025-12-04T11:45:26.2712755Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2712803Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2712846Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2712944Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2713183Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2713451Z triton_mm_3 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2713679Z triton_mm_2 0.0061 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2713908Z triton_mm_0 0.0073 ms 83.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2713972Z _scaled_mm 0.0228 ms 26.7% 2025-12-04T11:45:26.2714101Z SingleProcess AUTOTUNE benchmarking takes 0.0235 seconds and 0.1370 seconds precompiling for 5 choices 2025-12-04T11:45:26.2714174Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2714218Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2714275Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2714376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2714863Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2714906Z graph_break [] 2025-12-04T11:45:26.2714965Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2715039Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2715081Z Autotune Choices Stats: 2025-12-04T11:45:26.2715451Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_7", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2715512Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2715553Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2715654Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2715885Z triton_mm_7 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2716130Z triton_mm_5 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2716368Z triton_mm_6 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2716597Z triton_mm_4 0.0074 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2716638Z _scaled_mm 0.0177 ms 33.7% 2025-12-04T11:45:26.2716769Z SingleProcess AUTOTUNE benchmarking takes 0.0203 seconds and 0.1181 seconds precompiling for 5 choices 2025-12-04T11:45:26.2716841Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2716887Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2716943Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2717045Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2717528Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2717566Z graph_break [] 2025-12-04T11:45:26.2717630Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2717718Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2717760Z Autotune Choices Stats: 2025-12-04T11:45:26.2718127Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039000116288662, "best_triton_pos": 0} 2025-12-04T11:45:26.2718173Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2718213Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2718315Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2718551Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2718784Z triton_mm_9 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2719013Z triton_mm_10 0.0062 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2719240Z triton_mm_8 0.0077 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2719293Z _scaled_mm 0.0191 ms 31.6% 2025-12-04T11:45:26.2719421Z SingleProcess AUTOTUNE benchmarking takes 0.0288 seconds and 0.2148 seconds precompiling for 5 choices 2025-12-04T11:45:26.2719617Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e3b78bd533c52247.xml - 2025-12-04T11:45:26.2719677Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2720290Z FAILED [0.6346s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2720294Z 2025-12-04T11:45:26.2720368Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2720633Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2720636Z 2025-12-04T11:45:26.2720725Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2720789Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2720859Z ================== 1 failed, 187 deselected, 2 rerun in 3.22s ================== 2025-12-04T11:45:26.2720898Z Got exit code 1 2025-12-04T11:45:26.2720940Z Retrying single test... 2025-12-04T11:45:26.2721087Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d36f74b2c32fe994.xml 2025-12-04T11:45:26.2721146Z ============================= test session starts ============================== 2025-12-04T11:45:26.2721257Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2721298Z cachedir: .pytest_cache 2025-12-04T11:45:26.2721469Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2721516Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2721558Z configfile: pytest.ini 2025-12-04T11:45:26.2721724Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2721799Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2722062Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2722106Z Running 1 items in this shard 2025-12-04T11:45:26.2722108Z 2025-12-04T11:45:26.2722327Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0885s] [100%] 2025-12-04T11:45:26.2722543Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.7478s] [100%] 2025-12-04T11:45:26.2722735Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda FAILED [0.7524s] [100%] 2025-12-04T11:45:26.2722737Z 2025-12-04T11:45:26.2722788Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2722933Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2722996Z Traceback (most recent call last): 2025-12-04T11:45:26.2723158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2723201Z method(*args, **kwargs) 2025-12-04T11:45:26.2723381Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2723423Z method(*args, **kwargs) 2025-12-04T11:45:26.2723593Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2723633Z with policy(): 2025-12-04T11:45:26.2723788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2723842Z raise RuntimeError(msg) 2025-12-04T11:45:26.2724242Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1025507328. 2025-12-04T11:45:26.2724245Z 2025-12-04T11:45:26.2724325Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2724590Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2724592Z 2025-12-04T11:45:26.2724683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2724755Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2724802Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2724859Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2725350Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2725465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2725502Z graph_break [] 2025-12-04T11:45:26.2725564Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2725637Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2726127Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2726175Z current_size = base.storage().size() 2025-12-04T11:45:26.2726217Z Autotune Choices Stats: 2025-12-04T11:45:26.2726595Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2726646Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2726687Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2726790Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2727030Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2727273Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2727514Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2727740Z triton_mm_0 0.0075 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2727803Z _scaled_mm 0.0162 ms 36.5% 2025-12-04T11:45:26.2727933Z SingleProcess AUTOTUNE benchmarking takes 0.0250 seconds and 0.1345 seconds precompiling for 5 choices 2025-12-04T11:45:26.2728080Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2728127Z Traceback (most recent call last): 2025-12-04T11:45:26.2728287Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2728328Z method(*args, **kwargs) 2025-12-04T11:45:26.2728485Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2728524Z method(*args, **kwargs) 2025-12-04T11:45:26.2728678Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2728715Z with policy(): 2025-12-04T11:45:26.2728873Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2728914Z raise RuntimeError(msg) 2025-12-04T11:45:26.2729307Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1025507328 and is now 1050673152. 2025-12-04T11:45:26.2729320Z 2025-12-04T11:45:26.2729395Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2731195Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2731197Z 2025-12-04T11:45:26.2731289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2731365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2731410Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2731469Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2731949Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2732049Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2732087Z graph_break [] 2025-12-04T11:45:26.2732148Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2732226Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2732714Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2732779Z current_size = base.storage().size() 2025-12-04T11:45:26.2732819Z Autotune Choices Stats: 2025-12-04T11:45:26.2733202Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2733298Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2733340Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2733442Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2733677Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2733903Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2734129Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2734354Z triton_mm_0 0.0075 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2734396Z _scaled_mm 0.0162 ms 36.5% 2025-12-04T11:45:26.2734525Z SingleProcess AUTOTUNE benchmarking takes 0.0250 seconds and 0.1345 seconds precompiling for 5 choices 2025-12-04T11:45:26.2734600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2734659Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2734716Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2734818Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2735296Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2735335Z graph_break [] 2025-12-04T11:45:26.2735397Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2735470Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2735510Z Autotune Choices Stats: 2025-12-04T11:45:26.2735874Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2735922Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2735963Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2736064Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2736297Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2736540Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2736781Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2737016Z triton_mm_4 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2737059Z _scaled_mm 0.0229 ms 26.6% 2025-12-04T11:45:26.2737188Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1232 seconds precompiling for 5 choices 2025-12-04T11:45:26.2737242Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2737386Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2737434Z Traceback (most recent call last): 2025-12-04T11:45:26.2737591Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2737634Z method(*args, **kwargs) 2025-12-04T11:45:26.2737788Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2737830Z method(*args, **kwargs) 2025-12-04T11:45:26.2737981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2738019Z with policy(): 2025-12-04T11:45:26.2738173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2738215Z raise RuntimeError(msg) 2025-12-04T11:45:26.2738603Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2738617Z 2025-12-04T11:45:26.2738693Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2738956Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2738960Z 2025-12-04T11:45:26.2739048Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2739124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2739168Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2739225Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2739709Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2739810Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2739849Z graph_break [] 2025-12-04T11:45:26.2739910Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2739983Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2740481Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2740528Z current_size = base.storage().size() 2025-12-04T11:45:26.2740580Z Autotune Choices Stats: 2025-12-04T11:45:26.2740958Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2741005Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2741048Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2741147Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2741382Z triton_mm_3 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2741607Z triton_mm_2 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2741836Z triton_mm_1 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2742058Z triton_mm_0 0.0075 ms 78.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2742102Z _scaled_mm 0.0162 ms 36.5% 2025-12-04T11:45:26.2742240Z SingleProcess AUTOTUNE benchmarking takes 0.0250 seconds and 0.1345 seconds precompiling for 5 choices 2025-12-04T11:45:26.2742314Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2742356Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2742415Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2742515Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2742996Z inductor [('triton_bundler_save_kernel', 40), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2743036Z graph_break [] 2025-12-04T11:45:26.2743097Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2743171Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2743211Z Autotune Choices Stats: 2025-12-04T11:45:26.2743613Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_5", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2743659Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2743700Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2743813Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2744046Z triton_mm_5 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2744295Z triton_mm_7 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2744520Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2744756Z triton_mm_4 0.0075 ms 81.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2744799Z _scaled_mm 0.0229 ms 26.6% 2025-12-04T11:45:26.2744926Z SingleProcess AUTOTUNE benchmarking takes 0.0237 seconds and 0.1232 seconds precompiling for 5 choices 2025-12-04T11:45:26.2744999Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2745042Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2745099Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2745201Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2745679Z inductor [('triton_bundler_save_kernel', 40), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 5), ('generated_module_cache_miss', 4), ('select_algorithm_num_precompiles', 4), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2745719Z graph_break [] 2025-12-04T11:45:26.2745778Z aten_mm_info [('aten._scaled_mm.default_3_16_1024', 1)] 2025-12-04T11:45:26.2745852Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2745905Z Autotune Choices Stats: 2025-12-04T11:45:26.2746265Z {"num_choices": 5, "num_triton_choices": 4, "best_kernel": "triton_mm_11", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005998999811708927, "best_triton_pos": 0} 2025-12-04T11:45:26.2746310Z AUTOTUNE scaled_mm(3x1024, 1024x16, , ) 2025-12-04T11:45:26.2746352Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2746450Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2746683Z triton_mm_11 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2746910Z triton_mm_9 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2747138Z triton_mm_10 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2747365Z triton_mm_8 0.0074 ms 80.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2747405Z _scaled_mm 0.0209 ms 28.7% 2025-12-04T11:45:26.2747545Z SingleProcess AUTOTUNE benchmarking takes 0.0344 seconds and 0.2193 seconds precompiling for 5 choices 2025-12-04T11:45:26.2747736Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d36f74b2c32fe994.xml - 2025-12-04T11:45:26.2747798Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2748403Z FAILED [0.7524s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1050673152 and is now 1075838976. 2025-12-04T11:45:26.2748407Z 2025-12-04T11:45:26.2748480Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2748745Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2748749Z 2025-12-04T11:45:26.2748836Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2748899Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2748968Z ================== 1 failed, 187 deselected, 2 rerun in 3.61s ================== 2025-12-04T11:45:26.2749008Z Got exit code 1 2025-12-04T11:45:26.2749218Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2749345Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2749490Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-14ed5193b1cb07ff.xml 2025-12-04T11:45:26.2749549Z ============================= test session starts ============================== 2025-12-04T11:45:26.2749661Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2749716Z cachedir: .pytest_cache 2025-12-04T11:45:26.2749874Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2749922Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2749963Z configfile: pytest.ini 2025-12-04T11:45:26.2750125Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2750204Z collecting ... collected 188 items / 149 deselected / 39 selected 2025-12-04T11:45:26.2750261Z stepcurrent: skipping 149 already run items. 2025-12-04T11:45:26.2750306Z Running 39 items in this shard 2025-12-04T11:45:26.2750308Z 2025-12-04T11:45:26.2750532Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4284s] [ 2%] 2025-12-04T11:45:26.2750753Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8655s] [ 2%] 2025-12-04T11:45:26.2750948Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.8677s] [ 2%] 2025-12-04T11:45:26.2750950Z 2025-12-04T11:45:26.2751002Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2751152Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2751201Z Traceback (most recent call last): 2025-12-04T11:45:26.2751372Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2751415Z method(*args, **kwargs) 2025-12-04T11:45:26.2751569Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2751611Z method(*args, **kwargs) 2025-12-04T11:45:26.2751774Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2751812Z with policy(): 2025-12-04T11:45:26.2751966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2752007Z raise RuntimeError(msg) 2025-12-04T11:45:26.2752412Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.2752415Z 2025-12-04T11:45:26.2752492Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2752755Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2752758Z 2025-12-04T11:45:26.2752847Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2752921Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2752963Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2753021Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2753545Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2753659Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2753696Z graph_break [] 2025-12-04T11:45:26.2753762Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2753835Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2754324Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2754372Z current_size = base.storage().size() 2025-12-04T11:45:26.2754414Z Autotune Choices Stats: 2025-12-04T11:45:26.2754783Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2754833Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2754875Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2754975Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2755212Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2755455Z triton_mm_16 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2755699Z triton_mm_7 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2755924Z triton_mm_12 0.0070 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2756174Z triton_mm_9 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2756405Z triton_mm_6 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2756632Z triton_mm_14 0.0081 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2756858Z triton_mm_10 0.0084 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2757082Z triton_mm_5 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2757313Z triton_mm_18 0.0095 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2757456Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3422 seconds precompiling for 20 choices 2025-12-04T11:45:26.2757608Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2757654Z Traceback (most recent call last): 2025-12-04T11:45:26.2757812Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2757853Z method(*args, **kwargs) 2025-12-04T11:45:26.2758007Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2758049Z method(*args, **kwargs) 2025-12-04T11:45:26.2758202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2758240Z with policy(): 2025-12-04T11:45:26.2758395Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2758437Z raise RuntimeError(msg) 2025-12-04T11:45:26.2758831Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.2758835Z 2025-12-04T11:45:26.2758910Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2759170Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2759190Z 2025-12-04T11:45:26.2759279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2759354Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2759397Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2759454Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2759962Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2760065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2760102Z graph_break [] 2025-12-04T11:45:26.2760165Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2760237Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2760729Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2760776Z current_size = base.storage().size() 2025-12-04T11:45:26.2760818Z Autotune Choices Stats: 2025-12-04T11:45:26.2761184Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2761233Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2761285Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2761386Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2761620Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2761850Z triton_mm_16 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2762082Z triton_mm_7 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2762305Z triton_mm_12 0.0070 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2762539Z triton_mm_9 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2762771Z triton_mm_6 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2763007Z triton_mm_14 0.0081 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2763229Z triton_mm_10 0.0084 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2763497Z triton_mm_5 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2763740Z triton_mm_18 0.0095 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2763870Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3422 seconds precompiling for 20 choices 2025-12-04T11:45:26.2763944Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2763986Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2764045Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2764144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2764636Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2764675Z graph_break [] 2025-12-04T11:45:26.2764740Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2764813Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2764854Z Autotune Choices Stats: 2025-12-04T11:45:26.2765218Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2765279Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2765321Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2765419Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2765654Z triton_mm_36 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2765884Z triton_mm_35 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2766115Z triton_mm_26 0.0067 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2766338Z triton_mm_31 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2766569Z triton_mm_25 0.0075 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2766814Z triton_mm_28 0.0078 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2767050Z triton_mm_33 0.0081 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2767284Z triton_mm_29 0.0083 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2767507Z triton_mm_24 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2767737Z triton_mm_37 0.0088 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2767868Z SingleProcess AUTOTUNE benchmarking takes 0.1205 seconds and 0.2576 seconds precompiling for 20 choices 2025-12-04T11:45:26.2767923Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2768070Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2768117Z Traceback (most recent call last): 2025-12-04T11:45:26.2768276Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2768318Z method(*args, **kwargs) 2025-12-04T11:45:26.2768472Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2768512Z method(*args, **kwargs) 2025-12-04T11:45:26.2768664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2768712Z with policy(): 2025-12-04T11:45:26.2768868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2768911Z raise RuntimeError(msg) 2025-12-04T11:45:26.2769310Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2769312Z 2025-12-04T11:45:26.2769387Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2769650Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2769653Z 2025-12-04T11:45:26.2769739Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2769813Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2769856Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2769914Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2770398Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2770507Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2770545Z graph_break [] 2025-12-04T11:45:26.2770609Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2770683Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2771189Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2771237Z current_size = base.storage().size() 2025-12-04T11:45:26.2771277Z Autotune Choices Stats: 2025-12-04T11:45:26.2771647Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2771694Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2771736Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2771836Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2772074Z triton_mm_17 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2772303Z triton_mm_16 0.0063 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2772532Z triton_mm_7 0.0068 ms 91.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2772777Z triton_mm_12 0.0070 ms 89.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2773006Z triton_mm_9 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2773238Z triton_mm_6 0.0074 ms 84.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2773488Z triton_mm_14 0.0081 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2773717Z triton_mm_10 0.0084 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2773942Z triton_mm_5 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2774171Z triton_mm_18 0.0095 ms 66.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2774317Z SingleProcess AUTOTUNE benchmarking takes 0.0870 seconds and 0.3422 seconds precompiling for 20 choices 2025-12-04T11:45:26.2774391Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2774435Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2774491Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2774604Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2775103Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2775144Z graph_break [] 2025-12-04T11:45:26.2775206Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2775282Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2775324Z Autotune Choices Stats: 2025-12-04T11:45:26.2775687Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2775734Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2775777Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2775877Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2776108Z triton_mm_36 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2776337Z triton_mm_35 0.0065 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2776580Z triton_mm_26 0.0067 ms 93.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2776807Z triton_mm_31 0.0068 ms 92.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2777035Z triton_mm_25 0.0075 ms 84.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2777265Z triton_mm_28 0.0078 ms 80.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2777492Z triton_mm_33 0.0081 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2777716Z triton_mm_29 0.0083 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2777950Z triton_mm_24 0.0086 ms 73.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2778178Z triton_mm_37 0.0088 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2778320Z SingleProcess AUTOTUNE benchmarking takes 0.1205 seconds and 0.2576 seconds precompiling for 20 choices 2025-12-04T11:45:26.2778393Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2778436Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2778503Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2778603Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2779092Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2779130Z graph_break [] 2025-12-04T11:45:26.2779193Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2779268Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2779309Z Autotune Choices Stats: 2025-12-04T11:45:26.2779670Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2779719Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2779759Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2779858Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2780102Z triton_mm_55 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2780332Z triton_mm_54 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2780560Z triton_mm_45 0.0065 ms 91.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2780786Z triton_mm_50 0.0066 ms 89.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2780830Z _scaled_mm 0.0072 ms 82.3% 2025-12-04T11:45:26.2781059Z triton_mm_47 0.0073 ms 81.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2781292Z triton_mm_44 0.0074 ms 80.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2781517Z triton_mm_52 0.0078 ms 76.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2781752Z triton_mm_48 0.0079 ms 75.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2781991Z triton_mm_43 0.0083 ms 72.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2782121Z SingleProcess AUTOTUNE benchmarking takes 0.1358 seconds and 0.2431 seconds precompiling for 20 choices 2025-12-04T11:45:26.2782324Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-14ed5193b1cb07ff.xml - 2025-12-04T11:45:26.2782385Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2782982Z FAILED [0.8677s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2782985Z 2025-12-04T11:45:26.2783060Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2783357Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2783359Z 2025-12-04T11:45:26.2783449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2783514Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2783583Z ================== 1 failed, 149 deselected, 2 rerun in 4.18s ================== 2025-12-04T11:45:26.2783620Z Got exit code 1 2025-12-04T11:45:26.2783676Z Retrying single test... 2025-12-04T11:45:26.2783822Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a11c50adadea2884.xml 2025-12-04T11:45:26.2783880Z ============================= test session starts ============================== 2025-12-04T11:45:26.2783993Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2784034Z cachedir: .pytest_cache 2025-12-04T11:45:26.2784196Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2784242Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2784282Z configfile: pytest.ini 2025-12-04T11:45:26.2784445Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2784521Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2784782Z stepcurrent: skipping 149 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2784825Z Running 1 items in this shard 2025-12-04T11:45:26.2784827Z 2025-12-04T11:45:26.2785050Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.5335s] [100%] 2025-12-04T11:45:26.2785268Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9659s] [100%] 2025-12-04T11:45:26.2785476Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.8157s] [100%] 2025-12-04T11:45:26.2785478Z 2025-12-04T11:45:26.2785530Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2785679Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2785725Z Traceback (most recent call last): 2025-12-04T11:45:26.2785898Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2785941Z method(*args, **kwargs) 2025-12-04T11:45:26.2786112Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2786154Z method(*args, **kwargs) 2025-12-04T11:45:26.2786304Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2786343Z with policy(): 2025-12-04T11:45:26.2786495Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2786539Z raise RuntimeError(msg) 2025-12-04T11:45:26.2786930Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.2786932Z 2025-12-04T11:45:26.2787008Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2787270Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2787273Z 2025-12-04T11:45:26.2787362Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2787435Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2787490Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2787547Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2788037Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2788139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2788176Z graph_break [] 2025-12-04T11:45:26.2788241Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2788314Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2788802Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2788849Z current_size = base.storage().size() 2025-12-04T11:45:26.2788891Z Autotune Choices Stats: 2025-12-04T11:45:26.2789258Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2789317Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2789358Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2789460Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2789708Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2789942Z triton_mm_17 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2790189Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2790417Z triton_mm_12 0.0064 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2790649Z triton_mm_6 0.0069 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2790877Z triton_mm_9 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2791102Z triton_mm_14 0.0077 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2791327Z triton_mm_10 0.0079 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2791562Z triton_mm_5 0.0082 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2791792Z triton_mm_18 0.0090 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2791921Z SingleProcess AUTOTUNE benchmarking takes 0.0804 seconds and 0.3654 seconds precompiling for 20 choices 2025-12-04T11:45:26.2792069Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2792114Z Traceback (most recent call last): 2025-12-04T11:45:26.2792273Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2792315Z method(*args, **kwargs) 2025-12-04T11:45:26.2792470Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2792510Z method(*args, **kwargs) 2025-12-04T11:45:26.2792663Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2792701Z with policy(): 2025-12-04T11:45:26.2792855Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2792896Z raise RuntimeError(msg) 2025-12-04T11:45:26.2793339Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.2793343Z 2025-12-04T11:45:26.2793417Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2793694Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2793696Z 2025-12-04T11:45:26.2793785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2793872Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2793918Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2793974Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2794460Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2794561Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2794598Z graph_break [] 2025-12-04T11:45:26.2794660Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2794736Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2795220Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2795282Z current_size = base.storage().size() 2025-12-04T11:45:26.2795323Z Autotune Choices Stats: 2025-12-04T11:45:26.2795690Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2795738Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2795780Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2795880Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2796114Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2796345Z triton_mm_17 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2796577Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2796806Z triton_mm_12 0.0064 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2797048Z triton_mm_6 0.0069 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2797277Z triton_mm_9 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2797512Z triton_mm_14 0.0077 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2797747Z triton_mm_10 0.0079 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2797973Z triton_mm_5 0.0082 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2798202Z triton_mm_18 0.0090 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2798333Z SingleProcess AUTOTUNE benchmarking takes 0.0804 seconds and 0.3654 seconds precompiling for 20 choices 2025-12-04T11:45:26.2798407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2798449Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2798507Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2798607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2799095Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2799143Z graph_break [] 2025-12-04T11:45:26.2799205Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2799280Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2799320Z Autotune Choices Stats: 2025-12-04T11:45:26.2799684Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:26.2799732Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2799773Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2799872Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2800107Z triton_mm_36 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2800336Z triton_mm_35 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2800568Z triton_mm_26 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2800805Z triton_mm_31 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2801056Z triton_mm_25 0.0069 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2801294Z triton_mm_28 0.0071 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2801518Z triton_mm_29 0.0077 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2801743Z triton_mm_33 0.0078 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2801970Z triton_mm_24 0.0081 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2802199Z triton_mm_37 0.0095 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2802327Z SingleProcess AUTOTUNE benchmarking takes 0.1148 seconds and 0.2591 seconds precompiling for 20 choices 2025-12-04T11:45:26.2802381Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2802528Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2802575Z Traceback (most recent call last): 2025-12-04T11:45:26.2802744Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2802785Z method(*args, **kwargs) 2025-12-04T11:45:26.2802939Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2802981Z method(*args, **kwargs) 2025-12-04T11:45:26.2803133Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2803171Z with policy(): 2025-12-04T11:45:26.2803359Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2803402Z raise RuntimeError(msg) 2025-12-04T11:45:26.2803793Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2803797Z 2025-12-04T11:45:26.2803872Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2804135Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2804138Z 2025-12-04T11:45:26.2804227Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2804300Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2804359Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2804416Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2804916Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2805017Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2805054Z graph_break [] 2025-12-04T11:45:26.2805130Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2805204Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2805692Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2805740Z current_size = base.storage().size() 2025-12-04T11:45:26.2805781Z Autotune Choices Stats: 2025-12-04T11:45:26.2806151Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.2806197Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2806239Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2806338Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2806573Z triton_mm_16 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2806818Z triton_mm_17 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2807051Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2807277Z triton_mm_12 0.0064 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2807510Z triton_mm_6 0.0069 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2807741Z triton_mm_9 0.0074 ms 80.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2807967Z triton_mm_14 0.0077 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2808190Z triton_mm_10 0.0079 ms 75.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2808424Z triton_mm_5 0.0082 ms 71.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2808663Z triton_mm_18 0.0090 ms 65.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2808794Z SingleProcess AUTOTUNE benchmarking takes 0.0804 seconds and 0.3654 seconds precompiling for 20 choices 2025-12-04T11:45:26.2808878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2808921Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2808978Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2809080Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2809567Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2809606Z graph_break [] 2025-12-04T11:45:26.2809668Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2809743Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2809783Z Autotune Choices Stats: 2025-12-04T11:45:26.2810147Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:26.2810193Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2810248Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2810345Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2810578Z triton_mm_36 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2810807Z triton_mm_35 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2811035Z triton_mm_26 0.0064 ms 92.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2811260Z triton_mm_31 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2811490Z triton_mm_25 0.0069 ms 85.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2811720Z triton_mm_28 0.0071 ms 83.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2811955Z triton_mm_29 0.0077 ms 76.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2812179Z triton_mm_33 0.0078 ms 75.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2812415Z triton_mm_24 0.0081 ms 72.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2812654Z triton_mm_37 0.0095 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2812786Z SingleProcess AUTOTUNE benchmarking takes 0.1148 seconds and 0.2591 seconds precompiling for 20 choices 2025-12-04T11:45:26.2812859Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2812901Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2812960Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2813063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2813593Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2813631Z graph_break [] 2025-12-04T11:45:26.2813694Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2813770Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2813810Z Autotune Choices Stats: 2025-12-04T11:45:26.2814174Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.006479999981820583, "best_triton_pos": 0} 2025-12-04T11:45:26.2814240Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2814281Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2814381Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2814615Z triton_mm_55 0.0065 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2814659Z _scaled_mm 0.0066 ms 98.8% 2025-12-04T11:45:26.2814886Z triton_mm_54 0.0066 ms 97.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2815122Z triton_mm_44 0.0070 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2815351Z triton_mm_45 0.0070 ms 92.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2815577Z triton_mm_50 0.0071 ms 91.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2815821Z triton_mm_52 0.0076 ms 85.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2816065Z triton_mm_47 0.0079 ms 81.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2816309Z triton_mm_48 0.0084 ms 77.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2816534Z triton_mm_43 0.0086 ms 75.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2816665Z SingleProcess AUTOTUNE benchmarking takes 0.1440 seconds and 0.2444 seconds precompiling for 20 choices 2025-12-04T11:45:26.2816857Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-a11c50adadea2884.xml - 2025-12-04T11:45:26.2816918Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2817516Z FAILED [0.8157s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2817519Z 2025-12-04T11:45:26.2817593Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2817858Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2817871Z 2025-12-04T11:45:26.2817960Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2818025Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2818092Z ================== 1 failed, 187 deselected, 2 rerun in 4.33s ================== 2025-12-04T11:45:26.2818131Z Got exit code 1 2025-12-04T11:45:26.2818170Z Retrying single test... 2025-12-04T11:45:26.2818317Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7038232b4d32e08b.xml 2025-12-04T11:45:26.2818375Z ============================= test session starts ============================== 2025-12-04T11:45:26.2818488Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2818529Z cachedir: .pytest_cache 2025-12-04T11:45:26.2818692Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2818737Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2818779Z configfile: pytest.ini 2025-12-04T11:45:26.2818942Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2819017Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2819276Z stepcurrent: skipping 149 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2819330Z Running 1 items in this shard 2025-12-04T11:45:26.2819333Z 2025-12-04T11:45:26.2819555Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.4714s] [100%] 2025-12-04T11:45:26.2819774Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.9427s] [100%] 2025-12-04T11:45:26.2819982Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda FAILED [0.7978s] [100%] 2025-12-04T11:45:26.2819984Z 2025-12-04T11:45:26.2820035Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2820195Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2820243Z Traceback (most recent call last): 2025-12-04T11:45:26.2820404Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2820444Z method(*args, **kwargs) 2025-12-04T11:45:26.2820598Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2820638Z method(*args, **kwargs) 2025-12-04T11:45:26.2820791Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2820828Z with policy(): 2025-12-04T11:45:26.2820981Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2821022Z raise RuntimeError(msg) 2025-12-04T11:45:26.2821419Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1056964608. 2025-12-04T11:45:26.2821422Z 2025-12-04T11:45:26.2821500Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2821776Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2821778Z 2025-12-04T11:45:26.2821867Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2821940Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2821983Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2822042Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2822527Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2822628Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2822666Z graph_break [] 2025-12-04T11:45:26.2822730Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2822806Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2823335Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2823397Z current_size = base.storage().size() 2025-12-04T11:45:26.2823437Z Autotune Choices Stats: 2025-12-04T11:45:26.2823819Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2823869Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2823910Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2824011Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2824263Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2824496Z triton_mm_17 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2824727Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2824954Z triton_mm_12 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2825185Z triton_mm_6 0.0070 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2825413Z triton_mm_9 0.0074 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2825651Z triton_mm_10 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2825875Z triton_mm_14 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2826105Z triton_mm_5 0.0082 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2826337Z triton_mm_18 0.0088 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2826470Z SingleProcess AUTOTUNE benchmarking takes 0.0868 seconds and 0.3692 seconds precompiling for 20 choices 2025-12-04T11:45:26.2826619Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2826666Z Traceback (most recent call last): 2025-12-04T11:45:26.2826828Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2826869Z method(*args, **kwargs) 2025-12-04T11:45:26.2827022Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2827073Z method(*args, **kwargs) 2025-12-04T11:45:26.2827224Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2827262Z with policy(): 2025-12-04T11:45:26.2827418Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2827459Z raise RuntimeError(msg) 2025-12-04T11:45:26.2827871Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1056964608 and is now 1113587712. 2025-12-04T11:45:26.2827874Z 2025-12-04T11:45:26.2827949Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2828209Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2828212Z 2025-12-04T11:45:26.2828304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2828379Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2828422Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2828480Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2828970Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2829070Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2829110Z graph_break [] 2025-12-04T11:45:26.2829172Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2829268Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2829755Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2829802Z current_size = base.storage().size() 2025-12-04T11:45:26.2829843Z Autotune Choices Stats: 2025-12-04T11:45:26.2830212Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2830260Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2830302Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2830399Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2830638Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2830873Z triton_mm_17 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2831113Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2831338Z triton_mm_12 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2831576Z triton_mm_6 0.0070 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2831816Z triton_mm_9 0.0074 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2832044Z triton_mm_10 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2832267Z triton_mm_14 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2832495Z triton_mm_5 0.0082 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2832731Z triton_mm_18 0.0088 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2832861Z SingleProcess AUTOTUNE benchmarking takes 0.0868 seconds and 0.3692 seconds precompiling for 20 choices 2025-12-04T11:45:26.2832936Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2832992Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2833048Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2833150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2833680Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2833721Z graph_break [] 2025-12-04T11:45:26.2833786Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2833858Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2833902Z Autotune Choices Stats: 2025-12-04T11:45:26.2834267Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2834317Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2834359Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2834460Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2834693Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2834941Z triton_mm_35 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2835184Z triton_mm_26 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2835420Z triton_mm_31 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2835463Z _scaled_mm 0.0066 ms 93.9% 2025-12-04T11:45:26.2835690Z triton_mm_25 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2835922Z triton_mm_28 0.0074 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2836147Z triton_mm_29 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2836373Z triton_mm_33 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2836597Z triton_mm_24 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2836730Z SingleProcess AUTOTUNE benchmarking takes 0.1188 seconds and 0.2560 seconds precompiling for 20 choices 2025-12-04T11:45:26.2836798Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2836947Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2836997Z Traceback (most recent call last): 2025-12-04T11:45:26.2837157Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2837201Z method(*args, **kwargs) 2025-12-04T11:45:26.2837355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2837398Z method(*args, **kwargs) 2025-12-04T11:45:26.2837547Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2837585Z with policy(): 2025-12-04T11:45:26.2837738Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2837781Z raise RuntimeError(msg) 2025-12-04T11:45:26.2838175Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2838177Z 2025-12-04T11:45:26.2838253Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2838516Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2838529Z 2025-12-04T11:45:26.2838617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2838691Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2838735Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2838791Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2839302Z inductor [('triton_bundler_save_kernel', 160), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2839402Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2839441Z graph_break [] 2025-12-04T11:45:26.2839504Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2839577Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2840063Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2840109Z current_size = base.storage().size() 2025-12-04T11:45:26.2840150Z Autotune Choices Stats: 2025-12-04T11:45:26.2840516Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_16", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.2840564Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2840617Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2840717Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2840952Z triton_mm_16 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2841186Z triton_mm_17 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2841417Z triton_mm_7 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2841641Z triton_mm_12 0.0066 ms 90.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2841872Z triton_mm_6 0.0070 ms 84.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2842100Z triton_mm_9 0.0074 ms 81.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2842324Z triton_mm_10 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2842559Z triton_mm_14 0.0080 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2842795Z triton_mm_5 0.0082 ms 72.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2843050Z triton_mm_18 0.0088 ms 67.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2843182Z SingleProcess AUTOTUNE benchmarking takes 0.0868 seconds and 0.3692 seconds precompiling for 20 choices 2025-12-04T11:45:26.2843293Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2843336Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2843394Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2843494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2843982Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2844021Z graph_break [] 2025-12-04T11:45:26.2844084Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2844159Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2844200Z Autotune Choices Stats: 2025-12-04T11:45:26.2844566Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_36", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2844629Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2844673Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2844771Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2845004Z triton_mm_36 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2845236Z triton_mm_35 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2845468Z triton_mm_26 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2845693Z triton_mm_31 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2845736Z _scaled_mm 0.0066 ms 93.9% 2025-12-04T11:45:26.2845967Z triton_mm_25 0.0070 ms 88.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2846209Z triton_mm_28 0.0074 ms 82.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2846451Z triton_mm_29 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2846689Z triton_mm_33 0.0078 ms 78.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2846914Z triton_mm_24 0.0080 ms 77.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2847043Z SingleProcess AUTOTUNE benchmarking takes 0.1188 seconds and 0.2560 seconds precompiling for 20 choices 2025-12-04T11:45:26.2847117Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2847160Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2847217Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2847317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2847806Z inductor [('triton_bundler_save_kernel', 160), ('async_compile_cache_miss', 21), ('benchmarking.InductorBenchmarker.benchmark_gpu', 20), ('generated_module_cache_miss', 19), ('select_algorithm_num_precompiles', 19), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2847845Z graph_break [] 2025-12-04T11:45:26.2847907Z aten_mm_info [('aten._scaled_mm.default_3_2048_1024', 1)] 2025-12-04T11:45:26.2847981Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2848031Z Autotune Choices Stats: 2025-12-04T11:45:26.2848393Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "triton_mm_55", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2848439Z AUTOTUNE scaled_mm(3x1024, 1024x2048, , ) 2025-12-04T11:45:26.2848480Z strides: [1024, 1], [1, 1024], [], [] 2025-12-04T11:45:26.2848579Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2848814Z triton_mm_55 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2849042Z triton_mm_54 0.0064 ms 96.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=256, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2849273Z triton_mm_50 0.0064 ms 95.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2849503Z triton_mm_45 0.0066 ms 93.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2849733Z triton_mm_44 0.0072 ms 86.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2849973Z triton_mm_47 0.0075 ms 82.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2850211Z triton_mm_52 0.0076 ms 81.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2850446Z triton_mm_48 0.0079 ms 78.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=64, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2850669Z triton_mm_43 0.0081 ms 75.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2850899Z triton_mm_56 0.0096 ms 64.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=128, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2851030Z SingleProcess AUTOTUNE benchmarking takes 0.1363 seconds and 0.2403 seconds precompiling for 20 choices 2025-12-04T11:45:26.2851220Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-7038232b4d32e08b.xml - 2025-12-04T11:45:26.2851282Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2851879Z FAILED [0.7978s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1113587712 and is now 1170210816. 2025-12-04T11:45:26.2851897Z 2025-12-04T11:45:26.2851973Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2852235Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2852238Z 2025-12-04T11:45:26.2852325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2852389Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2852457Z ================== 1 failed, 187 deselected, 2 rerun in 4.23s ================== 2025-12-04T11:45:26.2852497Z Got exit code 1 2025-12-04T11:45:26.2852707Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2852835Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2852979Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d8f4d1d2b4cfb2c.xml 2025-12-04T11:45:26.2853038Z ============================= test session starts ============================== 2025-12-04T11:45:26.2853150Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2853192Z cachedir: .pytest_cache 2025-12-04T11:45:26.2853401Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2853461Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2853502Z configfile: pytest.ini 2025-12-04T11:45:26.2853664Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2853741Z collecting ... collected 188 items / 150 deselected / 38 selected 2025-12-04T11:45:26.2853796Z stepcurrent: skipping 150 already run items. 2025-12-04T11:45:26.2853840Z Running 38 items in this shard 2025-12-04T11:45:26.2853842Z 2025-12-04T11:45:26.2854075Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7486s] [ 2%] 2025-12-04T11:45:26.2854302Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3588s] [ 2%] 2025-12-04T11:45:26.2854491Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.3028s] [ 2%] 2025-12-04T11:45:26.2854494Z 2025-12-04T11:45:26.2854546Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2854689Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2854736Z Traceback (most recent call last): 2025-12-04T11:45:26.2854895Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2854938Z method(*args, **kwargs) 2025-12-04T11:45:26.2855091Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2855133Z method(*args, **kwargs) 2025-12-04T11:45:26.2855283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2855324Z with policy(): 2025-12-04T11:45:26.2855476Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2855517Z raise RuntimeError(msg) 2025-12-04T11:45:26.2855924Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2855926Z 2025-12-04T11:45:26.2856001Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2856262Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2856264Z 2025-12-04T11:45:26.2856354Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2856427Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2856471Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2856529Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2856599Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2856698Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2856735Z graph_break [] 2025-12-04T11:45:26.2856796Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2856939Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2856987Z Traceback (most recent call last): 2025-12-04T11:45:26.2857140Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2857192Z method(*args, **kwargs) 2025-12-04T11:45:26.2857343Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2857384Z method(*args, **kwargs) 2025-12-04T11:45:26.2857535Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2857572Z with policy(): 2025-12-04T11:45:26.2857740Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2857782Z raise RuntimeError(msg) 2025-12-04T11:45:26.2858181Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2858184Z 2025-12-04T11:45:26.2858260Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2858519Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2858522Z 2025-12-04T11:45:26.2858610Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2858685Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2858727Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2858784Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2858852Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2858951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2858990Z graph_break [] 2025-12-04T11:45:26.2859050Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2859124Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2859164Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2859220Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2859328Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2859394Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2859429Z graph_break [] 2025-12-04T11:45:26.2859489Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2859540Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2859683Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2859729Z Traceback (most recent call last): 2025-12-04T11:45:26.2859884Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2859925Z method(*args, **kwargs) 2025-12-04T11:45:26.2860075Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2860115Z method(*args, **kwargs) 2025-12-04T11:45:26.2860269Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2860306Z with policy(): 2025-12-04T11:45:26.2860460Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2860502Z raise RuntimeError(msg) 2025-12-04T11:45:26.2860885Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2860898Z 2025-12-04T11:45:26.2860975Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2861230Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2861233Z 2025-12-04T11:45:26.2861337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2861411Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2861454Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2861510Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2861586Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2861685Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2861723Z graph_break [] 2025-12-04T11:45:26.2861781Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2861856Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2861898Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2861953Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2862048Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2862114Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2862150Z graph_break [] 2025-12-04T11:45:26.2862210Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2862284Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2862326Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2862381Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2862479Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2862542Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2862580Z graph_break [] 2025-12-04T11:45:26.2862637Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2862841Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4d8f4d1d2b4cfb2c.xml - 2025-12-04T11:45:26.2862901Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2863514Z FAILED [0.3028s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2863518Z 2025-12-04T11:45:26.2863592Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2863848Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2863851Z 2025-12-04T11:45:26.2863939Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2864001Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2864069Z ================== 1 failed, 150 deselected, 2 rerun in 2.43s ================== 2025-12-04T11:45:26.2864106Z Got exit code 1 2025-12-04T11:45:26.2864148Z Retrying single test... 2025-12-04T11:45:26.2864294Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0cc0ff881c32dc85.xml 2025-12-04T11:45:26.2864368Z ============================= test session starts ============================== 2025-12-04T11:45:26.2864478Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2864519Z cachedir: .pytest_cache 2025-12-04T11:45:26.2864685Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2864731Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2864771Z configfile: pytest.ini 2025-12-04T11:45:26.2864946Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2865021Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2865290Z stepcurrent: skipping 150 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2865335Z Running 1 items in this shard 2025-12-04T11:45:26.2865338Z 2025-12-04T11:45:26.2865550Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6674s] [100%] 2025-12-04T11:45:26.2865766Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2759s] [100%] 2025-12-04T11:45:26.2865954Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2231s] [100%] 2025-12-04T11:45:26.2865956Z 2025-12-04T11:45:26.2866008Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2866149Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2866198Z Traceback (most recent call last): 2025-12-04T11:45:26.2866355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2866397Z method(*args, **kwargs) 2025-12-04T11:45:26.2866564Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2866605Z method(*args, **kwargs) 2025-12-04T11:45:26.2866756Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2866794Z with policy(): 2025-12-04T11:45:26.2866947Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2866990Z raise RuntimeError(msg) 2025-12-04T11:45:26.2867376Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2867381Z 2025-12-04T11:45:26.2867456Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2867714Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2867717Z 2025-12-04T11:45:26.2867804Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2867878Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2867921Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2867978Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2868044Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2868154Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2868190Z graph_break [] 2025-12-04T11:45:26.2868249Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2868391Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2868437Z Traceback (most recent call last): 2025-12-04T11:45:26.2868601Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2868643Z method(*args, **kwargs) 2025-12-04T11:45:26.2868795Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2868845Z method(*args, **kwargs) 2025-12-04T11:45:26.2868997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2869038Z with policy(): 2025-12-04T11:45:26.2869191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2869232Z raise RuntimeError(msg) 2025-12-04T11:45:26.2869619Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2869622Z 2025-12-04T11:45:26.2869696Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2869951Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2869954Z 2025-12-04T11:45:26.2870040Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2870115Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2870157Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2870226Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2870292Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2870391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2870427Z graph_break [] 2025-12-04T11:45:26.2870491Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2870566Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2870611Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2870666Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2870764Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2870829Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2870867Z graph_break [] 2025-12-04T11:45:26.2870926Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2870981Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2871121Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2871171Z Traceback (most recent call last): 2025-12-04T11:45:26.2871329Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2871371Z method(*args, **kwargs) 2025-12-04T11:45:26.2871523Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2871564Z method(*args, **kwargs) 2025-12-04T11:45:26.2871732Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2871769Z with policy(): 2025-12-04T11:45:26.2871921Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2871964Z raise RuntimeError(msg) 2025-12-04T11:45:26.2872356Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2872358Z 2025-12-04T11:45:26.2872437Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2872703Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2872708Z 2025-12-04T11:45:26.2872795Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2872867Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2872909Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2872966Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2873032Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2873135Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2873172Z graph_break [] 2025-12-04T11:45:26.2873232Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2873338Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2873381Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2873436Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2873535Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2873601Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2873637Z graph_break [] 2025-12-04T11:45:26.2873713Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2873786Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2873827Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2873886Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2873981Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2874047Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2874085Z graph_break [] 2025-12-04T11:45:26.2874142Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2874333Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-0cc0ff881c32dc85.xml - 2025-12-04T11:45:26.2874394Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2874970Z FAILED [0.2231s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2874973Z 2025-12-04T11:45:26.2875048Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2875304Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2875321Z 2025-12-04T11:45:26.2875412Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2875474Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2875547Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.2875585Z Got exit code 1 2025-12-04T11:45:26.2875625Z Retrying single test... 2025-12-04T11:45:26.2875785Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad83cd965fb9ba4b.xml 2025-12-04T11:45:26.2875843Z ============================= test session starts ============================== 2025-12-04T11:45:26.2875970Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2876012Z cachedir: .pytest_cache 2025-12-04T11:45:26.2876171Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2876217Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2876258Z configfile: pytest.ini 2025-12-04T11:45:26.2876417Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2876492Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2876749Z stepcurrent: skipping 150 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2876796Z Running 1 items in this shard 2025-12-04T11:45:26.2876798Z 2025-12-04T11:45:26.2877014Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6758s] [100%] 2025-12-04T11:45:26.2877227Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2602s] [100%] 2025-12-04T11:45:26.2877417Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda FAILED [0.2109s] [100%] 2025-12-04T11:45:26.2877430Z 2025-12-04T11:45:26.2877487Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2877630Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2877679Z Traceback (most recent call last): 2025-12-04T11:45:26.2877835Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2877880Z method(*args, **kwargs) 2025-12-04T11:45:26.2878034Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2878076Z method(*args, **kwargs) 2025-12-04T11:45:26.2878228Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2878266Z with policy(): 2025-12-04T11:45:26.2878422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2878462Z raise RuntimeError(msg) 2025-12-04T11:45:26.2878855Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2878857Z 2025-12-04T11:45:26.2878934Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2879191Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2879203Z 2025-12-04T11:45:26.2879290Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2879365Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2879406Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2879463Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2879539Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2879638Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2879674Z graph_break [] 2025-12-04T11:45:26.2879746Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2879888Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2879936Z Traceback (most recent call last): 2025-12-04T11:45:26.2880090Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2880130Z method(*args, **kwargs) 2025-12-04T11:45:26.2880283Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2880324Z method(*args, **kwargs) 2025-12-04T11:45:26.2880474Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2880512Z with policy(): 2025-12-04T11:45:26.2880664Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2880706Z raise RuntimeError(msg) 2025-12-04T11:45:26.2881092Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2881095Z 2025-12-04T11:45:26.2881179Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2881436Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2881439Z 2025-12-04T11:45:26.2881526Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2881602Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2881644Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2881702Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2881768Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2881870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2881906Z graph_break [] 2025-12-04T11:45:26.2881968Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2882043Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2882085Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2882140Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2882241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2882305Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2882342Z graph_break [] 2025-12-04T11:45:26.2882402Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2882455Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2882595Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2882655Z Traceback (most recent call last): 2025-12-04T11:45:26.2882808Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2882851Z method(*args, **kwargs) 2025-12-04T11:45:26.2883002Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2883053Z method(*args, **kwargs) 2025-12-04T11:45:26.2883204Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2883243Z with policy(): 2025-12-04T11:45:26.2883451Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2883498Z raise RuntimeError(msg) 2025-12-04T11:45:26.2883881Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2883886Z 2025-12-04T11:45:26.2883960Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2884217Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2884219Z 2025-12-04T11:45:26.2884304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2884381Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2884423Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2884481Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2884548Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2884648Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2884705Z graph_break [] 2025-12-04T11:45:26.2884765Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2884837Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2884880Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2884935Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2885034Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2885100Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2885137Z graph_break [] 2025-12-04T11:45:26.2885196Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2885270Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2885314Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2885369Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2885466Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2885532Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2885567Z graph_break [] 2025-12-04T11:45:26.2885625Z aten_mm_info [('aten._scaled_mm.default_3_16_16', 1)] 2025-12-04T11:45:26.2885818Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-ad83cd965fb9ba4b.xml - 2025-12-04T11:45:26.2885879Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2886456Z FAILED [0.2109s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2886473Z 2025-12-04T11:45:26.2886546Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2886818Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2886820Z 2025-12-04T11:45:26.2886907Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2886981Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2887050Z ================== 1 failed, 187 deselected, 2 rerun in 2.16s ================== 2025-12-04T11:45:26.2887090Z Got exit code 1 2025-12-04T11:45:26.2887296Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2887426Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2887571Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e6aeb0c99529b7d.xml 2025-12-04T11:45:26.2887630Z ============================= test session starts ============================== 2025-12-04T11:45:26.2887740Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2887781Z cachedir: .pytest_cache 2025-12-04T11:45:26.2887938Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2887985Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2888027Z configfile: pytest.ini 2025-12-04T11:45:26.2888189Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2888267Z collecting ... collected 188 items / 151 deselected / 37 selected 2025-12-04T11:45:26.2888334Z stepcurrent: skipping 151 already run items. 2025-12-04T11:45:26.2888378Z Running 37 items in this shard 2025-12-04T11:45:26.2888380Z 2025-12-04T11:45:26.2888602Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.7604s] [ 2%] 2025-12-04T11:45:26.2888822Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3640s] [ 2%] 2025-12-04T11:45:26.2889014Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.3226s] [ 2%] 2025-12-04T11:45:26.2889017Z 2025-12-04T11:45:26.2889070Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2889213Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2889261Z Traceback (most recent call last): 2025-12-04T11:45:26.2889419Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2889461Z method(*args, **kwargs) 2025-12-04T11:45:26.2889612Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2889653Z method(*args, **kwargs) 2025-12-04T11:45:26.2889805Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2889843Z with policy(): 2025-12-04T11:45:26.2890009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2890052Z raise RuntimeError(msg) 2025-12-04T11:45:26.2890436Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2890440Z 2025-12-04T11:45:26.2890526Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2890802Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2890806Z 2025-12-04T11:45:26.2890894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2890973Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2891014Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2891073Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2891141Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2891242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2891280Z graph_break [] 2025-12-04T11:45:26.2891344Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2891487Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2891535Z Traceback (most recent call last): 2025-12-04T11:45:26.2891689Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2891729Z method(*args, **kwargs) 2025-12-04T11:45:26.2891881Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2891923Z method(*args, **kwargs) 2025-12-04T11:45:26.2892076Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2892128Z with policy(): 2025-12-04T11:45:26.2892279Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2892321Z raise RuntimeError(msg) 2025-12-04T11:45:26.2892708Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2892710Z 2025-12-04T11:45:26.2892786Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2893045Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2893048Z 2025-12-04T11:45:26.2893135Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2893209Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2893283Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2893340Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2893409Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2893511Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2893547Z graph_break [] 2025-12-04T11:45:26.2893609Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2893699Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2893742Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2893798Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2893897Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2893962Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2893998Z graph_break [] 2025-12-04T11:45:26.2894072Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2894125Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2894280Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2894329Z Traceback (most recent call last): 2025-12-04T11:45:26.2894484Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2894528Z method(*args, **kwargs) 2025-12-04T11:45:26.2894679Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2894722Z method(*args, **kwargs) 2025-12-04T11:45:26.2894872Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2894911Z with policy(): 2025-12-04T11:45:26.2895065Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2895107Z raise RuntimeError(msg) 2025-12-04T11:45:26.2895496Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2895499Z 2025-12-04T11:45:26.2895574Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2895831Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2895848Z 2025-12-04T11:45:26.2895935Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2896008Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2896051Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2896107Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2896177Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2896275Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2896313Z graph_break [] 2025-12-04T11:45:26.2896372Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2896445Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2896487Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2896546Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2896643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2896714Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2896751Z graph_break [] 2025-12-04T11:45:26.2896810Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2896883Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2896926Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2896981Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2897080Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2897156Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2897193Z graph_break [] 2025-12-04T11:45:26.2897251Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2897445Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-9e6aeb0c99529b7d.xml - 2025-12-04T11:45:26.2897506Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2898118Z FAILED [0.3226s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2898121Z 2025-12-04T11:45:26.2898194Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2898452Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2898455Z 2025-12-04T11:45:26.2898543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2898605Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2898673Z ================== 1 failed, 151 deselected, 2 rerun in 2.47s ================== 2025-12-04T11:45:26.2898710Z Got exit code 1 2025-12-04T11:45:26.2898753Z Retrying single test... 2025-12-04T11:45:26.2898899Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f170aeea117a4540.xml 2025-12-04T11:45:26.2898959Z ============================= test session starts ============================== 2025-12-04T11:45:26.2899069Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2899113Z cachedir: .pytest_cache 2025-12-04T11:45:26.2899287Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2899335Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2899375Z configfile: pytest.ini 2025-12-04T11:45:26.2899537Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2899614Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2899874Z stepcurrent: skipping 151 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2899921Z Running 1 items in this shard 2025-12-04T11:45:26.2899923Z 2025-12-04T11:45:26.2900137Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6712s] [100%] 2025-12-04T11:45:26.2900354Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2647s] [100%] 2025-12-04T11:45:26.2900544Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2234s] [100%] 2025-12-04T11:45:26.2900546Z 2025-12-04T11:45:26.2900598Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2900743Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2900802Z Traceback (most recent call last): 2025-12-04T11:45:26.2900958Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2900999Z method(*args, **kwargs) 2025-12-04T11:45:26.2901151Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2901192Z method(*args, **kwargs) 2025-12-04T11:45:26.2901353Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2901391Z with policy(): 2025-12-04T11:45:26.2901544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2901596Z raise RuntimeError(msg) 2025-12-04T11:45:26.2901986Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2901989Z 2025-12-04T11:45:26.2902063Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2902327Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2902329Z 2025-12-04T11:45:26.2902417Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2902490Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2902533Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2902592Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2902659Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2902760Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2902795Z graph_break [] 2025-12-04T11:45:26.2902856Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2903010Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2903058Z Traceback (most recent call last): 2025-12-04T11:45:26.2903212Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2903296Z method(*args, **kwargs) 2025-12-04T11:45:26.2903446Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2903488Z method(*args, **kwargs) 2025-12-04T11:45:26.2903639Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2903679Z with policy(): 2025-12-04T11:45:26.2903832Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2903873Z raise RuntimeError(msg) 2025-12-04T11:45:26.2904262Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2904265Z 2025-12-04T11:45:26.2904337Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2904598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2904601Z 2025-12-04T11:45:26.2904703Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2904779Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2904821Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2904878Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2904944Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2905044Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2905094Z graph_break [] 2025-12-04T11:45:26.2905154Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2905227Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2905285Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2905340Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2905438Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2905503Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2905541Z graph_break [] 2025-12-04T11:45:26.2905598Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2905652Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2905794Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2905842Z Traceback (most recent call last): 2025-12-04T11:45:26.2905997Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2906040Z method(*args, **kwargs) 2025-12-04T11:45:26.2906191Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2906232Z method(*args, **kwargs) 2025-12-04T11:45:26.2906384Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2906421Z with policy(): 2025-12-04T11:45:26.2906574Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2906633Z raise RuntimeError(msg) 2025-12-04T11:45:26.2907018Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2907021Z 2025-12-04T11:45:26.2907095Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2907354Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2907357Z 2025-12-04T11:45:26.2907445Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2907520Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2907562Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2907620Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2907684Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2907784Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2907820Z graph_break [] 2025-12-04T11:45:26.2907879Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2907952Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2907996Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2908050Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2908157Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2908221Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2908260Z graph_break [] 2025-12-04T11:45:26.2908320Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2908394Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2908434Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2908502Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2908599Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2908663Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2908710Z graph_break [] 2025-12-04T11:45:26.2908770Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2908960Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-f170aeea117a4540.xml - 2025-12-04T11:45:26.2909021Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2909602Z FAILED [0.2234s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2909606Z 2025-12-04T11:45:26.2909681Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2909939Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2909942Z 2025-12-04T11:45:26.2910027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2910090Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2911615Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.2911655Z Got exit code 1 2025-12-04T11:45:26.2911696Z Retrying single test... 2025-12-04T11:45:26.2911844Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0494c938948fcfc.xml 2025-12-04T11:45:26.2911903Z ============================= test session starts ============================== 2025-12-04T11:45:26.2912018Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2912059Z cachedir: .pytest_cache 2025-12-04T11:45:26.2912221Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2912267Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2912308Z configfile: pytest.ini 2025-12-04T11:45:26.2912471Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2912548Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2912807Z stepcurrent: skipping 151 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2912852Z Running 1 items in this shard 2025-12-04T11:45:26.2912854Z 2025-12-04T11:45:26.2913071Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.6683s] [100%] 2025-12-04T11:45:26.2913326Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.2651s] [100%] 2025-12-04T11:45:26.2913535Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda FAILED [0.2266s] [100%] 2025-12-04T11:45:26.2913540Z 2025-12-04T11:45:26.2913590Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2913755Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2913801Z Traceback (most recent call last): 2025-12-04T11:45:26.2913961Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2914018Z method(*args, **kwargs) 2025-12-04T11:45:26.2914173Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2914214Z method(*args, **kwargs) 2025-12-04T11:45:26.2914367Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2914403Z with policy(): 2025-12-04T11:45:26.2914563Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2914603Z raise RuntimeError(msg) 2025-12-04T11:45:26.2914999Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1094713344. 2025-12-04T11:45:26.2915001Z 2025-12-04T11:45:26.2915075Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2915336Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2915340Z 2025-12-04T11:45:26.2915427Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2915519Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2915561Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2915619Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2915686Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2915786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2915822Z graph_break [] 2025-12-04T11:45:26.2915884Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2916027Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2916075Z Traceback (most recent call last): 2025-12-04T11:45:26.2916229Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2916271Z method(*args, **kwargs) 2025-12-04T11:45:26.2916422Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2916461Z method(*args, **kwargs) 2025-12-04T11:45:26.2916613Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2916650Z with policy(): 2025-12-04T11:45:26.2916807Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2916848Z raise RuntimeError(msg) 2025-12-04T11:45:26.2917239Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1094713344 and is now 1109393408. 2025-12-04T11:45:26.2917254Z 2025-12-04T11:45:26.2917328Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2917598Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2917600Z 2025-12-04T11:45:26.2917687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2917770Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2917812Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2917869Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2917936Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2918034Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2918070Z graph_break [] 2025-12-04T11:45:26.2918130Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2918204Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2918248Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2918302Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2918401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2918465Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2918503Z graph_break [] 2025-12-04T11:45:26.2918561Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2918613Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2918757Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2918804Z Traceback (most recent call last): 2025-12-04T11:45:26.2918959Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2919012Z method(*args, **kwargs) 2025-12-04T11:45:26.2919162Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2919203Z method(*args, **kwargs) 2025-12-04T11:45:26.2919355Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2919392Z with policy(): 2025-12-04T11:45:26.2919546Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2919587Z raise RuntimeError(msg) 2025-12-04T11:45:26.2919974Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2919978Z 2025-12-04T11:45:26.2920050Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2920311Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2920313Z 2025-12-04T11:45:26.2920400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2920475Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2920516Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2920589Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2920654Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2920753Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2920790Z graph_break [] 2025-12-04T11:45:26.2920849Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2920924Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2920976Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2921031Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2921129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2921206Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2921244Z graph_break [] 2025-12-04T11:45:26.2921301Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2921375Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2921415Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2921470Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2921566Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2921632Z inductor [('fxgraph_cache_miss', 1), ('extern_calls', 1)] 2025-12-04T11:45:26.2921668Z graph_break [] 2025-12-04T11:45:26.2921726Z aten_mm_info [('aten._scaled_mm.default_3_2048_16', 1)] 2025-12-04T11:45:26.2921918Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-e0494c938948fcfc.xml - 2025-12-04T11:45:26.2921980Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2922566Z FAILED [0.2266s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1109393408 and is now 1124073472. 2025-12-04T11:45:26.2922580Z 2025-12-04T11:45:26.2922653Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2922913Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2922915Z 2025-12-04T11:45:26.2923001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2923065Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2923133Z ================== 1 failed, 187 deselected, 2 rerun in 2.18s ================== 2025-12-04T11:45:26.2923172Z Got exit code 1 2025-12-04T11:45:26.2923430Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2923561Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2923706Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16434e8ee56f6d4.xml 2025-12-04T11:45:26.2923767Z ============================= test session starts ============================== 2025-12-04T11:45:26.2923878Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2923922Z cachedir: .pytest_cache 2025-12-04T11:45:26.2924082Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2924144Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2924183Z configfile: pytest.ini 2025-12-04T11:45:26.2924347Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2924425Z collecting ... collected 188 items / 152 deselected / 36 selected 2025-12-04T11:45:26.2924479Z stepcurrent: skipping 152 already run items. 2025-12-04T11:45:26.2924523Z Running 36 items in this shard 2025-12-04T11:45:26.2924526Z 2025-12-04T11:45:26.2924755Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8337s] [ 2%] 2025-12-04T11:45:26.2924982Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3888s] [ 2%] 2025-12-04T11:45:26.2925172Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.4509s] [ 2%] 2025-12-04T11:45:26.2925175Z 2025-12-04T11:45:26.2925228Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2925374Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2925420Z Traceback (most recent call last): 2025-12-04T11:45:26.2925581Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2925624Z method(*args, **kwargs) 2025-12-04T11:45:26.2925776Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2925817Z method(*args, **kwargs) 2025-12-04T11:45:26.2925968Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2926006Z with policy(): 2025-12-04T11:45:26.2926159Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2926201Z raise RuntimeError(msg) 2025-12-04T11:45:26.2926604Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.2926608Z 2025-12-04T11:45:26.2926681Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2926940Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2926942Z 2025-12-04T11:45:26.2927030Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2927104Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2927146Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2927204Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2927700Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2927802Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2927839Z graph_break [] 2025-12-04T11:45:26.2927900Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2927993Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2928487Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2928538Z current_size = base.storage().size() 2025-12-04T11:45:26.2928588Z Autotune Choices Stats: 2025-12-04T11:45:26.2928974Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2929021Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2929063Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2929164Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2929401Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2929444Z _scaled_mm 0.0192 ms 31.8% 2025-12-04T11:45:26.2929574Z SingleProcess AUTOTUNE benchmarking takes 0.0119 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2929718Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2929765Z Traceback (most recent call last): 2025-12-04T11:45:26.2929922Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2929964Z method(*args, **kwargs) 2025-12-04T11:45:26.2930118Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2930159Z method(*args, **kwargs) 2025-12-04T11:45:26.2930321Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2930359Z with policy(): 2025-12-04T11:45:26.2930514Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2930558Z raise RuntimeError(msg) 2025-12-04T11:45:26.2930946Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.2930950Z 2025-12-04T11:45:26.2931026Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2931287Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2931290Z 2025-12-04T11:45:26.2931377Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2931453Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2931495Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2931551Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2932041Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2932150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2932188Z graph_break [] 2025-12-04T11:45:26.2932248Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2932322Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2932832Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2932880Z current_size = base.storage().size() 2025-12-04T11:45:26.2932923Z Autotune Choices Stats: 2025-12-04T11:45:26.2933337Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2933382Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2933423Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2933525Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2933761Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2933805Z _scaled_mm 0.0192 ms 31.8% 2025-12-04T11:45:26.2933935Z SingleProcess AUTOTUNE benchmarking takes 0.0119 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2934009Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2934052Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2934126Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2934226Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2934714Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2934752Z graph_break [] 2025-12-04T11:45:26.2934810Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2934887Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2934927Z Autotune Choices Stats: 2025-12-04T11:45:26.2935292Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006318999920040369, "best_triton_pos": 0} 2025-12-04T11:45:26.2935338Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2935378Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2935478Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2935713Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2935772Z _scaled_mm 0.0185 ms 34.2% 2025-12-04T11:45:26.2935900Z SingleProcess AUTOTUNE benchmarking takes 0.0106 seconds and 0.0542 seconds precompiling for 2 choices 2025-12-04T11:45:26.2935954Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2936097Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2936144Z Traceback (most recent call last): 2025-12-04T11:45:26.2936312Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2936353Z method(*args, **kwargs) 2025-12-04T11:45:26.2936521Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2936562Z method(*args, **kwargs) 2025-12-04T11:45:26.2936715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2936753Z with policy(): 2025-12-04T11:45:26.2936907Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2936949Z raise RuntimeError(msg) 2025-12-04T11:45:26.2937338Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2937340Z 2025-12-04T11:45:26.2937415Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2937675Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2937680Z 2025-12-04T11:45:26.2937766Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2937839Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2937896Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2937953Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2938435Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2938534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2938572Z graph_break [] 2025-12-04T11:45:26.2938632Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2938705Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2939200Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2939249Z current_size = base.storage().size() 2025-12-04T11:45:26.2939290Z Autotune Choices Stats: 2025-12-04T11:45:26.2939655Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2939711Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2939752Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2939852Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2940085Z triton_mm_0 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2940135Z _scaled_mm 0.0192 ms 31.8% 2025-12-04T11:45:26.2940265Z SingleProcess AUTOTUNE benchmarking takes 0.0119 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2940351Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2940395Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2940451Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2940552Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2941043Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2941081Z graph_break [] 2025-12-04T11:45:26.2941141Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2941215Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2941255Z Autotune Choices Stats: 2025-12-04T11:45:26.2941621Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006318999920040369, "best_triton_pos": 0} 2025-12-04T11:45:26.2941666Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2941706Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2941823Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2942060Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2942101Z _scaled_mm 0.0185 ms 34.2% 2025-12-04T11:45:26.2942229Z SingleProcess AUTOTUNE benchmarking takes 0.0106 seconds and 0.0542 seconds precompiling for 2 choices 2025-12-04T11:45:26.2942303Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2942344Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2942402Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2942500Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2942986Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2943023Z graph_break [] 2025-12-04T11:45:26.2943082Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2943155Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2943197Z Autotune Choices Stats: 2025-12-04T11:45:26.2943608Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2943672Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2943712Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2943810Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2944056Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2944111Z _scaled_mm 0.0200 ms 30.5% 2025-12-04T11:45:26.2944239Z SingleProcess AUTOTUNE benchmarking takes 0.0109 seconds and 0.0474 seconds precompiling for 2 choices 2025-12-04T11:45:26.2944432Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d16434e8ee56f6d4.xml - 2025-12-04T11:45:26.2944495Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2945077Z FAILED [0.4509s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2945080Z 2025-12-04T11:45:26.2945158Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2945419Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2945423Z 2025-12-04T11:45:26.2945511Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2945573Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2945655Z ================== 1 failed, 152 deselected, 2 rerun in 2.69s ================== 2025-12-04T11:45:26.2945692Z Got exit code 1 2025-12-04T11:45:26.2945732Z Retrying single test... 2025-12-04T11:45:26.2945878Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-83c0553e271454d2.xml 2025-12-04T11:45:26.2945935Z ============================= test session starts ============================== 2025-12-04T11:45:26.2946049Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2946090Z cachedir: .pytest_cache 2025-12-04T11:45:26.2946250Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2946296Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2946335Z configfile: pytest.ini 2025-12-04T11:45:26.2946499Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2946576Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2946834Z stepcurrent: skipping 152 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2946878Z Running 1 items in this shard 2025-12-04T11:45:26.2946880Z 2025-12-04T11:45:26.2947095Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8075s] [100%] 2025-12-04T11:45:26.2947319Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3895s] [100%] 2025-12-04T11:45:26.2947508Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.3412s] [100%] 2025-12-04T11:45:26.2947511Z 2025-12-04T11:45:26.2947562Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2947719Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2947766Z Traceback (most recent call last): 2025-12-04T11:45:26.2947935Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2947977Z method(*args, **kwargs) 2025-12-04T11:45:26.2948131Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2948173Z method(*args, **kwargs) 2025-12-04T11:45:26.2948325Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2948366Z with policy(): 2025-12-04T11:45:26.2948522Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2948564Z raise RuntimeError(msg) 2025-12-04T11:45:26.2948952Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.2948955Z 2025-12-04T11:45:26.2949028Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2949287Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2949290Z 2025-12-04T11:45:26.2949376Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2949463Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2949505Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2949564Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2950054Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2950155Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2950191Z graph_break [] 2025-12-04T11:45:26.2950252Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2950324Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2950817Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2950864Z current_size = base.storage().size() 2025-12-04T11:45:26.2950905Z Autotune Choices Stats: 2025-12-04T11:45:26.2951276Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2951332Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2951373Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2951473Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2951719Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2951760Z _scaled_mm 0.0202 ms 29.9% 2025-12-04T11:45:26.2951904Z SingleProcess AUTOTUNE benchmarking takes 0.0116 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2952047Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2952094Z Traceback (most recent call last): 2025-12-04T11:45:26.2952251Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2952294Z method(*args, **kwargs) 2025-12-04T11:45:26.2952447Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2952488Z method(*args, **kwargs) 2025-12-04T11:45:26.2952641Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2952678Z with policy(): 2025-12-04T11:45:26.2952834Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2952875Z raise RuntimeError(msg) 2025-12-04T11:45:26.2953295Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.2953323Z 2025-12-04T11:45:26.2953399Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2953658Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2953660Z 2025-12-04T11:45:26.2953747Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2953821Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2953863Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2953920Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2954403Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2954505Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2954542Z graph_break [] 2025-12-04T11:45:26.2954602Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2954675Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2955167Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2955230Z current_size = base.storage().size() 2025-12-04T11:45:26.2955270Z Autotune Choices Stats: 2025-12-04T11:45:26.2955649Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2955693Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2955733Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2955852Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2956085Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2956128Z _scaled_mm 0.0202 ms 29.9% 2025-12-04T11:45:26.2956257Z SingleProcess AUTOTUNE benchmarking takes 0.0116 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2956330Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2956373Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2956430Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2956531Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2957015Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2957053Z graph_break [] 2025-12-04T11:45:26.2957112Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2957186Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2957239Z Autotune Choices Stats: 2025-12-04T11:45:26.2957601Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2957647Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2957687Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2957788Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2958021Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2958063Z _scaled_mm 0.0187 ms 32.5% 2025-12-04T11:45:26.2958190Z SingleProcess AUTOTUNE benchmarking takes 0.0101 seconds and 0.0511 seconds precompiling for 2 choices 2025-12-04T11:45:26.2958243Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2958387Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2958433Z Traceback (most recent call last): 2025-12-04T11:45:26.2958589Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2958630Z method(*args, **kwargs) 2025-12-04T11:45:26.2958784Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2958839Z method(*args, **kwargs) 2025-12-04T11:45:26.2958990Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2959029Z with policy(): 2025-12-04T11:45:26.2959182Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2959224Z raise RuntimeError(msg) 2025-12-04T11:45:26.2959633Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2959637Z 2025-12-04T11:45:26.2959711Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2959972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2959976Z 2025-12-04T11:45:26.2960062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2960135Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2960177Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2960235Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2960720Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2960820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2960857Z graph_break [] 2025-12-04T11:45:26.2960917Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2961004Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2961496Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2961546Z current_size = base.storage().size() 2025-12-04T11:45:26.2961586Z Autotune Choices Stats: 2025-12-04T11:45:26.2961954Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006039999891072512, "best_triton_pos": 0} 2025-12-04T11:45:26.2962000Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2962040Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2962140Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2962373Z triton_mm_0 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2962415Z _scaled_mm 0.0202 ms 29.9% 2025-12-04T11:45:26.2962543Z SingleProcess AUTOTUNE benchmarking takes 0.0116 seconds and 0.0616 seconds precompiling for 2 choices 2025-12-04T11:45:26.2962631Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2962672Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2962729Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2962829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2963380Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2963420Z graph_break [] 2025-12-04T11:45:26.2963492Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2963568Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2963608Z Autotune Choices Stats: 2025-12-04T11:45:26.2963970Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.2964016Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2964056Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2964156Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2964391Z triton_mm_1 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2964432Z _scaled_mm 0.0187 ms 32.5% 2025-12-04T11:45:26.2964560Z SingleProcess AUTOTUNE benchmarking takes 0.0101 seconds and 0.0511 seconds precompiling for 2 choices 2025-12-04T11:45:26.2964634Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2964675Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2964746Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2964846Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2965332Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2965369Z graph_break [] 2025-12-04T11:45:26.2965428Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2965502Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2965543Z Autotune Choices Stats: 2025-12-04T11:45:26.2965904Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.2965951Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2965990Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2966089Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2966320Z triton_mm_2 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2966374Z _scaled_mm 0.0192 ms 31.2% 2025-12-04T11:45:26.2966501Z SingleProcess AUTOTUNE benchmarking takes 0.0100 seconds and 0.0490 seconds precompiling for 2 choices 2025-12-04T11:45:26.2966692Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-83c0553e271454d2.xml - 2025-12-04T11:45:26.2966753Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2967360Z FAILED [0.3412s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2967364Z 2025-12-04T11:45:26.2967440Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2967702Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2967706Z 2025-12-04T11:45:26.2967793Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2967856Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2967925Z ================== 1 failed, 187 deselected, 2 rerun in 2.56s ================== 2025-12-04T11:45:26.2967962Z Got exit code 1 2025-12-04T11:45:26.2968001Z Retrying single test... 2025-12-04T11:45:26.2968150Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bd65e6e665b5f5c8.xml 2025-12-04T11:45:26.2968206Z ============================= test session starts ============================== 2025-12-04T11:45:26.2968319Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2968359Z cachedir: .pytest_cache 2025-12-04T11:45:26.2968519Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2968576Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2968617Z configfile: pytest.ini 2025-12-04T11:45:26.2968781Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2968857Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.2969115Z stepcurrent: skipping 152 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2969158Z Running 1 items in this shard 2025-12-04T11:45:26.2969161Z 2025-12-04T11:45:26.2969376Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [1.8092s] [100%] 2025-12-04T11:45:26.2969589Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.3940s] [100%] 2025-12-04T11:45:26.2969782Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda FAILED [0.4669s] [100%] 2025-12-04T11:45:26.2969784Z 2025-12-04T11:45:26.2969835Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2969978Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2970025Z Traceback (most recent call last): 2025-12-04T11:45:26.2970184Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2970236Z method(*args, **kwargs) 2025-12-04T11:45:26.2970391Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2970432Z method(*args, **kwargs) 2025-12-04T11:45:26.2970585Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2970621Z with policy(): 2025-12-04T11:45:26.2970794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2970836Z raise RuntimeError(msg) 2025-12-04T11:45:26.2971235Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1019215872. 2025-12-04T11:45:26.2971239Z 2025-12-04T11:45:26.2971313Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2971573Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2971576Z 2025-12-04T11:45:26.2971663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2971738Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2971780Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2971837Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2972323Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2972425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2972478Z graph_break [] 2025-12-04T11:45:26.2972537Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2972611Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2973101Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2973150Z current_size = base.storage().size() 2025-12-04T11:45:26.2973189Z Autotune Choices Stats: 2025-12-04T11:45:26.2973603Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2973648Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2973689Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2973789Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2974024Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2974080Z _scaled_mm 0.0200 ms 30.9% 2025-12-04T11:45:26.2974209Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0642 seconds precompiling for 2 choices 2025-12-04T11:45:26.2974351Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2974398Z Traceback (most recent call last): 2025-12-04T11:45:26.2974553Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2974594Z method(*args, **kwargs) 2025-12-04T11:45:26.2974762Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2974802Z method(*args, **kwargs) 2025-12-04T11:45:26.2974966Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2975004Z with policy(): 2025-12-04T11:45:26.2975158Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2975200Z raise RuntimeError(msg) 2025-12-04T11:45:26.2975590Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1019215872 and is now 1038090240. 2025-12-04T11:45:26.2975593Z 2025-12-04T11:45:26.2975668Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2975931Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2975933Z 2025-12-04T11:45:26.2976019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2976094Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2976136Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2976192Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2976682Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2976798Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2976836Z graph_break [] 2025-12-04T11:45:26.2976896Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2976970Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2977456Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2977504Z current_size = base.storage().size() 2025-12-04T11:45:26.2977544Z Autotune Choices Stats: 2025-12-04T11:45:26.2977914Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2977958Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2978012Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2978111Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2978346Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2978387Z _scaled_mm 0.0200 ms 30.9% 2025-12-04T11:45:26.2978526Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0642 seconds precompiling for 2 choices 2025-12-04T11:45:26.2978600Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2978642Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2978698Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2978811Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2979300Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2979339Z graph_break [] 2025-12-04T11:45:26.2979398Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2979472Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2979513Z Autotune Choices Stats: 2025-12-04T11:45:26.2979876Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2979921Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2979961Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2980060Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2980301Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2980343Z _scaled_mm 0.0184 ms 34.1% 2025-12-04T11:45:26.2980471Z SingleProcess AUTOTUNE benchmarking takes 0.0108 seconds and 0.0517 seconds precompiling for 2 choices 2025-12-04T11:45:26.2980526Z =================================== FAILURES =================================== 2025-12-04T11:45:26.2980668Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2980715Z Traceback (most recent call last): 2025-12-04T11:45:26.2980871Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2980911Z method(*args, **kwargs) 2025-12-04T11:45:26.2981064Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2981105Z method(*args, **kwargs) 2025-12-04T11:45:26.2981257Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2981295Z with policy(): 2025-12-04T11:45:26.2981449Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2981491Z raise RuntimeError(msg) 2025-12-04T11:45:26.2981884Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2981897Z 2025-12-04T11:45:26.2981971Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2982233Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2982235Z 2025-12-04T11:45:26.2982333Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2982407Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2982448Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2982515Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2983000Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2983102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2983139Z graph_break [] 2025-12-04T11:45:26.2983199Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2983311Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2983804Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2983853Z current_size = base.storage().size() 2025-12-04T11:45:26.2983893Z Autotune Choices Stats: 2025-12-04T11:45:26.2984262Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_0", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0061599998734891415, "best_triton_pos": 0} 2025-12-04T11:45:26.2984321Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2984362Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2984461Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2984697Z triton_mm_0 0.0062 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2984738Z _scaled_mm 0.0200 ms 30.9% 2025-12-04T11:45:26.2984866Z SingleProcess AUTOTUNE benchmarking takes 0.0132 seconds and 0.0642 seconds precompiling for 2 choices 2025-12-04T11:45:26.2984939Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2984982Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2985038Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2985140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2985634Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2985692Z graph_break [] 2025-12-04T11:45:26.2985752Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2985824Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2985865Z Autotune Choices Stats: 2025-12-04T11:45:26.2986243Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_1", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006279999855905771, "best_triton_pos": 0} 2025-12-04T11:45:26.2986288Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2986327Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2986438Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2986672Z triton_mm_1 0.0063 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2986714Z _scaled_mm 0.0184 ms 34.1% 2025-12-04T11:45:26.2986841Z SingleProcess AUTOTUNE benchmarking takes 0.0108 seconds and 0.0517 seconds precompiling for 2 choices 2025-12-04T11:45:26.2986916Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2986957Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2987016Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2987114Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2987601Z inductor [('triton_bundler_save_kernel', 16), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('generated_module_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_num_precompiles', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2987638Z graph_break [] 2025-12-04T11:45:26.2987698Z aten_mm_info [('aten._scaled_mm.default_3_16_32', 1)] 2025-12-04T11:45:26.2987783Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2987825Z Autotune Choices Stats: 2025-12-04T11:45:26.2988189Z {"num_choices": 2, "num_triton_choices": 1, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.006359000224620104, "best_triton_pos": 0} 2025-12-04T11:45:26.2988235Z AUTOTUNE scaled_mm(3x32, 32x16, , ) 2025-12-04T11:45:26.2988277Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2988375Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2988607Z triton_mm_2 0.0064 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2988648Z _scaled_mm 0.0193 ms 32.9% 2025-12-04T11:45:26.2988775Z SingleProcess AUTOTUNE benchmarking takes 0.0110 seconds and 0.0528 seconds precompiling for 2 choices 2025-12-04T11:45:26.2988969Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-bd65e6e665b5f5c8.xml - 2025-12-04T11:45:26.2989032Z =========================== short test summary info ============================ 2025-12-04T11:45:26.2989617Z FAILED [0.4669s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1038090240 and is now 1056964608. 2025-12-04T11:45:26.2989632Z 2025-12-04T11:45:26.2989705Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2989972Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2989986Z 2025-12-04T11:45:26.2990074Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2990136Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.2990212Z ================== 1 failed, 187 deselected, 2 rerun in 2.69s ================== 2025-12-04T11:45:26.2990253Z Got exit code 1 2025-12-04T11:45:26.2990462Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda 2025-12-04T11:45:26.2990592Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.2990739Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4353e0a3a83171ad.xml 2025-12-04T11:45:26.2990796Z ============================= test session starts ============================== 2025-12-04T11:45:26.2990910Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.2990953Z cachedir: .pytest_cache 2025-12-04T11:45:26.2991114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.2991159Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.2991198Z configfile: pytest.ini 2025-12-04T11:45:26.2991362Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.2991440Z collecting ... collected 188 items / 153 deselected / 35 selected 2025-12-04T11:45:26.2991497Z stepcurrent: skipping 153 already run items. 2025-12-04T11:45:26.2991551Z Running 35 items in this shard 2025-12-04T11:45:26.2991553Z 2025-12-04T11:45:26.2991776Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0097s] [ 2%] 2025-12-04T11:45:26.2991994Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.6952s] [ 2%] 2025-12-04T11:45:26.2992188Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.5878s] [ 2%] 2025-12-04T11:45:26.2992192Z 2025-12-04T11:45:26.2992243Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.2992386Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2992435Z Traceback (most recent call last): 2025-12-04T11:45:26.2992597Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2992639Z method(*args, **kwargs) 2025-12-04T11:45:26.2992793Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2992834Z method(*args, **kwargs) 2025-12-04T11:45:26.2992986Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2993025Z with policy(): 2025-12-04T11:45:26.2993178Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2993232Z raise RuntimeError(msg) 2025-12-04T11:45:26.2993675Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.2993679Z 2025-12-04T11:45:26.2993771Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.2994034Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.2994051Z 2025-12-04T11:45:26.2994140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.2994214Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.2994257Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.2994315Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.2994804Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.2994904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.2994941Z graph_break [] 2025-12-04T11:45:26.2995004Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.2995078Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.2995568Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.2995630Z current_size = base.storage().size() 2025-12-04T11:45:26.2995671Z Autotune Choices Stats: 2025-12-04T11:45:26.2996038Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.2996084Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.2996124Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.2996228Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.2996463Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2996699Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2996933Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2997160Z triton_mm_0 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2997403Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.2997640Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.2997875Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.2998101Z triton_mm_3 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.2998144Z _scaled_mm 0.0205 ms 29.8% 2025-12-04T11:45:26.2998273Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1746 seconds precompiling for 9 choices 2025-12-04T11:45:26.2998420Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.2998465Z Traceback (most recent call last): 2025-12-04T11:45:26.2998623Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2998664Z method(*args, **kwargs) 2025-12-04T11:45:26.2998818Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.2998858Z method(*args, **kwargs) 2025-12-04T11:45:26.2999009Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.2999048Z with policy(): 2025-12-04T11:45:26.2999201Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.2999265Z raise RuntimeError(msg) 2025-12-04T11:45:26.2999659Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.2999664Z 2025-12-04T11:45:26.2999739Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3000003Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3000006Z 2025-12-04T11:45:26.3000093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3000168Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3000210Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3000269Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3000755Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3000857Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3000895Z graph_break [] 2025-12-04T11:45:26.3000959Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3001043Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3001534Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3001591Z current_size = base.storage().size() 2025-12-04T11:45:26.3001633Z Autotune Choices Stats: 2025-12-04T11:45:26.3002008Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.3002055Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3002096Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3002196Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3002430Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3002657Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3002883Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3003114Z triton_mm_0 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3003374Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3003601Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3003825Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3004051Z triton_mm_3 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3004093Z _scaled_mm 0.0205 ms 29.8% 2025-12-04T11:45:26.3004222Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1746 seconds precompiling for 9 choices 2025-12-04T11:45:26.3004296Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3004338Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3004394Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3004495Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3004977Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3005032Z graph_break [] 2025-12-04T11:45:26.3005093Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3005169Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3005209Z Autotune Choices Stats: 2025-12-04T11:45:26.3005601Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.3005646Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3005687Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3005786Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3006017Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3006250Z triton_mm_14 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3006478Z triton_mm_8 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3006706Z triton_mm_12 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3006933Z triton_mm_10 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3007176Z triton_mm_15 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3007404Z triton_mm_11 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3007629Z triton_mm_13 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3007671Z _scaled_mm 0.0211 ms 29.0% 2025-12-04T11:45:26.3007799Z SingleProcess AUTOTUNE benchmarking takes 0.0377 seconds and 0.0799 seconds precompiling for 9 choices 2025-12-04T11:45:26.3007853Z =================================== FAILURES =================================== 2025-12-04T11:45:26.3007997Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3008044Z Traceback (most recent call last): 2025-12-04T11:45:26.3008200Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3008242Z method(*args, **kwargs) 2025-12-04T11:45:26.3008396Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3008449Z method(*args, **kwargs) 2025-12-04T11:45:26.3008600Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3008639Z with policy(): 2025-12-04T11:45:26.3008794Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3008838Z raise RuntimeError(msg) 2025-12-04T11:45:26.3009245Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3009258Z 2025-12-04T11:45:26.3009333Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3009596Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3009599Z 2025-12-04T11:45:26.3009687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3009762Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3009804Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3009861Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3010348Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3010449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3010486Z graph_break [] 2025-12-04T11:45:26.3010547Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3010619Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3011118Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3011166Z current_size = base.storage().size() 2025-12-04T11:45:26.3011207Z Autotune Choices Stats: 2025-12-04T11:45:26.3011575Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_2", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.3011620Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3011662Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3011760Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3011997Z triton_mm_2 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3012225Z triton_mm_4 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3012452Z triton_mm_5 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3012689Z triton_mm_0 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3012925Z triton_mm_1 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3013170Z triton_mm_7 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3013431Z triton_mm_6 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3013660Z triton_mm_3 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3013702Z _scaled_mm 0.0205 ms 29.8% 2025-12-04T11:45:26.3013832Z SingleProcess AUTOTUNE benchmarking takes 0.0387 seconds and 0.1746 seconds precompiling for 9 choices 2025-12-04T11:45:26.3013905Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3013948Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3014005Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3014106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3014593Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3014646Z graph_break [] 2025-12-04T11:45:26.3014708Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3014781Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3014822Z Autotune Choices Stats: 2025-12-04T11:45:26.3015184Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_9", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006120000034570694, "best_triton_pos": 0} 2025-12-04T11:45:26.3015230Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3015270Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3015368Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3015602Z triton_mm_9 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3015833Z triton_mm_14 0.0062 ms 99.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3016059Z triton_mm_8 0.0063 ms 97.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3016301Z triton_mm_12 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3016541Z triton_mm_10 0.0064 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3016781Z triton_mm_15 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3017009Z triton_mm_11 0.0065 ms 94.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3017235Z triton_mm_13 0.0066 ms 92.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3017278Z _scaled_mm 0.0211 ms 29.0% 2025-12-04T11:45:26.3017406Z SingleProcess AUTOTUNE benchmarking takes 0.0377 seconds and 0.0799 seconds precompiling for 9 choices 2025-12-04T11:45:26.3017481Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3017522Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3017580Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3017680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3018166Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3018216Z graph_break [] 2025-12-04T11:45:26.3018276Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3018349Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3018390Z Autotune Choices Stats: 2025-12-04T11:45:26.3018758Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_22", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2", "best_time": 0.005919000133872032, "best_triton_pos": 0} 2025-12-04T11:45:26.3018803Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3018843Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3018941Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3019175Z triton_mm_22 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3019404Z triton_mm_21 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3019634Z triton_mm_16 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3019876Z triton_mm_23 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3020106Z triton_mm_19 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3020344Z triton_mm_18 0.0064 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3020581Z triton_mm_20 0.0064 ms 91.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3020809Z triton_mm_17 0.0065 ms 90.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3020850Z _scaled_mm 0.0229 ms 25.9% 2025-12-04T11:45:26.3020980Z SingleProcess AUTOTUNE benchmarking takes 0.0520 seconds and 0.1870 seconds precompiling for 9 choices 2025-12-04T11:45:26.3021172Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-4353e0a3a83171ad.xml - 2025-12-04T11:45:26.3021233Z =========================== short test summary info ============================ 2025-12-04T11:45:26.3021828Z FAILED [0.5878s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3021831Z 2025-12-04T11:45:26.3021906Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3022182Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3022184Z 2025-12-04T11:45:26.3022272Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3022336Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.3022404Z ================== 1 failed, 153 deselected, 2 rerun in 3.31s ================== 2025-12-04T11:45:26.3022442Z Got exit code 1 2025-12-04T11:45:26.3022482Z Retrying single test... 2025-12-04T11:45:26.3022628Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-113abd9ec1197890.xml 2025-12-04T11:45:26.3022684Z ============================= test session starts ============================== 2025-12-04T11:45:26.3022796Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.3022838Z cachedir: .pytest_cache 2025-12-04T11:45:26.3022999Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.3023045Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.3023085Z configfile: pytest.ini 2025-12-04T11:45:26.3023246Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.3023360Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.3023620Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3023679Z Running 1 items in this shard 2025-12-04T11:45:26.3023681Z 2025-12-04T11:45:26.3023899Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.0114s] [100%] 2025-12-04T11:45:26.3024131Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8113s] [100%] 2025-12-04T11:45:26.3024325Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7332s] [100%] 2025-12-04T11:45:26.3024340Z 2025-12-04T11:45:26.3024392Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.3024537Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3024583Z Traceback (most recent call last): 2025-12-04T11:45:26.3024742Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3024784Z method(*args, **kwargs) 2025-12-04T11:45:26.3024940Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3024979Z method(*args, **kwargs) 2025-12-04T11:45:26.3025134Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3025170Z with policy(): 2025-12-04T11:45:26.3025326Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3025367Z raise RuntimeError(msg) 2025-12-04T11:45:26.3025760Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.3025777Z 2025-12-04T11:45:26.3025852Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3026113Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3026115Z 2025-12-04T11:45:26.3026204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3026278Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3026321Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3026377Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3026863Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3026963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3027001Z graph_break [] 2025-12-04T11:45:26.3027062Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3027135Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3027626Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3027684Z current_size = base.storage().size() 2025-12-04T11:45:26.3027726Z Autotune Choices Stats: 2025-12-04T11:45:26.3028111Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.3028158Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3028197Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3028310Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3028546Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3028778Z triton_mm_7 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3029006Z triton_mm_0 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3029233Z triton_mm_5 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3029457Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3029680Z triton_mm_2 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3029922Z triton_mm_1 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3030148Z triton_mm_4 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3030190Z _scaled_mm 0.0198 ms 30.2% 2025-12-04T11:45:26.3030319Z SingleProcess AUTOTUNE benchmarking takes 0.0383 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.3030465Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3030511Z Traceback (most recent call last): 2025-12-04T11:45:26.3030669Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3030710Z method(*args, **kwargs) 2025-12-04T11:45:26.3030868Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3030908Z method(*args, **kwargs) 2025-12-04T11:45:26.3031062Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3031102Z with policy(): 2025-12-04T11:45:26.3031255Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3031308Z raise RuntimeError(msg) 2025-12-04T11:45:26.3031700Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.3031703Z 2025-12-04T11:45:26.3031791Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3032066Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3032068Z 2025-12-04T11:45:26.3032159Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3032233Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3032277Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3032334Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3032823Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3032926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3032963Z graph_break [] 2025-12-04T11:45:26.3033025Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3033100Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3033630Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3033694Z current_size = base.storage().size() 2025-12-04T11:45:26.3033735Z Autotune Choices Stats: 2025-12-04T11:45:26.3034226Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.3034277Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3034317Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3034419Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3034652Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3034880Z triton_mm_7 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3035106Z triton_mm_0 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3035336Z triton_mm_5 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3035580Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3035817Z triton_mm_2 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3036061Z triton_mm_1 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3036287Z triton_mm_4 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3036331Z _scaled_mm 0.0198 ms 30.2% 2025-12-04T11:45:26.3036461Z SingleProcess AUTOTUNE benchmarking takes 0.0383 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.3036536Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3036578Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3036635Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3036736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3037220Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3037259Z graph_break [] 2025-12-04T11:45:26.3037320Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3037395Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3037446Z Autotune Choices Stats: 2025-12-04T11:45:26.3037810Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.3037855Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3037898Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3037996Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3038228Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3038456Z triton_mm_10 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3038683Z triton_mm_14 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3038912Z triton_mm_12 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3039148Z triton_mm_13 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3039376Z triton_mm_15 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3039611Z triton_mm_9 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3039845Z triton_mm_11 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3039888Z _scaled_mm 0.0213 ms 28.0% 2025-12-04T11:45:26.3040016Z SingleProcess AUTOTUNE benchmarking takes 0.0409 seconds and 0.1007 seconds precompiling for 9 choices 2025-12-04T11:45:26.3040070Z =================================== FAILURES =================================== 2025-12-04T11:45:26.3040215Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3040262Z Traceback (most recent call last): 2025-12-04T11:45:26.3040421Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3040464Z method(*args, **kwargs) 2025-12-04T11:45:26.3040618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3040658Z method(*args, **kwargs) 2025-12-04T11:45:26.3040811Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3040850Z with policy(): 2025-12-04T11:45:26.3041004Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3041059Z raise RuntimeError(msg) 2025-12-04T11:45:26.3041458Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3041460Z 2025-12-04T11:45:26.3041535Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3041800Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3041802Z 2025-12-04T11:45:26.3041890Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3041965Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3042010Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3042067Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3042554Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3042657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3042694Z graph_break [] 2025-12-04T11:45:26.3042773Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3042846Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3043396Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3043444Z current_size = base.storage().size() 2025-12-04T11:45:26.3043485Z Autotune Choices Stats: 2025-12-04T11:45:26.3043869Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.3043916Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3043956Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3044056Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3044291Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3044520Z triton_mm_7 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3044749Z triton_mm_0 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3044974Z triton_mm_5 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3045212Z triton_mm_6 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3045436Z triton_mm_2 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3045664Z triton_mm_1 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3045893Z triton_mm_4 0.0063 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3045934Z _scaled_mm 0.0198 ms 30.2% 2025-12-04T11:45:26.3046064Z SingleProcess AUTOTUNE benchmarking takes 0.0383 seconds and 0.1651 seconds precompiling for 9 choices 2025-12-04T11:45:26.3046141Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3046183Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3046241Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3046342Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3046825Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3046878Z graph_break [] 2025-12-04T11:45:26.3046939Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3047014Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3047053Z Autotune Choices Stats: 2025-12-04T11:45:26.3047440Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_8", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.3047484Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3047526Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3047624Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3047860Z triton_mm_8 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3048091Z triton_mm_10 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3048317Z triton_mm_14 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3048543Z triton_mm_12 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3048767Z triton_mm_13 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3049006Z triton_mm_15 0.0063 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3049234Z triton_mm_9 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3049459Z triton_mm_11 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3049502Z _scaled_mm 0.0213 ms 28.0% 2025-12-04T11:45:26.3049631Z SingleProcess AUTOTUNE benchmarking takes 0.0409 seconds and 0.1007 seconds precompiling for 9 choices 2025-12-04T11:45:26.3049705Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3049746Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3049804Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3049903Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3050384Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3050435Z graph_break [] 2025-12-04T11:45:26.3050496Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3050570Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3050612Z Autotune Choices Stats: 2025-12-04T11:45:26.3050984Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_18", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005919999908655882, "best_triton_pos": 0} 2025-12-04T11:45:26.3051041Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3051081Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3051179Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3051414Z triton_mm_18 0.0059 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3051643Z triton_mm_23 0.0061 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3051871Z triton_mm_20 0.0061 ms 96.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3052104Z triton_mm_19 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3052330Z triton_mm_21 0.0062 ms 96.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3052567Z triton_mm_22 0.0062 ms 95.5% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3052798Z triton_mm_16 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3053027Z triton_mm_17 0.0062 ms 94.9% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3053068Z _scaled_mm 0.0065 ms 90.8% 2025-12-04T11:45:26.3053198Z SingleProcess AUTOTUNE benchmarking takes 0.0554 seconds and 0.1804 seconds precompiling for 9 choices 2025-12-04T11:45:26.3053434Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-113abd9ec1197890.xml - 2025-12-04T11:45:26.3053495Z =========================== short test summary info ============================ 2025-12-04T11:45:26.3054093Z FAILED [0.7332s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3054109Z 2025-12-04T11:45:26.3054184Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3054449Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3054452Z 2025-12-04T11:45:26.3054539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3054617Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.3054685Z ================== 1 failed, 187 deselected, 2 rerun in 3.57s ================== 2025-12-04T11:45:26.3054725Z Got exit code 1 2025-12-04T11:45:26.3054765Z Retrying single test... 2025-12-04T11:45:26.3054925Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-10c5615ef83d21aa.xml 2025-12-04T11:45:26.3054982Z ============================= test session starts ============================== 2025-12-04T11:45:26.3055096Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.3055137Z cachedir: .pytest_cache 2025-12-04T11:45:26.3055298Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.3055343Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.3055384Z configfile: pytest.ini 2025-12-04T11:45:26.3055549Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.3055625Z collecting ... collected 188 items / 187 deselected / 1 selected 2025-12-04T11:45:26.3055884Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3055930Z Running 1 items in this shard 2025-12-04T11:45:26.3055932Z 2025-12-04T11:45:26.3056149Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [2.1569s] [100%] 2025-12-04T11:45:26.3056388Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda ('RERUN', {'yellow': True}) [0.8084s] [100%] 2025-12-04T11:45:26.3056582Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda FAILED [0.7116s] [100%] 2025-12-04T11:45:26.3056586Z 2025-12-04T11:45:26.3056636Z ==================================== RERUNS ==================================== 2025-12-04T11:45:26.3056781Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3056826Z Traceback (most recent call last): 2025-12-04T11:45:26.3056989Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3057030Z method(*args, **kwargs) 2025-12-04T11:45:26.3057185Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3057226Z method(*args, **kwargs) 2025-12-04T11:45:26.3057379Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3057416Z with policy(): 2025-12-04T11:45:26.3057573Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3057614Z raise RuntimeError(msg) 2025-12-04T11:45:26.3058003Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 807403520 and is now 1033895936. 2025-12-04T11:45:26.3058016Z 2025-12-04T11:45:26.3058090Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3058353Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3058355Z 2025-12-04T11:45:26.3058453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3058528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3058572Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3058641Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3059135Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3059237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3059275Z graph_break [] 2025-12-04T11:45:26.3059334Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3059409Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3059901Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3059950Z current_size = base.storage().size() 2025-12-04T11:45:26.3059990Z Autotune Choices Stats: 2025-12-04T11:45:26.3060359Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.3060417Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3060459Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3060559Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3060795Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3061031Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3061258Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3061484Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3061712Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3061949Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3062173Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3062410Z triton_mm_1 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3062463Z _scaled_mm 0.0228 ms 26.1% 2025-12-04T11:45:26.3062593Z SingleProcess AUTOTUNE benchmarking takes 0.0455 seconds and 0.1736 seconds precompiling for 9 choices 2025-12-04T11:45:26.3062739Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3062784Z Traceback (most recent call last): 2025-12-04T11:45:26.3062941Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3062983Z method(*args, **kwargs) 2025-12-04T11:45:26.3063138Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3063180Z method(*args, **kwargs) 2025-12-04T11:45:26.3063374Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3063412Z with policy(): 2025-12-04T11:45:26.3063568Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3063609Z raise RuntimeError(msg) 2025-12-04T11:45:26.3064005Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1033895936 and is now 1067450368. 2025-12-04T11:45:26.3064024Z 2025-12-04T11:45:26.3064098Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3064363Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3064365Z 2025-12-04T11:45:26.3064453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3064528Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3064571Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3064629Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3065115Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3065217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3065256Z graph_break [] 2025-12-04T11:45:26.3065316Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3065390Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3065882Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3065946Z current_size = base.storage().size() 2025-12-04T11:45:26.3065986Z Autotune Choices Stats: 2025-12-04T11:45:26.3066366Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.3066411Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3066464Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3066565Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3066801Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3067030Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3067256Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3067485Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3067712Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3067937Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3068172Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3068396Z triton_mm_1 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3068441Z _scaled_mm 0.0228 ms 26.1% 2025-12-04T11:45:26.3068569Z SingleProcess AUTOTUNE benchmarking takes 0.0455 seconds and 0.1736 seconds precompiling for 9 choices 2025-12-04T11:45:26.3068643Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3068688Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3068745Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3068846Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3069332Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3069369Z graph_break [] 2025-12-04T11:45:26.3069441Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3069514Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3069556Z Autotune Choices Stats: 2025-12-04T11:45:26.3069933Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.3069978Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3070018Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3070133Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3070367Z triton_mm_15 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3070595Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3070824Z triton_mm_8 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3071050Z triton_mm_14 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3071276Z triton_mm_11 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3071508Z triton_mm_12 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3071746Z triton_mm_9 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3071971Z triton_mm_13 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3072012Z _scaled_mm 0.0217 ms 28.0% 2025-12-04T11:45:26.3072141Z SingleProcess AUTOTUNE benchmarking takes 0.0439 seconds and 0.1653 seconds precompiling for 9 choices 2025-12-04T11:45:26.3072195Z =================================== FAILURES =================================== 2025-12-04T11:45:26.3072339Z _ TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda _ 2025-12-04T11:45:26.3072386Z Traceback (most recent call last): 2025-12-04T11:45:26.3072544Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3072584Z method(*args, **kwargs) 2025-12-04T11:45:26.3072739Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T11:45:26.3072780Z method(*args, **kwargs) 2025-12-04T11:45:26.3072933Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T11:45:26.3072970Z with policy(): 2025-12-04T11:45:26.3073136Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T11:45:26.3073176Z raise RuntimeError(msg) 2025-12-04T11:45:26.3073598Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3073602Z 2025-12-04T11:45:26.3073695Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3073974Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3073976Z 2025-12-04T11:45:26.3074066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3074142Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3074185Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3074243Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3074728Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3074828Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3074867Z graph_break [] 2025-12-04T11:45:26.3074928Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3075002Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3075493Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_inductor/select_algorithm.py:3433: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() 2025-12-04T11:45:26.3075556Z current_size = base.storage().size() 2025-12-04T11:45:26.3075598Z Autotune Choices Stats: 2025-12-04T11:45:26.3075966Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_3", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4", "best_time": 0.005960000213235617, "best_triton_pos": 0} 2025-12-04T11:45:26.3076011Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3076051Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3076151Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3076385Z triton_mm_3 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3076615Z triton_mm_0 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3076842Z triton_mm_7 0.0060 ms 99.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3077068Z triton_mm_2 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3077308Z triton_mm_4 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3077543Z triton_mm_5 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3077779Z triton_mm_6 0.0060 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3078011Z triton_mm_1 0.0063 ms 94.3% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3078054Z _scaled_mm 0.0228 ms 26.1% 2025-12-04T11:45:26.3078181Z SingleProcess AUTOTUNE benchmarking takes 0.0455 seconds and 0.1736 seconds precompiling for 9 choices 2025-12-04T11:45:26.3078258Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3078299Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3078357Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3078457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3078940Z inductor [('triton_bundler_save_kernel', 72), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3078980Z graph_break [] 2025-12-04T11:45:26.3079041Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3079128Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3079169Z Autotune Choices Stats: 2025-12-04T11:45:26.3079534Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_15", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1", "best_time": 0.0060800001956522465, "best_triton_pos": 0} 2025-12-04T11:45:26.3079579Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3079621Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3079720Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3079954Z triton_mm_15 0.0061 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3080182Z triton_mm_10 0.0062 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3080408Z triton_mm_8 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3080638Z triton_mm_14 0.0062 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3080879Z triton_mm_11 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3081107Z triton_mm_12 0.0063 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3081343Z triton_mm_9 0.0064 ms 95.6% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3081586Z triton_mm_13 0.0064 ms 95.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3081628Z _scaled_mm 0.0217 ms 28.0% 2025-12-04T11:45:26.3081758Z SingleProcess AUTOTUNE benchmarking takes 0.0439 seconds and 0.1653 seconds precompiling for 9 choices 2025-12-04T11:45:26.3081832Z ----------------------------- Captured stdout call ----------------------------- 2025-12-04T11:45:26.3081875Z frames [('total', 1), ('ok', 1)] 2025-12-04T11:45:26.3081931Z stats [('calls_captured', 1), ('unique_graphs', 1)] 2025-12-04T11:45:26.3082032Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)] 2025-12-04T11:45:26.3082515Z inductor [('triton_bundler_save_kernel', 72), ('async_compile_cache_miss', 10), ('benchmarking.InductorBenchmarker.benchmark_gpu', 9), ('generated_module_cache_miss', 8), ('select_algorithm_num_precompiles', 8), ('fxgraph_cache_miss', 1), ('select_algorithm_precompile', 1), ('select_algorithm_autotune', 1), ('benchmarking.InductorBenchmarker.benchmark', 1), ('triton_bundler_save_static_autotuner', 1)] 2025-12-04T11:45:26.3082552Z graph_break [] 2025-12-04T11:45:26.3082613Z aten_mm_info [('aten._scaled_mm.default_3_2048_32', 1)] 2025-12-04T11:45:26.3082687Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T11:45:26.3082740Z Autotune Choices Stats: 2025-12-04T11:45:26.3083103Z {"num_choices": 9, "num_triton_choices": 8, "best_kernel": "triton_mm_17", "best_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8", "best_time": 0.006000000052154064, "best_triton_pos": 0} 2025-12-04T11:45:26.3083148Z AUTOTUNE scaled_mm(3x32, 32x2048, , ) 2025-12-04T11:45:26.3083187Z strides: [32, 1], [1, 32], [], [] 2025-12-04T11:45:26.3083327Z dtypes: torch.float8_e4m3fnuz, torch.float8_e4m3fnuz, torch.float32, torch.float32 2025-12-04T11:45:26.3083563Z triton_mm_17 0.0060 ms 100.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3083792Z triton_mm_22 0.0061 ms 98.7% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3084020Z triton_mm_21 0.0061 ms 98.1% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=32, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=2 2025-12-04T11:45:26.3084253Z triton_mm_16 0.0061 ms 98.0% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=8 2025-12-04T11:45:26.3084480Z triton_mm_18 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3084721Z triton_mm_23 0.0062 ms 97.4% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=16, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=1 2025-12-04T11:45:26.3084979Z triton_mm_20 0.0062 ms 96.8% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3085220Z triton_mm_19 0.0062 ms 96.2% ACC_TYPE='tl.float32', ALLOW_TF32=True, BLOCK_K=32, BLOCK_M=16, BLOCK_N=256, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=True, kpack=2, matrix_instr_nonkdim=16, waves_per_eu=0, num_stages=2, num_warps=4 2025-12-04T11:45:26.3085262Z _scaled_mm 0.0198 ms 30.4% 2025-12-04T11:45:26.3085390Z SingleProcess AUTOTUNE benchmarking takes 0.0562 seconds and 0.1810 seconds precompiling for 9 choices 2025-12-04T11:45:26.3085584Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-10c5615ef83d21aa.xml - 2025-12-04T11:45:26.3085645Z =========================== short test summary info ============================ 2025-12-04T11:45:26.3086240Z FAILED [0.7116s] inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda! Caching allocator allocated memory was 0 and is now reported as 1024 on device 0. CUDA driver allocated memory was 1067450368 and is now 1101004800. 2025-12-04T11:45:26.3086243Z 2025-12-04T11:45:26.3086319Z To execute this test, run the following from the base repo dir: 2025-12-04T11:45:26.3086588Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_fp8.py TestFP8LoweringCUDA.test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3086589Z 2025-12-04T11:45:26.3086693Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T11:45:26.3086755Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T11:45:26.3086823Z ================== 1 failed, 187 deselected, 2 rerun in 3.70s ================== 2025-12-04T11:45:26.3086861Z Got exit code 1 2025-12-04T11:45:26.3087073Z FAILED CONSISTENTLY: test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda 2025-12-04T11:45:26.3087201Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T11:45:26.3087346Z Test results will be stored in test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4420156a0957c67.xml 2025-12-04T11:45:26.3087404Z ============================= test session starts ============================== 2025-12-04T11:45:26.3087515Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T11:45:26.3087558Z cachedir: .pytest_cache 2025-12-04T11:45:26.3087719Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T11:45:26.3087765Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T11:45:26.3087804Z configfile: pytest.ini 2025-12-04T11:45:26.3087967Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T11:45:26.3088044Z collecting ... collected 188 items / 154 deselected / 34 selected 2025-12-04T11:45:26.3088098Z stepcurrent: skipping 154 already run items. 2025-12-04T11:45:26.3088142Z Running 34 items in this shard 2025-12-04T11:45:26.3088155Z 2025-12-04T11:45:26.3088464Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.0465s] (XPU does not support use_fast_accum=True for now) [ 2%] 2025-12-04T11:45:26.3088781Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1790s] (XPU does not support use_fast_accum=True for now) [ 5%] 2025-12-04T11:45:26.3089090Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1755s] (XPU does not support use_fast_accum=True for now) [ 8%] 2025-12-04T11:45:26.3089385Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1323s] (XPU does not support use_fast_accum=True for now) [ 11%] 2025-12-04T11:45:26.3089678Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1165s] (XPU does not support use_fast_accum=True for now) [ 14%] 2025-12-04T11:45:26.3089969Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1114s] (XPU does not support use_fast_accum=True for now) [ 17%] 2025-12-04T11:45:26.3090258Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1103s] (XPU does not support use_fast_accum=True for now) [ 20%] 2025-12-04T11:45:26.3090545Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1117s] (XPU does not support use_fast_accum=True for now) [ 23%] 2025-12-04T11:45:26.3090834Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1111s] (XPU does not support use_fast_accum=True for now) [ 26%] 2025-12-04T11:45:26.3091136Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1112s] (XPU does not support use_fast_accum=True for now) [ 29%] 2025-12-04T11:45:26.3091424Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1113s] (XPU does not support use_fast_accum=True for now) [ 32%] 2025-12-04T11:45:26.3091709Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_bfloat16_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_bfloat16 SKIPPED [0.1108s] (XPU does not support use_fast_accum=True for now) [ 35%] 2025-12-04T11:45:26.3092005Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.1122s] (XPU does not support use_fast_accum=True for now) [ 38%] 2025-12-04T11:45:26.3092300Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.1117s] (XPU does not support use_fast_accum=True for now) [ 41%] 2025-12-04T11:45:26.3092602Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.1085s] (bias is not supported when output dtype is float32) [ 44%] 2025-12-04T11:45:26.3092908Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0016s] (bias is not supported when output dtype is float32) [ 47%] 2025-12-04T11:45:26.3093195Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0027s] (XPU does not support use_fast_accum=True for now) [ 50%] 2025-12-04T11:45:26.3093541Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.1342s] (XPU does not support use_fast_accum=True for now) [ 52%] 2025-12-04T11:45:26.3093841Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.1296s] (bias is not supported when output dtype is float32) [ 55%] 2025-12-04T11:45:26.3094129Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0017s] (bias is not supported when output dtype is float32) [ 58%] 2025-12-04T11:45:26.3094416Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.0029s] (XPU does not support use_fast_accum=True for now) [ 61%] 2025-12-04T11:45:26.3094701Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.1160s] (XPU does not support use_fast_accum=True for now) [ 64%] 2025-12-04T11:45:26.3094988Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda_float32 SKIPPED [0.1109s] (bias is not supported when output dtype is float32) [ 67%] 2025-12-04T11:45:26.3095273Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_float32_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda_float32 SKIPPED [0.0017s] (bias is not supported when output dtype is float32) [ 70%] 2025-12-04T11:45:26.3095549Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 73%] 2025-12-04T11:45:26.3095807Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_1024,1024,512_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 76%] 2025-12-04T11:45:26.3096056Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_False_cuda_bfloat16 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 79%] 2025-12-04T11:45:26.3096305Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_bfloat16_shape_16,32,32_use_fast_accum_True_cuda_bfloat16 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 82%] 2025-12-04T11:45:26.3096558Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_False_cuda_float32 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 85%] 2025-12-04T11:45:26.3096811Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_1024,1024,512_use_fast_accum_True_cuda_float32 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 88%] 2025-12-04T11:45:26.3097055Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_False_cuda_float32 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 91%] 2025-12-04T11:45:26.3097312Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_tma_template_float32_shape_16,32,32_use_fast_accum_True_cuda_float32 SKIPPED [0.0001s] (Need device-side TMA support in Triton) [ 94%] 2025-12-04T11:45:26.3097600Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_input_dims_cuda E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default 2025-12-04T11:45:26.3097760Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last): 2025-12-04T11:45:26.3098038Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T11:45:26.3098176Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] r = func(*args, **kwargs) 2025-12-04T11:45:26.3098309Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3098528Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:45:26.3098673Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return self._op(*args, **kwargs) 2025-12-04T11:45:26.3098812Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3099063Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm 2025-12-04T11:45:26.3099203Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return _check_scaled_mm_sizes( 2025-12-04T11:45:26.3099339Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3099599Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_meta_registrations.py", line 6384, in _check_scaled_mm_sizes 2025-12-04T11:45:26.3099731Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] torch._check( 2025-12-04T11:45:26.3099953Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 1734, in _check 2025-12-04T11:45:26.3100157Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] _check_with(RuntimeError, cond, message) # pyrefly: ignore [bad-argument-type] 2025-12-04T11:45:26.3100302Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3100531Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 1716, in _check_with 2025-12-04T11:45:26.3100680Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] raise error_type(message_evaluated) 2025-12-04T11:45:26.3100881Z E1204 11:45:23.154000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Expected self.size(1) to be divisible by 16, but got self.size(1)=15 2025-12-04T11:45:26.3100922Z PASSED [0.2956s] [ 97%] 2025-12-04T11:45:26.3101230Z inductor/test_fp8.py::TestFP8LoweringCUDA::test_unacceptable_scale_dims_rowwise_scaling_cuda E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] failed while attempting to run meta for aten._scaled_mm.default 2025-12-04T11:45:26.3101392Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] Traceback (most recent call last): 2025-12-04T11:45:26.3101647Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T11:45:26.3103189Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] r = func(*args, **kwargs) 2025-12-04T11:45:26.3103385Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3103619Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T11:45:26.3103764Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return self._op(*args, **kwargs) 2025-12-04T11:45:26.3103900Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3104152Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_meta_registrations.py", line 6528, in meta_scaled_mm 2025-12-04T11:45:26.3104295Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] return _check_scaled_mm_sizes( 2025-12-04T11:45:26.3104432Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3104691Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_meta_registrations.py", line 6498, in _check_scaled_mm_sizes 2025-12-04T11:45:26.3104812Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] torch._check( 2025-12-04T11:45:26.3105033Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 1734, in _check 2025-12-04T11:45:26.3105251Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] _check_with(RuntimeError, cond, message) # pyrefly: ignore [bad-argument-type] 2025-12-04T11:45:26.3105396Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T11:45:26.3105622Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/__init__.py", line 1716, in _check_with 2025-12-04T11:45:26.3105769Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] raise error_type(message_evaluated) 2025-12-04T11:45:26.3106362Z E1204 11:45:23.328000 1270530 site-packages/torch/_subclasses/fake_tensor.py:2827] [0/0] RuntimeError: Invalid scaling configuration. For tensorwise scaling, both scales should be scalar. For rowwise scaling, scale_a should be (233, 1), scale_b should be (1, 128). For (BlockWise1x128, BlockWise128x128), scale_a should be (233, 1), scale_b should be (1, 1). For (BlockWise1x128, BlockWise1x128), scale_a should be (233, 1), scale_b should be (1, 128). Got scale_a.size()=(1, 128) and scale_b.size()=(233, 1) 2025-12-04T11:45:26.3106406Z PASSED [0.1582s] [100%] 2025-12-04T11:45:26.3106408Z 2025-12-04T11:45:26.3106601Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_fp8/inductor.test_fp8-d4420156a0957c67.xml - 2025-12-04T11:45:26.3106677Z ================ 2 passed, 32 skipped, 154 deselected in 2.76s ================= 2025-12-04T11:45:26.3119628Z The following tests failed consistently: ['test/inductor/test_fp8.py::TestFP8TypesCUDA::test_eager_fallback_float16_cuda_float16', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_1024,1024,512_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,16,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_False_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_False_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_rowwise_scaling_shape_16,32,32_has_bias_True_use_fast_accum_True_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1024_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_1_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_257_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_33_K_32_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_1024_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_16_N_2048_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_16_persistent_matmul_False_cuda', 'test/inductor/test_fp8.py::TestFP8LoweringCUDA::test_tensorwise_scaling_acceptable_input_dims_M_3_K_32_N_2048_persistent_matmul_False_cuda'] 2025-12-04T11:45:26.3119701Z 2025-12-04T11:45:26.3119847Z FINISHED PRINTING LOG FILE of inductor/test_fp8 1/1 (test/test-reports/inductor.test_fp8_1.1_6df9ffee87f5c527_.log) 2025-12-04T11:45:26.3119849Z 2025-12-04T11:45:26.3119953Z Finished inductor/test_fp8 1/1 ... [2025-12-04 11:45:24.325276][2252708.591938067], took 36.95min 2025-12-04T11:45:26.3120203Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:45:26.3120293Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:45:26.3120387Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T11:45:26.3120435Z Uploading artifacts took 0.00 seconds 2025-12-04T11:45:26.3120480Z inductor/test_fp8 1/1 failed! 2025-12-04T11:45:26.3120573Z Running inductor/test_pad_mm 1/1 ... [2025-12-04 11:45:24.331723][2252708.598394103] 2025-12-04T11:45:26.3120622Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:45:26.3120918Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_pad_mm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:45:24.331923] 2025-12-04T11:45:57.1901925Z 2025-12-04T11:45:57.1902989Z inductor/test_pad_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_pad_mm_1.1_c30445c453aba64e_.log 2025-12-04T11:45:57.1907666Z Running 19 items in this shard: test/inductor/test_pad_mm.py::PadMMTest::test_cat_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_cat_padding, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_padding, test/inductor/test_pad_mm.py::PadMMTest::test_no_autocast_in_pad_bmm_joint_graph_pass, test/inductor/test_pad_mm.py::PadMMTest::test_original_aten_preserved_pad_mm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_2d_bias, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_mn, test/inductor/test_pad_mm.py::PadMMTest::test_pad_batch, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_b, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_bm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_bf16, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_mnk, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_n, test/inductor/test_pad_mm.py::PadMMTest::test_pad_single_cat, test/inductor/test_pad_mm.py::PadMMTest::test_zero_dim 2025-12-04T11:45:57.1909961Z 2025-12-04T11:45:57.1910182Z Finished inductor/test_pad_mm 1/1 ... [2025-12-04 11:45:57.189918][2252741.456585652], took 0.55min 2025-12-04T11:45:57.1915414Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:45:57.1966512Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:45:57.1967928Z Running dynamo/test_utils 1/1 ... [2025-12-04 11:45:57.196635][2252741.463308284] 2025-12-04T11:45:57.1968144Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:45:57.1969646Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'dynamo/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:45:57.196812] 2025-12-04T11:46:14.8883377Z 2025-12-04T11:46:14.8884240Z dynamo/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_utils_1.1_f38c4a34f2be26bc_.log 2025-12-04T11:46:14.8887022Z Running 17 items in this shard: test/dynamo/test_utils.py::TestUtils::test_graph_break_counting, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_even_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_larger_multiplier_for_smaller_tensor, test/dynamo/test_utils.py::TestUtils::test_nan, test/dynamo/test_utils.py::TestUtils::test_traced_code_query, test/dynamo/test_utils.py::TestDynamoTimed::test_compiler_config, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamic_shape_feature_use, test/dynamo/test_utils.py::TestDynamoTimed::test_dynamo_timed, test/dynamo/test_utils.py::TestDynamoTimed::test_exception_stack_trace, test/dynamo/test_utils.py::TestDynamoTimed::test_graph_node_shapes, test/dynamo/test_utils.py::TestDynamoTimed::test_inductor_provenance, test/dynamo/test_utils.py::TestDynamoTimed::test_ir_count, test/dynamo/test_utils.py::TestDynamoTimed::test_log_dynamo_start, test/dynamo/test_utils.py::TestDynamoTimed::test_num_params, test/dynamo/test_utils.py::TestDynamoTimed::test_stack_trace, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_jsonify, test/dynamo/test_utils.py::TestInductorConfigParsingForLogging::test_inductor_config_parsing_non_conforming_items 2025-12-04T11:46:14.8889898Z 2025-12-04T11:46:14.8890061Z Finished dynamo/test_utils 1/1 ... [2025-12-04 11:46:14.888071][2252759.154740049], took 0.29min 2025-12-04T11:46:14.8894624Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:46:14.8946802Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:46:14.8947819Z Running inductor/test_mps_basic 1/1 ... [2025-12-04 11:46:14.894626][2252759.161299244] 2025-12-04T11:46:14.8948052Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:46:14.8949217Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_mps_basic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:14.894807] 2025-12-04T11:46:20.7654876Z 2025-12-04T11:46:20.7655844Z inductor/test_mps_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mps_basic_1.1_827f0a646cbf3d9a_.log 2025-12-04T11:46:20.7656490Z 2025-12-04T11:46:20.7656769Z Finished inductor/test_mps_basic 1/1 ... [2025-12-04 11:46:20.765119][2252765.031787225], took 0.10min 2025-12-04T11:46:20.7668974Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:46:20.7722182Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:46:20.7725245Z Running inductor/test_external_callables 1/1 ... [2025-12-04 11:46:20.772178][2252765.038851922] 2025-12-04T11:46:20.7725633Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:46:20.7726386Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_external_callables.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:20.772361] 2025-12-04T11:46:32.6579279Z 2025-12-04T11:46:32.6580432Z inductor/test_external_callables 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_external_callables_1.1_cf20e9b8ff78593c_.log 2025-12-04T11:46:32.6582448Z Running 3 items in this shard: test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cpu, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cuda, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_dup 2025-12-04T11:46:32.6583691Z 2025-12-04T11:46:32.6583969Z Finished inductor/test_external_callables 1/1 ... [2025-12-04 11:46:32.657604][2252776.924275601], took 0.20min 2025-12-04T11:46:32.6587258Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:46:32.6637030Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:46:32.6638059Z Running export/test_export_training_ir_to_run_decomp 1/1 ... [2025-12-04 11:46:32.663683][2252776.930356386] 2025-12-04T11:46:32.6638417Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:46:32.6640309Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'export/test_export_training_ir_to_run_decomp.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:46:32.663865] 2025-12-04T11:50:40.2727256Z 2025-12-04T11:50:40.2729402Z export/test_export_training_ir_to_run_decomp 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_export_training_ir_to_run_decomp_1.1_aab6e549389856b4_.log 2025-12-04T11:50:40.2936037Z Running 880 items in this shard: test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_assume_static_by_default_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_constraints_error_not_in_range_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_constraints_error_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_inline_constraints_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_slice_maxsize_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_slice_unbacked_dim1_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_export_strict_narrow_unbacked_expr_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_no_grad_param_inplace_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestDynamismExpression::test_reshape_view_backed_size_oblivious_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_assume_static_by_default_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_constraints_error_not_in_range_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_constraints_error_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_inline_constraints_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_slice_maxsize_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_slice_unbacked_dim1_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_export_strict_narrow_unbacked_expr_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_no_grad_param_inplace_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestDynamismExpression::test_reshape_view_backed_size_oblivious_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test__scaled_dot_product_flash_attention_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_additional_inputs_constants_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_allow_explicit_guards_as_runtime_asserts_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_annotate_on_assert_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_args_type_checked_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_aten_lift_fresh_copy_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_attention_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_attr_assignment_extra_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_automatic_constrain_size_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_automatic_dynamic_shapes_constant_relation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_automatic_dynamic_shapes_linear_relation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_automatic_dynamic_shapes_simple_equality_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_baddbmm_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_basic_non_strict_fake_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_basic_non_strict_real_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_bincount_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_buffer_util_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_capture_subclass_constructor_torch_ir_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_capture_subclass_constructor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_capture_subclass_wrong_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_ccode_python_mod_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cdist_forward_compute_mode_zero_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_check_specialized_int_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_checks_to_constrain_range_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cleanup_dynamic_markers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_colin_unbacked_backed_vr_sub_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_colon_parameter_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_compiling_state_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_access_identical_symint_closure_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_branches_return_constant_int_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_branches_return_same_int_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_buffers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_contains_unbacked_no_escape_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_int_closure_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_unflatten_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_with_module_stack_export_with_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cond_with_module_stack_export_with_unflatten_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_aliasing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_input_naming_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_no_user_inp_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_output_dup_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_output_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_requires_grad_const_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_return_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_tensor_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_tensor_with_non_functional_nested_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constant_tensor_with_non_functional_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constrain_decomp_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constrain_size_in_eager_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constrain_size_with_constrain_value_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_constrain_size_with_various_cases_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_conv_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_crop_like_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_cse_for_symint_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_op_auto_functionalize_pre_dispatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_op_auto_functionalize_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_op_auto_warn_pre_dispatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_op_preserve_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_pytree_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_custom_tag_metadata_re_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_decomp_batch_norm_functional_predispatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_decomp_item_in_prim_after_decomposition_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_decomp_item_in_prim_before_decomposition_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_default_decomposition_core_cia_ops_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_1_2_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_integer_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_nested_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_out_of_order_repeat_derived_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_out_of_order_simplified_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_out_of_order_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_derived_dim_repeat_derived_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_detect_leak_nonstrict_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_detect_leak_nonstrict_with_stacktrace_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_detect_leak_strict_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_device_to_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_device_to_gpu_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_device_to_mutation_float_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_device_to_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_device_to_static_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_1_2_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_auto_and_dim_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_dynamic_divisibility_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_dynamic_specialization_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_hint_range_violations_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dim_hint_ranges_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_disable_forced_specializations_errors_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_disable_forced_specializations_ok_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_distributed_all_gather_into_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_distributed_all_gather_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_distributed_all_reduce_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_distributed_all_to_all_single_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_distributed_reduce_scatter_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dont_duck_size_for_auto_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_double_lifted_constants_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_checks_aliasing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_checks_mutation_list_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_checks_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_checks_mutation_with_nan_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_fake_kernel_inference_errors_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_draft_export_infers_fake_kernel_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_duplicate_modules_with_non_persistent_buffers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_lr_shift_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_bounds_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_builder_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_builder_kwargs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_builder_pytree_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_dataclass_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_inferred_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_serdes_generic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_serdes_user_errors_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_serdes_various_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_spec_with_pytree_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_shapes_wrapped_with_shape_guards_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_dynamic_sym_round_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_ends_of_bounds_oblivious_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_enum_str_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_error_does_not_reference_eager_fallback_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_error_when_passing_mutating_primitive_op_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_exception_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_expand_copy_export_handles_implicit_true_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_api_with_dynamic_shapes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_as_backend_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_associative_scan_lifted_buffers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_associative_scan_symbol_dim_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_associative_scan_symbol_scandim_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_aten_to_unflatten_subclass_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_aten_to_unflatten_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_cond_symbool_pred_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_cond_warns_constant_pred_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_custom_decomp_table_basic_pop_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_custom_decomp_table_container_methods_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_custom_op_lib_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_custom_triton_kernel_mutable_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_custom_triton_kernel_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_cyclic_reference_leak_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_decomp_torture_case_1_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_decomp_torture_case_2_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_decomps_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_decomps_simple_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_dynamo_config_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_for_training_run_decomp_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_for_training_with_container_type_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_for_training_with_dynamic_shapes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_for_training_with_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_for_training_with_state_dict_hooks_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_default_kwargs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_keyword_only_args_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_kwargs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_pytree_kwargs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_var_keyword_args_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_var_keyword_pytree_args_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_func_with_var_postional_args_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_function_schema_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_graph_with_no_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_input_mutation_bug_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_input_mutation_dynamic_shape_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_input_mutation_static_shape_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_leak_compile_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_linear_preserve_dynamic_shape_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_max_nonstrict_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_max_onnx_reported_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_method_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_mod_constraints_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_preserve_linear_at_aot_level_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_preserve_linear_but_not_custom_op_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_rnn_variants_with_warning_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_scan_pytree_output_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_script_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_statically_known_true_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_then_compile_tensor_ctor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_autocast_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_fake_tensor_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_inline_constraints_complex_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_inline_constraints_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_set_grad_enabled_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_export_with_wrong_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_external_call_non_strict_real_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_fake_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_fake_weights_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_filter_traceback_frames_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_flex_attention_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_float_conversion_from_int_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_float_conversion_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_fqn_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_from_node_metadata_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_full_on_scalar_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_function_holding_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_hints_wrapper_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_hoo_inline_users_issue_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_if_functional_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_if_post_autograd_op_preserved_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_inductor_backend_inside_nonstrict_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_inline_script_class_method_recursive_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_inline_script_class_method_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_inline_script_function_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_inline_script_method_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_int_shape_specialization_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_intermediate_shape_comp_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_invalid_pytree_dynamo_graph_capture_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_is_exporting_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_is_nonzero_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_isnonzero_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_issue_113041_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_issue_157289_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_issue_161902_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_istft_op_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_keep_composite_ops_invalid_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_keep_composite_ops_linear_convd_for_training_ir_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_keep_composite_ops_linear_convd_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_kwarg_dynamic_shapes_diff_order_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_kwargs_reorder_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_layer_norm_unbacked_normalized_shape_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_layer_sharing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_lazy_module_kwargs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_lifted_constants_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_linear_conv_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_malformed_fqn_from_source_name_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_map_buffers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_map_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_mask_nonzero_static_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_masked_select_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_math_pow_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_mismatched_dynamic_shapes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_mixed_input_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_dict_key_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_input_subclasses_parameterization_nested_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_input_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_list_slice_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_module_with_dict_container_inp_out_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_modules_access_for_deleted_submodule_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_more_multidimensional_slicing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_multidimensional_slicing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_multinomial_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_multiple_definitions_same_name_dim_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_namedtuple_input_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_native_multi_attention_head_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_dynamic_shapes_spec_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_module_fake_tensor_leak_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_module_with_constant_buffer_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_module_with_init_buffer_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nested_module_with_parameter_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nn_module_stack_shared_submodule_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nn_module_stack_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_check_is_size_error_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_suggested_fixes_for_data_dependent_errors_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_tensor_computation_2_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_tensor_computation_3_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_tensor_computation_4_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_no_tensor_computation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_arg_name_dynamic_shapes_api_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_persistent_buffer_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_strict_dynamic_shapes_suggested_fixes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_non_strict_dynamic_shapes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_none_buffers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nonstrict_retrace_preserves_metadata_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nonzero_2_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_nonzero_dynamic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_not_registered_parameter_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_operator_aten_tensor_mode_variant_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_output_node_name_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_pad_sequence_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_param_util_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_partial_patched_forward_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_placeholder_naming_collisions_hoo_subgraphs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_placeholder_naming_collisions_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_placeholder_naming_order_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_placeholder_naming_order_variadic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_placeholder_update_preserving_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_predispatch_cond_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_predispatch_grad_wrappers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_preserve_annotation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_preserve_module_call_signature_unflatten_specialization_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_preserve_requires_grad_placeholders_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_preserve_shape_dynamism_for_unused_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_profiling_code_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_python_asserts_with_sym_int_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_pytree_register_data_class_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_pytree_register_nested_data_class_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_range_constraints_with_replacement_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_real_tensor_alias_dtype_mismatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_real_tensor_bool_cast_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_real_tensor_errors_on_aliasing_custom_op_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_real_tensor_for_max_op_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_real_tensor_size_mismatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_redundant_assert_max_upper_bound_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_redundant_asserts_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_refine_dynamic_shapes_from_suggested_fixes_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_register_constant_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_repeat_interleave_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_replace_unbacked_with_very_large_upperbound_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_replaced_unbacked_bindings_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_reshape_view_helper_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_retracable_ep_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_retrace_pre_autograd_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_run_decomposition_supports_user_input_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_run_decompositions_keep_metadata_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_run_decompositions_keep_tensor_constant_metadata_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_runtime_assert_for_prim_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_runtime_assert_for_prm_str_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_runtime_assert_with_size_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_sdpa_gqa_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_sequential_slicing_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_set_example_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_set_grad_as_side_effect_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_set_grad_empty_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_set_grad_unflatten_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_setgrad_lifted_tensor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_shared_submodule_nn_module_stack_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_simple_export_for_training_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_simple_unbacked_view_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_size_input_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_slice_nn_module_stack_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_solver_unsupported_sympy_function_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_specialize_derived_dim_roots_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_split_const_gm_with_lifted_constants_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_stack_trace_make_fx_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_stack_trace_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_state_primitives_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_state_shape_attribute_assignment_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_state_tensors_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_static_dim_constraints_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_context_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_nested_attr_access_complicated_metadata_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_nested_attr_access_const_metadata_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_nested_attr_access_submodule_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclass_nested_attr_access_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclasses_parameterization_nested_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_subclasses_parameterization_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_suggest_torch_checks_with_non_negative_check_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_suggest_torch_checks_with_regular_check_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_suggested_fixes_for_data_dependent_errors_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_suggested_fixes_new_roots_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_sym_float_operators_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_sym_or_sym_and_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_sym_sqrt_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symbool_item_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symfloat_item_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_additional_inputs_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_basic_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_ranges_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_shapes_collection_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_input_specialization_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_item_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_output_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_symint_tensor_return_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tag_ac_export_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tensor_attribute_zero_args_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tensor_constant_aten_to_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tensor_constant_with_wrapped_method_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_to_module_with_mutated_buffer_multiple_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_to_module_with_mutated_buffer_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tolist_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_torch_check_eq_commutativity_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_torch_fn_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_trace_under_fake_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_train_eval_on_exported_preautograd_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_tril_dynamic_diagonal_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_triu_dynamic_diagonal_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_3d_matmul_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_bincount_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_bindings_for_divisible_u_symint_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_deferred_runtime_retrace_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_expand_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_infer_size_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_kth_value_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_linear_layer_norm_input_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_noncontig_lin_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_pad_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_scalar_constructor_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_slice_forward_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_slice_simple_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_stack_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_to_cond_passthrough_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_to_cond_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unbacked_unsqueeze_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_asserts_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_buffer_update_child2parent_swap_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_closure_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_isinstance_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_multiple_graphs_dispatch_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_multiple_graphs_shared_submodule_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_multiple_graphs_state_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_no_unroll_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_placeholder_update_child2parent_swap_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_5_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_6_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_buf_8_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_const_preserving_3_1_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_const_preserving_3_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_4_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_6_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_9_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unflatten_random_dag_preserving_4_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unused_aliases_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_unused_constant_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_uplift_common_custom_meta_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_uplift_common_custom_meta_with_multiple_calls_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_use_embedding_twice_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_user_input_and_buffer_mutation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_vmap_custom_autograd_function_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_vmap_to_assert_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_vmap_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_where_decomp_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_while_loop_assert_separation_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_while_loop_index_assertions_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_while_loop_simple_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_while_loop_tensor_constant_idx_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportTestExport::test_wrapper_module_training_ir_to_decomp_strict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test__scaled_dot_product_flash_attention_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_additional_inputs_constants_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_allow_explicit_guards_as_runtime_asserts_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_annotate_on_assert_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_args_type_checked_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_aten_lift_fresh_copy_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_attention_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_attr_assignment_extra_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_automatic_constrain_size_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_automatic_dynamic_shapes_constant_relation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_automatic_dynamic_shapes_linear_relation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_automatic_dynamic_shapes_simple_equality_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_baddbmm_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_basic_non_strict_fake_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_basic_non_strict_real_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_bincount_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_buffer_util_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_capture_subclass_constructor_torch_ir_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_capture_subclass_constructor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_capture_subclass_wrong_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_ccode_python_mod_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cdist_forward_compute_mode_zero_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_check_specialized_int_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_checks_to_constrain_range_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cleanup_dynamic_markers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_colin_unbacked_backed_vr_sub_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_colon_parameter_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_compiling_state_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_access_identical_symint_closure_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_branches_return_constant_int_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_branches_return_same_int_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_buffers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_contains_unbacked_no_escape_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_int_closure_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_unflatten_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_with_module_stack_export_with_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cond_with_module_stack_export_with_unflatten_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_aliasing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_input_naming_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_no_user_inp_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_output_dup_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_output_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_requires_grad_const_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_return_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_tensor_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_tensor_with_non_functional_nested_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constant_tensor_with_non_functional_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constrain_decomp_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constrain_size_in_eager_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constrain_size_with_constrain_value_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_constrain_size_with_various_cases_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_conv_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_crop_like_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_cse_for_symint_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_op_auto_functionalize_pre_dispatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_op_auto_functionalize_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_op_auto_warn_pre_dispatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_op_preserve_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_pytree_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_custom_tag_metadata_re_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_decomp_batch_norm_functional_predispatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_decomp_item_in_prim_after_decomposition_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_decomp_item_in_prim_before_decomposition_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_default_decomposition_core_cia_ops_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_1_2_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_integer_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_nested_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_out_of_order_repeat_derived_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_out_of_order_simplified_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_out_of_order_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_derived_dim_repeat_derived_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_detect_leak_nonstrict_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_detect_leak_nonstrict_with_stacktrace_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_detect_leak_strict_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_device_to_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_device_to_gpu_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_device_to_mutation_float_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_device_to_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_device_to_static_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_1_2_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_auto_and_dim_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_dynamic_divisibility_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_dynamic_specialization_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_hint_range_violations_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dim_hint_ranges_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_disable_forced_specializations_errors_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_disable_forced_specializations_ok_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_distributed_all_gather_into_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_distributed_all_gather_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_distributed_all_reduce_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_distributed_all_to_all_single_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_distributed_reduce_scatter_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dont_duck_size_for_auto_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_double_lifted_constants_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_checks_aliasing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_checks_mutation_list_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_checks_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_checks_mutation_with_nan_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_fake_kernel_inference_errors_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_draft_export_infers_fake_kernel_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_duplicate_modules_with_non_persistent_buffers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_lr_shift_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_bounds_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_builder_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_builder_kwargs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_builder_pytree_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_dataclass_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_inferred_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_serdes_generic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_serdes_user_errors_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_serdes_various_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_spec_with_pytree_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_shapes_wrapped_with_shape_guards_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_dynamic_sym_round_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_ends_of_bounds_oblivious_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_enum_str_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_error_does_not_reference_eager_fallback_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_error_when_passing_mutating_primitive_op_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_exception_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_expand_copy_export_handles_implicit_true_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_api_with_dynamic_shapes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_as_backend_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_associative_scan_lifted_buffers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_associative_scan_symbol_dim_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_associative_scan_symbol_scandim_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_aten_to_unflatten_subclass_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_aten_to_unflatten_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_cond_preserve_torch_fn_for_subgraphs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_cond_symbool_pred_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_cond_warns_constant_pred_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_custom_decomp_table_basic_pop_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_custom_decomp_table_container_methods_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_custom_op_lib_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_custom_triton_kernel_mutable_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_custom_triton_kernel_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_cyclic_reference_leak_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_decomp_torture_case_1_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_decomp_torture_case_2_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_decomps_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_decomps_simple_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_dynamo_config_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_for_training_run_decomp_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_for_training_with_container_type_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_for_training_with_dynamic_shapes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_for_training_with_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_for_training_with_state_dict_hooks_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_default_kwargs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_keyword_only_args_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_kwargs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_pytree_kwargs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_var_keyword_args_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_var_keyword_pytree_args_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_func_with_var_postional_args_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_function_schema_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_graph_with_no_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_input_mutation_bug_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_input_mutation_dynamic_shape_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_input_mutation_static_shape_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_leak_compile_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_linear_preserve_dynamic_shape_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_max_nonstrict_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_max_onnx_reported_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_method_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_mod_constraints_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_module_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_preserve_linear_at_aot_level_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_preserve_linear_but_not_custom_op_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_rnn_variants_with_warning_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_scan_pytree_output_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_script_module_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_statically_known_true_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_then_compile_tensor_ctor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_autocast_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_fake_tensor_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_inline_constraints_complex_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_inline_constraints_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_set_grad_enabled_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_export_with_wrong_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_external_call_non_strict_real_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_fake_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_fake_weights_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_filter_traceback_frames_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_flex_attention_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_float_conversion_from_int_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_float_conversion_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_fqn_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_from_node_metadata_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_full_on_scalar_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_function_holding_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_hints_wrapper_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_hoo_inline_users_issue_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_if_functional_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_if_post_autograd_op_preserved_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_inductor_backend_inside_nonstrict_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_inline_script_class_method_recursive_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_inline_script_class_method_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_inline_script_function_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_inline_script_method_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_int_shape_specialization_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_intermediate_shape_comp_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_invalid_pytree_dynamo_graph_capture_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_is_exporting_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_is_nonzero_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_isnonzero_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_issue_113041_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_issue_157289_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_issue_161902_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_istft_op_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_keep_composite_ops_invalid_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_keep_composite_ops_linear_convd_for_training_ir_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_keep_composite_ops_linear_convd_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_kwarg_dynamic_shapes_diff_order_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_kwargs_reorder_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_layer_norm_unbacked_normalized_shape_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_layer_sharing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_lazy_module_kwargs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_lifted_constants_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_linear_conv_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_malformed_fqn_from_source_name_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_map_buffers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_map_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_mask_nonzero_static_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_masked_select_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_math_pow_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_mismatched_dynamic_shapes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_mixed_input_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_dict_key_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_input_subclasses_parameterization_nested_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_input_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_list_slice_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_module_with_dict_container_inp_out_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_modules_access_for_deleted_submodule_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_more_multidimensional_slicing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_multidimensional_slicing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_multinomial_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_multiple_definitions_same_name_dim_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_namedtuple_input_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_native_multi_attention_head_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_dynamic_shapes_spec_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_module_fake_tensor_leak_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_module_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_module_with_constant_buffer_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_module_with_init_buffer_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nested_module_with_parameter_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nn_module_stack_shared_submodule_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nn_module_stack_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_check_is_size_error_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_suggested_fixes_for_data_dependent_errors_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_tensor_computation_2_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_tensor_computation_3_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_tensor_computation_4_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_no_tensor_computation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_persistent_buffer_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_strict_dynamic_shapes_suggested_fixes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_non_strict_dynamic_shapes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_none_buffers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nonstrict_retrace_preserves_metadata_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nonzero_2_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_nonzero_dynamic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_not_registered_parameter_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_operator_aten_tensor_mode_variant_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_output_node_name_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_pad_sequence_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_param_util_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_partial_patched_forward_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_placeholder_naming_collisions_hoo_subgraphs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_placeholder_naming_collisions_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_placeholder_naming_order_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_placeholder_naming_order_variadic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_placeholder_update_preserving_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_predispatch_cond_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_predispatch_grad_wrappers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_preserve_annotation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_preserve_module_call_signature_unflatten_specialization_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_preserve_requires_grad_placeholders_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_preserve_shape_dynamism_for_unused_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_profiling_code_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_python_asserts_with_sym_int_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_pytree_register_data_class_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_pytree_register_nested_data_class_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_raise_user_error_when_guard_on_data_dependent_operation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_range_constraints_with_replacement_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_real_tensor_alias_dtype_mismatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_real_tensor_bool_cast_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_real_tensor_errors_on_aliasing_custom_op_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_real_tensor_for_max_op_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_real_tensor_size_mismatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_redundant_assert_max_upper_bound_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_redundant_asserts_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_refine_dynamic_shapes_from_suggested_fixes_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_register_constant_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_repeat_interleave_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_replace_unbacked_with_very_large_upperbound_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_replaced_unbacked_bindings_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_reshape_view_helper_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_retracable_ep_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_retrace_pre_autograd_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_run_decomposition_supports_user_input_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_run_decompositions_keep_metadata_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_run_decompositions_keep_tensor_constant_metadata_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_runtime_assert_for_prim_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_runtime_assert_for_prm_str_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_runtime_assert_with_size_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_sdpa_gqa_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_sequential_slicing_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_set_example_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_set_grad_as_side_effect_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_set_grad_empty_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_set_grad_unflatten_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_setgrad_lifted_tensor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_shared_submodule_nn_module_stack_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_simple_export_for_training_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_simple_unbacked_view_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_size_input_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_slice_nn_module_stack_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_solver_unsupported_sympy_function_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_specialize_derived_dim_roots_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_split_const_gm_with_lifted_constants_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_stack_trace_make_fx_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_stack_trace_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_state_primitives_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_state_shape_attribute_assignment_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_state_tensors_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_static_dim_constraints_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_context_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_nested_attr_access_complicated_metadata_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_nested_attr_access_const_metadata_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_nested_attr_access_submodule_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclass_nested_attr_access_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclasses_parameterization_nested_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_subclasses_parameterization_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_suggest_torch_checks_with_non_negative_check_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_suggest_torch_checks_with_regular_check_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_suggested_fixes_new_roots_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_sym_float_operators_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_sym_or_sym_and_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_sym_sqrt_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symbool_item_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symfloat_item_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_input_additional_inputs_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_input_basic_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_input_ranges_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_input_shapes_collection_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_input_specialization_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_item_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_output_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_symint_tensor_return_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tag_ac_export_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tensor_attribute_zero_args_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tensor_constant_aten_to_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tensor_constant_with_wrapped_method_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_to_module_with_mutated_buffer_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tolist_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_torch_check_eq_commutativity_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_torch_fn_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_trace_under_fake_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_train_eval_on_exported_preautograd_module_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_tril_dynamic_diagonal_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_triu_dynamic_diagonal_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_3d_matmul_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_bincount_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_bindings_for_divisible_u_symint_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_deferred_runtime_retrace_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_expand_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_infer_size_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_kth_value_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_linear_layer_norm_input_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_noncontig_lin_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_pad_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_scalar_constructor_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_slice_forward_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_slice_simple_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_stack_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_to_cond_passthrough_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_to_cond_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unbacked_unsqueeze_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_asserts_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_buffer_update_child2parent_swap_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_closure_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_isinstance_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_multiple_graphs_dispatch_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_multiple_graphs_shared_submodule_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_multiple_graphs_state_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_no_unroll_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_placeholder_update_child2parent_swap_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_5_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_6_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_buf_8_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_1_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_const_preserving_3_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_4_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_6_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_9_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_10_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_4_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_5_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_mutating_buf_preserving_7_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unflatten_random_dag_preserving_4_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unused_aliases_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_unused_constant_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_uplift_common_custom_meta_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_uplift_common_custom_meta_with_multiple_calls_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_use_embedding_twice_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_user_input_and_buffer_mutation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_vmap_custom_autograd_function_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_vmap_to_assert_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_vmap_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_where_decomp_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_while_loop_assert_separation_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_while_loop_index_assertions_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_while_loop_simple_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_while_loop_tensor_constant_idx_training_ir_to_decomp_nonstrict, test/export/test_export_training_ir_to_run_decomp.py::TrainingIRToRunDecompExportNonStrictTestExport::test_wrapper_module_training_ir_to_decomp_nonstrict 2025-12-04T11:50:40.3135089Z 2025-12-04T11:50:40.3135248Z Finished export/test_export_training_ir_to_run_decomp 1/1 ... [2025-12-04 11:50:40.273739][2253024.540409874], took 4.13min 2025-12-04T11:50:40.3135665Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:50:40.3136023Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:50:40.3136292Z Running inductor/test_async_compile 1/1 ... [2025-12-04 11:50:40.279852][2253024.546525945] 2025-12-04T11:50:40.3136487Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:50:40.3136888Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_async_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:50:40.280030] 2025-12-04T11:51:12.4401587Z 2025-12-04T11:51:12.4402971Z inductor/test_async_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_async_compile_1.1_d5be3c4bae4f07c8_.log 2025-12-04T11:51:12.4406695Z Running 8 items in this shard: test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_bad_kernel, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_wait_pool_ready 2025-12-04T11:51:12.4409237Z 2025-12-04T11:51:12.4409511Z Finished inductor/test_async_compile 1/1 ... [2025-12-04 11:51:12.439805][2253056.706474326], took 0.54min 2025-12-04T11:51:12.4414899Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:51:12.4468016Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:51:12.4469961Z Running inductor/test_compiled_optimizers 2/2 ... [2025-12-04 11:51:12.446827][2253056.713500272] 2025-12-04T11:51:12.4470317Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:51:12.4471557Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_compiled_optimizers.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:51:12.447006] 2025-12-04T11:59:59.9854016Z 2025-12-04T11:59:59.9855973Z inductor/test_compiled_optimizers 2/2 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compiled_optimizers_2.2_3e74edc6254202ea_.log 2025-12-04T11:59:59.9928723Z Running 353 items in this shard: test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_S429861, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_recompile, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_rho_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adadelta_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_initial_accumulator_value_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_lr_decay_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cpu_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_tensor_lr_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adagrad_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_recompile, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_tensor_lr_tensor_betas_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_amsgrad_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_recompile, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_tensor_lr_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamax_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_amsgrad_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_amsgrad_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_tensor_lr_tensor_betas_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_amsgrad_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_adamw_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_lambd_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_recompile_default, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_t0_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_tensor_lr_weight_decay_maximize_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_asgd_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_basic_shampoo, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_closure_graph_break, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_foreach_map_adam, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_get_value_on_static_address, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_recompile, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_tensor_lr_weight_decay_momentum_decay_decoupled_weight_decay_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_nadam_weight_decay_momentum_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_capturable_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_eps_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_tensor_lr_capturable_weight_decay_decoupled_weight_decay_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_decoupled_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_radam_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_tensor_lr_capturable_foreach_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_momentum_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_momentum_maximize_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_centered_momentum_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rmsprop_weight_decay_maximize_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_capturable_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_etas_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_recompile, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_step_sizes_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_rprop_tensor_lr_capturable_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_dampening_cpu, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_momentum_nesterov_weight_decay_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_foreach, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_recompile_single, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cosineannealinglr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cpu_polynomiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cosineannealingwarmrestarts, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_cycliclr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_lambdalr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_cuda_steplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_constantlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_exponentiallr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_linearlr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multiplicativelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_multisteplr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_onecyclelr, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_tensor_lr_foreach_cuda_reducelronplateau, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_maximize_foreach_cuda, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_ASGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adadelta_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adagrad_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_AdamW_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_AdamW_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Adamax_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Muon_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_NAdam_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_RMSprop_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_Rprop_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SGD_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SGD_use_closure_True_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_False_cuda_float32, test/inductor/test_compiled_optimizers.py::CompiledOptimizerParityTestsCUDA::test_correctness_SparseAdam_use_closure_True_cuda_float32 2025-12-04T11:59:59.9990595Z 2025-12-04T11:59:59.9990730Z Finished inductor/test_compiled_optimizers 2/2 ... [2025-12-04 11:59:59.985538][2253584.252206424], took 8.79min 2025-12-04T11:59:59.9991154Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T11:59:59.9991511Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T11:59:59.9991768Z Running inductor/test_control_flow 4/4 ... [2025-12-04 11:59:59.992042][2253584.258715735] 2025-12-04T11:59:59.9991960Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T11:59:59.9992370Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:59:59.992230] 2025-12-04T12:07:18.6762533Z 2025-12-04T12:07:18.6763873Z inductor/test_control_flow 4/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_4.4_e0416db0b214922e_.log 2025-12-04T12:07:18.6845654Z Running 183 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_advanced_dynamic_shapes_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_recursive_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_decompose_ops_in_subgraph_recursive_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cuda, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_outer_to_inner_device_cpu, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_models_with_mixed_device_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_nested_control_flow_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_mismatch_dynamic_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_mismatch_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_pytree_inputs_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cuda_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_0_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_pytree_in_out_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_linear_with_view_device_cuda_dynamic_True_autograd_False 2025-12-04T12:07:18.6903498Z 2025-12-04T12:07:18.6903667Z Finished inductor/test_control_flow 4/4 ... [2025-12-04 12:07:18.690240][2254022.956906415], took 7.31min 2025-12-04T12:07:18.6914169Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T12:07:18.6964834Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:07:18.6965807Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T12:07:18.6966305Z Uploading artifacts took 0.00 seconds 2025-12-04T12:07:18.6966847Z Running inductor/test_minifier_isolate 1/1 ... [2025-12-04 12:07:18.696520][2254022.963191391] 2025-12-04T12:07:18.6967378Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:18.6968537Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'inductor/test_minifier_isolate.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:18.696712] 2025-12-04T12:07:24.4554590Z 2025-12-04T12:07:24.4555614Z inductor/test_minifier_isolate 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_isolate_1.1_40efc5a31d81b9ac_.log 2025-12-04T12:07:24.4556127Z 2025-12-04T12:07:24.4556371Z Finished inductor/test_minifier_isolate 1/1 ... [2025-12-04 12:07:24.455091][2254028.7217596], took 0.10min 2025-12-04T12:07:24.4567340Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T12:07:24.4619722Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:07:24.4620839Z Running test_matmul_cuda 1/1 ... [2025-12-04 12:07:24.461946][2254028.728619178] 2025-12-04T12:07:24.4621135Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:07:24.4622736Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_matmul_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:07:24.462123] 2025-12-04T12:08:11.4463703Z 2025-12-04T12:08:11.4464377Z test_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_matmul_cuda_1.1_4ee74b08ecd6cab2_.log 2025-12-04T12:08:11.4796972Z Running 1584 items in this shard: test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_False_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_False_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_addmm_baddmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_broadcast_self_True_high_precision_self_True_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_alignment_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_bias_shapes_size_128_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_4_size_32768_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_4_size_32768_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_8_size_32768_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_no_reduced_precision_small_size_8_size_32768_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_fp16_accumulate_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_reduced_precision_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_10000_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_1000_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublas_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_addmm_size_100_backend_cublaslt_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_and_lt_reduced_precision_fp16_accumulate_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_baddbmm_large_input_2_100_100_100_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_batch_invariance_blackwell_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_batch_invariance_blackwell_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_1024_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_1024_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_1024_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_128_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_128_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_128_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_2048_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_2048_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_2048_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_256_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_256_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_256_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_32_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_32_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_32_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_4096_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_4096_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_4096_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_512_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_512_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_512_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_64_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_64_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_64_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_8192_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_8192_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_cublas_deterministic_shape_8192_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_32_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_fp16_accum_and_fp32_out_failure_batch_size_32_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_greencontext_carveout_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_2d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_2d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_False_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_False_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_False_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_bfloat16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float16, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_3d_3d_strided_True_a_row_major_True_b_row_major_True_cuda_float32, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/2d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_2d/3d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/2d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_False_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_False_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_False_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_True_max_autotune_False_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_grouped_gemm_compiled_op_3d/3d_a_row_major_True_b_row_major_True_max_autotune_True_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_input_dimension_checking_out_dtype_ops0_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_input_dimension_checking_out_dtype_ops1_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_input_dimension_checking_out_dtype_ops2_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_input_dimension_checking_out_dtype_ops3_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_bfloat16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float16_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_1_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_32_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_1_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_32_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_1_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_32_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size0_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_16_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_16_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_backend_cublas_cuda, test/test_matmul_cuda.py::TestMatmulCudaCUDA::test_mm_bmm_dtype_overload_float32_M_64_N_64_K_64_batch_size_1_backend_cublaslt_cuda, test/test_matmul_cuda.py::TestMixedDtypesLinearCudaCUDA::test_mixed_dtypes_linear_cuda_bfloat16, test/test_matmul_cuda.py::TestMixedDtypesLinearCudaCUDA::test_mixed_dtypes_linear_cuda_float16 2025-12-04T12:08:11.5113065Z 2025-12-04T12:08:11.5113411Z Finished test_matmul_cuda 1/1 ... [2025-12-04 12:08:11.447494][2254075.714165825], took 0.78min 2025-12-04T12:08:11.5114056Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T12:08:11.5114570Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T12:08:11.5114864Z Running test_ops 3/5 ... [2025-12-04 12:08:11.453629][2254075.720302074] 2025-12-04T12:08:11.5115099Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T12:08:11.5115620Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_ops.py', '--shard-id=3', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:08:11.453805] 2025-12-04T13:20:27.6542282Z 2025-12-04T13:20:27.6543190Z PRINTING LOG FILE of test_ops 3/5 (test/test-reports/test_ops_3.5_0a83ba8be83064cd_.log) 2025-12-04T13:20:27.6544068Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-7371450e9d0a79a0.xml 2025-12-04T13:20:27.6545102Z ============================= test session starts ============================== 2025-12-04T13:20:27.6545645Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:20:27.6546116Z cachedir: .pytest_cache 2025-12-04T13:20:27.6546796Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:20:27.6547407Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:20:27.6547706Z configfile: pytest.ini 2025-12-04T13:20:27.6548267Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:20:27.6548851Z collecting ... collected 33666 items 2025-12-04T13:20:27.6549219Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T13:20:27.7313946Z Running 6702 items in this shard: test/test_ops.py::TestSelfKwarg::test_self_kwargs, test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_item_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_in_place_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unravel_index_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsafe_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_errors_T_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_arange_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_errors_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_item_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mul_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_errors_roll_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout3_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_permuted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argsort_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_in_place_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_permute_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_put_accumulate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_alias_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_head_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linalg_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_complex_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_any_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__chunk_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_geometric_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_xlogy_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logaddexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_rms_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_multiple_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_item_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_renorm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_rms_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_static_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsafe_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__chunk_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cauchy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_geometric_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rad2deg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_geometric_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_permute_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_t_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__batch_norm_with_update_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsafe_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geometric_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_complex_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch__scaled_mm_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_complex128, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__chunk_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_count_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_geometric_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_normal__in_place_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_take_along_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_alias_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cauchy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_permuted_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_mm_reduce_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_trunc_rounding_cuda_float32 2025-12-04T13:20:27.8052276Z 2025-12-04T13:20:27.8052427Z test_ops.py::TestSelfKwarg::test_self_kwargs PASSED [0.0021s] [ 0%] 2025-12-04T13:20:27.8052741Z test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32 SKIPPED [0.0914s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8053132Z test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32 SKIPPED [0.0015s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8053568Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8053963Z test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8054374Z test_ops.py::TestCommonCUDA::test_compare_cpu__batch_norm_with_update_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8054816Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8055256Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8055691Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8056223Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8056700Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8057133Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8057561Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32 SKIPPED [0.0001s] (Overflow when downcasting signed type is undefined) [ 0%] 2025-12-04T13:20:27.8057983Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8058394Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8058801Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64 SKIPPED [0.0001s] (Skipped some inputs produce undefined outputs) [ 0%] 2025-12-04T13:20:27.8059253Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_block_diag_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8059665Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8060075Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8060548Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8060969Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8061349Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 0%] 2025-12-04T13:20:27.8061735Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8062140Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8062543Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8062944Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8065531Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8065957Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8066368Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logaddexp2_cuda_float32 SKIPPED [0.0013s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8066786Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8077697Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8078144Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8078563Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 0%] 2025-12-04T13:20:27.8078962Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8079398Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8079840Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8080296Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8080759Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_pixel_shuffle_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8081235Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8081684Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8082118Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8082519Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_permute_copy_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8082921Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8083379Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8083794Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8084196Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_copy_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8084606Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_take_along_dim_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8085075Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8085487Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_complex_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8085925Z test_ops.py::TestCommonCUDA::test_compare_cpu__unsafe_masked_index_put_accumulate_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8086354Z test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8086777Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8087184Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8087583Z test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8087963Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64 SKIPPED [0.0001s] (Some inputs produce undefined outputs) [ 0%] 2025-12-04T13:20:27.8088332Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64 SKIPPED [0.0001s] (Some inputs produce undefined outputs) [ 0%] 2025-12-04T13:20:27.8088730Z test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8089101Z test_ops.py::TestCommonCUDA::test_compare_cpu_cauchy_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 0%] 2025-12-04T13:20:27.8089456Z test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8089870Z test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8090269Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32 SKIPPED [0.0014s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8090663Z test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8091044Z test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8091427Z test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8091784Z test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 0%] 2025-12-04T13:20:27.8092132Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8092516Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 0%] 2025-12-04T13:20:27.8092903Z test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8093292Z test_ops.py::TestCommonCUDA::test_compare_cpu_geometric_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 1%] 2025-12-04T13:20:27.8093651Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8094071Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amin_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8094482Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_prod_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8094885Z test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8095279Z test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64 SKIPPED [0.0013s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8095687Z test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8096091Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8096508Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8096908Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8097311Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8097724Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8098141Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32 SKIPPED [0.0005s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8098574Z test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8098942Z test_ops.py::TestCommonCUDA::test_compare_cpu_log_normal_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 1%] 2025-12-04T13:20:27.8099314Z test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8099753Z test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_tensor_overload_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8100164Z test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32 SKIPPED [0.0001s] (Overflow when downcasting signed type is undefined) [ 1%] 2025-12-04T13:20:27.8100560Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8100978Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8101390Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8101788Z test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8102176Z test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8102559Z test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8102966Z test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32 SKIPPED [0.0014s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8103372Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32 SKIPPED [0.0001s] (output is non-deterministic) [ 1%] 2025-12-04T13:20:27.8103743Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8104140Z test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8104594Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8105058Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8105500Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8105870Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv3d_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 1%] 2025-12-04T13:20:27.8106257Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8106708Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8107162Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8107642Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8108103Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8108601Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8109069Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8109531Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8109984Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8110459Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8110932Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8111386Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8111771Z test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_static_cuda_float32 SKIPPED [0.0005s] (Only runs on cpu) [ 1%] 2025-12-04T13:20:27.8112140Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8112531Z test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32 SKIPPED [0.0013s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8112919Z test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8113346Z test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8113734Z test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8114135Z test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8114538Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8114947Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8115368Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8115785Z test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8116189Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8116613Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8117013Z test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8117429Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8117895Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8118360Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8118832Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8119259Z test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32 SKIPPED [0.0013s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8119637Z test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8120034Z test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8120435Z test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32 SKIPPED [0.0012s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 1%] 2025-12-04T13:20:27.8120844Z test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 2%] 2025-12-04T13:20:27.8121230Z test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32 SKIPPED [0.0011s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 2%] 2025-12-04T13:20:27.8121572Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32 PASSED [1.0294s] [ 2%] 2025-12-04T13:20:27.8121864Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32 PASSED [0.8455s] [ 2%] 2025-12-04T13:20:27.8122157Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32 PASSED [1.1310s] [ 2%] 2025-12-04T13:20:27.8122468Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32 PASSED [0.7815s] [ 2%] 2025-12-04T13:20:27.8122778Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32 PASSED [0.7938s] [ 2%] 2025-12-04T13:20:27.8123077Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32 PASSED [0.7795s] [ 2%] 2025-12-04T13:20:27.8123416Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32 PASSED [0.7326s] [ 2%] 2025-12-04T13:20:27.8123717Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32 PASSED [0.7198s] [ 2%] 2025-12-04T13:20:27.8124012Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32 PASSED [4.5587s] [ 2%] 2025-12-04T13:20:27.8124318Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32 PASSED [0.8908s] [ 2%] 2025-12-04T13:20:27.8124614Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32 PASSED [0.7795s] [ 2%] 2025-12-04T13:20:27.8124913Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32 PASSED [0.7787s] [ 2%] 2025-12-04T13:20:27.8125294Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 2%] 2025-12-04T13:20:27.8125885Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32 MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x720c7c201800 size: 768 2025-12-04T13:20:27.8126454Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x720c7c201800 size: 768 2025-12-04T13:20:27.8126871Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x720c7c200e00 size: 1024 2025-12-04T13:20:27.8127285Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x720c7c200e00 size: 1024 2025-12-04T13:20:27.8127552Z PASSED [3.3810s] [ 2%] 2025-12-04T13:20:27.8127953Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv3d_cuda_complex32 MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver , workspace required: 26400, provided ptr: 0x720460201600 size: 5888 2025-12-04T13:20:27.8128502Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 26400, provided ptr: 0x720460201600 size: 5888 2025-12-04T13:20:27.8128920Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver , workspace required: 52800, provided ptr: 0x720460207400 size: 11008 2025-12-04T13:20:27.8129336Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 52800, provided ptr: 0x720460207400 size: 11008 2025-12-04T13:20:27.8129753Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver , workspace required: 168960, provided ptr: 0x720460200000 size: 6656 2025-12-04T13:20:27.8130190Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 168960, provided ptr: 0x720460200000 size: 6656 2025-12-04T13:20:27.8130609Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback WTI] Solver , workspace required: 337920, provided ptr: 0x720460201600 size: 12544 2025-12-04T13:20:27.8131028Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 337920, provided ptr: 0x720460201600 size: 12544 2025-12-04T13:20:27.8131293Z PASSED [0.1146s] [ 2%] 2025-12-04T13:20:27.8131536Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32 PASSED [2.7087s] [ 2%] 2025-12-04T13:20:27.8131868Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32 PASSED [0.0059s] [ 2%] 2025-12-04T13:20:27.8132183Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32 SKIPPED [0.0001s] (Skipped!) [ 2%] 2025-12-04T13:20:27.8132493Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32 PASSED [0.0056s] [ 2%] 2025-12-04T13:20:27.8132811Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_copy_cuda_complex32 PASSED [0.0062s] [ 2%] 2025-12-04T13:20:27.8133130Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32 PASSED [0.0059s] [ 2%] 2025-12-04T13:20:27.8133469Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32 PASSED [0.0066s] [ 2%] 2025-12-04T13:20:27.8133758Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32 PASSED [0.0041s] [ 2%] 2025-12-04T13:20:27.8134016Z test_ops.py::TestCommonCUDA::test_dtypes_T_cuda PASSED [0.7313s] [ 2%] 2025-12-04T13:20:27.8134235Z test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda PASSED [0.8202s] [ 2%] 2025-12-04T13:20:27.8134453Z test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda PASSED [0.7607s] [ 2%] 2025-12-04T13:20:27.8134702Z test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda PASSED [0.7635s] [ 2%] 2025-12-04T13:20:27.8134961Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda PASSED [0.7437s] [ 2%] 2025-12-04T13:20:27.8135215Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda PASSED [0.7418s] [ 2%] 2025-12-04T13:20:27.8135445Z test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda PASSED [0.7849s] [ 2%] 2025-12-04T13:20:27.8135673Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda PASSED [0.7709s] [ 2%] 2025-12-04T13:20:27.8135890Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda PASSED [1.1172s] [ 2%] 2025-12-04T13:20:27.8136112Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda PASSED [0.7463s] [ 2%] 2025-12-04T13:20:27.8136356Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda PASSED [0.7671s] [ 2%] 2025-12-04T13:20:27.8136612Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda PASSED [0.7385s] [ 2%] 2025-12-04T13:20:27.8136865Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda PASSED [0.0332s] [ 2%] 2025-12-04T13:20:27.8137112Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda PASSED [0.0230s] [ 2%] 2025-12-04T13:20:27.8137353Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda PASSED [0.7603s] [ 2%] 2025-12-04T13:20:27.8137586Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda PASSED [0.8682s] [ 2%] 2025-12-04T13:20:27.8137809Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda PASSED [0.7391s] [ 2%] 2025-12-04T13:20:27.8138028Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda PASSED [1.1505s] [ 2%] 2025-12-04T13:20:27.8138245Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda PASSED [0.7536s] [ 2%] 2025-12-04T13:20:27.8138480Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda PASSED [0.0624s] [ 2%] 2025-12-04T13:20:27.8138728Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda PASSED [0.9069s] [ 2%] 2025-12-04T13:20:27.8138959Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_copy_cuda PASSED [0.7479s] [ 2%] 2025-12-04T13:20:27.8139187Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda PASSED [0.7534s] [ 2%] 2025-12-04T13:20:27.8139413Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda PASSED [10.3532s] [ 2%] 2025-12-04T13:20:27.8139642Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda PASSED [1.2762s] [ 2%] 2025-12-04T13:20:27.8139872Z test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda PASSED [1.3867s] [ 2%] 2025-12-04T13:20:27.8140116Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda PASSED [1.3073s] [ 2%] 2025-12-04T13:20:27.8140338Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda PASSED [1.3112s] [ 2%] 2025-12-04T13:20:27.8140547Z test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda PASSED [1.2943s] [ 2%] 2025-12-04T13:20:27.8140765Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda PASSED [1.2742s] [ 2%] 2025-12-04T13:20:27.8140985Z test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda PASSED [1.2938s] [ 2%] 2025-12-04T13:20:27.8141207Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda PASSED [1.3090s] [ 2%] 2025-12-04T13:20:27.8141431Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda PASSED [1.5808s] [ 2%] 2025-12-04T13:20:27.8141663Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda PASSED [1.2802s] [ 2%] 2025-12-04T13:20:27.8141887Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda PASSED [1.2540s] [ 2%] 2025-12-04T13:20:27.8142107Z test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda PASSED [7.6594s] [ 2%] 2025-12-04T13:20:27.8142319Z test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda PASSED [1.2946s] [ 2%] 2025-12-04T13:20:27.8142553Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda PASSED [1.9355s] [ 2%] 2025-12-04T13:20:27.8142817Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda PASSED [1.5208s] [ 2%] 2025-12-04T13:20:27.8143057Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda PASSED [1.6399s] [ 2%] 2025-12-04T13:20:27.8143324Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda PASSED [0.1085s] [ 2%] 2025-12-04T13:20:27.8143569Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda PASSED [1.3310s] [ 2%] 2025-12-04T13:20:27.8143828Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda PASSED [1.2781s] [ 3%] 2025-12-04T13:20:27.8144056Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda PASSED [1.9436s] [ 3%] 2025-12-04T13:20:27.8144285Z test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda PASSED [1.2916s] [ 3%] 2025-12-04T13:20:27.8144506Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda PASSED [1.3168s] [ 3%] 2025-12-04T13:20:27.8144717Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda PASSED [1.2849s] [ 3%] 2025-12-04T13:20:27.8144951Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda PASSED [1.2618s] [ 3%] 2025-12-04T13:20:27.8145201Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda PASSED [1.2979s] [ 3%] 2025-12-04T13:20:27.8145458Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda PASSED [1.4629s] [ 3%] 2025-12-04T13:20:27.8145695Z test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda PASSED [1.2856s] [ 3%] 2025-12-04T13:20:27.8145922Z test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda PASSED [1.3127s] [ 3%] 2025-12-04T13:20:27.8146145Z test_ops.py::TestCommonCUDA::test_dtypes__refs_renorm_cuda PASSED [1.2780s] [ 3%] 2025-12-04T13:20:27.8146365Z test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda PASSED [1.3234s] [ 3%] 2025-12-04T13:20:27.8146595Z test_ops.py::TestCommonCUDA::test_dtypes__refs_select_scatter_cuda PASSED [1.2721s] [ 3%] 2025-12-04T13:20:27.8146822Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda PASSED [1.2748s] [ 3%] 2025-12-04T13:20:27.8147063Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda PASSED [1.4660s] [ 3%] 2025-12-04T13:20:27.8147302Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda PASSED [1.2878s] [ 3%] 2025-12-04T13:20:27.8147539Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda PASSED [1.4105s] [ 3%] 2025-12-04T13:20:27.8147777Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda PASSED [1.2705s] [ 3%] 2025-12-04T13:20:27.8148011Z test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_copy_cuda PASSED [1.2908s] [ 3%] 2025-12-04T13:20:27.8148232Z test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda PASSED [1.2801s] [ 3%] 2025-12-04T13:20:27.8148458Z test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda PASSED [1.3311s] [ 3%] 2025-12-04T13:20:27.8148683Z test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_copy_cuda PASSED [1.2764s] [ 3%] 2025-12-04T13:20:27.8148920Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda PASSED [1.2850s] [ 3%] 2025-12-04T13:20:27.8149153Z test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda PASSED [1.2704s] [ 3%] 2025-12-04T13:20:27.8149388Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_copy_cuda PASSED [1.2902s] [ 3%] 2025-12-04T13:20:27.8149617Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda PASSED [1.3179s] [ 3%] 2025-12-04T13:20:27.8149838Z test_ops.py::TestCommonCUDA::test_dtypes__refs_vdot_cuda PASSED [1.2891s] [ 3%] 2025-12-04T13:20:27.8150068Z test_ops.py::TestCommonCUDA::test_dtypes__segment_reduce_lengths_cuda PASSED [1.5431s] [ 3%] 2025-12-04T13:20:27.8150298Z test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda PASSED [2.3568s] [ 3%] 2025-12-04T13:20:27.8150507Z test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda PASSED [1.9007s] [ 3%] 2025-12-04T13:20:27.8150715Z test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda PASSED [1.3285s] [ 3%] 2025-12-04T13:20:27.8150923Z test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda PASSED [0.0553s] [ 3%] 2025-12-04T13:20:27.8151147Z test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda PASSED [1.3406s] [ 3%] 2025-12-04T13:20:27.8151367Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda PASSED [1.2964s] [ 3%] 2025-12-04T13:20:27.8151587Z test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda PASSED [1.5981s] [ 3%] 2025-12-04T13:20:27.8151794Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda PASSED [1.2944s] [ 3%] 2025-12-04T13:20:27.8152020Z test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda PASSED [1.2995s] [ 3%] 2025-12-04T13:20:27.8152230Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda PASSED [1.2840s] [ 3%] 2025-12-04T13:20:27.8152445Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda PASSED [1.2956s] [ 3%] 2025-12-04T13:20:27.8152656Z test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda PASSED [1.3013s] [ 3%] 2025-12-04T13:20:27.8152864Z test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda PASSED [1.3027s] [ 3%] 2025-12-04T13:20:27.8153076Z test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda PASSED [1.3014s] [ 3%] 2025-12-04T13:20:27.8153316Z test_ops.py::TestCommonCUDA::test_dtypes_char_cuda PASSED [1.2908s] [ 3%] 2025-12-04T13:20:27.8153524Z test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda PASSED [1.2806s] [ 3%] 2025-12-04T13:20:27.8153736Z test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda PASSED [1.2879s] [ 3%] 2025-12-04T13:20:27.8153953Z test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda PASSED [1.2704s] [ 3%] 2025-12-04T13:20:27.8154163Z test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda PASSED [2.1039s] [ 3%] 2025-12-04T13:20:27.8154372Z test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda PASSED [2.4977s] [ 3%] 2025-12-04T13:20:27.8154578Z test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda PASSED [1.2898s] [ 3%] 2025-12-04T13:20:27.8154784Z test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda PASSED [1.2690s] [ 3%] 2025-12-04T13:20:27.8155006Z test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda PASSED [1.2759s] [ 3%] 2025-12-04T13:20:27.8155216Z test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda PASSED [1.2914s] [ 3%] 2025-12-04T13:20:27.8155424Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda PASSED [1.2968s] [ 3%] 2025-12-04T13:20:27.8155631Z test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda PASSED [1.3888s] [ 3%] 2025-12-04T13:20:27.8155854Z test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda PASSED [1.2843s] [ 3%] 2025-12-04T13:20:27.8156074Z test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda PASSED [1.2703s] [ 3%] 2025-12-04T13:20:27.8156300Z test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda PASSED [1.2963s] [ 3%] 2025-12-04T13:20:27.8156512Z test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda PASSED [1.2698s] [ 3%] 2025-12-04T13:20:27.8156725Z test_ops.py::TestCommonCUDA::test_dtypes_exponential_cuda PASSED [1.2827s] [ 3%] 2025-12-04T13:20:27.8156941Z test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda PASSED [0.1376s] [ 3%] 2025-12-04T13:20:27.8157148Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda PASSED [3.1879s] [ 3%] 2025-12-04T13:20:27.8157357Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda PASSED [8.1280s] [ 3%] 2025-12-04T13:20:27.8157566Z test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda PASSED [1.2502s] [ 3%] 2025-12-04T13:20:27.8157773Z test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda PASSED [1.2742s] [ 3%] 2025-12-04T13:20:27.8157981Z test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda PASSED [1.2758s] [ 3%] 2025-12-04T13:20:27.8158187Z test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda PASSED [1.2771s] [ 3%] 2025-12-04T13:20:27.8158396Z test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda PASSED [1.2617s] [ 3%] 2025-12-04T13:20:27.8158603Z test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda PASSED [1.2506s] [ 4%] 2025-12-04T13:20:27.8158809Z test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda PASSED [1.3457s] [ 4%] 2025-12-04T13:20:27.8159032Z test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda PASSED [1.5046s] [ 4%] 2025-12-04T13:20:27.8159239Z test_ops.py::TestCommonCUDA::test_dtypes_item_cuda PASSED [1.2645s] [ 4%] 2025-12-04T13:20:27.8159446Z test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda PASSED [1.2665s] [ 4%] 2025-12-04T13:20:27.8159652Z test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda PASSED [2.7877s] [ 4%] 2025-12-04T13:20:27.8159885Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda PASSED [1.7226s] [ 4%] 2025-12-04T13:20:27.8160110Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda PASSED [2.6937s] [ 4%] 2025-12-04T13:20:27.8160330Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda PASSED [1.3184s] [ 4%] 2025-12-04T13:20:27.8160557Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda PASSED [1.9394s] [ 4%] 2025-12-04T13:20:27.8160792Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda PASSED [1.3469s] [ 4%] 2025-12-04T13:20:27.8161028Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda PASSED [1.7234s] [ 4%] 2025-12-04T13:20:27.8161278Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda PASSED [1.3670s] [ 4%] 2025-12-04T13:20:27.8161527Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda PASSED [1.8543s] [ 4%] 2025-12-04T13:20:27.8161749Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda PASSED [1.2739s] [ 4%] 2025-12-04T13:20:27.8161972Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda PASSED [1.2695s] [ 4%] 2025-12-04T13:20:27.8162199Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda PASSED [1.2559s] [ 4%] 2025-12-04T13:20:27.8162436Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda PASSED [1.2436s] [ 4%] 2025-12-04T13:20:27.8162676Z test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda PASSED [1.2627s] [ 4%] 2025-12-04T13:20:27.8162925Z test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda PASSED [1.2356s] [ 4%] 2025-12-04T13:20:27.8163146Z test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda PASSED [2.0086s] [ 4%] 2025-12-04T13:20:27.8163404Z test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda PASSED [2.0002s] [ 4%] 2025-12-04T13:20:27.8163619Z test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda PASSED [1.3787s] [ 4%] 2025-12-04T13:20:27.8163837Z test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda PASSED [1.3284s] [ 4%] 2025-12-04T13:20:27.8164066Z test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda PASSED [1.2685s] [ 4%] 2025-12-04T13:20:27.8164294Z test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda PASSED [1.2787s] [ 4%] 2025-12-04T13:20:27.8164535Z test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda PASSED [1.3147s] [ 4%] 2025-12-04T13:20:27.8164757Z test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda PASSED [1.2589s] [ 4%] 2025-12-04T13:20:27.8164999Z test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda PASSED [3.7713s] [ 4%] 2025-12-04T13:20:27.8165242Z test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda PASSED [1.2665s] [ 4%] 2025-12-04T13:20:27.8165472Z test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda PASSED [1.2446s] [ 4%] 2025-12-04T13:20:27.8165702Z test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda PASSED [1.2429s] [ 4%] 2025-12-04T13:20:27.8165911Z test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda PASSED [1.2424s] [ 4%] 2025-12-04T13:20:27.8166118Z test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda PASSED [1.2336s] [ 4%] 2025-12-04T13:20:27.8166323Z test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda PASSED [1.7203s] [ 4%] 2025-12-04T13:20:27.8166538Z test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda PASSED [1.2800s] [ 4%] 2025-12-04T13:20:27.8166772Z test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda PASSED [1.2731s] [ 4%] 2025-12-04T13:20:27.8167037Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda PASSED [1.2426s] [ 4%] 2025-12-04T13:20:27.8167312Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda PASSED [3.5501s] [ 4%] 2025-12-04T13:20:27.8167732Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x72029e012600 size: 1024 2025-12-04T13:20:27.8168244Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x72029e012600 size: 1024 2025-12-04T13:20:27.8168668Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x72029e012800 size: 1024 2025-12-04T13:20:27.8169096Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x72029e012800 size: 1024 2025-12-04T13:20:27.8169515Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x72029e032200 size: 1024 2025-12-04T13:20:27.8169921Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x72029e032200 size: 1024 2025-12-04T13:20:27.8170334Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x72029e032400 size: 1024 2025-12-04T13:20:27.8170755Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x72029e032400 size: 1024 2025-12-04T13:20:27.8171168Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x72029e021800 size: 768 2025-12-04T13:20:27.8171572Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x72029e021800 size: 768 2025-12-04T13:20:27.8171992Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x72029e021800 size: 1024 2025-12-04T13:20:27.8172397Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x72029e021800 size: 1024 2025-12-04T13:20:27.8172813Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x72029e021a00 size: 1024 2025-12-04T13:20:27.8173289Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x72029e021a00 size: 1024 2025-12-04T13:20:27.8173558Z PASSED [2.2986s] [ 4%] 2025-12-04T13:20:27.8173746Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda PASSED [1.2365s] [ 4%] 2025-12-04T13:20:27.8174025Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda PASSED [0.8128s] [ 4%] 2025-12-04T13:20:27.8174407Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda PASSED [0.8424s] [ 4%] 2025-12-04T13:20:27.8174674Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda PASSED [0.9439s] [ 4%] 2025-12-04T13:20:27.8174938Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda PASSED [0.8216s] [ 4%] 2025-12-04T13:20:27.8175202Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda PASSED [0.8476s] [ 4%] 2025-12-04T13:20:27.8175478Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda PASSED [0.8290s] [ 4%] 2025-12-04T13:20:27.8175756Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda PASSED [0.8456s] [ 4%] 2025-12-04T13:20:27.8176018Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda PASSED [3.2294s] [ 4%] 2025-12-04T13:20:27.8176268Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda PASSED [1.7630s] [ 4%] 2025-12-04T13:20:27.8176551Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda PASSED [0.8195s] [ 4%] 2025-12-04T13:20:27.8176815Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda PASSED [0.8012s] [ 4%] 2025-12-04T13:20:27.8177061Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda PASSED [0.8151s] [ 4%] 2025-12-04T13:20:27.8177326Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda PASSED [0.8157s] [ 4%] 2025-12-04T13:20:27.8177590Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda PASSED [0.8105s] [ 4%] 2025-12-04T13:20:27.8177831Z test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda PASSED [0.8542s] [ 4%] 2025-12-04T13:20:27.8178043Z test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda PASSED [0.8037s] [ 4%] 2025-12-04T13:20:27.8178260Z test_ops.py::TestCommonCUDA::test_dtypes_normal_in_place_cuda PASSED [0.8108s] [ 4%] 2025-12-04T13:20:27.8178487Z test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda PASSED [0.8093s] [ 4%] 2025-12-04T13:20:27.8178696Z test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda PASSED [0.7872s] [ 4%] 2025-12-04T13:20:27.8178910Z test_ops.py::TestCommonCUDA::test_dtypes_permute_copy_cuda PASSED [0.7979s] [ 4%] 2025-12-04T13:20:27.8179161Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda SKIPPED [0.0002s] (Skipped!) [ 4%] 2025-12-04T13:20:27.8179409Z test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda PASSED [0.8385s] [ 4%] 2025-12-04T13:20:27.8179621Z test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda PASSED [0.8278s] [ 4%] 2025-12-04T13:20:27.8179836Z test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda PASSED [0.8129s] [ 4%] 2025-12-04T13:20:27.8180045Z test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda PASSED [0.8437s] [ 4%] 2025-12-04T13:20:27.8180256Z test_ops.py::TestCommonCUDA::test_dtypes_round_cuda PASSED [0.7859s] [ 4%] 2025-12-04T13:20:27.8180482Z test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda PASSED [1.0317s] [ 5%] 2025-12-04T13:20:27.8180706Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda PASSED [0.8576s] [ 5%] 2025-12-04T13:20:27.8180938Z test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda PASSED [1.2120s] [ 5%] 2025-12-04T13:20:27.8181152Z test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda PASSED [1.9767s] [ 5%] 2025-12-04T13:20:27.8181363Z test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda PASSED [0.8051s] [ 5%] 2025-12-04T13:20:27.8181593Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda PASSED [0.7995s] [ 5%] 2025-12-04T13:20:27.8181865Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda PASSED [0.8097s] [ 5%] 2025-12-04T13:20:27.8182125Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda PASSED [0.8135s] [ 5%] 2025-12-04T13:20:27.8182370Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda PASSED [0.8166s] [ 5%] 2025-12-04T13:20:27.8182619Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda PASSED [0.8129s] [ 5%] 2025-12-04T13:20:27.8182860Z test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda PASSED [0.8072s] [ 5%] 2025-12-04T13:20:27.8183100Z test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda PASSED [0.9124s] [ 5%] 2025-12-04T13:20:27.8183394Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda PASSED [1.1868s] [ 5%] 2025-12-04T13:20:27.8183642Z test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda PASSED [1.3506s] [ 5%] 2025-12-04T13:20:27.8183884Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda PASSED [1.1587s] [ 5%] 2025-12-04T13:20:27.8184127Z test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda PASSED [0.8023s] [ 5%] 2025-12-04T13:20:27.8184480Z test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda PASSED [0.8207s] [ 5%] 2025-12-04T13:20:27.8184727Z test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda PASSED [1.0322s] [ 5%] 2025-12-04T13:20:27.8184941Z test_ops.py::TestCommonCUDA::test_dtypes_squeeze_copy_cuda PASSED [0.8391s] [ 5%] 2025-12-04T13:20:27.8185157Z test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda PASSED [0.8101s] [ 5%] 2025-12-04T13:20:27.8185365Z test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda PASSED [1.2113s] [ 5%] 2025-12-04T13:20:27.8185570Z test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda PASSED [0.8652s] [ 5%] 2025-12-04T13:20:27.8185793Z test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda PASSED [0.8164s] [ 5%] 2025-12-04T13:20:27.8186002Z test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda PASSED [0.8109s] [ 5%] 2025-12-04T13:20:27.8186228Z test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda PASSED [0.8176s] [ 5%] 2025-12-04T13:20:27.8186445Z test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda PASSED [0.7992s] [ 5%] 2025-12-04T13:20:27.8186662Z test_ops.py::TestCommonCUDA::test_dtypes_unravel_index_cuda PASSED [0.8282s] [ 5%] 2025-12-04T13:20:27.8186886Z test_ops.py::TestCommonCUDA::test_dtypes_unsafe_split_cuda PASSED [0.8032s] [ 5%] 2025-12-04T13:20:27.8187099Z test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda PASSED [0.8079s] [ 5%] 2025-12-04T13:20:27.8187307Z test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda PASSED [0.8323s] [ 5%] 2025-12-04T13:20:27.8187515Z test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda PASSED [0.7948s] [ 5%] 2025-12-04T13:20:27.8187726Z test_ops.py::TestCommonCUDA::test_errors_T_cuda PASSED [0.0026s] [ 5%] 2025-12-04T13:20:27.8187933Z test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda PASSED [0.7773s] [ 5%] 2025-12-04T13:20:27.8188139Z test_ops.py::TestCommonCUDA::test_errors_amin_cuda PASSED [0.7827s] [ 5%] 2025-12-04T13:20:27.8188345Z test_ops.py::TestCommonCUDA::test_errors_arange_cuda PASSED [0.0077s] [ 5%] 2025-12-04T13:20:27.8188555Z test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda PASSED [0.7691s] [ 5%] 2025-12-04T13:20:27.8188781Z test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda XFAIL [0.0043s] [ 5%] 2025-12-04T13:20:27.8188990Z test_ops.py::TestCommonCUDA::test_errors_complex_cuda PASSED [1.5596s] [ 5%] 2025-12-04T13:20:27.8189197Z test_ops.py::TestCommonCUDA::test_errors_copysign_cuda PASSED [0.7858s] [ 5%] 2025-12-04T13:20:27.8189405Z test_ops.py::TestCommonCUDA::test_errors_diag_cuda PASSED [0.7726s] [ 5%] 2025-12-04T13:20:27.8189621Z test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda PASSED [0.7760s] [ 5%] 2025-12-04T13:20:27.8189851Z test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda PASSED [0.0026s] [ 5%] 2025-12-04T13:20:27.8190092Z test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda PASSED [0.7708s] [ 5%] 2025-12-04T13:20:27.8190301Z test_ops.py::TestCommonCUDA::test_errors_eq_cuda PASSED [0.7784s] [ 5%] 2025-12-04T13:20:27.8190507Z test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda PASSED [0.7731s] [ 5%] 2025-12-04T13:20:27.8190717Z test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda PASSED [0.7674s] [ 5%] 2025-12-04T13:20:27.8190924Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda PASSED [0.7760s] [ 5%] 2025-12-04T13:20:27.8191131Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda PASSED [0.7770s] [ 5%] 2025-12-04T13:20:27.8191343Z test_ops.py::TestCommonCUDA::test_errors_float_power_cuda PASSED [0.0035s] [ 5%] 2025-12-04T13:20:27.8191561Z test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda PASSED [0.0021s] [ 5%] 2025-12-04T13:20:27.8191774Z test_ops.py::TestCommonCUDA::test_errors_fmin_cuda PASSED [0.7826s] [ 5%] 2025-12-04T13:20:27.8191985Z test_ops.py::TestCommonCUDA::test_errors_gather_cuda PASSED [0.7735s] [ 5%] 2025-12-04T13:20:27.8192193Z test_ops.py::TestCommonCUDA::test_errors_gradient_cuda PASSED [0.8029s] [ 5%] 2025-12-04T13:20:27.8192401Z test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda PASSED [0.7799s] [ 5%] 2025-12-04T13:20:27.8192611Z test_ops.py::TestCommonCUDA::test_errors_hypot_cuda PASSED [0.7886s] [ 5%] 2025-12-04T13:20:27.8192836Z test_ops.py::TestCommonCUDA::test_errors_igamma_cuda PASSED [0.7879s] [ 5%] 2025-12-04T13:20:27.8193043Z test_ops.py::TestCommonCUDA::test_errors_index_add_cuda PASSED [0.0038s] [ 5%] 2025-12-04T13:20:27.8193278Z test_ops.py::TestCommonCUDA::test_errors_item_cuda PASSED [0.7827s] [ 5%] 2025-12-04T13:20:27.8193510Z test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda PASSED [0.7731s] [ 5%] 2025-12-04T13:20:27.8193758Z test_ops.py::TestCommonCUDA::test_errors_linspace_cuda PASSED [0.7742s] [ 5%] 2025-12-04T13:20:27.8193972Z test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda PASSED [0.7728s] [ 5%] 2025-12-04T13:20:27.8194208Z test_ops.py::TestCommonCUDA::test_errors_logspace_tensor_overload_cuda PASSED [0.7752s] [ 5%] 2025-12-04T13:20:27.8194441Z test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda PASSED [0.7924s] [ 5%] 2025-12-04T13:20:27.8194651Z test_ops.py::TestCommonCUDA::test_errors_maximum_cuda PASSED [0.7869s] [ 5%] 2025-12-04T13:20:27.8194861Z test_ops.py::TestCommonCUDA::test_errors_mul_cuda PASSED [0.0027s] [ 5%] 2025-12-04T13:20:27.8195071Z test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda PASSED [0.7741s] [ 5%] 2025-12-04T13:20:27.8195295Z test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda PASSED [0.7820s] [ 5%] 2025-12-04T13:20:27.8195550Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_max_pool1d_cuda PASSED [0.7782s] [ 6%] 2025-12-04T13:20:27.8195812Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda PASSED [0.0075s] [ 6%] 2025-12-04T13:20:27.8196059Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda PASSED [0.7870s] [ 6%] 2025-12-04T13:20:27.8196316Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda PASSED [0.7744s] [ 6%] 2025-12-04T13:20:27.8196595Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_multilabel_margin_loss_cuda PASSED [0.7838s] [ 6%] 2025-12-04T13:20:27.8196878Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda PASSED [0.7809s] [ 6%] 2025-12-04T13:20:27.8197103Z test_ops.py::TestCommonCUDA::test_errors_pow_cuda PASSED [0.0036s] [ 6%] 2025-12-04T13:20:27.8197313Z test_ops.py::TestCommonCUDA::test_errors_remainder_cuda PASSED [0.0030s] [ 6%] 2025-12-04T13:20:27.8197520Z test_ops.py::TestCommonCUDA::test_errors_roll_cuda PASSED [0.7825s] [ 6%] 2025-12-04T13:20:27.8197726Z test_ops.py::TestCommonCUDA::test_errors_rot90_cuda PASSED [0.7930s] [ 6%] 2025-12-04T13:20:27.8197933Z test_ops.py::TestCommonCUDA::test_errors_scatter_cuda PASSED [0.7823s] [ 6%] 2025-12-04T13:20:27.8198185Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda PASSED [0.0068s] [ 6%] 2025-12-04T13:20:27.8198441Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda PASSED [0.0058s] [ 6%] 2025-12-04T13:20:27.8198678Z test_ops.py::TestCommonCUDA::test_errors_sparse_mul_layout3_cuda PASSED [0.0030s] [ 6%] 2025-12-04T13:20:27.8198914Z test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout4_cuda PASSED [0.0013s] [ 6%] 2025-12-04T13:20:27.8199164Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda PASSED [0.0028s] [ 6%] 2025-12-04T13:20:27.8199431Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda PASSED [0.0027s] [ 6%] 2025-12-04T13:20:27.8199668Z test_ops.py::TestCommonCUDA::test_errors_trace_cuda PASSED [0.7813s] [ 6%] 2025-12-04T13:20:27.8199877Z test_ops.py::TestCommonCUDA::test_errors_tril_cuda PASSED [0.7867s] [ 6%] 2025-12-04T13:20:27.8200090Z test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda PASSED [0.0026s] [ 6%] 2025-12-04T13:20:27.8200396Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__batch_norm_with_update_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8200767Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_abs_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8201131Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_addcmul_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8201476Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_any_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8201830Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_as_strided_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8202201Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_asinh_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8202545Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan2_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8202887Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8203231Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ceil_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8203612Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cumprod_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8203956Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfc_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8204300Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_erfinv_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8204646Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_fft_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8204995Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8205381Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ifft2_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8205737Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfft2_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8206090Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8206445Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_ihfftn_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8206813Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_rfft2_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8207160Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8207505Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_frexp_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8207848Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_gt_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8208186Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_hypot_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8208523Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_i0_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8208861Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_ldexp_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8209210Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cond_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8209569Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_cross_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8209944Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_det_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8210387Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_householder_product_cuda_float32 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 6%] 2025-12-04T13:20:27.8210856Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_ldl_factor_ex_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8211239Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_factor_ex_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8211618Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8212001Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_hermitian_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8212381Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_slogdet_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8212738Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log1p_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8213081Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8213460Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logical_not_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8213837Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logspace_tensor_overload_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8214234Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_solve_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8214585Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_lu_unpack_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8214942Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_masked_select_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8215297Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_matmul_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8215678Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_max_reduction_with_dim_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8216064Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_min_reduction_no_dim_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8216426Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mm_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8216767Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_msort_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8217107Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mv_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 6%] 2025-12-04T13:20:27.8217455Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanquantile_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8217816Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_narrow_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8218192Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_avg_pool2d_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8218563Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8218922Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8219285Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_normal_number_mean_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8219648Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_rad2deg_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8220025Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_0_cuda_float32 SKIPPED [0.0014s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8220387Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_round_decimals_3_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 7%] 2025-12-04T13:20:27.8220754Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_amax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8221135Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_scatter_reduce_sum_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8221497Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sigmoid_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8221844Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_signbit_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8222192Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sort_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8222557Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sparse_sampled_addmm_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8222938Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_bessel_j1_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8223379Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_u_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8223794Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_v_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8224204Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8224603Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_i0e_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8224986Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_legendre_polynomial_p_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8225377Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_log_ndtr_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8225746Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_ndtri_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8226144Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8226536Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_topk_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8226884Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_tril_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8227234Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_true_divide_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8227599Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unsqueeze_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8227973Z test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_vdot_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 7%] 2025-12-04T13:20:27.8228318Z test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8228671Z test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8229003Z test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8229336Z test_ops.py::TestCommonCUDA::test_multiple_devices__chunk_cat_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8229689Z test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8230064Z test_ops.py::TestCommonCUDA::test_multiple_devices__upsample_bilinear2d_aa_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8230418Z test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8230743Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8231071Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8231398Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8231724Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8232061Z test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8232393Z test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8232723Z test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8233051Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8233436Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_copy_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8233804Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8234156Z test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8234484Z test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8234811Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8235134Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8235460Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8235787Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8236121Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8236489Z test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8236830Z test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8237170Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8237526Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8237862Z test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8238199Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8238542Z test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8238879Z test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8239220Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8239574Z test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8239920Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8240252Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8240583Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8240931Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 7%] 2025-12-04T13:20:27.8241280Z test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8241621Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8241964Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8242326Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8242665Z test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8242995Z test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8243422Z test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8243749Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8244103Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8244476Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8244828Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8245167Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8245524Z test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8245859Z test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8246198Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_permuted_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8246548Z test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8246871Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8247195Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8247517Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8247842Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8248162Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8248486Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8248817Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8249147Z test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8249476Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8249826Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8250167Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8250500Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8250834Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8251183Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8251521Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8251863Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8252202Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8252535Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8252865Z test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8253315Z test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8253641Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8253969Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8254315Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8254642Z test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8254969Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8255314Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8255652Z test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8255994Z test_ops.py::TestCommonCUDA::test_multiple_devices_hash_tensor_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8256333Z test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8256663Z test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8257000Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8257342Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8257681Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8258027Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_amax_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8258382Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8258751Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8259087Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8259414Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8259737Z test_ops.py::TestCommonCUDA::test_multiple_devices_item_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8260109Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8260469Z test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8260795Z test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8261138Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8261492Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8261848Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8262204Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8262569Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8262924Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8263321Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8263672Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8264047Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 8%] 2025-12-04T13:20:27.8264402Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8264749Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8265113Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_tensor_overload_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8265467Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8265806Z test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8266148Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8266510Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8266892Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_tensor_overload_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8267254Z test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8267605Z test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8267940Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8268284Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8268626Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8268992Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8269356Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8269708Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8270060Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8270417Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8270786Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8271141Z test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8271477Z test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8271827Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8272219Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8272576Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8272910Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8273287Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8273625Z test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8273955Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8274319Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8274713Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8275125Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8275520Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8275897Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8276295Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8276723Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8277131Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8277521Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8277918Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8278295Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8278696Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8279103Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8279486Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8279891Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8280308Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8280687Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8281076Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8281467Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8281863Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8282241Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8282587Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_in_place_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8282947Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8283328Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8283660Z test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8284017Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8284371Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 9%] 2025-12-04T13:20:27.8284701Z test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8285038Z test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8285352Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64 SKIPPED [0.0001s] (Skipped!) [ 9%] 2025-12-04T13:20:27.8285682Z test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8286022Z test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8286361Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8286697Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8287055Z test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8287384Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8287729Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8288074Z test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8288403Z test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 9%] 2025-12-04T13:20:27.8288759Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8289124Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8289464Z test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8289807Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8290181Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8290563Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8290943Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8291327Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8291704Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8292074Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8298805Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8299200Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8299555Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8299914Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8300273Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8300618Z test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8300989Z test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8301329Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8301658Z test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8301980Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8302336Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8302684Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8303028Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8303399Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8303732Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32 SKIPPED [0.0014s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8304062Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8304388Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8304730Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8305063Z test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8305422Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8305765Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8306105Z test_ops.py::TestCommonCUDA::test_multiple_devices_unsafe_split_cuda_int64 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8306454Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64 SKIPPED [0.0012s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8306784Z test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32 SKIPPED [0.0011s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8307115Z test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64 SKIPPED [0.0013s] (fewer than 2 devices detected) [ 10%] 2025-12-04T13:20:27.8307410Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool PASSED [0.7866s] [ 10%] 2025-12-04T13:20:27.8307676Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool PASSED [0.0087s] [ 10%] 2025-12-04T13:20:27.8307943Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argsort_cuda_bool PASSED [0.8537s] [ 10%] 2025-12-04T13:20:27.8308219Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_copy_cuda_bool PASSED [0.7773s] [ 10%] 2025-12-04T13:20:27.8308497Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool PASSED [0.0079s] [ 10%] 2025-12-04T13:20:27.8308758Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool PASSED [0.7861s] [ 10%] 2025-12-04T13:20:27.8309020Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool PASSED [0.7834s] [ 10%] 2025-12-04T13:20:27.8309284Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool PASSED [0.7854s] [ 10%] 2025-12-04T13:20:27.8309562Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool PASSED [0.8018s] [ 10%] 2025-12-04T13:20:27.8309834Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool PASSED [0.7846s] [ 10%] 2025-12-04T13:20:27.8310103Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool PASSED [0.7982s] [ 10%] 2025-12-04T13:20:27.8310364Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool PASSED [0.7793s] [ 10%] 2025-12-04T13:20:27.8310625Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool PASSED [0.8072s] [ 10%] 2025-12-04T13:20:27.8310884Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool PASSED [0.7941s] [ 10%] 2025-12-04T13:20:27.8311169Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool PASSED [0.7834s] [ 10%] 2025-12-04T13:20:27.8311427Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool PASSED [0.7916s] [ 10%] 2025-12-04T13:20:27.8311691Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool PASSED [1.0406s] [ 10%] 2025-12-04T13:20:27.8311964Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool PASSED [1.1041s] [ 10%] 2025-12-04T13:20:27.8312231Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool PASSED [0.8162s] [ 10%] 2025-12-04T13:20:27.8312495Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool PASSED [0.7911s] [ 10%] 2025-12-04T13:20:27.8312766Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool PASSED [1.3991s] [ 10%] 2025-12-04T13:20:27.8313036Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool PASSED [0.9324s] [ 10%] 2025-12-04T13:20:27.8313337Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool PASSED [0.7875s] [ 10%] 2025-12-04T13:20:27.8313595Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool PASSED [0.7819s] [ 10%] 2025-12-04T13:20:27.8313880Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool PASSED [0.7880s] [ 10%] 2025-12-04T13:20:27.8314175Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool PASSED [0.2764s] [ 10%] 2025-12-04T13:20:27.8314465Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool PASSED [0.9635s] [ 10%] 2025-12-04T13:20:27.8314748Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool PASSED [0.7995s] [ 10%] 2025-12-04T13:20:27.8315055Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool PASSED [0.0072s] [ 10%] 2025-12-04T13:20:27.8315331Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool PASSED [0.7857s] [ 10%] 2025-12-04T13:20:27.8315600Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool PASSED [0.7847s] [ 10%] 2025-12-04T13:20:27.8315861Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool PASSED [0.7870s] [ 10%] 2025-12-04T13:20:27.8316131Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool PASSED [1.4728s] [ 11%] 2025-12-04T13:20:27.8316402Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool PASSED [0.7889s] [ 11%] 2025-12-04T13:20:27.8316669Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool PASSED [0.7944s] [ 11%] 2025-12-04T13:20:27.8316934Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool PASSED [0.7813s] [ 11%] 2025-12-04T13:20:27.8317197Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool PASSED [0.7967s] [ 11%] 2025-12-04T13:20:27.8317502Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool SKIPPED [0.0002s] (Skipped!) [ 11%] 2025-12-04T13:20:27.8317843Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool SKIPPED [0.0001s] (Skipped!) [ 11%] 2025-12-04T13:20:27.8318162Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool PASSED [0.7822s] [ 11%] 2025-12-04T13:20:27.8318425Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool PASSED [0.7808s] [ 11%] 2025-12-04T13:20:27.8318695Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool PASSED [0.7877s] [ 11%] 2025-12-04T13:20:27.8318983Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool SKIPPED [0.0002s] (Skipped!) [ 11%] 2025-12-04T13:20:27.8319283Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool PASSED [0.9883s] [ 11%] 2025-12-04T13:20:27.8319588Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool PASSED [0.9639s] [ 11%] 2025-12-04T13:20:27.8319874Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool PASSED [0.7856s] [ 11%] 2025-12-04T13:20:27.8320152Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool PASSED [0.9433s] [ 11%] 2025-12-04T13:20:27.8320425Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool PASSED [0.9501s] [ 11%] 2025-12-04T13:20:27.8320718Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool PASSED [0.9477s] [ 11%] 2025-12-04T13:20:27.8321052Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool PASSED [0.5373s] [ 11%] 2025-12-04T13:20:27.8321371Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool PASSED [0.0068s] [ 11%] 2025-12-04T13:20:27.8321652Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool PASSED [0.5010s] [ 11%] 2025-12-04T13:20:27.8321920Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool PASSED [0.7821s] [ 11%] 2025-12-04T13:20:27.8322179Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool PASSED [0.7939s] [ 11%] 2025-12-04T13:20:27.8322447Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool PASSED [0.7831s] [ 11%] 2025-12-04T13:20:27.8322716Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool PASSED [0.7838s] [ 11%] 2025-12-04T13:20:27.8322988Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool PASSED [0.7924s] [ 11%] 2025-12-04T13:20:27.8323297Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool PASSED [0.7790s] [ 11%] 2025-12-04T13:20:27.8323581Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool PASSED [0.7897s] [ 11%] 2025-12-04T13:20:27.8323864Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool PASSED [0.1581s] [ 11%] 2025-12-04T13:20:27.8324151Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_chunk_cuda_bool PASSED [0.7798s] [ 11%] 2025-12-04T13:20:27.8324418Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool PASSED [0.7870s] [ 11%] 2025-12-04T13:20:27.8324677Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool PASSED [0.7883s] [ 11%] 2025-12-04T13:20:27.8324931Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64 PASSED [0.7799s] [ 11%] 2025-12-04T13:20:27.8325190Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32 PASSED [0.0144s] [ 11%] 2025-12-04T13:20:27.8325479Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_float32 PASSED [0.7997s] [ 11%] 2025-12-04T13:20:27.8325781Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__unsafe_masked_index_cuda_int64 PASSED [0.7834s] [ 11%] 2025-12-04T13:20:27.8326062Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32 PASSED [0.7838s] [ 11%] 2025-12-04T13:20:27.8326326Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32 PASSED [0.7856s] [ 11%] 2025-12-04T13:20:27.8326604Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64 PASSED [0.7934s] [ 11%] 2025-12-04T13:20:27.8326869Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32 PASSED [0.7834s] [ 11%] 2025-12-04T13:20:27.8327151Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64 PASSED [0.7807s] [ 11%] 2025-12-04T13:20:27.8327437Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64 PASSED [0.7871s] [ 11%] 2025-12-04T13:20:27.8327712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64 PASSED [0.7961s] [ 11%] 2025-12-04T13:20:27.8327987Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32 PASSED [0.7843s] [ 11%] 2025-12-04T13:20:27.8328270Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32 PASSED [0.7892s] [ 11%] 2025-12-04T13:20:27.8328537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32 PASSED [0.0076s] [ 11%] 2025-12-04T13:20:27.8328807Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64 PASSED [0.0050s] [ 11%] 2025-12-04T13:20:27.8329070Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32 PASSED [0.7835s] [ 11%] 2025-12-04T13:20:27.8329331Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32 PASSED [0.7792s] [ 11%] 2025-12-04T13:20:27.8329590Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64 PASSED [0.7941s] [ 11%] 2025-12-04T13:20:27.8329855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64 PASSED [0.7859s] [ 11%] 2025-12-04T13:20:27.8330130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_int64 XFAIL [0.0051s] [ 11%] 2025-12-04T13:20:27.8330484Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64 SKIPPED [0.0001s] (Works for int64, fails for everything else) [ 11%] 2025-12-04T13:20:27.8330831Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32 PASSED [0.8083s] [ 11%] 2025-12-04T13:20:27.8331113Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32 PASSED [0.0114s] [ 11%] 2025-12-04T13:20:27.8331377Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64 PASSED [0.9140s] [ 11%] 2025-12-04T13:20:27.8331649Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32 PASSED [0.7335s] [ 11%] 2025-12-04T13:20:27.8331938Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64 PASSED [0.7275s] [ 11%] 2025-12-04T13:20:27.8332211Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64 PASSED [0.7213s] [ 11%] 2025-12-04T13:20:27.8332482Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64 PASSED [0.0065s] [ 11%] 2025-12-04T13:20:27.8332748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64 PASSED [0.7200s] [ 11%] 2025-12-04T13:20:27.8333026Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64 PASSED [0.7270s] [ 11%] 2025-12-04T13:20:27.8333358Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32 PASSED [0.7368s] [ 11%] 2025-12-04T13:20:27.8333632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64 PASSED [0.7241s] [ 11%] 2025-12-04T13:20:27.8333898Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64 PASSED [0.7220s] [ 11%] 2025-12-04T13:20:27.8334164Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64 PASSED [0.7223s] [ 11%] 2025-12-04T13:20:27.8334431Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64 PASSED [0.7261s] [ 11%] 2025-12-04T13:20:27.8334698Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64 PASSED [0.7296s] [ 11%] 2025-12-04T13:20:27.8334974Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32 PASSED [0.7374s] [ 12%] 2025-12-04T13:20:27.8335684Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64 SKIPPED [0.0004s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/165294 for platform(s) rocm. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 12%] 2025-12-04T13:20:27.8336350Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32 PASSED [0.0152s] [ 12%] 2025-12-04T13:20:27.8336631Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32 PASSED [0.0059s] [ 12%] 2025-12-04T13:20:27.8336910Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64 PASSED [0.0047s] [ 12%] 2025-12-04T13:20:27.8337171Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64 PASSED [0.0034s] [ 12%] 2025-12-04T13:20:27.8337447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64 PASSED [0.0310s] [ 12%] 2025-12-04T13:20:27.8337739Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32 PASSED [0.7324s] [ 12%] 2025-12-04T13:20:27.8338010Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64 PASSED [0.7312s] [ 12%] 2025-12-04T13:20:27.8338273Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64 PASSED [1.4957s] [ 12%] 2025-12-04T13:20:27.8338534Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64 PASSED [0.7234s] [ 12%] 2025-12-04T13:20:27.8338808Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64 PASSED [0.7224s] [ 12%] 2025-12-04T13:20:27.8339090Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64 PASSED [0.7235s] [ 12%] 2025-12-04T13:20:27.8339366Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64 PASSED [0.7301s] [ 12%] 2025-12-04T13:20:27.8339644Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32 PASSED [0.7429s] [ 12%] 2025-12-04T13:20:27.8339933Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32 PASSED [0.7709s] [ 12%] 2025-12-04T13:20:27.8340196Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32 PASSED [0.7205s] [ 12%] 2025-12-04T13:20:27.8340457Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64 PASSED [0.7277s] [ 12%] 2025-12-04T13:20:27.8340737Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64 PASSED [0.7321s] [ 12%] 2025-12-04T13:20:27.8341032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_permuted_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 12%] 2025-12-04T13:20:27.8341358Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_strided_cuda_complex64 SKIPPED [0.0001s] (Skipped!) [ 12%] 2025-12-04T13:20:27.8341650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64 PASSED [0.7298s] [ 12%] 2025-12-04T13:20:27.8341910Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32 PASSED [0.7265s] [ 12%] 2025-12-04T13:20:27.8342174Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64 PASSED [0.7286s] [ 12%] 2025-12-04T13:20:27.8342443Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32 PASSED [0.7202s] [ 12%] 2025-12-04T13:20:27.8342707Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64 PASSED [0.7310s] [ 12%] 2025-12-04T13:20:27.8342984Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64 PASSED [0.7308s] [ 12%] 2025-12-04T13:20:27.8343302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64 PASSED [0.7368s] [ 12%] 2025-12-04T13:20:27.8343579Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64 PASSED [0.7334s] [ 12%] 2025-12-04T13:20:27.8343848Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64 PASSED [0.7439s] [ 12%] 2025-12-04T13:20:27.8344143Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32 PASSED [0.7581s] [ 12%] 2025-12-04T13:20:27.8344416Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64 PASSED [0.7376s] [ 12%] 2025-12-04T13:20:27.8344686Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32 PASSED [1.0504s] [ 12%] 2025-12-04T13:20:27.8344956Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64 PASSED [0.7608s] [ 12%] 2025-12-04T13:20:27.8345228Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64 PASSED [0.0044s] [ 12%] 2025-12-04T13:20:27.8345510Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64 PASSED [0.7384s] [ 12%] 2025-12-04T13:20:27.8345774Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64 PASSED [0.7231s] [ 12%] 2025-12-04T13:20:27.8346035Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32 PASSED [0.0127s] [ 12%] 2025-12-04T13:20:27.8346295Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64 PASSED [0.0058s] [ 12%] 2025-12-04T13:20:27.8346560Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32 PASSED [0.8736s] [ 12%] 2025-12-04T13:20:27.8346825Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64 PASSED [0.7231s] [ 12%] 2025-12-04T13:20:27.8347087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64 PASSED [0.4215s] [ 12%] 2025-12-04T13:20:27.8347345Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32 PASSED [0.0056s] [ 12%] 2025-12-04T13:20:27.8347612Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geometric_cuda_float32 PASSED [0.7249s] [ 12%] 2025-12-04T13:20:27.8347886Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hash_tensor_cuda_float32 PASSED [0.7263s] [ 12%] 2025-12-04T13:20:27.8348157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64 PASSED [0.7200s] [ 12%] 2025-12-04T13:20:27.8348433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32 PASSED [1.2926s] [ 12%] 2025-12-04T13:20:27.8348693Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32 PASSED [0.0077s] [ 12%] 2025-12-04T13:20:27.8348967Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64 PASSED [0.7279s] [ 12%] 2025-12-04T13:20:27.8349248Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64 PASSED [0.7393s] [ 12%] 2025-12-04T13:20:27.8349542Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32 PASSED [0.7367s] [ 12%] 2025-12-04T13:20:27.8349826Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_amax_cuda_float32 PASSED [0.7396s] [ 12%] 2025-12-04T13:20:27.8350104Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64 PASSED [0.7314s] [ 12%] 2025-12-04T13:20:27.8350365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32 PASSED [0.7275s] [ 12%] 2025-12-04T13:20:27.8350632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64 PASSED [0.7312s] [ 12%] 2025-12-04T13:20:27.8350901Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32 PASSED [0.7321s] [ 12%] 2025-12-04T13:20:27.8351161Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64 PASSED [0.7361s] [ 12%] 2025-12-04T13:20:27.8351420Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32 PASSED [0.7299s] [ 12%] 2025-12-04T13:20:27.8351687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32 PASSED [0.7309s] [ 12%] 2025-12-04T13:20:27.8351986Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64 PASSED [0.4095s] [ 12%] 2025-12-04T13:20:27.8352311Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32 PASSED [0.3863s] [ 12%] 2025-12-04T13:20:27.8352656Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [0.3806s] [ 12%] 2025-12-04T13:20:27.8352980Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64 PASSED [0.4419s] [ 12%] 2025-12-04T13:20:27.8353305Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32 PASSED [0.3797s] [ 12%] 2025-12-04T13:20:27.8353621Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64 PASSED [0.4013s] [ 12%] 2025-12-04T13:20:27.8353922Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64 PASSED [0.0050s] [ 12%] 2025-12-04T13:20:27.8354206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64 PASSED [0.9257s] [ 12%] 2025-12-04T13:20:27.8354470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32 PASSED [0.6348s] [ 13%] 2025-12-04T13:20:27.8354750Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32 PASSED [0.0152s] [ 13%] 2025-12-04T13:20:27.8355045Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64 PASSED [0.0092s] [ 13%] 2025-12-04T13:20:27.8355329Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64 PASSED [0.0060s] [ 13%] 2025-12-04T13:20:27.8355612Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64 PASSED [0.0118s] [ 13%] 2025-12-04T13:20:27.8355891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32 PASSED [0.0917s] [ 13%] 2025-12-04T13:20:27.8356171Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32 PASSED [0.0737s] [ 13%] 2025-12-04T13:20:27.8356464Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32 PASSED [0.0133s] [ 13%] 2025-12-04T13:20:27.8356757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64 PASSED [0.2144s] [ 13%] 2025-12-04T13:20:27.8357050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64 PASSED [0.0486s] [ 13%] 2025-12-04T13:20:27.8357359Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64 PASSED [1.2353s] [ 13%] 2025-12-04T13:20:27.8357637Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32 PASSED [1.2406s] [ 13%] 2025-12-04T13:20:27.8357931Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64 PASSED [0.0222s] [ 13%] 2025-12-04T13:20:27.8358242Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64 PASSED [0.0282s] [ 13%] 2025-12-04T13:20:27.8358524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32 PASSED [0.0271s] [ 13%] 2025-12-04T13:20:27.8358811Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64 PASSED [0.0279s] [ 13%] 2025-12-04T13:20:27.8359113Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32 PASSED [0.2567s] [ 13%] 2025-12-04T13:20:27.8359412Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32 PASSED [0.0312s] [ 13%] 2025-12-04T13:20:27.8359702Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32 PASSED [0.0079s] [ 13%] 2025-12-04T13:20:27.8359992Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32 PASSED [0.0273s] [ 13%] 2025-12-04T13:20:27.8360282Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32 PASSED [0.1195s] [ 13%] 2025-12-04T13:20:27.8360560Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64 PASSED [0.0039s] [ 13%] 2025-12-04T13:20:27.8360846Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64 PASSED [0.0078s] [ 13%] 2025-12-04T13:20:27.8361141Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_complex64 PASSED [0.8597s] [ 13%] 2025-12-04T13:20:27.8361433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32 PASSED [0.0225s] [ 13%] 2025-12-04T13:20:27.8361710Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32 PASSED [0.0079s] [ 13%] 2025-12-04T13:20:27.8361989Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32 PASSED [0.0053s] [ 13%] 2025-12-04T13:20:27.8362262Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64 PASSED [0.0036s] [ 13%] 2025-12-04T13:20:27.8362540Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64 PASSED [0.0052s] [ 13%] 2025-12-04T13:20:27.8362835Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64 PASSED [0.0051s] [ 13%] 2025-12-04T13:20:27.8363111Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64 PASSED [0.0051s] [ 13%] 2025-12-04T13:20:27.8363408Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32 PASSED [0.0138s] [ 13%] 2025-12-04T13:20:27.8363681Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64 PASSED [0.0041s] [ 13%] 2025-12-04T13:20:27.8363941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32 PASSED [0.0515s] [ 13%] 2025-12-04T13:20:27.8364206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64 PASSED [0.0668s] [ 13%] 2025-12-04T13:20:27.8364482Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64 PASSED [0.0445s] [ 13%] 2025-12-04T13:20:27.8364749Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32 PASSED [1.2261s] [ 13%] 2025-12-04T13:20:27.8365007Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64 PASSED [1.2265s] [ 13%] 2025-12-04T13:20:27.8365265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64 PASSED [1.2524s] [ 13%] 2025-12-04T13:20:27.8365543Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64 PASSED [1.2379s] [ 13%] 2025-12-04T13:20:27.8365843Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64 PASSED [1.2231s] [ 13%] 2025-12-04T13:20:27.8366121Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64 PASSED [1.2238s] [ 13%] 2025-12-04T13:20:27.8366399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32 PASSED [0.3185s] [ 13%] 2025-12-04T13:20:27.8366699Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32 PASSED [0.0317s] [ 13%] 2025-12-04T13:20:27.8366987Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64 PASSED [0.0093s] [ 13%] 2025-12-04T13:20:27.8367265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64 PASSED [0.0503s] [ 13%] 2025-12-04T13:20:27.8367538Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64 PASSED [0.0640s] [ 13%] 2025-12-04T13:20:27.8367811Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32 PASSED [1.2392s] [ 13%] 2025-12-04T13:20:27.8368079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32 PASSED [0.0133s] [ 13%] 2025-12-04T13:20:27.8368349Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64 PASSED [0.0055s] [ 13%] 2025-12-04T13:20:27.8368617Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32 PASSED [0.0109s] [ 13%] 2025-12-04T13:20:27.8368882Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64 PASSED [1.2293s] [ 13%] 2025-12-04T13:20:27.8369141Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64 PASSED [1.3446s] [ 13%] 2025-12-04T13:20:27.8369411Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32 PASSED [1.2173s] [ 13%] 2025-12-04T13:20:27.8369678Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32 PASSED [1.2268s] [ 13%] 2025-12-04T13:20:27.8369974Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64 PASSED [1.2126s] [ 13%] 2025-12-04T13:20:27.8370249Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32 PASSED [1.2514s] [ 13%] 2025-12-04T13:20:27.8370526Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32 PASSED [1.2312s] [ 13%] 2025-12-04T13:20:27.8370802Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32 PASSED [1.2415s] [ 13%] 2025-12-04T13:20:27.8371080Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64 SKIPPED [0.0002s] (Skipped!) [ 13%] 2025-12-04T13:20:27.8371382Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64 PASSED [1.2213s] [ 13%] 2025-12-04T13:20:27.8371655Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32 PASSED [1.2097s] [ 13%] 2025-12-04T13:20:27.8371947Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32 PASSED [0.0169s] [ 13%] 2025-12-04T13:20:27.8372264Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32 PASSED [1.5984s] [ 13%] 2025-12-04T13:20:27.8372573Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32 PASSED [0.0177s] [ 13%] 2025-12-04T13:20:27.8372874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32 PASSED [0.0372s] [ 13%] 2025-12-04T13:20:27.8373191Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32 PASSED [1.2340s] [ 13%] 2025-12-04T13:20:27.8373566Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [1.2470s] [ 14%] 2025-12-04T13:20:27.8373902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32 PASSED [1.2592s] [ 14%] 2025-12-04T13:20:27.8374217Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32 PASSED [0.0219s] [ 14%] 2025-12-04T13:20:27.8374587Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [0.0097s] [ 14%] 2025-12-04T13:20:27.8374923Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32 PASSED [0.0211s] [ 14%] 2025-12-04T13:20:27.8375232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32 PASSED [0.0348s] [ 14%] 2025-12-04T13:20:27.8375574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32 PASSED [1.2334s] [ 14%] 2025-12-04T13:20:27.8375893Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32 PASSED [1.2309s] [ 14%] 2025-12-04T13:20:27.8376222Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32 PASSED [0.0321s] [ 14%] 2025-12-04T13:20:27.8376541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32 PASSED [0.0190s] [ 14%] 2025-12-04T13:20:27.8376865Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32 PASSED [0.0229s] [ 14%] 2025-12-04T13:20:27.8377201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [1.2947s] [ 14%] 2025-12-04T13:20:27.8377536Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [1.2575s] [ 14%] 2025-12-04T13:20:27.8377884Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [1.2788s] [ 14%] 2025-12-04T13:20:27.8378221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64 PASSED [1.6592s] [ 14%] 2025-12-04T13:20:27.8378537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32 PASSED [1.2378s] [ 14%] 2025-12-04T13:20:27.8378887Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_negative_cuda_float32 PASSED [1.2615s] [ 14%] 2025-12-04T13:20:27.8379227Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64 PASSED [0.0081s] [ 14%] 2025-12-04T13:20:27.8379550Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64 PASSED [0.0043s] [ 14%] 2025-12-04T13:20:27.8379860Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64 PASSED [1.2352s] [ 14%] 2025-12-04T13:20:27.8380155Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64 PASSED [1.2466s] [ 14%] 2025-12-04T13:20:27.8380464Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32 PASSED [1.2526s] [ 14%] 2025-12-04T13:20:27.8380774Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32 PASSED [1.2782s] [ 14%] 2025-12-04T13:20:27.8381092Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64 PASSED [1.2450s] [ 14%] 2025-12-04T13:20:27.8381409Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32 PASSED [1.2423s] [ 14%] 2025-12-04T13:20:27.8381736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32 PASSED [1.2615s] [ 14%] 2025-12-04T13:20:27.8382063Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64 PASSED [1.3542s] [ 14%] 2025-12-04T13:20:27.8382355Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64 PASSED [0.0116s] [ 14%] 2025-12-04T13:20:27.8382657Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_int64 SKIPPED [0.0006s] (Only runs on cpu) [ 14%] 2025-12-04T13:20:27.8382959Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32 PASSED [1.2389s] [ 14%] 2025-12-04T13:20:27.8383338Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_in_place_cuda_complex64 SKIPPED [0.0002s] (Test expects tensor input) [ 14%] 2025-12-04T13:20:27.8383687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 14%] 2025-12-04T13:20:27.8383983Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32 PASSED [1.2432s] [ 14%] 2025-12-04T13:20:27.8384260Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64 PASSED [1.2593s] [ 14%] 2025-12-04T13:20:27.8384539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_copy_cuda_float32 PASSED [1.2499s] [ 14%] 2025-12-04T13:20:27.8384818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64 PASSED [1.2515s] [ 14%] 2025-12-04T13:20:27.8385085Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32 PASSED [0.0155s] [ 14%] 2025-12-04T13:20:27.8385392Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64 SKIPPED [0.0002s] (Skipped!) [ 14%] 2025-12-04T13:20:27.8385696Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32 PASSED [1.2406s] [ 14%] 2025-12-04T13:20:27.8385966Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64 PASSED [0.2963s] [ 14%] 2025-12-04T13:20:27.8386232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32 PASSED [1.2462s] [ 14%] 2025-12-04T13:20:27.8386498Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64 PASSED [1.2369s] [ 14%] 2025-12-04T13:20:27.8386801Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64 SKIPPED [0.0002s] (Test expects tensor input) [ 14%] 2025-12-04T13:20:27.8387144Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64 SKIPPED [0.0001s] (Test expects tensor input) [ 14%] 2025-12-04T13:20:27.8387449Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64 PASSED [1.2444s] [ 14%] 2025-12-04T13:20:27.8387727Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32 PASSED [1.2355s] [ 14%] 2025-12-04T13:20:27.8387994Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64 PASSED [1.2446s] [ 14%] 2025-12-04T13:20:27.8388280Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64 PASSED [1.2528s] [ 14%] 2025-12-04T13:20:27.8388572Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64 PASSED [1.2299s] [ 14%] 2025-12-04T13:20:27.8388854Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64 PASSED [1.2429s] [ 14%] 2025-12-04T13:20:27.8389141Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64 PASSED [1.2412s] [ 14%] 2025-12-04T13:20:27.8389417Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32 PASSED [1.2353s] [ 14%] 2025-12-04T13:20:27.8389698Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64 PASSED [1.6462s] [ 14%] 2025-12-04T13:20:27.8389996Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 14%] 2025-12-04T13:20:27.8390317Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 14%] 2025-12-04T13:20:27.8390625Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32 PASSED [0.0238s] [ 14%] 2025-12-04T13:20:27.8390915Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64 PASSED [0.0752s] [ 14%] 2025-12-04T13:20:27.8391199Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64 PASSED [0.0042s] [ 14%] 2025-12-04T13:20:27.8391470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64 PASSED [0.0038s] [ 14%] 2025-12-04T13:20:27.8391732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32 PASSED [1.2539s] [ 14%] 2025-12-04T13:20:27.8391858Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64 PASSED [1.2421s] [ 14%] 2025-12-04T13:20:27.8392021Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 14%] 2025-12-04T13:20:27.8392136Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64 PASSED [1.6703s] [ 14%] 2025-12-04T13:20:27.8392251Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64 PASSED [1.2436s] [ 14%] 2025-12-04T13:20:27.8392375Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32 PASSED [1.2492s] [ 14%] 2025-12-04T13:20:27.8392510Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64 PASSED [1.2528s] [ 14%] 2025-12-04T13:20:27.8392637Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32 PASSED [1.6225s] [ 14%] 2025-12-04T13:20:27.8392766Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32 PASSED [1.5469s] [ 15%] 2025-12-04T13:20:27.8392913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64 PASSED [0.0105s] [ 15%] 2025-12-04T13:20:27.8393060Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32 PASSED [0.7192s] [ 15%] 2025-12-04T13:20:27.8393204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64 PASSED [0.5614s] [ 15%] 2025-12-04T13:20:27.8393388Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64 PASSED [0.0079s] [ 15%] 2025-12-04T13:20:27.8393510Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32 PASSED [1.7944s] [ 15%] 2025-12-04T13:20:27.8393624Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64 PASSED [1.4490s] [ 15%] 2025-12-04T13:20:27.8393780Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64 PASSED [1.4589s] [ 15%] 2025-12-04T13:20:27.8393941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [1.9779s] [ 15%] 2025-12-04T13:20:27.8394089Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64 PASSED [1.4446s] [ 15%] 2025-12-04T13:20:27.8394246Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32 PASSED [0.1574s] [ 15%] 2025-12-04T13:20:27.8394405Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32 PASSED [0.2117s] [ 15%] 2025-12-04T13:20:27.8394541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32 PASSED [0.5953s] [ 15%] 2025-12-04T13:20:27.8394671Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64 PASSED [1.2376s] [ 15%] 2025-12-04T13:20:27.8394795Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64 PASSED [1.2503s] [ 15%] 2025-12-04T13:20:27.8394927Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_int64 PASSED [1.2332s] [ 15%] 2025-12-04T13:20:27.8395059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64 PASSED [1.2419s] [ 15%] 2025-12-04T13:20:27.8395183Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64 PASSED [1.2366s] [ 15%] 2025-12-04T13:20:27.8395303Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64 PASSED [1.2488s] [ 15%] 2025-12-04T13:20:27.8395419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32 PASSED [1.2564s] [ 15%] 2025-12-04T13:20:27.8395533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64 PASSED [1.2453s] [ 15%] 2025-12-04T13:20:27.8395647Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32 PASSED [1.2465s] [ 15%] 2025-12-04T13:20:27.8395795Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_complex64 PASSED [1.2485s] [ 15%] 2025-12-04T13:20:27.8395919Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_int64 PASSED [1.2378s] [ 15%] 2025-12-04T13:20:27.8396032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32 PASSED [1.2477s] [ 15%] 2025-12-04T13:20:27.8396148Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64 PASSED [1.2585s] [ 15%] 2025-12-04T13:20:27.8396280Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32 PASSED [1.2875s] [ 15%] 2025-12-04T13:20:27.8396393Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64 PASSED [1.3496s] [ 15%] 2025-12-04T13:20:27.8396508Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32 PASSED [1.3219s] [ 15%] 2025-12-04T13:20:27.8396628Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32 PASSED [0.1294s] [ 15%] 2025-12-04T13:20:27.8396742Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64 PASSED [0.0052s] [ 15%] 2025-12-04T13:20:27.8396847Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64 PASSED [0.0036s] [ 15%] 2025-12-04T13:20:27.8396974Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64 PASSED [1.2949s] [ 15%] 2025-12-04T13:20:27.8397087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32 PASSED [1.2885s] [ 15%] 2025-12-04T13:20:27.8397197Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32 PASSED [1.2708s] [ 15%] 2025-12-04T13:20:27.8397308Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64 PASSED [1.2642s] [ 15%] 2025-12-04T13:20:27.8397433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64 PASSED [1.2723s] [ 15%] 2025-12-04T13:20:27.8397548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64 PASSED [1.2779s] [ 15%] 2025-12-04T13:20:27.8397679Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32 SKIPPED [0.0002s] [ 15%] 2025-12-04T13:20:27.8397936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__efficient_attention_forward_cuda_float32 SKIPPED [0.0010s] (Efficient attention on ROCM doesn't support custom_mask_type==2) [ 15%] 2025-12-04T13:20:27.8398050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64 PASSED [1.2719s] [ 15%] 2025-12-04T13:20:27.8398166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32 PASSED [1.2672s] [ 15%] 2025-12-04T13:20:27.8398295Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32 PASSED [1.2718s] [ 15%] 2025-12-04T13:20:27.8398424Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32 PASSED [0.0234s] [ 15%] 2025-12-04T13:20:27.8398534Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64 PASSED [1.2636s] [ 15%] 2025-12-04T13:20:27.8398660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_complex64 PASSED [1.2750s] [ 15%] 2025-12-04T13:20:27.8398783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64 PASSED [1.2847s] [ 15%] 2025-12-04T13:20:27.8398905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unravel_index_cuda_int64 PASSED [1.2691s] [ 15%] 2025-12-04T13:20:27.8399122Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsafe_split_cuda_complex64 PASSED [1.2625s] [ 15%] 2025-12-04T13:20:27.8399238Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32 PASSED [1.2822s] [ 15%] 2025-12-04T13:20:27.8399362Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64 PASSED [1.2730s] [ 15%] 2025-12-04T13:20:27.8399481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32 PASSED [1.2807s] [ 15%] 2025-12-04T13:20:27.8399598Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64 PASSED [1.2796s] [ 15%] 2025-12-04T13:20:27.8399728Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64 PASSED [1.2911s] [ 15%] 2025-12-04T13:20:27.8399841Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64 PASSED [1.2786s] [ 15%] 2025-12-04T13:20:27.8399951Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64 PASSED [1.2770s] [ 15%] 2025-12-04T13:20:27.8400083Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32 PASSED [1.2790s] [ 15%] 2025-12-04T13:20:27.8400181Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64 PASSED [0.0064s] [ 15%] 2025-12-04T13:20:27.8400279Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64 PASSED [0.0048s] [ 15%] 2025-12-04T13:20:27.8400378Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64 PASSED [1.2739s] [ 15%] 2025-12-04T13:20:27.8400490Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64 PASSED [1.2689s] [ 15%] 2025-12-04T13:20:27.8400599Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128 PASSED [1.2673s] [ 15%] 2025-12-04T13:20:27.8400703Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64 PASSED [1.2704s] [ 15%] 2025-12-04T13:20:27.8400794Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64 PASSED [0.0077s] [ 15%] 2025-12-04T13:20:27.8400893Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128 XFAIL [0.0054s] [ 15%] 2025-12-04T13:20:27.8400986Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64 PASSED [1.2867s] [ 15%] 2025-12-04T13:20:27.8401081Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64 PASSED [1.2732s] [ 15%] 2025-12-04T13:20:27.8401174Z test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_float64 PASSED [1.2821s] [ 16%] 2025-12-04T13:20:27.8401277Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64 PASSED [1.2689s] [ 16%] 2025-12-04T13:20:27.8401400Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64 PASSED [1.2671s] [ 16%] 2025-12-04T13:20:27.8401522Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128 PASSED [1.2716s] [ 16%] 2025-12-04T13:20:27.8401637Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64 PASSED [1.2653s] [ 16%] 2025-12-04T13:20:27.8401767Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64 PASSED [1.2825s] [ 16%] 2025-12-04T13:20:27.8401879Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64 PASSED [1.2848s] [ 16%] 2025-12-04T13:20:27.8402014Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64 PASSED [1.2773s] [ 16%] 2025-12-04T13:20:27.8402119Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128 PASSED [1.2862s] [ 16%] 2025-12-04T13:20:27.8402216Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64 PASSED [0.0131s] [ 16%] 2025-12-04T13:20:27.8402315Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128 PASSED [1.2883s] [ 16%] 2025-12-04T13:20:27.8402441Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64 PASSED [0.0103s] [ 16%] 2025-12-04T13:20:27.8402570Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64 PASSED [0.0085s] [ 16%] 2025-12-04T13:20:27.8402683Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64 PASSED [0.0090s] [ 16%] 2025-12-04T13:20:27.8402788Z test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_copy_cuda_int64 PASSED [1.2698s] [ 16%] 2025-12-04T13:20:27.8402890Z test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_complex128 PASSED [1.2854s] [ 16%] 2025-12-04T13:20:27.8402999Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128 PASSED [1.2789s] [ 16%] 2025-12-04T13:20:27.8403100Z test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64 PASSED [0.0054s] [ 16%] 2025-12-04T13:20:27.8403213Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_copy_cuda_int64 PASSED [1.2664s] [ 16%] 2025-12-04T13:20:27.8403354Z test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64 PASSED [1.2707s] [ 16%] 2025-12-04T13:20:27.8403449Z test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32 PASSED [1.2680s] [ 16%] 2025-12-04T13:20:27.8403540Z test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32 PASSED [1.2741s] [ 16%] 2025-12-04T13:20:27.8403670Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32 PASSED [1.2662s] [ 16%] 2025-12-04T13:20:27.8403781Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32 PASSED [1.2763s] [ 16%] 2025-12-04T13:20:27.8403892Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32 PASSED [1.2701s] [ 16%] 2025-12-04T13:20:27.8404000Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32 PASSED [1.2347s] [ 16%] 2025-12-04T13:20:27.8404097Z test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32 PASSED [1.2617s] [ 16%] 2025-12-04T13:20:27.8404190Z test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32 PASSED [1.2303s] [ 16%] 2025-12-04T13:20:27.8404285Z test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32 PASSED [0.0283s] [ 16%] 2025-12-04T13:20:27.8404392Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64 PASSED [0.0105s] [ 16%] 2025-12-04T13:20:27.8404493Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64 PASSED [0.0101s] [ 16%] 2025-12-04T13:20:27.8404599Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32 PASSED [0.0027s] [ 16%] 2025-12-04T13:20:27.8404747Z test_ops.py::TestCommonCUDA::test_out__refs_cauchy_cuda_float32 SKIPPED [0.0001s] (Expected: cauchy is not comparable) [ 16%] 2025-12-04T13:20:27.8404843Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32 PASSED [0.0098s] [ 16%] 2025-12-04T13:20:27.8404940Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32 PASSED [0.0093s] [ 16%] 2025-12-04T13:20:27.8405060Z test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32 PASSED [0.0034s] [ 16%] 2025-12-04T13:20:27.8405154Z test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32 PASSED [1.2455s] [ 16%] 2025-12-04T13:20:27.8405264Z test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32 PASSED [0.0176s] [ 16%] 2025-12-04T13:20:27.8405354Z test_ops.py::TestCommonCUDA::test_out__refs_dot_cuda_float32 PASSED [1.2384s] [ 16%] 2025-12-04T13:20:27.8405516Z test_ops.py::TestCommonCUDA::test_out__refs_exponential_cuda_float32 SKIPPED [0.0002s] (Expected: exponential is not comparable) [ 16%] 2025-12-04T13:20:27.8405632Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32 PASSED [1.2405s] [ 16%] 2025-12-04T13:20:27.8405731Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32 PASSED [1.2501s] [ 16%] 2025-12-04T13:20:27.8405826Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32 PASSED [1.2323s] [ 16%] 2025-12-04T13:20:27.8405927Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32 PASSED [1.2407s] [ 16%] 2025-12-04T13:20:27.8406023Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32 PASSED [1.2520s] [ 16%] 2025-12-04T13:20:27.8406118Z test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32 PASSED [1.2326s] [ 16%] 2025-12-04T13:20:27.8406211Z test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32 PASSED [1.2236s] [ 16%] 2025-12-04T13:20:27.8406306Z test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32 PASSED [0.0158s] [ 16%] 2025-12-04T13:20:27.8406460Z test_ops.py::TestCommonCUDA::test_out__refs_geometric_cuda_float32 SKIPPED [0.0002s] (Expected: geometric is not comparable) [ 16%] 2025-12-04T13:20:27.8406553Z test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32 PASSED [0.0105s] [ 16%] 2025-12-04T13:20:27.8406646Z test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32 PASSED [1.2242s] [ 16%] 2025-12-04T13:20:27.8406756Z test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32 PASSED [0.0155s] [ 16%] 2025-12-04T13:20:27.8406848Z test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32 PASSED [0.0217s] [ 16%] 2025-12-04T13:20:27.8406953Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_cross_cuda_float32 PASSED [1.2344s] [ 16%] 2025-12-04T13:20:27.8407063Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32 PASSED [1.3751s] [ 16%] 2025-12-04T13:20:27.8407174Z test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32 PASSED [0.0536s] [ 16%] 2025-12-04T13:20:27.8407328Z test_ops.py::TestCommonCUDA::test_out__refs_log_normal_cuda_float32 SKIPPED [0.0002s] (Expected: log_normal is not comparable) [ 16%] 2025-12-04T13:20:27.8407442Z test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32 PASSED [1.2290s] [ 16%] 2025-12-04T13:20:27.8407544Z test_ops.py::TestCommonCUDA::test_out__refs_logaddexp2_cuda_float32 PASSED [1.2308s] [ 16%] 2025-12-04T13:20:27.8407648Z test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32 PASSED [1.2389s] [ 16%] 2025-12-04T13:20:27.8407767Z test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [1.2338s] [ 16%] 2025-12-04T13:20:27.8407868Z test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32 PASSED [1.2464s] [ 16%] 2025-12-04T13:20:27.8407965Z test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32 PASSED [1.2501s] [ 16%] 2025-12-04T13:20:27.8408063Z test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32 PASSED [1.2609s] [ 16%] 2025-12-04T13:20:27.8408192Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [1.2361s] [ 16%] 2025-12-04T13:20:27.8408307Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32 PASSED [1.2375s] [ 16%] 2025-12-04T13:20:27.8408426Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32 PASSED [1.2237s] [ 16%] 2025-12-04T13:20:27.8408568Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [1.2447s] [ 17%] 2025-12-04T13:20:27.8408663Z test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32 XFAIL [0.0059s] [ 17%] 2025-12-04T13:20:27.8408755Z test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32 PASSED [1.2352s] [ 17%] 2025-12-04T13:20:27.8408851Z test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32 PASSED [0.0036s] [ 17%] 2025-12-04T13:20:27.8408961Z test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32 PASSED [0.0201s] [ 17%] 2025-12-04T13:20:27.8409066Z test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32 PASSED [0.0056s] [ 17%] 2025-12-04T13:20:27.8409177Z test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32 PASSED [0.0047s] [ 17%] 2025-12-04T13:20:27.8409312Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [0.0123s] [ 17%] 2025-12-04T13:20:27.8409416Z test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32 PASSED [0.0066s] [ 17%] 2025-12-04T13:20:27.8409523Z test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32 PASSED [0.0121s] [ 17%] 2025-12-04T13:20:27.8409627Z test_ops.py::TestCommonCUDA::test_out__refs_squeeze_copy_cuda_float32 PASSED [0.0088s] [ 17%] 2025-12-04T13:20:27.8409720Z test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32 PASSED [0.0095s] [ 17%] 2025-12-04T13:20:27.8409813Z test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32 PASSED [0.0135s] [ 17%] 2025-12-04T13:20:27.8409905Z test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32 PASSED [0.0104s] [ 17%] 2025-12-04T13:20:27.8410008Z test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32 PASSED [0.0200s] [ 17%] 2025-12-04T13:20:27.8410112Z test_ops.py::TestCommonCUDA::test_out__refs_view_as_complex_cuda_float32 PASSED [0.0028s] [ 17%] 2025-12-04T13:20:27.8410207Z test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32 PASSED [1.2404s] [ 17%] 2025-12-04T13:20:27.8410316Z test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32 PASSED [1.2320s] [ 17%] 2025-12-04T13:20:27.8410410Z test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32 PASSED [1.2430s] [ 17%] 2025-12-04T13:20:27.8410520Z test_ops.py::TestCommonCUDA::test_out__segment_reduce_lengths_cuda_float32 PASSED [1.2174s] [ 17%] 2025-12-04T13:20:27.8410628Z test_ops.py::TestCommonCUDA::test_out__unsafe_masked_index_cuda_float32 PASSED [1.2397s] [ 17%] 2025-12-04T13:20:27.8410731Z test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32 PASSED [1.2461s] [ 17%] 2025-12-04T13:20:27.8410820Z test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32 PASSED [1.2093s] [ 17%] 2025-12-04T13:20:27.8410919Z test_ops.py::TestCommonCUDA::test_out_as_strided_copy_cuda_float32 PASSED [1.2341s] [ 17%] 2025-12-04T13:20:27.8411031Z test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32 PASSED [1.2558s] [ 17%] 2025-12-04T13:20:27.8411134Z test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32 PASSED [1.2229s] [ 17%] 2025-12-04T13:20:27.8411223Z test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32 PASSED [1.2345s] [ 17%] 2025-12-04T13:20:27.8411315Z test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32 PASSED [1.2263s] [ 17%] 2025-12-04T13:20:27.8411405Z test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64 PASSED [1.2405s] [ 17%] 2025-12-04T13:20:27.8411496Z test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64 PASSED [0.0086s] [ 17%] 2025-12-04T13:20:27.8411599Z test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32 PASSED [0.0028s] [ 17%] 2025-12-04T13:20:27.8411686Z test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32 PASSED [0.0028s] [ 17%] 2025-12-04T13:20:27.8411787Z test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32 XFAIL [0.0074s] [ 17%] 2025-12-04T13:20:27.8411884Z test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32 PASSED [1.2433s] [ 17%] 2025-12-04T13:20:27.8411990Z test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32 PASSED [0.0225s] [ 17%] 2025-12-04T13:20:27.8412075Z test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32 PASSED [0.0051s] [ 17%] 2025-12-04T13:20:27.8412165Z test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32 XFAIL [0.0040s] [ 17%] 2025-12-04T13:20:27.8412254Z test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32 PASSED [1.2382s] [ 17%] 2025-12-04T13:20:27.8412343Z test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32 PASSED [0.0154s] [ 17%] 2025-12-04T13:20:27.8412434Z test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32 PASSED [0.0029s] [ 17%] 2025-12-04T13:20:27.8412554Z test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32 PASSED [0.0099s] [ 17%] 2025-12-04T13:20:27.8412693Z test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 17%] 2025-12-04T13:20:27.8412785Z test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32 PASSED [0.0027s] [ 17%] 2025-12-04T13:20:27.8412876Z test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32 PASSED [0.0027s] [ 17%] 2025-12-04T13:20:27.8412962Z test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32 PASSED [1.2581s] [ 17%] 2025-12-04T13:20:27.8413057Z test_ops.py::TestCommonCUDA::test_out_exponential_cuda_float32 PASSED [1.2289s] [ 17%] 2025-12-04T13:20:27.8413146Z test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32 PASSED [1.2727s] [ 17%] 2025-12-04T13:20:27.8413239Z test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32 PASSED [1.2615s] [ 17%] 2025-12-04T13:20:27.8413366Z test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32 PASSED [1.2731s] [ 17%] 2025-12-04T13:20:27.8413457Z test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32 XFAIL [0.0092s] [ 17%] 2025-12-04T13:20:27.8413548Z test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32 PASSED [1.2638s] [ 17%] 2025-12-04T13:20:27.8413634Z test_ops.py::TestCommonCUDA::test_out_float_cuda_float32 PASSED [1.2446s] [ 17%] 2025-12-04T13:20:27.8413740Z test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32 PASSED [1.2535s] [ 17%] 2025-12-04T13:20:27.8413827Z test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32 PASSED [1.2567s] [ 17%] 2025-12-04T13:20:27.8413911Z test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64 PASSED [0.0094s] [ 17%] 2025-12-04T13:20:27.8414003Z test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32 PASSED [1.2705s] [ 17%] 2025-12-04T13:20:27.8414102Z test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32 PASSED [0.0085s] [ 17%] 2025-12-04T13:20:27.8414190Z test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32 PASSED [0.0094s] [ 17%] 2025-12-04T13:20:27.8414278Z test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64 PASSED [0.0028s] [ 17%] 2025-12-04T13:20:27.8414371Z test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32 PASSED [0.0028s] [ 17%] 2025-12-04T13:20:27.8414473Z test_ops.py::TestCommonCUDA::test_out_index_reduce_prod_cuda_float32 PASSED [1.2754s] [ 17%] 2025-12-04T13:20:27.8414561Z test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32 PASSED [1.2510s] [ 17%] 2025-12-04T13:20:27.8414648Z test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32 PASSED [1.2338s] [ 17%] 2025-12-04T13:20:27.8414742Z test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32 PASSED [0.1630s] [ 17%] 2025-12-04T13:20:27.8414839Z test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32 PASSED [0.0153s] [ 17%] 2025-12-04T13:20:27.8414942Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32 PASSED [0.0643s] [ 17%] 2025-12-04T13:20:27.8415040Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32 PASSED [0.0741s] [ 17%] 2025-12-04T13:20:27.8415137Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32 PASSED [0.0275s] [ 18%] 2025-12-04T13:20:27.8415250Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32 PASSED [0.0899s] [ 18%] 2025-12-04T13:20:27.8415353Z test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32 PASSED [0.0068s] [ 18%] 2025-12-04T13:20:27.8415464Z test_ops.py::TestCommonCUDA::test_out_log_normal_cuda_float32 PASSED [0.0094s] [ 18%] 2025-12-04T13:20:27.8415561Z test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32 PASSED [1.2813s] [ 18%] 2025-12-04T13:20:27.8415656Z test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32 PASSED [0.0087s] [ 18%] 2025-12-04T13:20:27.8415745Z test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32 PASSED [0.0273s] [ 18%] 2025-12-04T13:20:27.8415837Z test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32 PASSED [0.0661s] [ 18%] 2025-12-04T13:20:27.8415932Z test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32 PASSED [0.0029s] [ 18%] 2025-12-04T13:20:27.8416042Z test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32 PASSED [0.0099s] [ 18%] 2025-12-04T13:20:27.8416145Z test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32 PASSED [1.2813s] [ 18%] 2025-12-04T13:20:27.8416248Z test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32 PASSED [1.2885s] [ 18%] 2025-12-04T13:20:27.8416346Z test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32 PASSED [1.3209s] [ 18%] 2025-12-04T13:20:27.8416490Z test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 18%] 2025-12-04T13:20:27.8416589Z test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 18%] 2025-12-04T13:20:27.8416680Z test_ops.py::TestCommonCUDA::test_out_median_cuda_float32 PASSED [1.3231s] [ 18%] 2025-12-04T13:20:27.8416792Z test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32 PASSED [1.2720s] [ 18%] 2025-12-04T13:20:27.8416883Z test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32 PASSED [1.3013s] [ 18%] 2025-12-04T13:20:27.8416974Z test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32 PASSED [1.2696s] [ 18%] 2025-12-04T13:20:27.8417076Z test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32 PASSED [1.2930s] [ 18%] 2025-12-04T13:20:27.8417179Z test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32 PASSED [0.0305s] [ 18%] 2025-12-04T13:20:27.8417305Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [1.2990s] [ 18%] 2025-12-04T13:20:27.8417420Z test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32 PASSED [1.2926s] [ 18%] 2025-12-04T13:20:27.8417551Z test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [1.3082s] [ 18%] 2025-12-04T13:20:27.8417704Z test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [1.3006s] [ 18%] 2025-12-04T13:20:27.8417815Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv3d_cuda_float32 PASSED [1.2878s] [ 18%] 2025-12-04T13:20:27.8417956Z test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [1.2990s] [ 18%] 2025-12-04T13:20:27.8418069Z test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32 PASSED [1.2863s] [ 18%] 2025-12-04T13:20:27.8418179Z test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32 PASSED [1.3022s] [ 18%] 2025-12-04T13:20:27.8418302Z test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32 PASSED [1.2975s] [ 18%] 2025-12-04T13:20:27.8418417Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32 PASSED [1.2887s] [ 18%] 2025-12-04T13:20:27.8418529Z test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32 PASSED [1.3152s] [ 18%] 2025-12-04T13:20:27.8418642Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32 PASSED [0.0036s] [ 18%] 2025-12-04T13:20:27.8418769Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_negative_cuda_float32 PASSED [1.2867s] [ 18%] 2025-12-04T13:20:27.8418890Z test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32 PASSED [1.3189s] [ 18%] 2025-12-04T13:20:27.8419004Z test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32 PASSED [1.2939s] [ 18%] 2025-12-04T13:20:27.8419126Z test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32 PASSED [1.3021s] [ 18%] 2025-12-04T13:20:27.8419237Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32 PASSED [1.2988s] [ 18%] 2025-12-04T13:20:27.8419348Z test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32 PASSED [1.3056s] [ 18%] 2025-12-04T13:20:27.8419469Z test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32 PASSED [1.3044s] [ 18%] 2025-12-04T13:20:27.8419558Z test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32 PASSED [1.3199s] [ 18%] 2025-12-04T13:20:27.8419660Z test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32 PASSED [1.2862s] [ 18%] 2025-12-04T13:20:27.8419749Z test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32 PASSED [1.2861s] [ 18%] 2025-12-04T13:20:27.8419840Z test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32 PASSED [1.2891s] [ 18%] 2025-12-04T13:20:27.8419928Z test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32 PASSED [0.0111s] [ 18%] 2025-12-04T13:20:27.8420017Z test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32 XFAIL [0.0039s] [ 18%] 2025-12-04T13:20:27.8420110Z test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32 PASSED [1.2877s] [ 18%] 2025-12-04T13:20:27.8420197Z test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32 PASSED [0.0032s] [ 18%] 2025-12-04T13:20:27.8420284Z test_ops.py::TestCommonCUDA::test_out_real_cuda_float32 PASSED [0.0029s] [ 18%] 2025-12-04T13:20:27.8420371Z test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32 PASSED [1.2925s] [ 18%] 2025-12-04T13:20:27.8420486Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_complex64 PASSED [1.2970s] [ 18%] 2025-12-04T13:20:27.8420600Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_cuda_float32 PASSED [1.2932s] [ 18%] 2025-12-04T13:20:27.8420733Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_complex64 PASSED [1.4044s] [ 18%] 2025-12-04T13:20:27.8420874Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_decomposed_cuda_float32 PASSED [1.4222s] [ 18%] 2025-12-04T13:20:27.8420989Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmv_cuda_complex64 PASSED [0.9651s] [ 18%] 2025-12-04T13:20:27.8421112Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_alias_copy_cuda_complex64 PASSED [0.8479s] [ 18%] 2025-12-04T13:20:27.8421238Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_angle_cuda_complex64 PASSED [0.8439s] [ 18%] 2025-12-04T13:20:27.8421351Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_angle_cuda_float32 PASSED [0.8700s] [ 18%] 2025-12-04T13:20:27.8421465Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_complex64 PASSED [0.8541s] [ 18%] 2025-12-04T13:20:27.8421577Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_float32 PASSED [0.8417s] [ 18%] 2025-12-04T13:20:27.8421690Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cat_cuda_float32 PASSED [0.8618s] [ 18%] 2025-12-04T13:20:27.8421822Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cholesky_inverse_cuda_complex64 PASSED [0.8602s] [ 18%] 2025-12-04T13:20:27.8421933Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cos_cuda_complex64 PASSED [0.8611s] [ 18%] 2025-12-04T13:20:27.8422052Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumprod_cuda_complex64 PASSED [0.8560s] [ 18%] 2025-12-04T13:20:27.8422168Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumsum_cuda_complex64 PASSED [0.8097s] [ 18%] 2025-12-04T13:20:27.8422284Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_complex64 PASSED [0.8054s] [ 18%] 2025-12-04T13:20:27.8422395Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_diff_cuda_float32 PASSED [0.8104s] [ 18%] 2025-12-04T13:20:27.8422512Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dstack_cuda_complex64 PASSED [0.8102s] [ 19%] 2025-12-04T13:20:27.8422633Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32 PASSED [0.7911s] [ 19%] 2025-12-04T13:20:27.8422747Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_exp2_cuda_complex64 PASSED [1.0126s] [ 19%] 2025-12-04T13:20:27.8422870Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_complex64 PASSED [1.3322s] [ 19%] 2025-12-04T13:20:27.8422985Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_complex64 PASSED [1.2582s] [ 19%] 2025-12-04T13:20:27.8423097Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_float32 PASSED [1.3332s] [ 19%] 2025-12-04T13:20:27.8423223Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft_cuda_complex64 PASSED [1.2961s] [ 19%] 2025-12-04T13:20:27.8423374Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_complex64 PASSED [1.4228s] [ 19%] 2025-12-04T13:20:27.8423490Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_float32 PASSED [1.4334s] [ 19%] 2025-12-04T13:20:27.8423612Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_hfft2_cuda_complex64 PASSED [1.3450s] [ 19%] 2025-12-04T13:20:27.8423727Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ifft2_cuda_float32 PASSED [1.3303s] [ 19%] 2025-12-04T13:20:27.8423846Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft2_cuda_float32 PASSED [1.3195s] [ 19%] 2025-12-04T13:20:27.8423963Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfftn_cuda_float32 PASSED [1.3087s] [ 19%] 2025-12-04T13:20:27.8424078Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_gather_cuda_float32 PASSED [1.2654s] [ 19%] 2025-12-04T13:20:27.8424195Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_add_cuda_float32 PASSED [1.2271s] [ 19%] 2025-12-04T13:20:27.8424308Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_kron_cuda_float32 PASSED [1.2371s] [ 19%] 2025-12-04T13:20:27.8424439Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ldexp_cuda_complex64 PASSED [1.2423s] [ 19%] 2025-12-04T13:20:27.8424638Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cross_cuda_float32 PASSED [1.2625s] [ 19%] 2025-12-04T13:20:27.8424755Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eig_cuda_float32 PASSED [1.2451s] [ 19%] 2025-12-04T13:20:27.8424877Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_inv_cuda_float32 PASSED [1.2372s] [ 19%] 2025-12-04T13:20:27.8425018Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_cuda_complex64 PASSED [1.2447s] [ 19%] 2025-12-04T13:20:27.8425150Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lu_factor_cuda_complex64 PASSED [1.2472s] [ 19%] 2025-12-04T13:20:27.8425272Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_cuda_complex64 PASSED [1.2661s] [ 19%] 2025-12-04T13:20:27.8425426Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [1.2790s] [ 19%] 2025-12-04T13:20:27.8425551Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_slogdet_cuda_float32 PASSED [1.2739s] [ 19%] 2025-12-04T13:20:27.8425674Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_cuda_complex64 PASSED [1.2405s] [ 19%] 2025-12-04T13:20:27.8425805Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_solve_ex_cuda_complex64 PASSED [1.2551s] [ 19%] 2025-12-04T13:20:27.8425932Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svdvals_cuda_complex64 PASSED [1.2469s] [ 19%] 2025-12-04T13:20:27.8426061Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_tensorinv_cuda_float32 PASSED [1.2652s] [ 19%] 2025-12-04T13:20:27.8426187Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vecdot_cuda_complex64 PASSED [1.2444s] [ 19%] 2025-12-04T13:20:27.8426317Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_vector_norm_cuda_float32 PASSED [1.2507s] [ 19%] 2025-12-04T13:20:27.8426456Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log1p_cuda_complex64 PASSED [1.2325s] [ 19%] 2025-12-04T13:20:27.8426570Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log1p_cuda_float32 PASSED [1.2405s] [ 19%] 2025-12-04T13:20:27.8426708Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_complex64 PASSED [1.2478s] [ 19%] 2025-12-04T13:20:27.8426845Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log_softmax_with_dtype_cuda_float32 PASSED [1.2237s] [ 19%] 2025-12-04T13:20:27.8426963Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logspace_cuda_complex64 PASSED [0.0036s] [ 19%] 2025-12-04T13:20:27.8427089Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_float32 PASSED [1.2509s] [ 19%] 2025-12-04T13:20:27.8427207Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_solve_cuda_complex64 PASSED [1.2569s] [ 19%] 2025-12-04T13:20:27.8427330Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_unpack_cuda_complex64 PASSED [1.2292s] [ 19%] 2025-12-04T13:20:27.8427446Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64 PASSED [1.2359s] [ 19%] 2025-12-04T13:20:27.8427559Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_float32 PASSED [1.2243s] [ 19%] 2025-12-04T13:20:27.8427676Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_binary_cuda_float32 PASSED [1.2539s] [ 19%] 2025-12-04T13:20:27.8427808Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_reduction_no_dim_cuda_float32 PASSED [1.2458s] [ 19%] 2025-12-04T13:20:27.8427920Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mm_cuda_float32 PASSED [1.2364s] [ 19%] 2025-12-04T13:20:27.8428030Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mul_cuda_float32 PASSED [1.2497s] [ 19%] 2025-12-04T13:20:27.8428166Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [1.2339s] [ 19%] 2025-12-04T13:20:27.8428317Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_avg_pool2d_cuda_float32 PASSED [1.2617s] [ 19%] 2025-12-04T13:20:27.8428456Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softshrink_cuda_float32 PASSED [1.2311s] [ 19%] 2025-12-04T13:20:27.8428575Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_nuc_cuda_complex64 PASSED [1.2338s] [ 19%] 2025-12-04T13:20:27.8428700Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_normal_cuda_float32 PASSED [1.2617s] [ 19%] 2025-12-04T13:20:27.8428824Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_permute_copy_cuda_complex64 PASSED [1.2648s] [ 19%] 2025-12-04T13:20:27.8428942Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_quantile_cuda_float32 PASSED [1.2709s] [ 19%] 2025-12-04T13:20:27.8429055Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_rad2deg_cuda_float32 PASSED [1.2329s] [ 19%] 2025-12-04T13:20:27.8429201Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_3_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 19%] 2025-12-04T13:20:27.8429325Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_add_cuda_complex64 PASSED [1.2510s] [ 19%] 2025-12-04T13:20:27.8429457Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_amax_cuda_float32 PASSED [1.2472s] [ 19%] 2025-12-04T13:20:27.8429587Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_mean_cuda_float32 PASSED [1.2408s] [ 19%] 2025-12-04T13:20:27.8429718Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_reduce_sum_cuda_float32 PASSED [1.2428s] [ 19%] 2025-12-04T13:20:27.8429830Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_complex64 PASSED [1.2372s] [ 19%] 2025-12-04T13:20:27.8429944Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinc_cuda_complex64 PASSED [1.5049s] [ 19%] 2025-12-04T13:20:27.8430055Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sinc_cuda_float32 PASSED [1.4000s] [ 19%] 2025-12-04T13:20:27.8430180Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sort_cuda_float32 PASSED [1.2578s] [ 19%] 2025-12-04T13:20:27.8430318Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sparse_sampled_addmm_cuda_complex64 PASSED [1.2321s] [ 19%] 2025-12-04T13:20:27.8430431Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_square_cuda_float32 PASSED [1.2411s] [ 19%] 2025-12-04T13:20:27.8430557Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_squeeze_copy_cuda_complex64 PASSED [1.2467s] [ 19%] 2025-12-04T13:20:27.8430670Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_stack_cuda_complex64 PASSED [1.2528s] [ 19%] 2025-12-04T13:20:27.8430794Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tanh_cuda_complex64 PASSED [1.2638s] [ 19%] 2025-12-04T13:20:27.8430921Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_triangular_solve_cuda_float32 PASSED [1.2413s] [ 20%] 2025-12-04T13:20:27.8431034Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_trunc_cuda_float32 PASSED [1.2440s] [ 20%] 2025-12-04T13:20:27.8431157Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unbind_copy_cuda_complex64 PASSED [1.2530s] [ 20%] 2025-12-04T13:20:27.8431282Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_float32 PASSED [1.2445s] [ 20%] 2025-12-04T13:20:27.8431393Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_var_cuda_complex64 PASSED [1.2296s] [ 20%] 2025-12-04T13:20:27.8431506Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_where_cuda_float32 PASSED [1.2311s] [ 20%] 2025-12-04T13:20:27.8431620Z test_ops.py::TestCommonCUDA::test_out_requires_grad_error_zeros_cuda_complex64 PASSED [1.2381s] [ 20%] 2025-12-04T13:20:27.8431713Z test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32 PASSED [1.2359s] [ 20%] 2025-12-04T13:20:27.8431808Z test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32 PASSED [1.2364s] [ 20%] 2025-12-04T13:20:27.8431912Z test_ops.py::TestCommonCUDA::test_out_round_cuda_float32 PASSED [1.2226s] [ 20%] 2025-12-04T13:20:27.8432003Z test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32 PASSED [1.2826s] [ 20%] 2025-12-04T13:20:27.8432091Z test_ops.py::TestCommonCUDA::test_out_short_cuda_float32 PASSED [1.2421s] [ 20%] 2025-12-04T13:20:27.8432205Z test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32 PASSED [0.0034s] [ 20%] 2025-12-04T13:20:27.8432328Z test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32 PASSED [0.0027s] [ 20%] 2025-12-04T13:20:27.8432438Z test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32 PASSED [0.0026s] [ 20%] 2025-12-04T13:20:27.8432530Z test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32 PASSED [0.8141s] [ 20%] 2025-12-04T13:20:27.8432619Z test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32 PASSED [0.7832s] [ 20%] 2025-12-04T13:20:27.8432715Z test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32 PASSED [0.7876s] [ 20%] 2025-12-04T13:20:27.8432804Z test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32 PASSED [0.8165s] [ 20%] 2025-12-04T13:20:27.8432930Z test_ops.py::TestCommonCUDA::test_out_sparse_mm_reduce_cuda_float32 SKIPPED [0.0010s] (Only runs on cpu) [ 20%] 2025-12-04T13:20:27.8433034Z test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32 PASSED [0.7831s] [ 20%] 2025-12-04T13:20:27.8433155Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32 PASSED [0.0115s] [ 20%] 2025-12-04T13:20:27.8433291Z test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32 PASSED [0.9356s] [ 20%] 2025-12-04T13:20:27.8433387Z test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32 PASSED [0.9732s] [ 20%] 2025-12-04T13:20:27.8433482Z test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32 PASSED [0.7735s] [ 20%] 2025-12-04T13:20:27.8433577Z test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32 PASSED [0.0112s] [ 20%] 2025-12-04T13:20:27.8433685Z test_ops.py::TestCommonCUDA::test_out_square_cuda_float32 PASSED [0.7747s] [ 20%] 2025-12-04T13:20:27.8433775Z test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32 PASSED [0.7736s] [ 20%] 2025-12-04T13:20:27.8433862Z test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32 PASSED [0.7792s] [ 20%] 2025-12-04T13:20:27.8433946Z test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32 PASSED [1.1176s] [ 20%] 2025-12-04T13:20:27.8434035Z test_ops.py::TestCommonCUDA::test_out_t_copy_cuda_float32 PASSED [0.7697s] [ 20%] 2025-12-04T13:20:27.8434121Z test_ops.py::TestCommonCUDA::test_out_take_cuda_float32 PASSED [0.7751s] [ 20%] 2025-12-04T13:20:27.8434265Z test_ops.py::TestCommonCUDA::test_out_torch__scaled_mm_v2_cuda_float8_e4m3fn SKIPPED [0.0002s] (Skipped!) [ 20%] 2025-12-04T13:20:27.8434366Z test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32 XFAIL [0.0097s] [ 20%] 2025-12-04T13:20:27.8434453Z test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32 PASSED [0.7717s] [ 20%] 2025-12-04T13:20:27.8434543Z test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32 PASSED [0.0033s] [ 20%] 2025-12-04T13:20:27.8434645Z test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32 PASSED [0.0028s] [ 20%] 2025-12-04T13:20:27.8434734Z test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32 PASSED [0.7780s] [ 20%] 2025-12-04T13:20:27.8434833Z test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64 PASSED [0.7848s] [ 20%] 2025-12-04T13:20:27.8434925Z test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda PASSED [0.7719s] [ 20%] 2025-12-04T13:20:27.8435016Z test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda PASSED [0.7694s] [ 20%] 2025-12-04T13:20:27.8435106Z test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda PASSED [0.7689s] [ 20%] 2025-12-04T13:20:27.8435219Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda PASSED [0.7711s] [ 20%] 2025-12-04T13:20:27.8435334Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda PASSED [0.7798s] [ 20%] 2025-12-04T13:20:27.8435460Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda PASSED [0.7755s] [ 20%] 2025-12-04T13:20:27.8435555Z test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda PASSED [0.7970s] [ 20%] 2025-12-04T13:20:27.8435648Z test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda PASSED [0.8039s] [ 20%] 2025-12-04T13:20:27.8435742Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda PASSED [0.7805s] [ 20%] 2025-12-04T13:20:27.8435858Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda PASSED [0.0174s] [ 20%] 2025-12-04T13:20:27.8435950Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda PASSED [0.7811s] [ 20%] 2025-12-04T13:20:27.8436045Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda PASSED [0.7756s] [ 20%] 2025-12-04T13:20:27.8436149Z test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda PASSED [0.7897s] [ 20%] 2025-12-04T13:20:27.8436244Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda PASSED [0.7859s] [ 20%] 2025-12-04T13:20:27.8436353Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda PASSED [0.8137s] [ 20%] 2025-12-04T13:20:27.8436462Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda PASSED [0.8107s] [ 20%] 2025-12-04T13:20:27.8436572Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda PASSED [0.0499s] [ 20%] 2025-12-04T13:20:27.8436682Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda PASSED [0.0137s] [ 20%] 2025-12-04T13:20:27.8436834Z test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda SKIPPED [0.0001s] (Expected: empty is not comparable) [ 20%] 2025-12-04T13:20:27.8436927Z test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda PASSED [0.7796s] [ 20%] 2025-12-04T13:20:27.8437023Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda PASSED [0.7846s] [ 20%] 2025-12-04T13:20:27.8437121Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda PASSED [0.7996s] [ 20%] 2025-12-04T13:20:27.8437234Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda PASSED [0.8039s] [ 20%] 2025-12-04T13:20:27.8437333Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda PASSED [0.7970s] [ 20%] 2025-12-04T13:20:27.8437438Z test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda PASSED [0.0503s] [ 20%] 2025-12-04T13:20:27.8437531Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda PASSED [0.0122s] [ 20%] 2025-12-04T13:20:27.8437627Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda PASSED [0.0125s] [ 20%] 2025-12-04T13:20:27.8437718Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda PASSED [0.0113s] [ 20%] 2025-12-04T13:20:27.8437825Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda PASSED [0.0027s] [ 21%] 2025-12-04T13:20:27.8437923Z test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda PASSED [0.0121s] [ 21%] 2025-12-04T13:20:27.8438021Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda PASSED [0.0111s] [ 21%] 2025-12-04T13:20:27.8438118Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda PASSED [0.0033s] [ 21%] 2025-12-04T13:20:27.8438212Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda PASSED [0.7806s] [ 21%] 2025-12-04T13:20:27.8438408Z test_ops.py::TestCommonCUDA::test_out_warning__refs_item_cuda SKIPPED [0.0039s] (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 21%] 2025-12-04T13:20:27.8438515Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_cross_cuda PASSED [0.7866s] [ 21%] 2025-12-04T13:20:27.8438618Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda PASSED [0.8930s] [ 21%] 2025-12-04T13:20:27.8438739Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_tensor_overload_cuda PASSED [0.1787s] [ 21%] 2025-12-04T13:20:27.8438833Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda PASSED [0.7846s] [ 21%] 2025-12-04T13:20:27.8438982Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda SKIPPED [0.0002s] (Expected: empty is not comparable) [ 21%] 2025-12-04T13:20:27.8439106Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda PASSED [0.7760s] [ 21%] 2025-12-04T13:20:27.8439203Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda PASSED [0.7795s] [ 21%] 2025-12-04T13:20:27.8439302Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda PASSED [0.7808s] [ 21%] 2025-12-04T13:20:27.8439429Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda PASSED [0.7963s] [ 21%] 2025-12-04T13:20:27.8439549Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda PASSED [0.8002s] [ 21%] 2025-12-04T13:20:27.8439670Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda PASSED [0.7898s] [ 21%] 2025-12-04T13:20:27.8439802Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda PASSED [0.7903s] [ 21%] 2025-12-04T13:20:27.8439927Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pixel_shuffle_cuda PASSED [0.7751s] [ 21%] 2025-12-04T13:20:27.8440054Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda PASSED [0.7803s] [ 21%] 2025-12-04T13:20:27.8440164Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda PASSED [0.7774s] [ 21%] 2025-12-04T13:20:27.8440281Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda PASSED [0.7763s] [ 21%] 2025-12-04T13:20:27.8440429Z test_ops.py::TestCommonCUDA::test_out_warning__refs_normal_cuda SKIPPED [0.0002s] (Expected: normal is not comparable) [ 21%] 2025-12-04T13:20:27.8440525Z test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda PASSED [0.8149s] [ 21%] 2025-12-04T13:20:27.8440622Z test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda PASSED [0.7763s] [ 21%] 2025-12-04T13:20:27.8440715Z test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda PASSED [0.7671s] [ 21%] 2025-12-04T13:20:27.8440825Z test_ops.py::TestCommonCUDA::test_out_warning__refs_renorm_cuda PASSED [0.7782s] [ 21%] 2025-12-04T13:20:27.8440921Z test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda PASSED [0.7668s] [ 21%] 2025-12-04T13:20:27.8441017Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda PASSED [0.7787s] [ 21%] 2025-12-04T13:20:27.8441109Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda PASSED [0.7727s] [ 21%] 2025-12-04T13:20:27.8441204Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda PASSED [0.7771s] [ 21%] 2025-12-04T13:20:27.8441310Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda PASSED [0.7780s] [ 21%] 2025-12-04T13:20:27.8441449Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda PASSED [0.7804s] [ 21%] 2025-12-04T13:20:27.8441541Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda PASSED [0.7688s] [ 21%] 2025-12-04T13:20:27.8441634Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda PASSED [0.7923s] [ 21%] 2025-12-04T13:20:27.8441738Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda PASSED [0.7780s] [ 21%] 2025-12-04T13:20:27.8441833Z test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda PASSED [0.7733s] [ 21%] 2025-12-04T13:20:27.8441935Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda PASSED [0.0037s] [ 21%] 2025-12-04T13:20:27.8442029Z test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda PASSED [0.7818s] [ 21%] 2025-12-04T13:20:27.8442127Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda PASSED [0.7686s] [ 21%] 2025-12-04T13:20:27.8442257Z test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_put_accumulate_cuda PASSED [0.7817s] [ 21%] 2025-12-04T13:20:27.8442345Z test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda PASSED [0.7801s] [ 21%] 2025-12-04T13:20:27.8442439Z test_ops.py::TestCommonCUDA::test_out_warning_alias_copy_cuda PASSED [0.7765s] [ 21%] 2025-12-04T13:20:27.8442536Z test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda PASSED [0.7894s] [ 21%] 2025-12-04T13:20:27.8442627Z test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda PASSED [0.0139s] [ 21%] 2025-12-04T13:20:27.8442714Z test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda PASSED [0.7892s] [ 21%] 2025-12-04T13:20:27.8442803Z test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda PASSED [0.7889s] [ 21%] 2025-12-04T13:20:27.8442908Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda PASSED [0.7716s] [ 21%] 2025-12-04T13:20:27.8442995Z test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda PASSED [0.7727s] [ 21%] 2025-12-04T13:20:27.8443089Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda PASSED [0.7709s] [ 21%] 2025-12-04T13:20:27.8443178Z test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda PASSED [0.7892s] [ 21%] 2025-12-04T13:20:27.8443303Z test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda XFAIL [0.0083s] [ 21%] 2025-12-04T13:20:27.8443399Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda PASSED [0.7837s] [ 21%] 2025-12-04T13:20:27.8443495Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda PASSED [0.7757s] [ 21%] 2025-12-04T13:20:27.8443586Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda PASSED [0.0147s] [ 21%] 2025-12-04T13:20:27.8443690Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda PASSED [0.7764s] [ 21%] 2025-12-04T13:20:27.8443779Z test_ops.py::TestCommonCUDA::test_out_warning_cauchy_cuda PASSED [0.7931s] [ 21%] 2025-12-04T13:20:27.8443870Z test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda PASSED [0.7705s] [ 21%] 2025-12-04T13:20:27.8443968Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda PASSED [0.7829s] [ 21%] 2025-12-04T13:20:27.8444057Z test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda PASSED [0.7703s] [ 21%] 2025-12-04T13:20:27.8444151Z test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda PASSED [0.7802s] [ 21%] 2025-12-04T13:20:27.8444254Z test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda PASSED [0.7744s] [ 21%] 2025-12-04T13:20:27.8444342Z test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda PASSED [0.7908s] [ 21%] 2025-12-04T13:20:27.8444431Z test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda PASSED [0.7751s] [ 21%] 2025-12-04T13:20:27.8444521Z test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda PASSED [0.7649s] [ 21%] 2025-12-04T13:20:27.8444609Z test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda PASSED [0.7823s] [ 21%] 2025-12-04T13:20:27.8444697Z test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda PASSED [0.7790s] [ 21%] 2025-12-04T13:20:27.8444815Z test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda PASSED [0.0159s] [ 22%] 2025-12-04T13:20:27.8444912Z test_ops.py::TestCommonCUDA::test_out_warning_empty_strided_cuda PASSED [0.7646s] [ 22%] 2025-12-04T13:20:27.8445095Z test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda SKIPPED [0.0036s] (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 22%] 2025-12-04T13:20:27.8445182Z test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda PASSED [0.7803s] [ 22%] 2025-12-04T13:20:27.8445280Z test_ops.py::TestCommonCUDA::test_out_warning_expand_copy_cuda PASSED [0.0118s] [ 22%] 2025-12-04T13:20:27.8445370Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda PASSED [0.0088s] [ 22%] 2025-12-04T13:20:27.8445460Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda PASSED [0.7764s] [ 22%] 2025-12-04T13:20:27.8445551Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda PASSED [0.7815s] [ 22%] 2025-12-04T13:20:27.8445644Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda PASSED [0.7809s] [ 22%] 2025-12-04T13:20:27.8445735Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda PASSED [0.7816s] [ 22%] 2025-12-04T13:20:27.8445822Z test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda PASSED [0.7744s] [ 22%] 2025-12-04T13:20:27.8445907Z test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda PASSED [0.7796s] [ 22%] 2025-12-04T13:20:27.8446017Z test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda PASSED [0.0160s] [ 22%] 2025-12-04T13:20:27.8446106Z test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda PASSED [0.7900s] [ 22%] 2025-12-04T13:20:27.8446191Z test_ops.py::TestCommonCUDA::test_out_warning_half_cuda PASSED [0.7770s] [ 22%] 2025-12-04T13:20:27.8446285Z test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda PASSED [0.7929s] [ 22%] 2025-12-04T13:20:27.8446386Z test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda PASSED [0.7739s] [ 22%] 2025-12-04T13:20:27.8446476Z test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda PASSED [0.0152s] [ 22%] 2025-12-04T13:20:27.8446562Z test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda PASSED [0.7699s] [ 22%] 2025-12-04T13:20:27.8446656Z test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda PASSED [0.7751s] [ 22%] 2025-12-04T13:20:27.8446759Z test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_amin_cuda PASSED [0.7812s] [ 22%] 2025-12-04T13:20:27.8446865Z test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_mean_cuda PASSED [0.7860s] [ 22%] 2025-12-04T13:20:27.8446952Z test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda PASSED [0.7704s] [ 22%] 2025-12-04T13:20:27.8447131Z test_ops.py::TestCommonCUDA::test_out_warning_item_cuda SKIPPED [0.0038s] (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 22%] 2025-12-04T13:20:27.8447223Z test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda PASSED [0.7834s] [ 22%] 2025-12-04T13:20:27.8447309Z test_ops.py::TestCommonCUDA::test_out_warning_le_cuda PASSED [0.0094s] [ 22%] 2025-12-04T13:20:27.8447397Z test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda PASSED [0.0143s] [ 22%] 2025-12-04T13:20:27.8447493Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda PASSED [0.0053s] [ 22%] 2025-12-04T13:20:27.8447594Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda PASSED [0.0068s] [ 22%] 2025-12-04T13:20:27.8447709Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda PASSED [0.0513s] [ 22%] 2025-12-04T13:20:27.8447816Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda PASSED [0.0421s] [ 22%] 2025-12-04T13:20:27.8447911Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda PASSED [0.8784s] [ 22%] 2025-12-04T13:20:27.8448023Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda PASSED [0.0732s] [ 22%] 2025-12-04T13:20:27.8448120Z test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda PASSED [0.0092s] [ 22%] 2025-12-04T13:20:27.8448213Z test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda PASSED [0.0045s] [ 22%] 2025-12-04T13:20:27.8448318Z test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda PASSED [0.0090s] [ 22%] 2025-12-04T13:20:27.8448408Z test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda PASSED [0.0231s] [ 22%] 2025-12-04T13:20:27.8448505Z test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda PASSED [0.0028s] [ 22%] 2025-12-04T13:20:27.8448604Z test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda PASSED [0.7720s] [ 22%] 2025-12-04T13:20:27.8448710Z test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda PASSED [0.7684s] [ 22%] 2025-12-04T13:20:27.8448807Z test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda PASSED [0.7750s] [ 22%] 2025-12-04T13:20:27.8448903Z test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda PASSED [0.7673s] [ 22%] 2025-12-04T13:20:27.8448998Z test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda PASSED [0.7623s] [ 22%] 2025-12-04T13:20:27.8449113Z test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda PASSED [0.7765s] [ 22%] 2025-12-04T13:20:27.8449201Z test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda PASSED [0.7740s] [ 22%] 2025-12-04T13:20:27.8449286Z test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda XFAIL [0.0111s] [ 22%] 2025-12-04T13:20:27.8449380Z test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda XFAIL [0.7729s] [ 22%] 2025-12-04T13:20:27.8449477Z test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda PASSED [0.7818s] [ 22%] 2025-12-04T13:20:27.8449569Z test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda PASSED [0.0033s] [ 22%] 2025-12-04T13:20:27.8449701Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda PASSED [0.0031s] [ 22%] 2025-12-04T13:20:27.8449823Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda PASSED [0.7715s] [ 22%] 2025-12-04T13:20:27.8449930Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda PASSED [0.7769s] [ 22%] 2025-12-04T13:20:27.8450039Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda PASSED [0.7912s] [ 22%] 2025-12-04T13:20:27.8450160Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda PASSED [0.7679s] [ 22%] 2025-12-04T13:20:27.8450284Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda PASSED [0.7749s] [ 22%] 2025-12-04T13:20:27.8450399Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda PASSED [0.7816s] [ 22%] 2025-12-04T13:20:27.8450509Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda PASSED [0.7765s] [ 22%] 2025-12-04T13:20:27.8450611Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda PASSED [0.7734s] [ 22%] 2025-12-04T13:20:27.8450717Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda PASSED [0.7781s] [ 22%] 2025-12-04T13:20:27.8450830Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda PASSED [0.7689s] [ 22%] 2025-12-04T13:20:27.8450942Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda PASSED [0.7816s] [ 22%] 2025-12-04T13:20:27.8451053Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda PASSED [0.7756s] [ 22%] 2025-12-04T13:20:27.8451191Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda PASSED [0.7717s] [ 22%] 2025-12-04T13:20:27.8451314Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda PASSED [0.7694s] [ 22%] 2025-12-04T13:20:27.8451437Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda PASSED [0.7764s] [ 22%] 2025-12-04T13:20:27.8451563Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda PASSED [0.7796s] [ 22%] 2025-12-04T13:20:27.8451668Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda PASSED [0.7865s] [ 22%] 2025-12-04T13:20:27.8451819Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_head_attention_forward_cuda PASSED [0.7797s] [ 23%] 2025-12-04T13:20:27.8451930Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda PASSED [0.0034s] [ 23%] 2025-12-04T13:20:27.8452042Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda PASSED [0.7774s] [ 23%] 2025-12-04T13:20:27.8452149Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda PASSED [0.7766s] [ 23%] 2025-12-04T13:20:27.8452272Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda PASSED [0.7824s] [ 23%] 2025-12-04T13:20:27.8452383Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda PASSED [0.7655s] [ 23%] 2025-12-04T13:20:27.8452474Z test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda XFAIL [0.0055s] [ 23%] 2025-12-04T13:20:27.8452595Z test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda SKIPPED [0.0001s] (Skipped!) [ 23%] 2025-12-04T13:20:27.8452690Z test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda PASSED [0.7696s] [ 23%] 2025-12-04T13:20:27.8452782Z test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda PASSED [0.0032s] [ 23%] 2025-12-04T13:20:27.8452877Z test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda PASSED [0.0044s] [ 23%] 2025-12-04T13:20:27.8452967Z test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda PASSED [0.8437s] [ 23%] 2025-12-04T13:20:27.8453084Z test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda PASSED [0.7773s] [ 23%] 2025-12-04T13:20:27.8453177Z test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda PASSED [0.7715s] [ 23%] 2025-12-04T13:20:27.8453301Z test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda PASSED [0.7707s] [ 23%] 2025-12-04T13:20:27.8453388Z test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda PASSED [0.7869s] [ 23%] 2025-12-04T13:20:27.8453492Z test_ops.py::TestCommonCUDA::test_out_warning_round_cuda PASSED [0.7803s] [ 23%] 2025-12-04T13:20:27.8453578Z test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda PASSED [0.7695s] [ 23%] 2025-12-04T13:20:27.8453667Z test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda PASSED [0.7650s] [ 23%] 2025-12-04T13:20:27.8453765Z test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda PASSED [0.7685s] [ 23%] 2025-12-04T13:20:27.8453852Z test_ops.py::TestCommonCUDA::test_out_warning_short_cuda PASSED [0.7762s] [ 23%] 2025-12-04T13:20:27.8453944Z test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda PASSED [0.7766s] [ 23%] 2025-12-04T13:20:27.8454059Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda PASSED [0.0033s] [ 23%] 2025-12-04T13:20:27.8454175Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda PASSED [0.0027s] [ 23%] 2025-12-04T13:20:27.8454284Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda PASSED [0.0026s] [ 23%] 2025-12-04T13:20:27.8454378Z test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda PASSED [0.7871s] [ 23%] 2025-12-04T13:20:27.8454465Z test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda PASSED [0.7850s] [ 23%] 2025-12-04T13:20:27.8454571Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda PASSED [0.7886s] [ 23%] 2025-12-04T13:20:27.8454694Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda PASSED [0.0145s] [ 23%] 2025-12-04T13:20:27.8454818Z test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda PASSED [0.7745s] [ 23%] 2025-12-04T13:20:27.8454912Z test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda PASSED [0.7849s] [ 23%] 2025-12-04T13:20:27.8455030Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda PASSED [0.9178s] [ 23%] 2025-12-04T13:20:27.8455144Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda PASSED [0.7822s] [ 23%] 2025-12-04T13:20:27.8455248Z test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda PASSED [0.7713s] [ 23%] 2025-12-04T13:20:27.8455351Z test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda PASSED [0.7739s] [ 23%] 2025-12-04T13:20:27.8455464Z test_ops.py::TestCommonCUDA::test_out_warning_squeeze_copy_cuda PASSED [0.7820s] [ 23%] 2025-12-04T13:20:27.8455558Z test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda PASSED [0.7797s] [ 23%] 2025-12-04T13:20:27.8455657Z test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda PASSED [0.7828s] [ 23%] 2025-12-04T13:20:27.8455753Z test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda PASSED [0.7760s] [ 23%] 2025-12-04T13:20:27.8455841Z test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda PASSED [0.7812s] [ 23%] 2025-12-04T13:20:27.8455930Z test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda PASSED [0.7735s] [ 23%] 2025-12-04T13:20:27.8456034Z test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda PASSED [0.7810s] [ 23%] 2025-12-04T13:20:27.8456136Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda PASSED [0.7813s] [ 23%] 2025-12-04T13:20:27.8456228Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda PASSED [0.7774s] [ 23%] 2025-12-04T13:20:27.8456324Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda PASSED [0.7767s] [ 23%] 2025-12-04T13:20:27.8456420Z test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda PASSED [0.7859s] [ 23%] 2025-12-04T13:20:27.8456508Z test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32 PASSED [0.0110s] [ 23%] 2025-12-04T13:20:27.8456612Z test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32 PASSED [0.7764s] [ 23%] 2025-12-04T13:20:27.8456704Z test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32 PASSED [0.7576s] [ 23%] 2025-12-04T13:20:27.8456800Z test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda PASSED [0.0040s] [ 23%] 2025-12-04T13:20:27.8456912Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_bool PASSED [0.0054s] [ 23%] 2025-12-04T13:20:27.8457034Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int16 PASSED [0.0045s] [ 23%] 2025-12-04T13:20:27.8457146Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int64 PASSED [0.0043s] [ 23%] 2025-12-04T13:20:27.8457256Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_int8 PASSED [0.0043s] [ 23%] 2025-12-04T13:20:27.8457363Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_bool PASSED [0.7775s] [ 23%] 2025-12-04T13:20:27.8457472Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_uint8 PASSED [0.7790s] [ 23%] 2025-12-04T13:20:27.8457581Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int16 PASSED [0.7818s] [ 23%] 2025-12-04T13:20:27.8457688Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acosh_cuda_int64 PASSED [0.7741s] [ 23%] 2025-12-04T13:20:27.8457795Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8 PASSED [0.7711s] [ 23%] 2025-12-04T13:20:27.8457901Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int8 PASSED [0.7815s] [ 23%] 2025-12-04T13:20:27.8458010Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int64 PASSED [0.0049s] [ 23%] 2025-12-04T13:20:27.8458115Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_uint8 PASSED [0.0041s] [ 23%] 2025-12-04T13:20:27.8458229Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int8 PASSED [0.0040s] [ 23%] 2025-12-04T13:20:27.8458348Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int8 PASSED [0.0031s] [ 23%] 2025-12-04T13:20:27.8458459Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_deg2rad_cuda_bool PASSED [0.7773s] [ 23%] 2025-12-04T13:20:27.8458564Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int16 PASSED [0.7761s] [ 23%] 2025-12-04T13:20:27.8458669Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erf_cuda_int64 PASSED [0.7677s] [ 23%] 2025-12-04T13:20:27.8458774Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_bool PASSED [0.9221s] [ 24%] 2025-12-04T13:20:27.8458880Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_uint8 PASSED [0.7687s] [ 24%] 2025-12-04T13:20:27.8459000Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_int16 PASSED [0.9557s] [ 24%] 2025-12-04T13:20:27.8459110Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfinv_cuda_uint8 PASSED [0.7731s] [ 24%] 2025-12-04T13:20:27.8459216Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_bool PASSED [0.7851s] [ 24%] 2025-12-04T13:20:27.8459324Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int16 PASSED [0.7690s] [ 24%] 2025-12-04T13:20:27.8459430Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int8 PASSED [0.7756s] [ 24%] 2025-12-04T13:20:27.8459533Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_int32 PASSED [0.7851s] [ 24%] 2025-12-04T13:20:27.8459641Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_bool PASSED [0.7781s] [ 24%] 2025-12-04T13:20:27.8459748Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16 PASSED [0.7735s] [ 24%] 2025-12-04T13:20:27.8459856Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int64 PASSED [0.7749s] [ 24%] 2025-12-04T13:20:27.8459970Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_int16 PASSED [0.0051s] [ 24%] 2025-12-04T13:20:27.8460076Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_i0_cuda_bool PASSED [0.7753s] [ 24%] 2025-12-04T13:20:27.8460193Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int16 PASSED [0.0051s] [ 24%] 2025-12-04T13:20:27.8460300Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int64 PASSED [0.0043s] [ 24%] 2025-12-04T13:20:27.8460407Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_uint8 PASSED [0.0041s] [ 24%] 2025-12-04T13:20:27.8460525Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool PASSED [0.0028s] [ 24%] 2025-12-04T13:20:27.8460630Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int32 PASSED [0.7670s] [ 24%] 2025-12-04T13:20:27.8460745Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int16 PASSED [0.7912s] [ 24%] 2025-12-04T13:20:27.8460857Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_uint8 PASSED [0.7986s] [ 24%] 2025-12-04T13:20:27.8460988Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_int64 PASSED [0.7732s] [ 24%] 2025-12-04T13:20:27.8461117Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_uint8 PASSED [0.7802s] [ 24%] 2025-12-04T13:20:27.8461245Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int16 PASSED [0.7750s] [ 24%] 2025-12-04T13:20:27.8461371Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_5_cuda_int64 PASSED [0.7699s] [ 24%] 2025-12-04T13:20:27.8461503Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_bool PASSED [0.7788s] [ 24%] 2025-12-04T13:20:27.8461653Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int64 SKIPPED [0.0002s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8461800Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_2_cuda_int64 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8461959Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int8 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8462106Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_bool SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8462257Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int64 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8462403Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_uint8 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T13:20:27.8462518Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_uint8 PASSED [0.7754s] [ 24%] 2025-12-04T13:20:27.8462640Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rsqrt_cuda_int16 PASSED [0.7756s] [ 24%] 2025-12-04T13:20:27.8462752Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_int32 PASSED [0.7730s] [ 24%] 2025-12-04T13:20:27.8462857Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_bool PASSED [0.7768s] [ 24%] 2025-12-04T13:20:27.8462967Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int32 PASSED [0.7742s] [ 24%] 2025-12-04T13:20:27.8463072Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int64 PASSED [0.7749s] [ 24%] 2025-12-04T13:20:27.8463181Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinh_cuda_int32 PASSED [0.7732s] [ 24%] 2025-12-04T13:20:27.8463354Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_t_cuda_int64 PASSED [0.0051s] [ 24%] 2025-12-04T13:20:27.8463497Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_bool PASSED [0.0042s] [ 24%] 2025-12-04T13:20:27.8463637Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_u_cuda_int8 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8463777Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int16 PASSED [0.3884s] [ 24%] 2025-12-04T13:20:27.8463937Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_int64 PASSED [0.0043s] [ 24%] 2025-12-04T13:20:27.8464078Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_bool PASSED [0.3913s] [ 24%] 2025-12-04T13:20:27.8464217Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int16 PASSED [0.0043s] [ 24%] 2025-12-04T13:20:27.8464368Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_laguerre_polynomial_l_cuda_int64 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8464505Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int8 PASSED [0.3941s] [ 24%] 2025-12-04T13:20:27.8464659Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_uint8 PASSED [0.0062s] [ 24%] 2025-12-04T13:20:27.8464812Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int32 PASSED [0.0057s] [ 24%] 2025-12-04T13:20:27.8464964Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int32 PASSED [0.0059s] [ 24%] 2025-12-04T13:20:27.8465116Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_v_cuda_int64 PASSED [0.0039s] [ 24%] 2025-12-04T13:20:27.8465266Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool PASSED [0.0038s] [ 24%] 2025-12-04T13:20:27.8465418Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_uint8 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8465541Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int16 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8465664Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int32 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8465781Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int64 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8465912Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_zeta_cuda_int8 PASSED [0.0040s] [ 24%] 2025-12-04T13:20:27.8466019Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_bool PASSED [0.7789s] [ 24%] 2025-12-04T13:20:27.8466126Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int8 PASSED [0.7782s] [ 24%] 2025-12-04T13:20:27.8466232Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_bool PASSED [0.7778s] [ 24%] 2025-12-04T13:20:27.8466338Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tan_cuda_uint8 PASSED [0.7809s] [ 24%] 2025-12-04T13:20:27.8466465Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int16 PASSED [0.7754s] [ 24%] 2025-12-04T13:20:27.8466571Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_int64 PASSED [0.7732s] [ 24%] 2025-12-04T13:20:27.8466685Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int32 PASSED [0.0057s] [ 24%] 2025-12-04T13:20:27.8466793Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_bool PASSED [0.0042s] [ 24%] 2025-12-04T13:20:27.8466903Z test_ops.py::TestCommonCUDA::test_promotes_int_to_float_xlogy_cuda_uint8 PASSED [0.0041s] [ 24%] 2025-12-04T13:20:27.8467005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128 PASSED [0.0056s] [ 24%] 2025-12-04T13:20:27.8467102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32 PASSED [0.0035s] [ 25%] 2025-12-04T13:20:27.8467198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64 PASSED [0.0034s] [ 25%] 2025-12-04T13:20:27.8467323Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16 PASSED [0.0201s] [ 25%] 2025-12-04T13:20:27.8467443Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8 PASSED [0.0184s] [ 25%] 2025-12-04T13:20:27.8467565Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64 PASSED [0.0181s] [ 25%] 2025-12-04T13:20:27.8467694Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16 PASSED [0.0162s] [ 25%] 2025-12-04T13:20:27.8467810Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8 PASSED [0.0114s] [ 25%] 2025-12-04T13:20:27.8467932Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16 PASSED [0.0220s] [ 25%] 2025-12-04T13:20:27.8468068Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64 PASSED [0.0221s] [ 25%] 2025-12-04T13:20:27.8468192Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32 PASSED [0.0801s] [ 25%] 2025-12-04T13:20:27.8468318Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64 PASSED [0.0280s] [ 25%] 2025-12-04T13:20:27.8468439Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64 PASSED [0.0222s] [ 25%] 2025-12-04T13:20:27.8468559Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32 PASSED [0.0200s] [ 25%] 2025-12-04T13:20:27.8468676Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8 PASSED [0.0189s] [ 25%] 2025-12-04T13:20:27.8468801Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128 PASSED [0.0334s] [ 25%] 2025-12-04T13:20:27.8468924Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32 PASSED [0.0303s] [ 25%] 2025-12-04T13:20:27.8469042Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8 PASSED [0.0189s] [ 25%] 2025-12-04T13:20:27.8469164Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32 PASSED [0.0292s] [ 25%] 2025-12-04T13:20:27.8469290Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128 PASSED [0.0326s] [ 25%] 2025-12-04T13:20:27.8469411Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16 PASSED [0.0212s] [ 25%] 2025-12-04T13:20:27.8469543Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16 PASSED [0.0213s] [ 25%] 2025-12-04T13:20:27.8469663Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64 PASSED [0.0325s] [ 25%] 2025-12-04T13:20:27.8469781Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16 PASSED [0.0187s] [ 25%] 2025-12-04T13:20:27.8469900Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16 PASSED [0.0182s] [ 25%] 2025-12-04T13:20:27.8470016Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32 PASSED [0.0180s] [ 25%] 2025-12-04T13:20:27.8470147Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8 PASSED [0.0156s] [ 25%] 2025-12-04T13:20:27.8470267Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_polar_cuda_float32 PASSED [0.0900s] [ 25%] 2025-12-04T13:20:27.8470389Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32 PASSED [0.8202s] [ 25%] 2025-12-04T13:20:27.8470489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool PASSED [0.7921s] [ 25%] 2025-12-04T13:20:27.8470595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32 PASSED [0.9714s] [ 25%] 2025-12-04T13:20:27.8470697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16 PASSED [0.0250s] [ 25%] 2025-12-04T13:20:27.8470802Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32 PASSED [0.7951s] [ 25%] 2025-12-04T13:20:27.8470904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32 PASSED [0.7954s] [ 25%] 2025-12-04T13:20:27.8471005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32 PASSED [0.8060s] [ 25%] 2025-12-04T13:20:27.8471111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16 PASSED [0.8026s] [ 25%] 2025-12-04T13:20:27.8471212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16 PASSED [0.0993s] [ 25%] 2025-12-04T13:20:27.8473063Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128 PASSED [0.0755s] [ 25%] 2025-12-04T13:20:27.8473173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32 PASSED [0.1110s] [ 25%] 2025-12-04T13:20:27.8473309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16 PASSED [0.0969s] [ 25%] 2025-12-04T13:20:27.8473419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64 PASSED [1.7380s] [ 25%] 2025-12-04T13:20:27.8473554Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64 PASSED [0.8256s] [ 25%] 2025-12-04T13:20:27.8473659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16 XFAIL [0.0089s] [ 25%] 2025-12-04T13:20:27.8473758Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8 XFAIL [0.7828s] [ 25%] 2025-12-04T13:20:27.8473865Z test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int64 PASSED [0.7850s] [ 25%] 2025-12-04T13:20:27.8473979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8 PASSED [0.7830s] [ 25%] 2025-12-04T13:20:27.8474085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_uint8 PASSED [0.7767s] [ 25%] 2025-12-04T13:20:27.8474190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32 PASSED [0.7905s] [ 25%] 2025-12-04T13:20:27.8474290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8 PASSED [0.7930s] [ 25%] 2025-12-04T13:20:27.8474405Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16 PASSED [0.8200s] [ 25%] 2025-12-04T13:20:27.8474507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16 PASSED [0.7879s] [ 25%] 2025-12-04T13:20:27.8474611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32 PASSED [0.7789s] [ 25%] 2025-12-04T13:20:27.8474711Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8 PASSED [0.7805s] [ 25%] 2025-12-04T13:20:27.8474838Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64 PASSED [0.7802s] [ 25%] 2025-12-04T13:20:27.8474942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64 PASSED [0.7811s] [ 25%] 2025-12-04T13:20:27.8475045Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8 PASSED [0.7810s] [ 25%] 2025-12-04T13:20:27.8475150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128 PASSED [0.7870s] [ 25%] 2025-12-04T13:20:27.8475258Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64 PASSED [0.7908s] [ 25%] 2025-12-04T13:20:27.8475366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16 PASSED [0.0188s] [ 25%] 2025-12-04T13:20:27.8475491Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32 PASSED [0.0163s] [ 25%] 2025-12-04T13:20:27.8475595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32 PASSED [0.0088s] [ 25%] 2025-12-04T13:20:27.8475716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_complex64 XFAIL [0.0034s] [ 25%] 2025-12-04T13:20:27.8475832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int16 XFAIL [0.7776s] [ 25%] 2025-12-04T13:20:27.8475943Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_copy_cuda_int8 XFAIL [0.7712s] [ 25%] 2025-12-04T13:20:27.8476052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool PASSED [0.7752s] [ 25%] 2025-12-04T13:20:27.8476163Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16 PASSED [0.7699s] [ 25%] 2025-12-04T13:20:27.8476295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16 PASSED [0.7809s] [ 25%] 2025-12-04T13:20:27.8476426Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64 PASSED [0.7818s] [ 25%] 2025-12-04T13:20:27.8476555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32 PASSED [0.7803s] [ 26%] 2025-12-04T13:20:27.8476674Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32 PASSED [0.7814s] [ 26%] 2025-12-04T13:20:27.8476807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16 PASSED [0.7844s] [ 26%] 2025-12-04T13:20:27.8476921Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64 PASSED [0.7838s] [ 26%] 2025-12-04T13:20:27.8477024Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool PASSED [0.8049s] [ 26%] 2025-12-04T13:20:27.8477139Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64 PASSED [0.8014s] [ 26%] 2025-12-04T13:20:27.8477246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16 PASSED [0.0875s] [ 26%] 2025-12-04T13:20:27.8477347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool PASSED [0.8533s] [ 26%] 2025-12-04T13:20:27.8477449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8 PASSED [0.7925s] [ 26%] 2025-12-04T13:20:27.8477556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16 PASSED [0.7990s] [ 26%] 2025-12-04T13:20:27.8477667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64 PASSED [0.8010s] [ 26%] 2025-12-04T13:20:27.8477771Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32 PASSED [0.7987s] [ 26%] 2025-12-04T13:20:27.8477876Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64 PASSED [0.7958s] [ 26%] 2025-12-04T13:20:27.8477982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool PASSED [0.7915s] [ 26%] 2025-12-04T13:20:27.8478097Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128 PASSED [0.7895s] [ 26%] 2025-12-04T13:20:27.8478214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32 PASSED [0.7785s] [ 26%] 2025-12-04T13:20:27.8478325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64 PASSED [0.7822s] [ 26%] 2025-12-04T13:20:27.8478438Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64 PASSED [0.7854s] [ 26%] 2025-12-04T13:20:27.8478557Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16 PASSED [0.7781s] [ 26%] 2025-12-04T13:20:27.8478665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8 PASSED [0.7787s] [ 26%] 2025-12-04T13:20:27.8478770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool PASSED [0.7791s] [ 26%] 2025-12-04T13:20:27.8478883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64 PASSED [0.7801s] [ 26%] 2025-12-04T13:20:27.8478987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32 PASSED [0.7772s] [ 26%] 2025-12-04T13:20:27.8479109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8 PASSED [0.7771s] [ 26%] 2025-12-04T13:20:27.8479226Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16 PASSED [0.8161s] [ 26%] 2025-12-04T13:20:27.8479344Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32 PASSED [0.8186s] [ 26%] 2025-12-04T13:20:27.8479454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64 PASSED [0.7924s] [ 26%] 2025-12-04T13:20:27.8479562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8 PASSED [0.7862s] [ 26%] 2025-12-04T13:20:27.8479668Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool PASSED [0.8190s] [ 26%] 2025-12-04T13:20:27.8479777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool PASSED [0.7804s] [ 26%] 2025-12-04T13:20:27.8479890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex32 PASSED [0.7976s] [ 26%] 2025-12-04T13:20:27.8480007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex64 PASSED [0.7915s] [ 26%] 2025-12-04T13:20:27.8480125Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64 PASSED [0.7883s] [ 26%] 2025-12-04T13:20:27.8480242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64 PASSED [0.7760s] [ 26%] 2025-12-04T13:20:27.8480363Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32 PASSED [0.7743s] [ 26%] 2025-12-04T13:20:27.8480474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64 PASSED [0.2932s] [ 26%] 2025-12-04T13:20:27.8480575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16 PASSED [0.0090s] [ 26%] 2025-12-04T13:20:27.8480685Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16 PASSED [0.0077s] [ 26%] 2025-12-04T13:20:27.8480784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64 PASSED [0.0076s] [ 26%] 2025-12-04T13:20:27.8480970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float16 SKIPPED [0.0001s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 26%] 2025-12-04T13:20:27.8481150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float64 SKIPPED [0.0001s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 26%] 2025-12-04T13:20:27.8481255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64 PASSED [0.7878s] [ 26%] 2025-12-04T13:20:27.8481355Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool PASSED [0.7962s] [ 26%] 2025-12-04T13:20:27.8481462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128 PASSED [0.8113s] [ 26%] 2025-12-04T13:20:27.8481564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8 PASSED [0.7998s] [ 26%] 2025-12-04T13:20:27.8481670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16 PASSED [0.8294s] [ 26%] 2025-12-04T13:20:27.8481776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool PASSED [0.8578s] [ 26%] 2025-12-04T13:20:27.8481885Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16 PASSED [0.1205s] [ 26%] 2025-12-04T13:20:27.8481994Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64 PASSED [0.0858s] [ 26%] 2025-12-04T13:20:27.8482120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64 PASSED [0.0735s] [ 26%] 2025-12-04T13:20:27.8482228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64 PASSED [0.0750s] [ 26%] 2025-12-04T13:20:27.8482331Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8 PASSED [0.0722s] [ 26%] 2025-12-04T13:20:27.8482437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16 PASSED [0.0267s] [ 26%] 2025-12-04T13:20:27.8482541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16 PASSED [0.0214s] [ 26%] 2025-12-04T13:20:27.8482660Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32 PASSED [0.0048s] [ 26%] 2025-12-04T13:20:27.8482783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8 PASSED [0.0043s] [ 26%] 2025-12-04T13:20:27.8482891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16 PASSED [0.0140s] [ 26%] 2025-12-04T13:20:27.8482999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64 PASSED [0.0296s] [ 26%] 2025-12-04T13:20:27.8483104Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64 PASSED [0.0140s] [ 26%] 2025-12-04T13:20:27.8483207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16 PASSED [0.0103s] [ 26%] 2025-12-04T13:20:27.8483378Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32 PASSED [0.0270s] [ 26%] 2025-12-04T13:20:27.8483489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32 PASSED [0.0177s] [ 26%] 2025-12-04T13:20:27.8483595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64 PASSED [0.0177s] [ 26%] 2025-12-04T13:20:27.8483703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16 PASSED [0.1077s] [ 26%] 2025-12-04T13:20:27.8483807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32 PASSED [0.1062s] [ 26%] 2025-12-04T13:20:27.8483915Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128 PASSED [0.3915s] [ 26%] 2025-12-04T13:20:27.8484036Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32 PASSED [0.4226s] [ 26%] 2025-12-04T13:20:27.8484141Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16 PASSED [0.8183s] [ 27%] 2025-12-04T13:20:27.8484242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32 PASSED [0.7888s] [ 27%] 2025-12-04T13:20:27.8484361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32 PASSED [0.7992s] [ 27%] 2025-12-04T13:20:27.8484460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8 PASSED [0.8074s] [ 27%] 2025-12-04T13:20:27.8484582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex64 PASSED [0.7828s] [ 27%] 2025-12-04T13:20:27.8484695Z test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float16 PASSED [0.7925s] [ 27%] 2025-12-04T13:20:27.8484805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_bfloat16 PASSED [0.7892s] [ 27%] 2025-12-04T13:20:27.8484917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_complex128 PASSED [0.7898s] [ 27%] 2025-12-04T13:20:27.8485026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16 PASSED [0.7807s] [ 27%] 2025-12-04T13:20:27.8485127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32 PASSED [0.7782s] [ 27%] 2025-12-04T13:20:27.8485234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64 PASSED [0.7900s] [ 27%] 2025-12-04T13:20:27.8485336Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64 PASSED [0.7820s] [ 27%] 2025-12-04T13:20:27.8485445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool PASSED [0.8073s] [ 27%] 2025-12-04T13:20:27.8485558Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128 PASSED [0.8156s] [ 27%] 2025-12-04T13:20:27.8485672Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64 PASSED [0.8101s] [ 27%] 2025-12-04T13:20:27.8485811Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128 PASSED [0.7915s] [ 27%] 2025-12-04T13:20:27.8485924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32 PASSED [0.7850s] [ 27%] 2025-12-04T13:20:27.8486039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16 PASSED [0.7865s] [ 27%] 2025-12-04T13:20:27.8486151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64 PASSED [0.7896s] [ 27%] 2025-12-04T13:20:27.8486265Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8 PASSED [0.7833s] [ 27%] 2025-12-04T13:20:27.8486392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8 PASSED [0.7772s] [ 27%] 2025-12-04T13:20:27.8486506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128 PASSED [0.7846s] [ 27%] 2025-12-04T13:20:27.8486611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32 PASSED [0.7869s] [ 27%] 2025-12-04T13:20:27.8486720Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8 PASSED [0.7848s] [ 27%] 2025-12-04T13:20:27.8486834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool PASSED [0.7730s] [ 27%] 2025-12-04T13:20:27.8486958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128 PASSED [0.7885s] [ 27%] 2025-12-04T13:20:27.8487080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64 PASSED [0.7850s] [ 27%] 2025-12-04T13:20:27.8487200Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64 PASSED [0.7789s] [ 27%] 2025-12-04T13:20:27.8487305Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool PASSED [1.2028s] [ 27%] 2025-12-04T13:20:27.8487410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32 PASSED [0.0213s] [ 27%] 2025-12-04T13:20:27.8487529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16 PASSED [0.3089s] [ 27%] 2025-12-04T13:20:27.8487659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32 PASSED [0.1279s] [ 27%] 2025-12-04T13:20:27.8487780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32 PASSED [0.0644s] [ 27%] 2025-12-04T13:20:27.8487899Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64 PASSED [0.0857s] [ 27%] 2025-12-04T13:20:27.8488027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8 PASSED [0.0842s] [ 27%] 2025-12-04T13:20:27.8488144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32 PASSED [0.8355s] [ 27%] 2025-12-04T13:20:27.8488261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8 PASSED [0.8345s] [ 27%] 2025-12-04T13:20:27.8488364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex64 XFAIL [0.0044s] [ 27%] 2025-12-04T13:20:27.8488473Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16 PASSED [0.7853s] [ 27%] 2025-12-04T13:20:27.8488578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool PASSED [0.7824s] [ 27%] 2025-12-04T13:20:27.8488689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128 PASSED [0.7789s] [ 27%] 2025-12-04T13:20:27.8488794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16 PASSED [0.7656s] [ 27%] 2025-12-04T13:20:27.8488898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16 PASSED [0.7498s] [ 27%] 2025-12-04T13:20:27.8489003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32 PASSED [0.7441s] [ 27%] 2025-12-04T13:20:27.8489112Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64 PASSED [0.7494s] [ 27%] 2025-12-04T13:20:27.8489214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16 PASSED [0.7470s] [ 27%] 2025-12-04T13:20:27.8489384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32 SKIPPED [0.0002s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8489538Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8489691Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8489838Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8489992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8490162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8490321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 27%] 2025-12-04T13:20:27.8490500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bfloat16 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 27%] 2025-12-04T13:20:27.8490666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bool SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 27%] 2025-12-04T13:20:27.8490838Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float32 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 27%] 2025-12-04T13:20:27.8491008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float64 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 27%] 2025-12-04T13:20:27.8491177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int16 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 27%] 2025-12-04T13:20:27.8491280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32 PASSED [0.0898s] [ 27%] 2025-12-04T13:20:27.8491401Z test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_bfloat16 PASSED [0.7572s] [ 27%] 2025-12-04T13:20:27.8491507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex64 PASSED [0.7488s] [ 27%] 2025-12-04T13:20:27.8491612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int32 PASSED [0.7529s] [ 27%] 2025-12-04T13:20:27.8491711Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64 PASSED [0.7697s] [ 27%] 2025-12-04T13:20:27.8491823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool PASSED [0.9646s] [ 27%] 2025-12-04T13:20:27.8491925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32 PASSED [0.2905s] [ 27%] 2025-12-04T13:20:27.8492027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32 PASSED [0.0201s] [ 27%] 2025-12-04T13:20:27.8492129Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32 PASSED [0.2429s] [ 27%] 2025-12-04T13:20:27.8492237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_complex64 PASSED [0.4085s] [ 28%] 2025-12-04T13:20:27.8492337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16 PASSED [0.2136s] [ 28%] 2025-12-04T13:20:27.8492444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128 PASSED [1.1321s] [ 28%] 2025-12-04T13:20:27.8492546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32 PASSED [0.7563s] [ 28%] 2025-12-04T13:20:27.8492648Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8 PASSED [0.7630s] [ 28%] 2025-12-04T13:20:27.8492763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128 PASSED [0.7385s] [ 28%] 2025-12-04T13:20:27.8492870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32 PASSED [0.7463s] [ 28%] 2025-12-04T13:20:27.8492977Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64 PASSED [0.7460s] [ 28%] 2025-12-04T13:20:27.8493102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_complex128 PASSED [0.7485s] [ 28%] 2025-12-04T13:20:27.8493217Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_float64 PASSED [0.7497s] [ 28%] 2025-12-04T13:20:27.8493357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_int16 PASSED [0.7454s] [ 28%] 2025-12-04T13:20:27.8493466Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16 PASSED [0.7500s] [ 28%] 2025-12-04T13:20:27.8493571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64 PASSED [0.7473s] [ 28%] 2025-12-04T13:20:27.8493681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex128 PASSED [0.7701s] [ 28%] 2025-12-04T13:20:27.8493806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_complex64 PASSED [0.7685s] [ 28%] 2025-12-04T13:20:27.8493913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64 PASSED [0.7743s] [ 28%] 2025-12-04T13:20:27.8494105Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exponential_cuda_float32 SKIPPED [0.0002s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 28%] 2025-12-04T13:20:27.8494210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16 PASSED [0.0800s] [ 28%] 2025-12-04T13:20:27.8494315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128 PASSED [0.0799s] [ 28%] 2025-12-04T13:20:27.8494421Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64 PASSED [0.0794s] [ 28%] 2025-12-04T13:20:27.8494525Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16 PASSED [0.0768s] [ 28%] 2025-12-04T13:20:27.8494630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32 PASSED [0.0764s] [ 28%] 2025-12-04T13:20:27.8494730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16 PASSED [0.0695s] [ 28%] 2025-12-04T13:20:27.8494837Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool PASSED [0.7680s] [ 28%] 2025-12-04T13:20:27.8494948Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32 PASSED [1.4008s] [ 28%] 2025-12-04T13:20:27.8495068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64 PASSED [0.0108s] [ 28%] 2025-12-04T13:20:27.8495172Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8 PASSED [0.7423s] [ 28%] 2025-12-04T13:20:27.8495283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32 PASSED [0.7530s] [ 28%] 2025-12-04T13:20:27.8495412Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32 PASSED [0.7553s] [ 28%] 2025-12-04T13:20:27.8495515Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64 PASSED [0.7538s] [ 28%] 2025-12-04T13:20:27.8495621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8 PASSED [0.7505s] [ 28%] 2025-12-04T13:20:27.8495735Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32 PASSED [0.7528s] [ 28%] 2025-12-04T13:20:27.8495846Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64 PASSED [0.7560s] [ 28%] 2025-12-04T13:20:27.8495953Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16 PASSED [0.7614s] [ 28%] 2025-12-04T13:20:27.8496059Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64 PASSED [0.7483s] [ 28%] 2025-12-04T13:20:27.8496168Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64 PASSED [0.7463s] [ 28%] 2025-12-04T13:20:27.8496277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64 PASSED [0.8976s] [ 28%] 2025-12-04T13:20:27.8496382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32 PASSED [0.0105s] [ 28%] 2025-12-04T13:20:27.8496489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8 PASSED [0.7558s] [ 28%] 2025-12-04T13:20:27.8496600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64 PASSED [0.7559s] [ 28%] 2025-12-04T13:20:27.8496712Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64 PASSED [0.7553s] [ 28%] 2025-12-04T13:20:27.8496835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64 PASSED [0.7540s] [ 28%] 2025-12-04T13:20:27.8496943Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8 PASSED [0.7518s] [ 28%] 2025-12-04T13:20:27.8497046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32 PASSED [0.7494s] [ 28%] 2025-12-04T13:20:27.8497158Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32 PASSED [0.7600s] [ 28%] 2025-12-04T13:20:27.8497263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8 PASSED [0.7530s] [ 28%] 2025-12-04T13:20:27.8497394Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128 PASSED [0.7439s] [ 28%] 2025-12-04T13:20:27.8497500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64 PASSED [0.7551s] [ 28%] 2025-12-04T13:20:27.8497605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool PASSED [0.7565s] [ 28%] 2025-12-04T13:20:27.8497716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64 PASSED [0.9076s] [ 28%] 2025-12-04T13:20:27.8497827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32 PASSED [0.3448s] [ 28%] 2025-12-04T13:20:27.8497936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32 PASSED [0.7503s] [ 28%] 2025-12-04T13:20:27.8498041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16 PASSED [0.7483s] [ 28%] 2025-12-04T13:20:27.8498149Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8 PASSED [0.7504s] [ 28%] 2025-12-04T13:20:27.8498261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32 PASSED [0.7745s] [ 28%] 2025-12-04T13:20:27.8498372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16 PASSED [0.7545s] [ 28%] 2025-12-04T13:20:27.8498479Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16 PASSED [0.7454s] [ 28%] 2025-12-04T13:20:27.8498599Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32 PASSED [0.7471s] [ 28%] 2025-12-04T13:20:27.8498704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64 PASSED [0.7543s] [ 28%] 2025-12-04T13:20:27.8498814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64 PASSED [0.9101s] [ 28%] 2025-12-04T13:20:27.8498922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32 PASSED [0.0222s] [ 28%] 2025-12-04T13:20:27.8499042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64 PASSED [0.0202s] [ 28%] 2025-12-04T13:20:27.8499147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16 PASSED [0.0192s] [ 28%] 2025-12-04T13:20:27.8499252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32 PASSED [0.0189s] [ 28%] 2025-12-04T13:20:27.8499355Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool PASSED [0.0034s] [ 28%] 2025-12-04T13:20:27.8499464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16 PASSED [0.0034s] [ 28%] 2025-12-04T13:20:27.8499568Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32 PASSED [0.0034s] [ 28%] 2025-12-04T13:20:27.8499675Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64 PASSED [0.0036s] [ 29%] 2025-12-04T13:20:27.8499775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64 PASSED [0.0031s] [ 29%] 2025-12-04T13:20:27.8499881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8 PASSED [0.0032s] [ 29%] 2025-12-04T13:20:27.8499987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool PASSED [0.0839s] [ 29%] 2025-12-04T13:20:27.8500103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64 PASSED [0.8528s] [ 29%] 2025-12-04T13:20:27.8500207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16 PASSED [0.7674s] [ 29%] 2025-12-04T13:20:27.8500335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32 PASSED [0.7674s] [ 29%] 2025-12-04T13:20:27.8500439Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64 PASSED [0.8073s] [ 29%] 2025-12-04T13:20:27.8500540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16 PASSED [0.7952s] [ 29%] 2025-12-04T13:20:27.8500641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool PASSED [0.7807s] [ 29%] 2025-12-04T13:20:27.8500742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16 PASSED [0.7901s] [ 29%] 2025-12-04T13:20:27.8500845Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64 PASSED [0.8080s] [ 29%] 2025-12-04T13:20:27.8500958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32 PASSED [0.7919s] [ 29%] 2025-12-04T13:20:27.8501060Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8 PASSED [0.8057s] [ 29%] 2025-12-04T13:20:27.8501161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16 PASSED [0.7767s] [ 29%] 2025-12-04T13:20:27.8501266Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64 PASSED [0.7707s] [ 29%] 2025-12-04T13:20:27.8501363Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16 PASSED [1.2407s] [ 29%] 2025-12-04T13:20:27.8501462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8 PASSED [0.4965s] [ 29%] 2025-12-04T13:20:27.8501561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8 PASSED [1.2009s] [ 29%] 2025-12-04T13:20:27.8501665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16 PASSED [0.8167s] [ 29%] 2025-12-04T13:20:27.8501762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool PASSED [0.7811s] [ 29%] 2025-12-04T13:20:27.8501864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16 PASSED [0.8098s] [ 29%] 2025-12-04T13:20:27.8501961Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32 PASSED [0.7971s] [ 29%] 2025-12-04T13:20:27.8502061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64 PASSED [0.7920s] [ 29%] 2025-12-04T13:20:27.8502168Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8 PASSED [0.7932s] [ 29%] 2025-12-04T13:20:27.8502267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8 PASSED [0.7900s] [ 29%] 2025-12-04T13:20:27.8502449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int16 SKIPPED [0.0002s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 29%] 2025-12-04T13:20:27.8502568Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16 PASSED [0.8176s] [ 29%] 2025-12-04T13:20:27.8502665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8 PASSED [0.7972s] [ 29%] 2025-12-04T13:20:27.8502780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16 PASSED [0.1485s] [ 29%] 2025-12-04T13:20:27.8502889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64 PASSED [0.1190s] [ 29%] 2025-12-04T13:20:27.8502997Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8 PASSED [0.1122s] [ 29%] 2025-12-04T13:20:27.8503108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8 PASSED [0.1119s] [ 29%] 2025-12-04T13:20:27.8503216Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64 PASSED [0.7524s] [ 29%] 2025-12-04T13:20:27.8503358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16 PASSED [0.7481s] [ 29%] 2025-12-04T13:20:27.8503460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64 PASSED [0.7476s] [ 29%] 2025-12-04T13:20:27.8503565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool PASSED [0.7509s] [ 29%] 2025-12-04T13:20:27.8503673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32 PASSED [0.7535s] [ 29%] 2025-12-04T13:20:27.8503776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64 PASSED [0.7566s] [ 29%] 2025-12-04T13:20:27.8503878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8 PASSED [0.7579s] [ 29%] 2025-12-04T13:20:27.8503993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32 PASSED [0.9681s] [ 29%] 2025-12-04T13:20:27.8504089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8 PASSED [0.0200s] [ 29%] 2025-12-04T13:20:27.8504198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128 PASSED [0.0301s] [ 29%] 2025-12-04T13:20:27.8504303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32 PASSED [0.0311s] [ 29%] 2025-12-04T13:20:27.8504413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64 PASSED [0.7805s] [ 29%] 2025-12-04T13:20:27.8504533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16 XFAIL [0.0044s] [ 29%] 2025-12-04T13:20:27.8504641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16 XFAIL [0.7392s] [ 29%] 2025-12-04T13:20:27.8504745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32 XFAIL [0.7498s] [ 29%] 2025-12-04T13:20:27.8504857Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32 XFAIL [0.7386s] [ 29%] 2025-12-04T13:20:27.8504966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16 XFAIL [0.7408s] [ 29%] 2025-12-04T13:20:27.8505083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex32 XFAIL [0.7385s] [ 29%] 2025-12-04T13:20:27.8505190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64 XFAIL [0.7405s] [ 29%] 2025-12-04T13:20:27.8505297Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool PASSED [0.8924s] [ 29%] 2025-12-04T13:20:27.8505409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128 PASSED [0.1590s] [ 29%] 2025-12-04T13:20:27.8505517Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32 PASSED [0.1410s] [ 29%] 2025-12-04T13:20:27.8505630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128 PASSED [0.0287s] [ 29%] 2025-12-04T13:20:27.8505752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32 PASSED [0.0162s] [ 29%] 2025-12-04T13:20:27.8505859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64 PASSED [0.0160s] [ 29%] 2025-12-04T13:20:27.8505964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8 PASSED [0.0152s] [ 29%] 2025-12-04T13:20:27.8506071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16 PASSED [0.0218s] [ 29%] 2025-12-04T13:20:27.8506189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32 PASSED [0.0174s] [ 29%] 2025-12-04T13:20:27.8506295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16 PASSED [0.0146s] [ 29%] 2025-12-04T13:20:27.8506396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8 PASSED [0.0140s] [ 29%] 2025-12-04T13:20:27.8506507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128 PASSED [0.0245s] [ 29%] 2025-12-04T13:20:27.8506614Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32 PASSED [0.0133s] [ 29%] 2025-12-04T13:20:27.8506721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8 PASSED [0.0113s] [ 29%] 2025-12-04T13:20:27.8506830Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16 PASSED [0.0199s] [ 29%] 2025-12-04T13:20:27.8506936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool PASSED [0.0174s] [ 30%] 2025-12-04T13:20:27.8507042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32 PASSED [0.0147s] [ 30%] 2025-12-04T13:20:27.8507148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8 PASSED [0.0140s] [ 30%] 2025-12-04T13:20:27.8507252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16 PASSED [0.0220s] [ 30%] 2025-12-04T13:20:27.8507356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16 PASSED [0.0160s] [ 30%] 2025-12-04T13:20:27.8507458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float32 PASSED [0.0043s] [ 30%] 2025-12-04T13:20:27.8507574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64 PASSED [0.0042s] [ 30%] 2025-12-04T13:20:27.8507674Z test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int8 PASSED [0.7559s] [ 30%] 2025-12-04T13:20:27.8507774Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool PASSED [0.7836s] [ 30%] 2025-12-04T13:20:27.8507874Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64 PASSED [0.7987s] [ 30%] 2025-12-04T13:20:27.8507973Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32 PASSED [0.7886s] [ 30%] 2025-12-04T13:20:27.8508070Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64 PASSED [0.8053s] [ 30%] 2025-12-04T13:20:27.8508179Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8 PASSED [0.7955s] [ 30%] 2025-12-04T13:20:27.8508285Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64 PASSED [0.7911s] [ 30%] 2025-12-04T13:20:27.8508399Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int32 PASSED [0.7632s] [ 30%] 2025-12-04T13:20:27.8508514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int64 PASSED [0.7424s] [ 30%] 2025-12-04T13:20:27.8508622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_cross_cuda_int8 PASSED [0.7586s] [ 30%] 2025-12-04T13:20:27.8508738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bool PASSED [0.7548s] [ 30%] 2025-12-04T13:20:27.8508854Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int16 PASSED [0.7507s] [ 30%] 2025-12-04T13:20:27.8508970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_int64 PASSED [0.7543s] [ 30%] 2025-12-04T13:20:27.8509082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64 PASSED [0.8378s] [ 30%] 2025-12-04T13:20:27.8509196Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16 PASSED [0.8184s] [ 30%] 2025-12-04T13:20:27.8509326Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32 PASSED [0.7693s] [ 30%] 2025-12-04T13:20:27.8509454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128 PASSED [0.8599s] [ 30%] 2025-12-04T13:20:27.8509574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32 PASSED [0.8514s] [ 30%] 2025-12-04T13:20:27.8509689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64 PASSED [0.0425s] [ 30%] 2025-12-04T13:20:27.8509809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16 XFAIL [0.0040s] [ 30%] 2025-12-04T13:20:27.8509917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32 XFAIL [0.7507s] [ 30%] 2025-12-04T13:20:27.8510045Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32 XFAIL [0.7519s] [ 30%] 2025-12-04T13:20:27.8510148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool PASSED [0.7693s] [ 30%] 2025-12-04T13:20:27.8510258Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128 PASSED [1.2562s] [ 30%] 2025-12-04T13:20:27.8510367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32 PASSED [0.7720s] [ 30%] 2025-12-04T13:20:27.8510469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64 PASSED [0.7624s] [ 30%] 2025-12-04T13:20:27.8510580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64 PASSED [0.7733s] [ 30%] 2025-12-04T13:20:27.8510684Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16 PASSED [0.7583s] [ 30%] 2025-12-04T13:20:27.8510787Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8 PASSED [0.7629s] [ 30%] 2025-12-04T13:20:27.8510892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16 PASSED [0.7761s] [ 30%] 2025-12-04T13:20:27.8510990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64 PASSED [0.7650s] [ 30%] 2025-12-04T13:20:27.8511103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8 PASSED [0.7666s] [ 30%] 2025-12-04T13:20:27.8511202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8 PASSED [0.7631s] [ 30%] 2025-12-04T13:20:27.8511323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool PASSED [0.7559s] [ 30%] 2025-12-04T13:20:27.8511449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32 PASSED [0.7553s] [ 30%] 2025-12-04T13:20:27.8511577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64 PASSED [0.7557s] [ 30%] 2025-12-04T13:20:27.8511710Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64 PASSED [0.7530s] [ 30%] 2025-12-04T13:20:27.8511831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64 PASSED [0.7603s] [ 30%] 2025-12-04T13:20:27.8511941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp2_cuda_float64 PASSED [0.7472s] [ 30%] 2025-12-04T13:20:27.8512052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_bfloat16 PASSED [0.1817s] [ 30%] 2025-12-04T13:20:27.8512159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16 PASSED [0.0627s] [ 30%] 2025-12-04T13:20:27.8512267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64 PASSED [0.8086s] [ 30%] 2025-12-04T13:20:27.8512372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool PASSED [0.7558s] [ 30%] 2025-12-04T13:20:27.8512487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128 PASSED [0.7773s] [ 30%] 2025-12-04T13:20:27.8512599Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64 PASSED [0.8175s] [ 30%] 2025-12-04T13:20:27.8512709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64 PASSED [0.8199s] [ 30%] 2025-12-04T13:20:27.8512816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32 PASSED [0.7942s] [ 30%] 2025-12-04T13:20:27.8512935Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool PASSED [0.7888s] [ 30%] 2025-12-04T13:20:27.8513041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16 PASSED [0.8036s] [ 30%] 2025-12-04T13:20:27.8513145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32 XFAIL [0.0425s] [ 30%] 2025-12-04T13:20:27.8513282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8 PASSED [0.8535s] [ 30%] 2025-12-04T13:20:27.8513424Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_bfloat16 XFAIL [0.0130s] [ 30%] 2025-12-04T13:20:27.8513549Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_int16 XFAIL [0.7457s] [ 30%] 2025-12-04T13:20:27.8513655Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64 PASSED [0.7527s] [ 30%] 2025-12-04T13:20:27.8513766Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16 PASSED [0.7462s] [ 30%] 2025-12-04T13:20:27.8513878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64 PASSED [0.7419s] [ 30%] 2025-12-04T13:20:27.8513984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8 PASSED [0.7452s] [ 30%] 2025-12-04T13:20:27.8514089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64 PASSED [0.7961s] [ 30%] 2025-12-04T13:20:27.8514193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64 PASSED [0.7832s] [ 30%] 2025-12-04T13:20:27.8514297Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64 PASSED [0.7391s] [ 30%] 2025-12-04T13:20:27.8514425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [0.7390s] [ 31%] 2025-12-04T13:20:27.8514548Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [0.7469s] [ 31%] 2025-12-04T13:20:27.8514672Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [0.7463s] [ 31%] 2025-12-04T13:20:27.8514817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [0.7501s] [ 31%] 2025-12-04T13:20:27.8514945Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [0.7459s] [ 31%] 2025-12-04T13:20:27.8515050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32 PASSED [0.7894s] [ 31%] 2025-12-04T13:20:27.8515157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64 PASSED [0.7872s] [ 31%] 2025-12-04T13:20:27.8515259Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32 PASSED [0.7741s] [ 31%] 2025-12-04T13:20:27.8515376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16 PASSED [0.7390s] [ 31%] 2025-12-04T13:20:27.8515481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool PASSED [0.7638s] [ 31%] 2025-12-04T13:20:27.8515585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8 PASSED [0.7471s] [ 31%] 2025-12-04T13:20:27.8515700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128 XFAIL [0.0042s] [ 31%] 2025-12-04T13:20:27.8515810Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32 XFAIL [0.7326s] [ 31%] 2025-12-04T13:20:27.8515919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64 XFAIL [0.7514s] [ 31%] 2025-12-04T13:20:27.8516025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16 XFAIL [0.7578s] [ 31%] 2025-12-04T13:20:27.8516130Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8 XFAIL [0.7515s] [ 31%] 2025-12-04T13:20:27.8516236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool PASSED [0.7658s] [ 31%] 2025-12-04T13:20:27.8516344Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128 PASSED [0.0224s] [ 31%] 2025-12-04T13:20:27.8516447Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32 PASSED [0.7641s] [ 31%] 2025-12-04T13:20:27.8516571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64 PASSED [0.7649s] [ 31%] 2025-12-04T13:20:27.8516672Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64 PASSED [0.7550s] [ 31%] 2025-12-04T13:20:27.8516795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16 PASSED [0.0493s] [ 31%] 2025-12-04T13:20:27.8516891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8 PASSED [0.7981s] [ 31%] 2025-12-04T13:20:27.8517007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128 PASSED [0.9324s] [ 31%] 2025-12-04T13:20:27.8517106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16 PASSED [0.0143s] [ 31%] 2025-12-04T13:20:27.8517204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32 PASSED [0.0122s] [ 31%] 2025-12-04T13:20:27.8517364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 31%] 2025-12-04T13:20:27.8517522Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 31%] 2025-12-04T13:20:27.8517676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 31%] 2025-12-04T13:20:27.8517857Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 31%] 2025-12-04T13:20:27.8518031Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16 SKIPPED [0.0002s] (Expected: empty_strided is not comparable) [ 31%] 2025-12-04T13:20:27.8518144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32 PASSED [0.7504s] [ 31%] 2025-12-04T13:20:27.8518255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64 PASSED [0.7514s] [ 31%] 2025-12-04T13:20:27.8518372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32 PASSED [0.7533s] [ 31%] 2025-12-04T13:20:27.8518483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16 PASSED [0.7990s] [ 31%] 2025-12-04T13:20:27.8518663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16 SKIPPED [0.0002s] (Expected: dropout is not comparable) [ 31%] 2025-12-04T13:20:27.8518783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32 PASSED [0.7787s] [ 31%] 2025-12-04T13:20:27.8518915Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_bool PASSED [0.7517s] [ 31%] 2025-12-04T13:20:27.8519066Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex128 PASSED [0.7497s] [ 31%] 2025-12-04T13:20:27.8519201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64 PASSED [0.7540s] [ 31%] 2025-12-04T13:20:27.8519333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int32 PASSED [0.7399s] [ 31%] 2025-12-04T13:20:27.8519464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_int64 PASSED [0.7445s] [ 31%] 2025-12-04T13:20:27.8519582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16 PASSED [0.7940s] [ 31%] 2025-12-04T13:20:27.8519711Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [0.8480s] [ 31%] 2025-12-04T13:20:27.8519839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32 PASSED [0.0722s] [ 31%] 2025-12-04T13:20:27.8519959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16 PASSED [0.7718s] [ 31%] 2025-12-04T13:20:27.8520085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32 PASSED [0.7523s] [ 31%] 2025-12-04T13:20:27.8520209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [0.7473s] [ 31%] 2025-12-04T13:20:27.8520346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16 PASSED [0.7515s] [ 31%] 2025-12-04T13:20:27.8520472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32 PASSED [0.7478s] [ 31%] 2025-12-04T13:20:27.8520596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64 PASSED [0.7509s] [ 31%] 2025-12-04T13:20:27.8520755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [0.7484s] [ 31%] 2025-12-04T13:20:27.8520895Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [0.7615s] [ 31%] 2025-12-04T13:20:27.8521032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [0.7756s] [ 31%] 2025-12-04T13:20:27.8521151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16 PASSED [0.7801s] [ 31%] 2025-12-04T13:20:27.8521270Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32 PASSED [0.7787s] [ 31%] 2025-12-04T13:20:27.8521393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16 PASSED [0.7556s] [ 31%] 2025-12-04T13:20:27.8521536Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [0.7471s] [ 31%] 2025-12-04T13:20:27.8521663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_bool PASSED [0.0069s] [ 31%] 2025-12-04T13:20:27.8521795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64 PASSED [0.0061s] [ 31%] 2025-12-04T13:20:27.8521921Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int8 PASSED [0.0056s] [ 31%] 2025-12-04T13:20:27.8522050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int16 PASSED [0.0055s] [ 31%] 2025-12-04T13:20:27.8522191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_int64 PASSED [0.0055s] [ 31%] 2025-12-04T13:20:27.8522328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [0.0817s] [ 31%] 2025-12-04T13:20:27.8522449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16 PASSED [0.8609s] [ 31%] 2025-12-04T13:20:27.8522569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32 PASSED [0.8290s] [ 31%] 2025-12-04T13:20:27.8522687Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16 PASSED [0.7753s] [ 31%] 2025-12-04T13:20:27.8522820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64 PASSED [0.7832s] [ 32%] 2025-12-04T13:20:27.8522940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16 PASSED [0.7864s] [ 32%] 2025-12-04T13:20:27.8523056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8 PASSED [0.7665s] [ 32%] 2025-12-04T13:20:27.8523174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32 PASSED [0.7809s] [ 32%] 2025-12-04T13:20:27.8523340Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [0.7553s] [ 32%] 2025-12-04T13:20:27.8523478Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [0.7531s] [ 32%] 2025-12-04T13:20:27.8523603Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32 PASSED [0.7841s] [ 32%] 2025-12-04T13:20:27.8523735Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16 PASSED [0.0770s] [ 32%] 2025-12-04T13:20:27.8523860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64 PASSED [0.0394s] [ 32%] 2025-12-04T13:20:27.8523993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [0.7891s] [ 32%] 2025-12-04T13:20:27.8524135Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64 PASSED [0.7712s] [ 32%] 2025-12-04T13:20:27.8524256Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8 PASSED [0.7690s] [ 32%] 2025-12-04T13:20:27.8524377Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8 PASSED [0.7650s] [ 32%] 2025-12-04T13:20:27.8524531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [0.7650s] [ 32%] 2025-12-04T13:20:27.8524668Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [0.7632s] [ 32%] 2025-12-04T13:20:27.8524772Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64 PASSED [0.7650s] [ 32%] 2025-12-04T13:20:27.8524969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_bfloat16 SKIPPED [0.0002s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 32%] 2025-12-04T13:20:27.8525163Z test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_float64 SKIPPED [0.0001s] (TODO: RuntimeError: no _refs support for torch.rand_like) [ 32%] 2025-12-04T13:20:27.8525271Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32 PASSED [0.7533s] [ 32%] 2025-12-04T13:20:27.8525374Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16 PASSED [0.7460s] [ 32%] 2025-12-04T13:20:27.8525475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32 PASSED [0.7477s] [ 32%] 2025-12-04T13:20:27.8525575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8 PASSED [0.7464s] [ 32%] 2025-12-04T13:20:27.8525689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_float64 PASSED [0.7737s] [ 32%] 2025-12-04T13:20:27.8525799Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_int16 PASSED [0.7660s] [ 32%] 2025-12-04T13:20:27.8525923Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_copy_cuda_uint8 PASSED [0.7622s] [ 32%] 2025-12-04T13:20:27.8526032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128 PASSED [0.7680s] [ 32%] 2025-12-04T13:20:27.8526137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16 PASSED [0.7475s] [ 32%] 2025-12-04T13:20:27.8526235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32 PASSED [0.7974s] [ 32%] 2025-12-04T13:20:27.8526335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8 PASSED [0.8001s] [ 32%] 2025-12-04T13:20:27.8526453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64 PASSED [0.7614s] [ 32%] 2025-12-04T13:20:27.8526555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64 PASSED [0.7601s] [ 32%] 2025-12-04T13:20:27.8526653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16 PASSED [1.0111s] [ 32%] 2025-12-04T13:20:27.8526754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32 PASSED [0.2753s] [ 32%] 2025-12-04T13:20:27.8526859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float32 PASSED [0.7619s] [ 32%] 2025-12-04T13:20:27.8526965Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float64 PASSED [0.7565s] [ 32%] 2025-12-04T13:20:27.8527072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128 PASSED [0.7452s] [ 32%] 2025-12-04T13:20:27.8527177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32 PASSED [0.7431s] [ 32%] 2025-12-04T13:20:27.8527282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64 PASSED [0.7451s] [ 32%] 2025-12-04T13:20:27.8527384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32 PASSED [0.7563s] [ 32%] 2025-12-04T13:20:27.8527484Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8 PASSED [0.7508s] [ 32%] 2025-12-04T13:20:27.8527588Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16 PASSED [0.7729s] [ 32%] 2025-12-04T13:20:27.8527697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool PASSED [0.7550s] [ 32%] 2025-12-04T13:20:27.8527798Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16 PASSED [0.7590s] [ 32%] 2025-12-04T13:20:27.8527910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16 PASSED [0.7684s] [ 32%] 2025-12-04T13:20:27.8528026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool PASSED [0.7778s] [ 32%] 2025-12-04T13:20:27.8528135Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16 PASSED [0.7639s] [ 32%] 2025-12-04T13:20:27.8528242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16 PASSED [0.7608s] [ 32%] 2025-12-04T13:20:27.8528347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8 PASSED [0.7672s] [ 32%] 2025-12-04T13:20:27.8528450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64 PASSED [0.7969s] [ 32%] 2025-12-04T13:20:27.8528556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8 PASSED [0.7959s] [ 32%] 2025-12-04T13:20:27.8528664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_complex128 PASSED [0.7498s] [ 32%] 2025-12-04T13:20:27.8528768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32 PASSED [0.0278s] [ 32%] 2025-12-04T13:20:27.8528871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64 PASSED [0.0254s] [ 32%] 2025-12-04T13:20:27.8528984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16 PASSED [0.0165s] [ 32%] 2025-12-04T13:20:27.8529088Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8 PASSED [0.7646s] [ 32%] 2025-12-04T13:20:27.8529194Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16 PASSED [0.7741s] [ 32%] 2025-12-04T13:20:27.8529295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8 PASSED [0.7579s] [ 32%] 2025-12-04T13:20:27.8529409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16 PASSED [0.7487s] [ 32%] 2025-12-04T13:20:27.8529509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64 PASSED [0.7461s] [ 32%] 2025-12-04T13:20:27.8529614Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128 PASSED [0.7619s] [ 32%] 2025-12-04T13:20:27.8529718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32 PASSED [0.7506s] [ 32%] 2025-12-04T13:20:27.8529823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64 PASSED [0.7651s] [ 32%] 2025-12-04T13:20:27.8529923Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8 PASSED [0.7675s] [ 32%] 2025-12-04T13:20:27.8530034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8 PASSED [0.7550s] [ 32%] 2025-12-04T13:20:27.8530140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32 PASSED [1.1907s] [ 32%] 2025-12-04T13:20:27.8530242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64 PASSED [0.0223s] [ 32%] 2025-12-04T13:20:27.8530346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128 PASSED [0.0771s] [ 32%] 2025-12-04T13:20:27.8530446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64 PASSED [0.0621s] [ 33%] 2025-12-04T13:20:27.8530545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8 PASSED [0.0480s] [ 33%] 2025-12-04T13:20:27.8530658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_float16 PASSED [0.0077s] [ 33%] 2025-12-04T13:20:27.8530770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_select_scatter_cuda_uint8 PASSED [0.7584s] [ 33%] 2025-12-04T13:20:27.8530874Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32 PASSED [1.0778s] [ 33%] 2025-12-04T13:20:27.8530975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16 PASSED [0.0255s] [ 33%] 2025-12-04T13:20:27.8531083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16 PASSED [0.0330s] [ 33%] 2025-12-04T13:20:27.8531196Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool PASSED [0.7817s] [ 33%] 2025-12-04T13:20:27.8531299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16 PASSED [0.7770s] [ 33%] 2025-12-04T13:20:27.8531402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32 PASSED [0.7777s] [ 33%] 2025-12-04T13:20:27.8531513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8 PASSED [0.7598s] [ 33%] 2025-12-04T13:20:27.8531614Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32 PASSED [0.7547s] [ 33%] 2025-12-04T13:20:27.8531714Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64 PASSED [0.7487s] [ 33%] 2025-12-04T13:20:27.8531813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8 PASSED [0.7507s] [ 33%] 2025-12-04T13:20:27.8531913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool PASSED [0.7562s] [ 33%] 2025-12-04T13:20:27.8532018Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16 PASSED [0.7570s] [ 33%] 2025-12-04T13:20:27.8532122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32 PASSED [0.7497s] [ 33%] 2025-12-04T13:20:27.8532222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8 PASSED [0.7565s] [ 33%] 2025-12-04T13:20:27.8532324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32 PASSED [0.9964s] [ 33%] 2025-12-04T13:20:27.8532421Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8 PASSED [0.0194s] [ 33%] 2025-12-04T13:20:27.8532523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32 PASSED [0.3420s] [ 33%] 2025-12-04T13:20:27.8532622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64 PASSED [0.2467s] [ 33%] 2025-12-04T13:20:27.8532726Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32 PASSED [0.9916s] [ 33%] 2025-12-04T13:20:27.8532836Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32 PASSED [0.7587s] [ 33%] 2025-12-04T13:20:27.8532955Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32 PASSED [0.7506s] [ 33%] 2025-12-04T13:20:27.8533074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32 PASSED [1.3308s] [ 33%] 2025-12-04T13:20:27.8533188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8 PASSED [0.2678s] [ 33%] 2025-12-04T13:20:27.8533332Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32 PASSED [0.0176s] [ 33%] 2025-12-04T13:20:27.8533462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16 PASSED [0.3720s] [ 33%] 2025-12-04T13:20:27.8533572Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64 PASSED [0.3199s] [ 33%] 2025-12-04T13:20:27.8533682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16 PASSED [0.2311s] [ 33%] 2025-12-04T13:20:27.8533792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64 PASSED [0.0392s] [ 33%] 2025-12-04T13:20:27.8533901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8 PASSED [0.7865s] [ 33%] 2025-12-04T13:20:27.8534011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16 PASSED [1.0152s] [ 33%] 2025-12-04T13:20:27.8534122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16 PASSED [0.5376s] [ 33%] 2025-12-04T13:20:27.8534232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32 PASSED [0.7647s] [ 33%] 2025-12-04T13:20:27.8534342Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64 PASSED [1.1884s] [ 33%] 2025-12-04T13:20:27.8534450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64 PASSED [0.3878s] [ 33%] 2025-12-04T13:20:27.8534557Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32 PASSED [0.0189s] [ 33%] 2025-12-04T13:20:27.8534678Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8 PASSED [0.7623s] [ 33%] 2025-12-04T13:20:27.8534783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool PASSED [0.9666s] [ 33%] 2025-12-04T13:20:27.8534893Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64 PASSED [0.4447s] [ 33%] 2025-12-04T13:20:27.8534999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16 PASSED [0.0192s] [ 33%] 2025-12-04T13:20:27.8535128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32 PASSED [0.0187s] [ 33%] 2025-12-04T13:20:27.8535244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64 PASSED [1.2950s] [ 33%] 2025-12-04T13:20:27.8535357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64 PASSED [1.0219s] [ 33%] 2025-12-04T13:20:27.8535497Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [0.0681s] [ 33%] 2025-12-04T13:20:27.8535637Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [0.0636s] [ 33%] 2025-12-04T13:20:27.8535774Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [0.0657s] [ 33%] 2025-12-04T13:20:27.8535911Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [0.8231s] [ 33%] 2025-12-04T13:20:27.8536050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [0.8075s] [ 33%] 2025-12-04T13:20:27.8536161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8 PASSED [0.9745s] [ 33%] 2025-12-04T13:20:27.8536294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [0.0122s] [ 33%] 2025-12-04T13:20:27.8536425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16 PASSED [0.0101s] [ 33%] 2025-12-04T13:20:27.8536577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32 PASSED [0.0099s] [ 33%] 2025-12-04T13:20:27.8536707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16 PASSED [0.0099s] [ 33%] 2025-12-04T13:20:27.8536834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool PASSED [0.2429s] [ 33%] 2025-12-04T13:20:27.8536947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16 PASSED [0.1340s] [ 33%] 2025-12-04T13:20:27.8537059Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64 PASSED [0.1333s] [ 33%] 2025-12-04T13:20:27.8537179Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16 PASSED [1.4126s] [ 33%] 2025-12-04T13:20:27.8537295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_float32 PASSED [0.0060s] [ 33%] 2025-12-04T13:20:27.8537402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64 PASSED [0.9955s] [ 33%] 2025-12-04T13:20:27.8537508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32 PASSED [0.7663s] [ 33%] 2025-12-04T13:20:27.8537609Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64 PASSED [0.7640s] [ 33%] 2025-12-04T13:20:27.8537718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int16 PASSED [0.7580s] [ 33%] 2025-12-04T13:20:27.8537828Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int64 PASSED [0.7495s] [ 33%] 2025-12-04T13:20:27.8537935Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16 PASSED [0.7472s] [ 33%] 2025-12-04T13:20:27.8538041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16 PASSED [0.7549s] [ 34%] 2025-12-04T13:20:27.8538162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_complex64 PASSED [0.7458s] [ 34%] 2025-12-04T13:20:27.8538279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float16 PASSED [0.7489s] [ 34%] 2025-12-04T13:20:27.8538406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_float32 PASSED [0.7526s] [ 34%] 2025-12-04T13:20:27.8538520Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int16 PASSED [0.7428s] [ 34%] 2025-12-04T13:20:27.8538632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_uint8 PASSED [0.7393s] [ 34%] 2025-12-04T13:20:27.8538749Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64 PASSED [0.0101s] [ 34%] 2025-12-04T13:20:27.8538853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32 PASSED [0.0074s] [ 34%] 2025-12-04T13:20:27.8538955Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float32 XFAIL [0.0036s] [ 34%] 2025-12-04T13:20:27.8539056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16 PASSED [0.8385s] [ 34%] 2025-12-04T13:20:27.8539156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64 PASSED [0.0617s] [ 34%] 2025-12-04T13:20:27.8539256Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32 PASSED [0.7859s] [ 34%] 2025-12-04T13:20:27.8539354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8 PASSED [0.7910s] [ 34%] 2025-12-04T13:20:27.8539453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16 PASSED [0.7468s] [ 34%] 2025-12-04T13:20:27.8539552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64 PASSED [0.7529s] [ 34%] 2025-12-04T13:20:27.8539650Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16 PASSED [0.7665s] [ 34%] 2025-12-04T13:20:27.8539748Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32 PASSED [0.7626s] [ 34%] 2025-12-04T13:20:27.8539859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64 PASSED [0.7780s] [ 34%] 2025-12-04T13:20:27.8539968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16 PASSED [0.7765s] [ 34%] 2025-12-04T13:20:27.8540086Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32 PASSED [0.7619s] [ 34%] 2025-12-04T13:20:27.8540192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_bfloat16 PASSED [0.7440s] [ 34%] 2025-12-04T13:20:27.8540298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex64 PASSED [0.7523s] [ 34%] 2025-12-04T13:20:27.8540399Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int64 PASSED [0.7459s] [ 34%] 2025-12-04T13:20:27.8540496Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32 PASSED [0.7605s] [ 34%] 2025-12-04T13:20:27.8540591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8 PASSED [0.7382s] [ 34%] 2025-12-04T13:20:27.8540721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex64 XFAIL [0.0052s] [ 34%] 2025-12-04T13:20:27.8540831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16 XFAIL [0.7624s] [ 34%] 2025-12-04T13:20:27.8540940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int64 XFAIL [0.7632s] [ 34%] 2025-12-04T13:20:27.8541050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_uint8 XFAIL [0.7536s] [ 34%] 2025-12-04T13:20:27.8541150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32 PASSED [0.7615s] [ 34%] 2025-12-04T13:20:27.8541247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32 PASSED [0.7651s] [ 34%] 2025-12-04T13:20:27.8541344Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64 PASSED [0.7629s] [ 34%] 2025-12-04T13:20:27.8541446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32 PASSED [0.7618s] [ 34%] 2025-12-04T13:20:27.8541546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64 PASSED [0.7561s] [ 34%] 2025-12-04T13:20:27.8541643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8 PASSED [0.7580s] [ 34%] 2025-12-04T13:20:27.8541752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32 XFAIL [0.0043s] [ 34%] 2025-12-04T13:20:27.8541865Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64 PASSED [0.7503s] [ 34%] 2025-12-04T13:20:27.8541960Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8 PASSED [0.0148s] [ 34%] 2025-12-04T13:20:27.8542075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float16 PASSED [0.7480s] [ 34%] 2025-12-04T13:20:27.8542189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float32 PASSED [0.7483s] [ 34%] 2025-12-04T13:20:27.8542315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float64 PASSED [0.7440s] [ 34%] 2025-12-04T13:20:27.8542421Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool PASSED [0.7458s] [ 34%] 2025-12-04T13:20:27.8542533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128 PASSED [0.7472s] [ 34%] 2025-12-04T13:20:27.8542642Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32 PASSED [0.7404s] [ 34%] 2025-12-04T13:20:27.8542750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64 PASSED [0.7492s] [ 34%] 2025-12-04T13:20:27.8542856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32 PASSED [0.7500s] [ 34%] 2025-12-04T13:20:27.8542959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bfloat16 PASSED [0.7644s] [ 34%] 2025-12-04T13:20:27.8543057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool PASSED [0.7543s] [ 34%] 2025-12-04T13:20:27.8543159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32 PASSED [0.7570s] [ 34%] 2025-12-04T13:20:27.8543284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64 PASSED [0.7575s] [ 34%] 2025-12-04T13:20:27.8543383Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8 PASSED [0.7515s] [ 34%] 2025-12-04T13:20:27.8543487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bfloat16 PASSED [0.7557s] [ 34%] 2025-12-04T13:20:27.8543611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64 PASSED [0.7519s] [ 34%] 2025-12-04T13:20:27.8543719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool PASSED [0.8357s] [ 34%] 2025-12-04T13:20:27.8543832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128 PASSED [0.8327s] [ 34%] 2025-12-04T13:20:27.8543943Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32 XFAIL [0.2081s] [ 34%] 2025-12-04T13:20:27.8544057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64 PASSED [0.8295s] [ 34%] 2025-12-04T13:20:27.8544179Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8 PASSED [0.0859s] [ 34%] 2025-12-04T13:20:27.8544290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64 PASSED [0.0109s] [ 34%] 2025-12-04T13:20:27.8544475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int8 PASSED [0.7546s] [ 34%] 2025-12-04T13:20:27.8544583Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16 PASSED [0.7564s] [ 34%] 2025-12-04T13:20:27.8544688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8 PASSED [0.7570s] [ 34%] 2025-12-04T13:20:27.8544797Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128 PASSED [0.7714s] [ 34%] 2025-12-04T13:20:27.8544896Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32 PASSED [0.7635s] [ 34%] 2025-12-04T13:20:27.8545003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32 PASSED [0.7593s] [ 34%] 2025-12-04T13:20:27.8545108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128 PASSED [0.7508s] [ 34%] 2025-12-04T13:20:27.8545218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128 PASSED [0.7627s] [ 34%] 2025-12-04T13:20:27.8545322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64 PASSED [0.7483s] [ 35%] 2025-12-04T13:20:27.8545443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vdot_cuda_float16 PASSED [0.7585s] [ 35%] 2025-12-04T13:20:27.8545552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128 PASSED [0.7695s] [ 35%] 2025-12-04T13:20:27.8545656Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16 PASSED [0.7616s] [ 35%] 2025-12-04T13:20:27.8545759Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64 PASSED [0.7648s] [ 35%] 2025-12-04T13:20:27.8545881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_bool PASSED [0.7481s] [ 35%] 2025-12-04T13:20:27.8545992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_complex128 PASSED [0.7457s] [ 35%] 2025-12-04T13:20:27.8546098Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int32 PASSED [0.7661s] [ 35%] 2025-12-04T13:20:27.8546199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32 PASSED [0.7795s] [ 35%] 2025-12-04T13:20:27.8546301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16 PASSED [0.7675s] [ 35%] 2025-12-04T13:20:27.8546399Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8 PASSED [0.7553s] [ 35%] 2025-12-04T13:20:27.8546502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16 PASSED [0.7487s] [ 35%] 2025-12-04T13:20:27.8546605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64 PASSED [0.7384s] [ 35%] 2025-12-04T13:20:27.8546707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64 PASSED [0.7485s] [ 35%] 2025-12-04T13:20:27.8546807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8 PASSED [0.7559s] [ 35%] 2025-12-04T13:20:27.8546908Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool PASSED [0.1322s] [ 35%] 2025-12-04T13:20:27.8547009Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32 PASSED [0.1351s] [ 35%] 2025-12-04T13:20:27.8547109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8 PASSED [0.1322s] [ 35%] 2025-12-04T13:20:27.8547231Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda PASSED [0.0033s] [ 35%] 2025-12-04T13:20:27.8547353Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda PASSED [0.0029s] [ 35%] 2025-12-04T13:20:27.8547459Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda PASSED [0.7486s] [ 35%] 2025-12-04T13:20:27.8547560Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda PASSED [0.7529s] [ 35%] 2025-12-04T13:20:27.8547663Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda XFAIL [0.0067s] [ 35%] 2025-12-04T13:20:27.8547778Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda PASSED [0.7571s] [ 35%] 2025-12-04T13:20:27.8547884Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda PASSED [0.7529s] [ 35%] 2025-12-04T13:20:27.8547988Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda PASSED [0.7539s] [ 35%] 2025-12-04T13:20:27.8548098Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda PASSED [0.7531s] [ 35%] 2025-12-04T13:20:27.8548202Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda PASSED [0.7488s] [ 35%] 2025-12-04T13:20:27.8548308Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda PASSED [0.7417s] [ 35%] 2025-12-04T13:20:27.8548408Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda PASSED [0.7557s] [ 35%] 2025-12-04T13:20:27.8548512Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda PASSED [0.7488s] [ 35%] 2025-12-04T13:20:27.8548616Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda PASSED [0.7569s] [ 35%] 2025-12-04T13:20:27.8548717Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda PASSED [0.7535s] [ 35%] 2025-12-04T13:20:27.8548815Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda PASSED [0.7497s] [ 35%] 2025-12-04T13:20:27.8549103Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linalg_diagonal_cuda E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8549254Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8549522Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8549670Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8549792Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8550009Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8550146Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8550276Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8550448Z E1204 12:31:46.344000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] RuntimeError: diagonal dimensions cannot be identical 1, 1 2025-12-04T13:20:27.8550623Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8550758Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8551024Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8551149Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8551271Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8551482Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8551617Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8551756Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8551957Z E1204 12:31:46.346000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 10000) 2025-12-04T13:20:27.8552130Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8552261Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8552512Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8552636Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8552757Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8552967Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8553112Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8553239Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8553559Z E1204 12:31:46.348000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 10000) 2025-12-04T13:20:27.8553746Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8553878Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8554128Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8554251Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8554372Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8554582Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8554715Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8554843Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8555011Z E1204 12:31:46.350000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] RuntimeError: diagonal dimensions cannot be identical 1, 1 2025-12-04T13:20:27.8555195Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8555327Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8555577Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8555699Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8555833Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8556042Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8556175Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8556301Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8556498Z E1204 12:31:46.351000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 10000) 2025-12-04T13:20:27.8556667Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] failed while attempting to run meta for aten.diagonal.default 2025-12-04T13:20:27.8556800Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] Traceback (most recent call last): 2025-12-04T13:20:27.8557050Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_subclasses/fake_tensor.py", line 2823, in _dispatch_impl 2025-12-04T13:20:27.8557188Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] r = func(*args, **kwargs) 2025-12-04T13:20:27.8557309Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8557517Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/_ops.py", line 836, in __call__ 2025-12-04T13:20:27.8557660Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] return self._op(*args, **kwargs) 2025-12-04T13:20:27.8557787Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] ^^^^^^^^^^^^^^^^^^^^^^^^^ 2025-12-04T13:20:27.8557984Z E1204 12:31:46.353000 1505791 site-packages/torch/_subclasses/fake_tensor.py:2827] IndexError: Dimension out of range (expected to be in range of [-3, 2], but got 10000) 2025-12-04T13:20:27.8558027Z PASSED [0.7600s] [ 35%] 2025-12-04T13:20:27.8558142Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda PASSED [0.7553s] [ 35%] 2025-12-04T13:20:27.8558253Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_log_normal_cuda PASSED [0.7465s] [ 35%] 2025-12-04T13:20:27.8558361Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda PASSED [0.7592s] [ 35%] 2025-12-04T13:20:27.8558463Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda PASSED [0.0027s] [ 35%] 2025-12-04T13:20:27.8558568Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda PASSED [0.7691s] [ 35%] 2025-12-04T13:20:27.8558669Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda PASSED [0.7446s] [ 35%] 2025-12-04T13:20:27.8558798Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda PASSED [0.7538s] [ 35%] 2025-12-04T13:20:27.8558922Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda PASSED [0.7533s] [ 35%] 2025-12-04T13:20:27.8559027Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda PASSED [0.7537s] [ 35%] 2025-12-04T13:20:27.8559134Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_copy_cuda PASSED [0.7501s] [ 35%] 2025-12-04T13:20:27.8559237Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda PASSED [0.7622s] [ 35%] 2025-12-04T13:20:27.8559341Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda PASSED [0.0035s] [ 35%] 2025-12-04T13:20:27.8559469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16 PASSED [0.7587s] [ 35%] 2025-12-04T13:20:27.8559605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8 PASSED [0.7587s] [ 35%] 2025-12-04T13:20:27.8559759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool PASSED [0.1118s] [ 35%] 2025-12-04T13:20:27.8559923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128 PASSED [0.1113s] [ 35%] 2025-12-04T13:20:27.8560077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8 PASSED [0.0861s] [ 35%] 2025-12-04T13:20:27.8560230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32 PASSED [0.1090s] [ 35%] 2025-12-04T13:20:27.8560377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool PASSED [0.0928s] [ 35%] 2025-12-04T13:20:27.8560533Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128 PASSED [0.0974s] [ 35%] 2025-12-04T13:20:27.8560682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16 PASSED [0.0957s] [ 35%] 2025-12-04T13:20:27.8560831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64 PASSED [0.0875s] [ 35%] 2025-12-04T13:20:27.8560989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8 PASSED [0.0860s] [ 35%] 2025-12-04T13:20:27.8561150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128 PASSED [0.0783s] [ 35%] 2025-12-04T13:20:27.8561316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16 PASSED [0.0920s] [ 35%] 2025-12-04T13:20:27.8561471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64 PASSED [0.1008s] [ 35%] 2025-12-04T13:20:27.8561623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64 PASSED [0.0816s] [ 35%] 2025-12-04T13:20:27.8561774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8 PASSED [0.0775s] [ 35%] 2025-12-04T13:20:27.8561933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64 PASSED [0.0906s] [ 35%] 2025-12-04T13:20:27.8562085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16 PASSED [0.0935s] [ 35%] 2025-12-04T13:20:27.8562236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32 PASSED [0.1007s] [ 35%] 2025-12-04T13:20:27.8562388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16 PASSED [0.0912s] [ 35%] 2025-12-04T13:20:27.8562539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32 PASSED [0.0911s] [ 36%] 2025-12-04T13:20:27.8562688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8 PASSED [0.0779s] [ 36%] 2025-12-04T13:20:27.8562858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128 PASSED [0.1049s] [ 36%] 2025-12-04T13:20:27.8563008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16 PASSED [0.0824s] [ 36%] 2025-12-04T13:20:27.8563158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8 PASSED [0.0775s] [ 36%] 2025-12-04T13:20:27.8563356Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128 PASSED [0.0986s] [ 36%] 2025-12-04T13:20:27.8563524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32 PASSED [0.0969s] [ 36%] 2025-12-04T13:20:27.8563673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32 PASSED [0.0887s] [ 36%] 2025-12-04T13:20:27.8563822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64 PASSED [0.0862s] [ 36%] 2025-12-04T13:20:27.8563971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16 PASSED [0.0776s] [ 36%] 2025-12-04T13:20:27.8564118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32 PASSED [0.0774s] [ 36%] 2025-12-04T13:20:27.8564272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16 PASSED [0.0995s] [ 36%] 2025-12-04T13:20:27.8564426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32 PASSED [0.1011s] [ 36%] 2025-12-04T13:20:27.8564578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16 PASSED [0.0893s] [ 36%] 2025-12-04T13:20:27.8564727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16 PASSED [0.0818s] [ 36%] 2025-12-04T13:20:27.8564896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8 PASSED [0.0779s] [ 36%] 2025-12-04T13:20:27.8565050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128 PASSED [0.1056s] [ 36%] 2025-12-04T13:20:27.8565202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32 PASSED [0.0657s] [ 36%] 2025-12-04T13:20:27.8565365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32 PASSED [0.0880s] [ 36%] 2025-12-04T13:20:27.8565513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32 PASSED [0.0559s] [ 36%] 2025-12-04T13:20:27.8565657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8 PASSED [0.0745s] [ 36%] 2025-12-04T13:20:27.8565804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8 PASSED [0.0746s] [ 36%] 2025-12-04T13:20:27.8565956Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32 PASSED [0.0995s] [ 36%] 2025-12-04T13:20:27.8566105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16 PASSED [0.0859s] [ 36%] 2025-12-04T13:20:27.8566234Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16 PASSED [0.0616s] [ 36%] 2025-12-04T13:20:27.8566361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32 PASSED [0.0588s] [ 36%] 2025-12-04T13:20:27.8566489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64 PASSED [0.0589s] [ 36%] 2025-12-04T13:20:27.8566623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16 PASSED [0.1168s] [ 36%] 2025-12-04T13:20:27.8566776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32 PASSED [0.1287s] [ 36%] 2025-12-04T13:20:27.8566907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16 PASSED [0.0881s] [ 36%] 2025-12-04T13:20:27.8567037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool PASSED [0.1086s] [ 36%] 2025-12-04T13:20:27.8567170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64 PASSED [0.0785s] [ 36%] 2025-12-04T13:20:27.8567301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64 PASSED [0.0883s] [ 36%] 2025-12-04T13:20:27.8567438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8 PASSED [0.3073s] [ 36%] 2025-12-04T13:20:27.8567576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16 PASSED [0.6381s] [ 36%] 2025-12-04T13:20:27.8567709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16 PASSED [0.0468s] [ 36%] 2025-12-04T13:20:27.8567841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32 PASSED [0.0324s] [ 36%] 2025-12-04T13:20:27.8567968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8 PASSED [0.0247s] [ 36%] 2025-12-04T13:20:27.8568108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bfloat16 PASSED [0.8388s] [ 36%] 2025-12-04T13:20:27.8568245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bool PASSED [0.7665s] [ 36%] 2025-12-04T13:20:27.8568390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex128 PASSED [0.7700s] [ 36%] 2025-12-04T13:20:27.8568530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_float32 PASSED [0.7688s] [ 36%] 2025-12-04T13:20:27.8568667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64 PASSED [0.7695s] [ 36%] 2025-12-04T13:20:27.8568806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool PASSED [0.0931s] [ 36%] 2025-12-04T13:20:27.8568931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64 PASSED [0.8491s] [ 36%] 2025-12-04T13:20:27.8569056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8 PASSED [0.0892s] [ 36%] 2025-12-04T13:20:27.8569195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool PASSED [0.0417s] [ 36%] 2025-12-04T13:20:27.8569327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32 PASSED [0.0445s] [ 36%] 2025-12-04T13:20:27.8569458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16 PASSED [0.0646s] [ 36%] 2025-12-04T13:20:27.8569587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16 PASSED [0.0434s] [ 36%] 2025-12-04T13:20:27.8569715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64 PASSED [0.0413s] [ 36%] 2025-12-04T13:20:27.8569842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8 PASSED [0.0423s] [ 36%] 2025-12-04T13:20:27.8569971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16 PASSED [0.0712s] [ 36%] 2025-12-04T13:20:27.8570097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16 PASSED [0.0718s] [ 36%] 2025-12-04T13:20:27.8570233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16 PASSED [0.0834s] [ 36%] 2025-12-04T13:20:27.8570364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64 PASSED [0.0415s] [ 36%] 2025-12-04T13:20:27.8570495Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8 PASSED [0.0396s] [ 36%] 2025-12-04T13:20:27.8570649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_bool PASSED [0.7873s] [ 36%] 2025-12-04T13:20:27.8570796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float64 PASSED [0.7795s] [ 36%] 2025-12-04T13:20:27.8570939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int16 PASSED [0.7777s] [ 36%] 2025-12-04T13:20:27.8571081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int8 PASSED [0.7745s] [ 36%] 2025-12-04T13:20:27.8571230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64 PASSED [0.7789s] [ 36%] 2025-12-04T13:20:27.8571389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16 PASSED [0.7695s] [ 36%] 2025-12-04T13:20:27.8571550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128 PASSED [0.7724s] [ 36%] 2025-12-04T13:20:27.8571707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8 PASSED [0.7634s] [ 36%] 2025-12-04T13:20:27.8571861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64 PASSED [0.7864s] [ 36%] 2025-12-04T13:20:27.8571998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32 PASSED [0.1214s] [ 36%] 2025-12-04T13:20:27.8572130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32 PASSED [0.0678s] [ 37%] 2025-12-04T13:20:27.8572259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16 PASSED [0.0769s] [ 37%] 2025-12-04T13:20:27.8572393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32 PASSED [0.1171s] [ 37%] 2025-12-04T13:20:27.8572542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64 PASSED [0.0793s] [ 37%] 2025-12-04T13:20:27.8572675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32 PASSED [0.0671s] [ 37%] 2025-12-04T13:20:27.8572803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32 PASSED [0.3848s] [ 37%] 2025-12-04T13:20:27.8572948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64 PASSED [0.0799s] [ 37%] 2025-12-04T13:20:27.8573080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64 PASSED [0.0683s] [ 37%] 2025-12-04T13:20:27.8573208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8 PASSED [0.0737s] [ 37%] 2025-12-04T13:20:27.8573375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool PASSED [0.0134s] [ 37%] 2025-12-04T13:20:27.8573520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64 PASSED [0.8022s] [ 37%] 2025-12-04T13:20:27.8573658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16 PASSED [0.7819s] [ 37%] 2025-12-04T13:20:27.8573795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32 PASSED [0.7795s] [ 37%] 2025-12-04T13:20:27.8573939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128 PASSED [0.7874s] [ 37%] 2025-12-04T13:20:27.8574081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16 PASSED [0.7727s] [ 37%] 2025-12-04T13:20:27.8574219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16 PASSED [0.7846s] [ 37%] 2025-12-04T13:20:27.8574353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8 PASSED [0.7755s] [ 37%] 2025-12-04T13:20:27.8574511Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16 PASSED [0.7898s] [ 37%] 2025-12-04T13:20:27.8574645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool PASSED [0.7854s] [ 37%] 2025-12-04T13:20:27.8574788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128 PASSED [0.7856s] [ 37%] 2025-12-04T13:20:27.8574926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32 PASSED [0.7845s] [ 37%] 2025-12-04T13:20:27.8575076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8 PASSED [0.7893s] [ 37%] 2025-12-04T13:20:27.8575213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16 PASSED [0.2979s] [ 37%] 2025-12-04T13:20:27.8575364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16 PASSED [0.2711s] [ 37%] 2025-12-04T13:20:27.8575512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8 PASSED [0.2679s] [ 37%] 2025-12-04T13:20:27.8575651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool PASSED [0.0793s] [ 37%] 2025-12-04T13:20:27.8575787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16 PASSED [0.2826s] [ 37%] 2025-12-04T13:20:27.8575939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16 PASSED [0.2741s] [ 37%] 2025-12-04T13:20:27.8576078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32 PASSED [0.2828s] [ 37%] 2025-12-04T13:20:27.8576222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex64 PASSED [0.1296s] [ 37%] 2025-12-04T13:20:27.8576373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32 PASSED [0.0117s] [ 37%] 2025-12-04T13:20:27.8576540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16 PASSED [0.0330s] [ 37%] 2025-12-04T13:20:27.8576680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool PASSED [0.0180s] [ 37%] 2025-12-04T13:20:27.8576833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16 XFAIL [0.0132s] [ 37%] 2025-12-04T13:20:27.8576969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32 XFAIL [1.4073s] [ 37%] 2025-12-04T13:20:27.8577103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8 XFAIL [1.3639s] [ 37%] 2025-12-04T13:20:27.8577238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128 PASSED [1.4160s] [ 37%] 2025-12-04T13:20:27.8577369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64 PASSED [0.0385s] [ 37%] 2025-12-04T13:20:27.8577502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16 PASSED [0.1053s] [ 37%] 2025-12-04T13:20:27.8577634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16 PASSED [0.1080s] [ 37%] 2025-12-04T13:20:27.8577762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8 PASSED [0.0533s] [ 37%] 2025-12-04T13:20:27.8577892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32 PASSED [0.0683s] [ 37%] 2025-12-04T13:20:27.8578021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8 PASSED [0.0678s] [ 37%] 2025-12-04T13:20:27.8578158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16 PASSED [0.7066s] [ 37%] 2025-12-04T13:20:27.8578308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32 PASSED [0.4893s] [ 37%] 2025-12-04T13:20:27.8578444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8 PASSED [0.4822s] [ 37%] 2025-12-04T13:20:27.8578582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16 PASSED [0.6974s] [ 37%] 2025-12-04T13:20:27.8578718Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64 PASSED [0.5071s] [ 37%] 2025-12-04T13:20:27.8578850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8 PASSED [1.9262s] [ 37%] 2025-12-04T13:20:27.8579000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16 PASSED [0.1344s] [ 37%] 2025-12-04T13:20:27.8579135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64 PASSED [0.1374s] [ 37%] 2025-12-04T13:20:27.8579267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8 PASSED [0.1260s] [ 37%] 2025-12-04T13:20:27.8579396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8 PASSED [0.1259s] [ 37%] 2025-12-04T13:20:27.8579534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool PASSED [0.0138s] [ 37%] 2025-12-04T13:20:27.8579668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128 PASSED [0.0890s] [ 37%] 2025-12-04T13:20:27.8579799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32 PASSED [0.0555s] [ 37%] 2025-12-04T13:20:27.8579944Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16 PASSED [0.0498s] [ 37%] 2025-12-04T13:20:27.8580085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16 PASSED [0.0410s] [ 37%] 2025-12-04T13:20:27.8580242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32 PASSED [0.2741s] [ 37%] 2025-12-04T13:20:27.8580386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8 PASSED [0.2506s] [ 37%] 2025-12-04T13:20:27.8580524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool PASSED [0.1072s] [ 37%] 2025-12-04T13:20:27.8580671Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32 PASSED [0.7155s] [ 37%] 2025-12-04T13:20:27.8580805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128 PASSED [0.0840s] [ 37%] 2025-12-04T13:20:27.8580935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64 PASSED [0.0829s] [ 37%] 2025-12-04T13:20:27.8581062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64 PASSED [1.4205s] [ 37%] 2025-12-04T13:20:27.8581189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8 PASSED [0.0867s] [ 37%] 2025-12-04T13:20:27.8581315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8 PASSED [0.0813s] [ 37%] 2025-12-04T13:20:27.8581445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16 PASSED [0.1126s] [ 38%] 2025-12-04T13:20:27.8581574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8 PASSED [0.0804s] [ 38%] 2025-12-04T13:20:27.8581721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bfloat16 PASSED [0.0716s] [ 38%] 2025-12-04T13:20:27.8581854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_uint8 PASSED [0.0788s] [ 38%] 2025-12-04T13:20:27.8581987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16 PASSED [1.3592s] [ 38%] 2025-12-04T13:20:27.8582139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128 PASSED [0.0334s] [ 38%] 2025-12-04T13:20:27.8582273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16 PASSED [1.3331s] [ 38%] 2025-12-04T13:20:27.8582406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16 PASSED [0.0361s] [ 38%] 2025-12-04T13:20:27.8582538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64 PASSED [0.0290s] [ 38%] 2025-12-04T13:20:27.8582684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_deg2rad_executor_aten_cuda_float16 PASSED [0.1303s] [ 38%] 2025-12-04T13:20:27.8582826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16 PASSED [0.2946s] [ 38%] 2025-12-04T13:20:27.8582968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128 PASSED [0.2772s] [ 38%] 2025-12-04T13:20:27.8583104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128 PASSED [1.3890s] [ 38%] 2025-12-04T13:20:27.8583231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16 PASSED [0.0527s] [ 38%] 2025-12-04T13:20:27.8583397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64 PASSED [0.0494s] [ 38%] 2025-12-04T13:20:27.8583548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128 PASSED [0.0641s] [ 38%] 2025-12-04T13:20:27.8583693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32 PASSED [0.0648s] [ 38%] 2025-12-04T13:20:27.8583833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16 PASSED [0.0617s] [ 38%] 2025-12-04T13:20:27.8583973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8 PASSED [0.0628s] [ 38%] 2025-12-04T13:20:27.8584144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64 PASSED [0.0666s] [ 38%] 2025-12-04T13:20:27.8584277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool PASSED [0.1037s] [ 38%] 2025-12-04T13:20:27.8584412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16 PASSED [0.5281s] [ 38%] 2025-12-04T13:20:27.8584562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64 PASSED [0.5937s] [ 38%] 2025-12-04T13:20:27.8584697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64 PASSED [0.0869s] [ 38%] 2025-12-04T13:20:27.8584848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16 PASSED [2.0918s] [ 38%] 2025-12-04T13:20:27.8585023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128 SKIPPED [0.0002s] (Skipped!) [ 38%] 2025-12-04T13:20:27.8585177Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32 PASSED [0.3224s] [ 38%] 2025-12-04T13:20:27.8585309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dot_executor_aten_cuda_complex64 PASSED [1.5537s] [ 38%] 2025-12-04T13:20:27.8585444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16 PASSED [1.3179s] [ 38%] 2025-12-04T13:20:27.8585582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128 PASSED [1.3182s] [ 38%] 2025-12-04T13:20:27.8585719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64 PASSED [1.3317s] [ 38%] 2025-12-04T13:20:27.8585852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8 PASSED [1.3158s] [ 38%] 2025-12-04T13:20:27.8586000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16 PASSED [1.3276s] [ 38%] 2025-12-04T13:20:27.8586172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool SKIPPED [0.0002s] (Can't check result for empty) [ 38%] 2025-12-04T13:20:27.8586343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16 SKIPPED [0.0001s] (Can't check result for empty) [ 38%] 2025-12-04T13:20:27.8586513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64 SKIPPED [0.0001s] (Can't check result for empty) [ 38%] 2025-12-04T13:20:27.8586713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16 SKIPPED [0.0001s] (Can't check result for empty_like) [ 38%] 2025-12-04T13:20:27.8586896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16 SKIPPED [0.0001s] (Can't check result for empty_like) [ 38%] 2025-12-04T13:20:27.8587095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_bool SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 38%] 2025-12-04T13:20:27.8587292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int16 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 38%] 2025-12-04T13:20:27.8587489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int8 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 38%] 2025-12-04T13:20:27.8587621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16 PASSED [0.4001s] [ 38%] 2025-12-04T13:20:27.8587749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16 PASSED [0.4051s] [ 38%] 2025-12-04T13:20:27.8587875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32 PASSED [0.2815s] [ 38%] 2025-12-04T13:20:27.8588013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16 PASSED [0.2713s] [ 38%] 2025-12-04T13:20:27.8588146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_bfloat16 PASSED [0.0337s] [ 38%] 2025-12-04T13:20:27.8588279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float16 PASSED [0.0334s] [ 38%] 2025-12-04T13:20:27.8588422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float64 PASSED [0.0298s] [ 38%] 2025-12-04T13:20:27.8588552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int32 PASSED [0.0297s] [ 38%] 2025-12-04T13:20:27.8588682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int64 PASSED [0.0298s] [ 38%] 2025-12-04T13:20:27.8588809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_int8 PASSED [0.0295s] [ 38%] 2025-12-04T13:20:27.8588938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32 PASSED [0.0773s] [ 38%] 2025-12-04T13:20:27.8589068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16 PASSED [0.4314s] [ 38%] 2025-12-04T13:20:27.8589196Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16 PASSED [0.0861s] [ 38%] 2025-12-04T13:20:27.8589323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8 PASSED [0.0805s] [ 38%] 2025-12-04T13:20:27.8589451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8 PASSED [0.0874s] [ 38%] 2025-12-04T13:20:27.8589584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64 PASSED [0.0778s] [ 38%] 2025-12-04T13:20:27.8589718Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_complex64 PASSED [0.0876s] [ 38%] 2025-12-04T13:20:27.8589860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16 PASSED [0.1137s] [ 38%] 2025-12-04T13:20:27.8589995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128 PASSED [0.0875s] [ 38%] 2025-12-04T13:20:27.8590124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64 PASSED [0.4633s] [ 38%] 2025-12-04T13:20:27.8590263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64 PASSED [1.3840s] [ 38%] 2025-12-04T13:20:27.8590398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16 PASSED [1.3302s] [ 38%] 2025-12-04T13:20:27.8590545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32 PASSED [1.3215s] [ 38%] 2025-12-04T13:20:27.8590686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_float16 PASSED [1.3564s] [ 38%] 2025-12-04T13:20:27.8590826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int8 PASSED [1.3287s] [ 38%] 2025-12-04T13:20:27.8590958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16 PASSED [1.3318s] [ 38%] 2025-12-04T13:20:27.8591087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16 PASSED [0.0842s] [ 39%] 2025-12-04T13:20:27.8591218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32 PASSED [0.0768s] [ 39%] 2025-12-04T13:20:27.8591359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_bfloat16 XFAIL [0.0163s] [ 39%] 2025-12-04T13:20:27.8591501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exponential_executor_aten_cuda_float16 XFAIL [1.3510s] [ 39%] 2025-12-04T13:20:27.8591632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64 PASSED [1.8728s] [ 39%] 2025-12-04T13:20:27.8591774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16 PASSED [0.4845s] [ 39%] 2025-12-04T13:20:27.8591902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64 PASSED [0.4852s] [ 39%] 2025-12-04T13:20:27.8592041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e4m3fnuz PASSED [0.4847s] [ 39%] 2025-12-04T13:20:27.8592190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128 PASSED [0.3535s] [ 39%] 2025-12-04T13:20:27.8592320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool PASSED [0.0330s] [ 39%] 2025-12-04T13:20:27.8592448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8 PASSED [0.0298s] [ 39%] 2025-12-04T13:20:27.8592586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128 PASSED [0.0321s] [ 39%] 2025-12-04T13:20:27.8592722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32 PASSED [1.4245s] [ 39%] 2025-12-04T13:20:27.8592855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16 PASSED [0.0411s] [ 39%] 2025-12-04T13:20:27.8593000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32 PASSED [0.0304s] [ 39%] 2025-12-04T13:20:27.8593144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64 PASSED [1.3749s] [ 39%] 2025-12-04T13:20:27.8593324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16 PASSED [1.3889s] [ 39%] 2025-12-04T13:20:27.8593465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128 PASSED [1.9674s] [ 39%] 2025-12-04T13:20:27.8593623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64 PASSED [0.0358s] [ 39%] 2025-12-04T13:20:27.8593759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64 PASSED [1.3711s] [ 39%] 2025-12-04T13:20:27.8593891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool PASSED [0.0342s] [ 39%] 2025-12-04T13:20:27.8594027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16 PASSED [0.0357s] [ 39%] 2025-12-04T13:20:27.8594166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32 PASSED [0.6536s] [ 39%] 2025-12-04T13:20:27.8594314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16 PASSED [0.0437s] [ 39%] 2025-12-04T13:20:27.8594450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64 PASSED [0.0335s] [ 39%] 2025-12-04T13:20:27.8594583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16 PASSED [0.0328s] [ 39%] 2025-12-04T13:20:27.8594718Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64 PASSED [0.0328s] [ 39%] 2025-12-04T13:20:27.8594850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8 PASSED [0.0336s] [ 39%] 2025-12-04T13:20:27.8594984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32 PASSED [0.0305s] [ 39%] 2025-12-04T13:20:27.8595116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8 PASSED [0.0349s] [ 39%] 2025-12-04T13:20:27.8595248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8 PASSED [0.0356s] [ 39%] 2025-12-04T13:20:27.8595384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16 PASSED [0.0476s] [ 39%] 2025-12-04T13:20:27.8595516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8 PASSED [0.0409s] [ 39%] 2025-12-04T13:20:27.8595675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool PASSED [0.0289s] [ 39%] 2025-12-04T13:20:27.8595815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16 PASSED [1.3813s] [ 39%] 2025-12-04T13:20:27.8595969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64 SKIPPED [0.0002s] (Skipped!) [ 39%] 2025-12-04T13:20:27.8596116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8 PASSED [0.0395s] [ 39%] 2025-12-04T13:20:27.8596250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8 PASSED [0.0464s] [ 39%] 2025-12-04T13:20:27.8596390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64 PASSED [0.0245s] [ 39%] 2025-12-04T13:20:27.8596527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8 PASSED [1.3669s] [ 39%] 2025-12-04T13:20:27.8596659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8 PASSED [1.3630s] [ 39%] 2025-12-04T13:20:27.8596800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32 PASSED [0.0475s] [ 39%] 2025-12-04T13:20:27.8596938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64 PASSED [0.0311s] [ 39%] 2025-12-04T13:20:27.8597073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8 PASSED [0.0307s] [ 39%] 2025-12-04T13:20:27.8597210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64 PASSED [0.1973s] [ 39%] 2025-12-04T13:20:27.8597341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8 PASSED [1.3699s] [ 39%] 2025-12-04T13:20:27.8597487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16 PASSED [1.3721s] [ 39%] 2025-12-04T13:20:27.8597619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64 PASSED [1.3768s] [ 39%] 2025-12-04T13:20:27.8597753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8 PASSED [0.0421s] [ 39%] 2025-12-04T13:20:27.8597888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128 PASSED [0.0888s] [ 39%] 2025-12-04T13:20:27.8598023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32 PASSED [0.0883s] [ 39%] 2025-12-04T13:20:27.8598173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64 PASSED [0.0956s] [ 39%] 2025-12-04T13:20:27.8598309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16 PASSED [0.0945s] [ 39%] 2025-12-04T13:20:27.8598443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16 PASSED [0.0906s] [ 39%] 2025-12-04T13:20:27.8598577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32 PASSED [0.0211s] [ 39%] 2025-12-04T13:20:27.8598707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool PASSED [0.0064s] [ 39%] 2025-12-04T13:20:27.8598843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64 PASSED [0.0068s] [ 39%] 2025-12-04T13:20:27.8598972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8 PASSED [0.0069s] [ 39%] 2025-12-04T13:20:27.8599106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8 PASSED [0.0064s] [ 39%] 2025-12-04T13:20:27.8599245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128 PASSED [0.0068s] [ 39%] 2025-12-04T13:20:27.8599388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32 PASSED [0.0064s] [ 39%] 2025-12-04T13:20:27.8599521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8 PASSED [0.0061s] [ 39%] 2025-12-04T13:20:27.8599661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8 PASSED [0.3038s] [ 39%] 2025-12-04T13:20:27.8599807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16 PASSED [0.1034s] [ 39%] 2025-12-04T13:20:27.8599935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8 PASSED [0.0535s] [ 39%] 2025-12-04T13:20:27.8600067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32 PASSED [0.2659s] [ 39%] 2025-12-04T13:20:27.8600197Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64 PASSED [0.2826s] [ 39%] 2025-12-04T13:20:27.8600327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool PASSED [0.2570s] [ 40%] 2025-12-04T13:20:27.8600457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16 PASSED [0.4270s] [ 40%] 2025-12-04T13:20:27.8600584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8 PASSED [0.2442s] [ 40%] 2025-12-04T13:20:27.8600712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16 PASSED [0.2547s] [ 40%] 2025-12-04T13:20:27.8600839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64 PASSED [0.2812s] [ 40%] 2025-12-04T13:20:27.8601031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int64 SKIPPED [0.0002s] (Expected: geometric is not comparable) [ 40%] 2025-12-04T13:20:27.8601232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_int8 SKIPPED [0.0001s] (Expected: geometric is not comparable) [ 40%] 2025-12-04T13:20:27.8601423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_geometric_executor_aten_cuda_uint8 SKIPPED [0.0001s] (Expected: geometric is not comparable) [ 40%] 2025-12-04T13:20:27.8601562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64 PASSED [0.8098s] [ 40%] 2025-12-04T13:20:27.8601699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32 PASSED [0.8330s] [ 40%] 2025-12-04T13:20:27.8601835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64 PASSED [0.6710s] [ 40%] 2025-12-04T13:20:27.8601980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16 PASSED [0.0109s] [ 40%] 2025-12-04T13:20:27.8602118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32 PASSED [1.5160s] [ 40%] 2025-12-04T13:20:27.8602254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32 PASSED [1.3938s] [ 40%] 2025-12-04T13:20:27.8602384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8 PASSED [1.4012s] [ 40%] 2025-12-04T13:20:27.8602519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16 PASSED [1.4090s] [ 40%] 2025-12-04T13:20:27.8602652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64 PASSED [1.4179s] [ 40%] 2025-12-04T13:20:27.8602784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16 PASSED [1.4253s] [ 40%] 2025-12-04T13:20:27.8602915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32 PASSED [1.4290s] [ 40%] 2025-12-04T13:20:27.8603048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16 PASSED [0.4494s] [ 40%] 2025-12-04T13:20:27.8603188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16 PASSED [0.6547s] [ 40%] 2025-12-04T13:20:27.8603349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16 PASSED [0.6262s] [ 40%] 2025-12-04T13:20:27.8603474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8 PASSED [0.0768s] [ 40%] 2025-12-04T13:20:27.8603608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64 PASSED [0.2913s] [ 40%] 2025-12-04T13:20:27.8603756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128 PASSED [0.0925s] [ 40%] 2025-12-04T13:20:27.8603890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32 PASSED [0.0921s] [ 40%] 2025-12-04T13:20:27.8604022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64 PASSED [0.0916s] [ 40%] 2025-12-04T13:20:27.8604162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64 PASSED [0.0344s] [ 40%] 2025-12-04T13:20:27.8604304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16 PASSED [0.0135s] [ 40%] 2025-12-04T13:20:27.8604447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128 PASSED [0.0120s] [ 40%] 2025-12-04T13:20:27.8604590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64 PASSED [0.0119s] [ 40%] 2025-12-04T13:20:27.8604724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8 PASSED [0.0121s] [ 40%] 2025-12-04T13:20:27.8604869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128 PASSED [0.0275s] [ 40%] 2025-12-04T13:20:27.8605006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16 PASSED [0.0271s] [ 40%] 2025-12-04T13:20:27.8605154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8 PASSED [1.4620s] [ 40%] 2025-12-04T13:20:27.8605291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8 PASSED [0.0292s] [ 40%] 2025-12-04T13:20:27.8605434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16 PASSED [0.0112s] [ 40%] 2025-12-04T13:20:27.8605580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64 PASSED [1.4079s] [ 40%] 2025-12-04T13:20:27.8605737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16 PASSED [1.4106s] [ 40%] 2025-12-04T13:20:27.8605878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32 PASSED [1.4140s] [ 40%] 2025-12-04T13:20:27.8606017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16 PASSED [1.4006s] [ 40%] 2025-12-04T13:20:27.8606158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64 PASSED [1.4165s] [ 40%] 2025-12-04T13:20:27.8606296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8 PASSED [1.4196s] [ 40%] 2025-12-04T13:20:27.8606434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64 PASSED [1.0501s] [ 40%] 2025-12-04T13:20:27.8606564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool PASSED [0.1035s] [ 40%] 2025-12-04T13:20:27.8606700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32 PASSED [0.1246s] [ 40%] 2025-12-04T13:20:27.8606830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16 PASSED [0.0871s] [ 40%] 2025-12-04T13:20:27.8606960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32 PASSED [0.0898s] [ 40%] 2025-12-04T13:20:27.8607105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8 PASSED [0.0803s] [ 40%] 2025-12-04T13:20:27.8607239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16 PASSED [0.0872s] [ 40%] 2025-12-04T13:20:27.8607367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool PASSED [0.0718s] [ 40%] 2025-12-04T13:20:27.8607521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16 PASSED [0.1109s] [ 40%] 2025-12-04T13:20:27.8607656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool PASSED [0.1074s] [ 40%] 2025-12-04T13:20:27.8607792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16 PASSED [0.1112s] [ 40%] 2025-12-04T13:20:27.8607926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8 PASSED [0.0776s] [ 40%] 2025-12-04T13:20:27.8608066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16 PASSED [0.1097s] [ 40%] 2025-12-04T13:20:27.8608200Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool PASSED [0.1024s] [ 40%] 2025-12-04T13:20:27.8608335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32 PASSED [0.0889s] [ 40%] 2025-12-04T13:20:27.8608470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16 PASSED [0.0846s] [ 40%] 2025-12-04T13:20:27.8608604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64 PASSED [0.0832s] [ 40%] 2025-12-04T13:20:27.8608739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8 PASSED [0.0793s] [ 40%] 2025-12-04T13:20:27.8608868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool PASSED [0.1124s] [ 40%] 2025-12-04T13:20:27.8609020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128 PASSED [0.1320s] [ 40%] 2025-12-04T13:20:27.8609152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16 PASSED [0.0928s] [ 40%] 2025-12-04T13:20:27.8609392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex128 SKIPPED [0.0002s] (Expected: unfold_backward() got an unexpected keyword argument 'input_sizes') [ 40%] 2025-12-04T13:20:27.8609526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex64 PASSED [0.0131s] [ 40%] 2025-12-04T13:20:27.8609669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_float16 PASSED [0.0108s] [ 40%] 2025-12-04T13:20:27.8609796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16 PASSED [0.2746s] [ 41%] 2025-12-04T13:20:27.8609928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64 PASSED [0.1396s] [ 41%] 2025-12-04T13:20:27.8610062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32 PASSED [0.0774s] [ 41%] 2025-12-04T13:20:27.8610194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64 PASSED [0.3126s] [ 41%] 2025-12-04T13:20:27.8610340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex64 PASSED [0.0334s] [ 41%] 2025-12-04T13:20:27.8610482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_float16 PASSED [1.5236s] [ 41%] 2025-12-04T13:20:27.8610622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int16 PASSED [0.0361s] [ 41%] 2025-12-04T13:20:27.8610768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float32 PASSED [0.0325s] [ 41%] 2025-12-04T13:20:27.8610931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16 PASSED [0.2741s] [ 41%] 2025-12-04T13:20:27.8611075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128 PASSED [0.5169s] [ 41%] 2025-12-04T13:20:27.8611215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16 PASSED [0.4407s] [ 41%] 2025-12-04T13:20:27.8611367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64 PASSED [0.7901s] [ 41%] 2025-12-04T13:20:27.8611507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32 PASSED [0.7591s] [ 41%] 2025-12-04T13:20:27.8611652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float64 PASSED [1.5858s] [ 41%] 2025-12-04T13:20:27.8611804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16 PASSED [0.8225s] [ 41%] 2025-12-04T13:20:27.8611946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128 PASSED [0.2280s] [ 41%] 2025-12-04T13:20:27.8612084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64 PASSED [0.2265s] [ 41%] 2025-12-04T13:20:27.8612218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32 XFAIL [0.0066s] [ 41%] 2025-12-04T13:20:27.8612381Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_complex128 PASSED [2.6195s] [ 41%] 2025-12-04T13:20:27.8612520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128 PASSED [0.0899s] [ 41%] 2025-12-04T13:20:27.8612652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64 PASSED [0.0788s] [ 41%] 2025-12-04T13:20:27.8612796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64 PASSED [0.0876s] [ 41%] 2025-12-04T13:20:27.8612928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64 PASSED [0.0773s] [ 41%] 2025-12-04T13:20:27.8613061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64 PASSED [0.5392s] [ 41%] 2025-12-04T13:20:27.8613186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool PASSED [0.1046s] [ 41%] 2025-12-04T13:20:27.8613349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32 PASSED [0.0869s] [ 41%] 2025-12-04T13:20:27.8613556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float16 SKIPPED [0.0002s] (Expected: log_normal is not comparable) [ 41%] 2025-12-04T13:20:27.8613713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [0.0802s] [ 41%] 2025-12-04T13:20:27.8613871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [0.0723s] [ 41%] 2025-12-04T13:20:27.8614023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [0.0719s] [ 41%] 2025-12-04T13:20:27.8614175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [0.0715s] [ 41%] 2025-12-04T13:20:27.8614326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [0.0714s] [ 41%] 2025-12-04T13:20:27.8614468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float64 PASSED [1.5066s] [ 41%] 2025-12-04T13:20:27.8614608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_bfloat16 PASSED [1.3532s] [ 41%] 2025-12-04T13:20:27.8614747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float32 PASSED [1.1747s] [ 41%] 2025-12-04T13:20:27.8614898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool PASSED [0.2695s] [ 41%] 2025-12-04T13:20:27.8615035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16 PASSED [0.4084s] [ 41%] 2025-12-04T13:20:27.8615170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8 PASSED [0.3961s] [ 41%] 2025-12-04T13:20:27.8615330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128 PASSED [0.1115s] [ 41%] 2025-12-04T13:20:27.8615471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32 PASSED [0.1067s] [ 41%] 2025-12-04T13:20:27.8615614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64 PASSED [0.4432s] [ 41%] 2025-12-04T13:20:27.8615752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool PASSED [0.2690s] [ 41%] 2025-12-04T13:20:27.8615890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64 PASSED [2.3588s] [ 41%] 2025-12-04T13:20:27.8616057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_complex128 PASSED [14.1291s] [ 41%] 2025-12-04T13:20:27.8616213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_tensor_overload_executor_aten_cuda_int8 PASSED [4.2668s] [ 41%] 2025-12-04T13:20:27.8616352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32 PASSED [0.1408s] [ 41%] 2025-12-04T13:20:27.8616488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8 PASSED [0.0639s] [ 41%] 2025-12-04T13:20:27.8616626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32 PASSED [0.0295s] [ 41%] 2025-12-04T13:20:27.8616783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64 PASSED [0.0286s] [ 41%] 2025-12-04T13:20:27.8616921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8 PASSED [0.0291s] [ 41%] 2025-12-04T13:20:27.8617054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool PASSED [0.2439s] [ 41%] 2025-12-04T13:20:27.8617190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128 PASSED [0.0562s] [ 41%] 2025-12-04T13:20:27.8617333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64 PASSED [0.0551s] [ 41%] 2025-12-04T13:20:27.8617494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128 PASSED [0.0400s] [ 41%] 2025-12-04T13:20:27.8617653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16 PASSED [0.0389s] [ 41%] 2025-12-04T13:20:27.8617810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64 PASSED [0.0387s] [ 41%] 2025-12-04T13:20:27.8617964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64 PASSED [0.0372s] [ 41%] 2025-12-04T13:20:27.8618116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8 PASSED [1.7256s] [ 41%] 2025-12-04T13:20:27.8618280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64 PASSED [0.0434s] [ 41%] 2025-12-04T13:20:27.8618441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64 PASSED [0.0369s] [ 41%] 2025-12-04T13:20:27.8618598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16 PASSED [0.0366s] [ 41%] 2025-12-04T13:20:27.8618764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32 PASSED [0.0374s] [ 41%] 2025-12-04T13:20:27.8618903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16 PASSED [0.4300s] [ 41%] 2025-12-04T13:20:27.8619039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16 PASSED [0.4291s] [ 41%] 2025-12-04T13:20:27.8619184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8 PASSED [0.2457s] [ 41%] 2025-12-04T13:20:27.8619322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16 PASSED [1.4096s] [ 41%] 2025-12-04T13:20:27.8619460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32 PASSED [1.3778s] [ 41%] 2025-12-04T13:20:27.8619593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64 PASSED [1.3907s] [ 42%] 2025-12-04T13:20:27.8619726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8 PASSED [1.3751s] [ 42%] 2025-12-04T13:20:27.8619858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128 PASSED [0.3336s] [ 42%] 2025-12-04T13:20:27.8619994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32 PASSED [0.1824s] [ 42%] 2025-12-04T13:20:27.8620130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8 PASSED [0.0630s] [ 42%] 2025-12-04T13:20:27.8620273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16 PASSED [0.0610s] [ 42%] 2025-12-04T13:20:27.8620414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16 PASSED [0.0598s] [ 42%] 2025-12-04T13:20:27.8620568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32 PASSED [0.0605s] [ 42%] 2025-12-04T13:20:27.8620708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32 PASSED [0.0598s] [ 42%] 2025-12-04T13:20:27.8620845Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64 PASSED [0.0596s] [ 42%] 2025-12-04T13:20:27.8620981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8 PASSED [0.0587s] [ 42%] 2025-12-04T13:20:27.8621115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool PASSED [0.1057s] [ 42%] 2025-12-04T13:20:27.8621266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128 PASSED [0.1102s] [ 42%] 2025-12-04T13:20:27.8621402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32 PASSED [0.1121s] [ 42%] 2025-12-04T13:20:27.8621536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16 PASSED [0.1057s] [ 42%] 2025-12-04T13:20:27.8621687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16 PASSED [0.2105s] [ 42%] 2025-12-04T13:20:27.8621820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64 PASSED [0.2893s] [ 42%] 2025-12-04T13:20:27.8621946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64 PASSED [0.2821s] [ 42%] 2025-12-04T13:20:27.8622077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8 PASSED [0.0546s] [ 42%] 2025-12-04T13:20:27.8622268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128 SKIPPED [0.0002s] (Can't check result for new_empty) [ 42%] 2025-12-04T13:20:27.8622454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32 SKIPPED [0.0002s] (Can't check result for new_empty) [ 42%] 2025-12-04T13:20:27.8622669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64 SKIPPED [0.0001s] (Can't check result for new_empty) [ 42%] 2025-12-04T13:20:27.8622850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32 SKIPPED [0.0001s] (Can't check result for new_empty) [ 42%] 2025-12-04T13:20:27.8623031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8 SKIPPED [0.0001s] (Can't check result for new_empty) [ 42%] 2025-12-04T13:20:27.8623293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 42%] 2025-12-04T13:20:27.8623434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16 PASSED [0.0243s] [ 42%] 2025-12-04T13:20:27.8623566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8 PASSED [0.0226s] [ 42%] 2025-12-04T13:20:27.8623709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128 PASSED [0.0238s] [ 42%] 2025-12-04T13:20:27.8623847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32 PASSED [0.0243s] [ 42%] 2025-12-04T13:20:27.8623983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64 PASSED [0.0223s] [ 42%] 2025-12-04T13:20:27.8624120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64 PASSED [0.0243s] [ 42%] 2025-12-04T13:20:27.8624251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8 PASSED [0.0219s] [ 42%] 2025-12-04T13:20:27.8624393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64 PASSED [0.0226s] [ 42%] 2025-12-04T13:20:27.8624548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16 PASSED [0.2772s] [ 42%] 2025-12-04T13:20:27.8624702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16 PASSED [0.2485s] [ 42%] 2025-12-04T13:20:27.8624854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16 PASSED [0.2555s] [ 42%] 2025-12-04T13:20:27.8625021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_float64 PASSED [0.0161s] [ 42%] 2025-12-04T13:20:27.8625197Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int32 PASSED [1.4207s] [ 42%] 2025-12-04T13:20:27.8625364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_uint8 PASSED [1.3681s] [ 42%] 2025-12-04T13:20:27.8625515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64 PASSED [0.0660s] [ 42%] 2025-12-04T13:20:27.8625678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64 PASSED [0.5239s] [ 42%] 2025-12-04T13:20:27.8625836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16 PASSED [0.2115s] [ 42%] 2025-12-04T13:20:27.8625997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64 PASSED [0.1324s] [ 42%] 2025-12-04T13:20:27.8626155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16 PASSED [0.2906s] [ 42%] 2025-12-04T13:20:27.8626310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16 PASSED [0.2948s] [ 42%] 2025-12-04T13:20:27.8626463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32 PASSED [0.2295s] [ 42%] 2025-12-04T13:20:27.8626633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16 PASSED [0.0782s] [ 42%] 2025-12-04T13:20:27.8626789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32 PASSED [0.0631s] [ 42%] 2025-12-04T13:20:27.8626942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128 PASSED [0.0332s] [ 42%] 2025-12-04T13:20:27.8627107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32 PASSED [0.0241s] [ 42%] 2025-12-04T13:20:27.8627261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64 PASSED [0.0246s] [ 42%] 2025-12-04T13:20:27.8627417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16 PASSED [0.0708s] [ 42%] 2025-12-04T13:20:27.8627592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [0.0689s] [ 42%] 2025-12-04T13:20:27.8627765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [0.0679s] [ 42%] 2025-12-04T13:20:27.8627932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [0.0680s] [ 42%] 2025-12-04T13:20:27.8628101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16 PASSED [0.4425s] [ 42%] 2025-12-04T13:20:27.8628269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16 PASSED [0.2016s] [ 42%] 2025-12-04T13:20:27.8628433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8 PASSED [0.0404s] [ 42%] 2025-12-04T13:20:27.8628607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_complex64 PASSED [0.0253s] [ 42%] 2025-12-04T13:20:27.8628768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_float64 PASSED [0.0248s] [ 42%] 2025-12-04T13:20:27.8628934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_bfloat16 PASSED [0.0233s] [ 42%] 2025-12-04T13:20:27.8629097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_int32 PASSED [0.0237s] [ 42%] 2025-12-04T13:20:27.8629270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16 PASSED [0.6316s] [ 42%] 2025-12-04T13:20:27.8629422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16 PASSED [0.5832s] [ 42%] 2025-12-04T13:20:27.8629574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32 PASSED [0.2462s] [ 42%] 2025-12-04T13:20:27.8629722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64 PASSED [0.2141s] [ 42%] 2025-12-04T13:20:27.8629874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8 PASSED [0.2011s] [ 42%] 2025-12-04T13:20:27.8630023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16 PASSED [0.1051s] [ 43%] 2025-12-04T13:20:27.8630174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64 PASSED [0.2026s] [ 43%] 2025-12-04T13:20:27.8630336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float64 PASSED [0.0552s] [ 43%] 2025-12-04T13:20:27.8630508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [0.0486s] [ 43%] 2025-12-04T13:20:27.8630689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32 PASSED [0.0475s] [ 43%] 2025-12-04T13:20:27.8630853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16 PASSED [0.0478s] [ 43%] 2025-12-04T13:20:27.8631028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64 PASSED [0.0482s] [ 43%] 2025-12-04T13:20:27.8631192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [0.0490s] [ 43%] 2025-12-04T13:20:27.8631363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64 PASSED [0.0518s] [ 43%] 2025-12-04T13:20:27.8631530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64 PASSED [0.0471s] [ 43%] 2025-12-04T13:20:27.8631693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16 PASSED [0.0522s] [ 43%] 2025-12-04T13:20:27.8631856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64 PASSED [0.0510s] [ 43%] 2025-12-04T13:20:27.8632020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8 PASSED [0.0503s] [ 43%] 2025-12-04T13:20:27.8632179Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64 PASSED [0.1998s] [ 43%] 2025-12-04T13:20:27.8632334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32 PASSED [0.1223s] [ 43%] 2025-12-04T13:20:27.8632513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64 PASSED [0.0911s] [ 43%] 2025-12-04T13:20:27.8632678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16 PASSED [0.1097s] [ 43%] 2025-12-04T13:20:27.8632813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16 PASSED [0.2157s] [ 43%] 2025-12-04T13:20:27.8633003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float16 SKIPPED [0.0001s] (make_traced() doesn't set seed properly!) [ 43%] 2025-12-04T13:20:27.8633206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_executor_aten_cuda_float32 SKIPPED [0.0001s] (make_traced() doesn't set seed properly!) [ 43%] 2025-12-04T13:20:27.8633365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32 PASSED [0.0070s] [ 43%] 2025-12-04T13:20:27.8633512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bfloat16 PASSED [0.1649s] [ 43%] 2025-12-04T13:20:27.8633658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_complex64 PASSED [0.1665s] [ 43%] 2025-12-04T13:20:27.8633804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_float64 PASSED [0.1643s] [ 43%] 2025-12-04T13:20:27.8633945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int16 PASSED [0.1588s] [ 43%] 2025-12-04T13:20:27.8634085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_int64 PASSED [0.1592s] [ 43%] 2025-12-04T13:20:27.8634224Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32 PASSED [0.1365s] [ 43%] 2025-12-04T13:20:27.8634362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32 PASSED [0.1343s] [ 43%] 2025-12-04T13:20:27.8634497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32 PASSED [0.1282s] [ 43%] 2025-12-04T13:20:27.8634645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64 PASSED [0.1210s] [ 43%] 2025-12-04T13:20:27.8634778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8 PASSED [0.1282s] [ 43%] 2025-12-04T13:20:27.8634918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128 PASSED [0.0652s] [ 43%] 2025-12-04T13:20:27.8635069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16 PASSED [0.0523s] [ 43%] 2025-12-04T13:20:27.8635208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32 PASSED [0.0548s] [ 43%] 2025-12-04T13:20:27.8635342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128 PASSED [0.3254s] [ 43%] 2025-12-04T13:20:27.8635473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64 PASSED [0.3145s] [ 43%] 2025-12-04T13:20:27.8635602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64 PASSED [0.2794s] [ 43%] 2025-12-04T13:20:27.8635728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8 PASSED [0.2739s] [ 43%] 2025-12-04T13:20:27.8635860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64 PASSED [0.0861s] [ 43%] 2025-12-04T13:20:27.8635992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32 PASSED [0.0097s] [ 43%] 2025-12-04T13:20:27.8636123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64 PASSED [0.0488s] [ 43%] 2025-12-04T13:20:27.8636251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8 PASSED [0.0461s] [ 43%] 2025-12-04T13:20:27.8636405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32 PASSED [0.0776s] [ 43%] 2025-12-04T13:20:27.8636547Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16 PASSED [0.0884s] [ 43%] 2025-12-04T13:20:27.8636684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8 PASSED [0.0831s] [ 43%] 2025-12-04T13:20:27.8636824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16 PASSED [0.4784s] [ 43%] 2025-12-04T13:20:27.8636960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32 PASSED [0.2871s] [ 43%] 2025-12-04T13:20:27.8637118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64 PASSED [0.2860s] [ 43%] 2025-12-04T13:20:27.8637254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8 PASSED [2.0801s] [ 43%] 2025-12-04T13:20:27.8637388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float16 PASSED [0.0514s] [ 43%] 2025-12-04T13:20:27.8637521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_renorm_executor_aten_cuda_float32 PASSED [0.0348s] [ 43%] 2025-12-04T13:20:27.8637657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16 PASSED [0.1360s] [ 43%] 2025-12-04T13:20:27.8637794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128 PASSED [0.1363s] [ 43%] 2025-12-04T13:20:27.8637932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64 PASSED [0.1334s] [ 43%] 2025-12-04T13:20:27.8638066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32 PASSED [0.1336s] [ 43%] 2025-12-04T13:20:27.8638208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16 PASSED [0.0924s] [ 43%] 2025-12-04T13:20:27.8638355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool PASSED [0.0887s] [ 43%] 2025-12-04T13:20:27.8638499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64 PASSED [0.0909s] [ 43%] 2025-12-04T13:20:27.8638637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32 PASSED [0.0888s] [ 43%] 2025-12-04T13:20:27.8638786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64 PASSED [0.0887s] [ 43%] 2025-12-04T13:20:27.8638924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16 PASSED [0.1089s] [ 43%] 2025-12-04T13:20:27.8639058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16 PASSED [0.1053s] [ 43%] 2025-12-04T13:20:27.8639187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64 PASSED [1.5248s] [ 43%] 2025-12-04T13:20:27.8639317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool PASSED [0.0752s] [ 43%] 2025-12-04T13:20:27.8639446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32 PASSED [0.0727s] [ 43%] 2025-12-04T13:20:27.8639574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64 PASSED [0.0726s] [ 43%] 2025-12-04T13:20:27.8639709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16 PASSED [0.1037s] [ 43%] 2025-12-04T13:20:27.8639838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32 PASSED [0.0567s] [ 44%] 2025-12-04T13:20:27.8639971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32 PASSED [0.0741s] [ 44%] 2025-12-04T13:20:27.8640099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64 PASSED [0.0849s] [ 44%] 2025-12-04T13:20:27.8640240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16 PASSED [0.2925s] [ 44%] 2025-12-04T13:20:27.8640388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_bfloat16 PASSED [0.0316s] [ 44%] 2025-12-04T13:20:27.8640514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool PASSED [0.0946s] [ 44%] 2025-12-04T13:20:27.8640648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128 PASSED [0.1720s] [ 44%] 2025-12-04T13:20:27.8640773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16 PASSED [1.5187s] [ 44%] 2025-12-04T13:20:27.8640910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8 PASSED [0.0796s] [ 44%] 2025-12-04T13:20:27.8641043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32 PASSED [0.2046s] [ 44%] 2025-12-04T13:20:27.8641180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16 PASSED [0.1040s] [ 44%] 2025-12-04T13:20:27.8641318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16 PASSED [0.0841s] [ 44%] 2025-12-04T13:20:27.8641452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64 PASSED [0.0567s] [ 44%] 2025-12-04T13:20:27.8641585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128 PASSED [0.2886s] [ 44%] 2025-12-04T13:20:27.8641712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64 PASSED [0.0781s] [ 44%] 2025-12-04T13:20:27.8641838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8 PASSED [0.0721s] [ 44%] 2025-12-04T13:20:27.8641966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool PASSED [0.2743s] [ 44%] 2025-12-04T13:20:27.8642164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32 PASSED [0.2241s] [ 44%] 2025-12-04T13:20:27.8642306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64 PASSED [0.2238s] [ 44%] 2025-12-04T13:20:27.8642431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool PASSED [0.0953s] [ 44%] 2025-12-04T13:20:27.8642564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16 PASSED [0.1038s] [ 44%] 2025-12-04T13:20:27.8642706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64 PASSED [0.0677s] [ 44%] 2025-12-04T13:20:27.8642833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64 PASSED [0.0772s] [ 44%] 2025-12-04T13:20:27.8642989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [0.0457s] [ 44%] 2025-12-04T13:20:27.8643138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32 PASSED [0.0510s] [ 44%] 2025-12-04T13:20:27.8643333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64 PASSED [0.0469s] [ 44%] 2025-12-04T13:20:27.8643480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8 PASSED [0.0453s] [ 44%] 2025-12-04T13:20:27.8643625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool PASSED [0.1131s] [ 44%] 2025-12-04T13:20:27.8643772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32 PASSED [0.0752s] [ 44%] 2025-12-04T13:20:27.8643919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32 PASSED [0.0852s] [ 44%] 2025-12-04T13:20:27.8644063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32 PASSED [0.3077s] [ 44%] 2025-12-04T13:20:27.8644222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16 PASSED [0.2779s] [ 44%] 2025-12-04T13:20:27.8644360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool PASSED [0.2858s] [ 44%] 2025-12-04T13:20:27.8644500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16 PASSED [0.2303s] [ 44%] 2025-12-04T13:20:27.8644646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32 PASSED [1.7551s] [ 44%] 2025-12-04T13:20:27.8644803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16 PASSED [0.0875s] [ 44%] 2025-12-04T13:20:27.8644946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64 PASSED [0.0851s] [ 44%] 2025-12-04T13:20:27.8645085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16 PASSED [0.4576s] [ 44%] 2025-12-04T13:20:27.8645225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32 PASSED [0.0751s] [ 44%] 2025-12-04T13:20:27.8645362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32 PASSED [0.0718s] [ 44%] 2025-12-04T13:20:27.8645499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8 PASSED [0.0769s] [ 44%] 2025-12-04T13:20:27.8645643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bfloat16 PASSED [0.6134s] [ 44%] 2025-12-04T13:20:27.8645784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64 PASSED [0.0714s] [ 44%] 2025-12-04T13:20:27.8647267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8 PASSED [0.0762s] [ 44%] 2025-12-04T13:20:27.8647443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [0.0690s] [ 44%] 2025-12-04T13:20:27.8647632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [0.0688s] [ 44%] 2025-12-04T13:20:27.8647799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [0.0674s] [ 44%] 2025-12-04T13:20:27.8647980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [0.0677s] [ 44%] 2025-12-04T13:20:27.8648149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [0.0637s] [ 44%] 2025-12-04T13:20:27.8648293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8 PASSED [0.2086s] [ 44%] 2025-12-04T13:20:27.8648462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32 PASSED [0.4080s] [ 44%] 2025-12-04T13:20:27.8648632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32 PASSED [0.4261s] [ 44%] 2025-12-04T13:20:27.8648797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64 PASSED [0.4095s] [ 44%] 2025-12-04T13:20:27.8648971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_bfloat16 PASSED [2.3854s] [ 44%] 2025-12-04T13:20:27.8649140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64 PASSED [0.4423s] [ 44%] 2025-12-04T13:20:27.8649310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64 PASSED [0.4099s] [ 44%] 2025-12-04T13:20:27.8649491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8 PASSED [0.4009s] [ 44%] 2025-12-04T13:20:27.8649639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16 PASSED [1.6864s] [ 44%] 2025-12-04T13:20:27.8649782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32 PASSED [0.1506s] [ 44%] 2025-12-04T13:20:27.8649924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16 PASSED [0.1504s] [ 44%] 2025-12-04T13:20:27.8650064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8 PASSED [1.5383s] [ 44%] 2025-12-04T13:20:27.8650243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [0.0540s] [ 44%] 2025-12-04T13:20:27.8650401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool PASSED [0.0559s] [ 44%] 2025-12-04T13:20:27.8650564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64 PASSED [0.5458s] [ 44%] 2025-12-04T13:20:27.8650725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32 PASSED [0.0877s] [ 44%] 2025-12-04T13:20:27.8650870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8 PASSED [0.7856s] [ 44%] 2025-12-04T13:20:27.8651024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex128 PASSED [0.0183s] [ 44%] 2025-12-04T13:20:27.8651174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex64 PASSED [0.0167s] [ 45%] 2025-12-04T13:20:27.8651309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16 PASSED [0.1087s] [ 45%] 2025-12-04T13:20:27.8651449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool PASSED [0.1025s] [ 45%] 2025-12-04T13:20:27.8651584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32 PASSED [0.3562s] [ 45%] 2025-12-04T13:20:27.8651712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32 PASSED [0.0791s] [ 45%] 2025-12-04T13:20:27.8651842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64 PASSED [0.0793s] [ 45%] 2025-12-04T13:20:27.8651992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128 PASSED [0.1088s] [ 45%] 2025-12-04T13:20:27.8652149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 45%] 2025-12-04T13:20:27.8652285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16 PASSED [0.1353s] [ 45%] 2025-12-04T13:20:27.8652423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32 PASSED [0.0946s] [ 45%] 2025-12-04T13:20:27.8652558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16 PASSED [0.0818s] [ 45%] 2025-12-04T13:20:27.8652701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_float16 PASSED [0.0244s] [ 45%] 2025-12-04T13:20:27.8652841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16 PASSED [0.0188s] [ 45%] 2025-12-04T13:20:27.8652973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool PASSED [0.0174s] [ 45%] 2025-12-04T13:20:27.8653115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64 PASSED [0.0193s] [ 45%] 2025-12-04T13:20:27.8653295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16 PASSED [1.5291s] [ 45%] 2025-12-04T13:20:27.8653454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32 PASSED [1.4606s] [ 45%] 2025-12-04T13:20:27.8653587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16 PASSED [1.4567s] [ 45%] 2025-12-04T13:20:27.8653740Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_complex64 PASSED [1.4774s] [ 45%] 2025-12-04T13:20:27.8653891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float16 PASSED [1.4771s] [ 45%] 2025-12-04T13:20:27.8654039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32 PASSED [0.0433s] [ 45%] 2025-12-04T13:20:27.8654171Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64 PASSED [0.0378s] [ 45%] 2025-12-04T13:20:27.8654304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64 PASSED [0.0412s] [ 45%] 2025-12-04T13:20:27.8654437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16 PASSED [0.4967s] [ 45%] 2025-12-04T13:20:27.8654567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64 PASSED [0.3122s] [ 45%] 2025-12-04T13:20:27.8654695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8 PASSED [0.3019s] [ 45%] 2025-12-04T13:20:27.8654822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16 PASSED [0.0540s] [ 45%] 2025-12-04T13:20:27.8654950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32 PASSED [1.5153s] [ 45%] 2025-12-04T13:20:27.8655076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64 PASSED [0.0441s] [ 45%] 2025-12-04T13:20:27.8655203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8 PASSED [0.0547s] [ 45%] 2025-12-04T13:20:27.8655352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8 PASSED [0.0460s] [ 45%] 2025-12-04T13:20:27.8655491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_complex128 PASSED [1.4541s] [ 45%] 2025-12-04T13:20:27.8655623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_copy_executor_aten_cuda_float32 PASSED [1.4542s] [ 45%] 2025-12-04T13:20:27.8655764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32 PASSED [1.4414s] [ 45%] 2025-12-04T13:20:27.8655889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8 PASSED [1.4569s] [ 45%] 2025-12-04T13:20:27.8656035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_float32 PASSED [1.4780s] [ 45%] 2025-12-04T13:20:27.8656178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_uint8 PASSED [1.4739s] [ 45%] 2025-12-04T13:20:27.8656307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool PASSED [0.1015s] [ 45%] 2025-12-04T13:20:27.8656435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16 PASSED [0.1137s] [ 45%] 2025-12-04T13:20:27.8656562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64 PASSED [0.0793s] [ 45%] 2025-12-04T13:20:27.8656693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64 PASSED [0.0800s] [ 45%] 2025-12-04T13:20:27.8656823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8 PASSED [0.0742s] [ 45%] 2025-12-04T13:20:27.8656965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16 PASSED [0.0360s] [ 45%] 2025-12-04T13:20:27.8657089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool PASSED [0.0642s] [ 45%] 2025-12-04T13:20:27.8657233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128 PASSED [0.0638s] [ 45%] 2025-12-04T13:20:27.8657360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32 PASSED [0.0636s] [ 45%] 2025-12-04T13:20:27.8657496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64 PASSED [0.0065s] [ 45%] 2025-12-04T13:20:27.8657627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32 PASSED [0.0067s] [ 45%] 2025-12-04T13:20:27.8657756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8 PASSED [1.4650s] [ 45%] 2025-12-04T13:20:27.8657915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex128 PASSED [1.4130s] [ 45%] 2025-12-04T13:20:27.8658067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex32 PASSED [1.4349s] [ 45%] 2025-12-04T13:20:27.8658217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_complex64 PASSED [1.4252s] [ 45%] 2025-12-04T13:20:27.8658365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_float16 PASSED [1.4049s] [ 45%] 2025-12-04T13:20:27.8658508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int16 PASSED [1.4112s] [ 45%] 2025-12-04T13:20:27.8658653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int32 PASSED [1.4075s] [ 45%] 2025-12-04T13:20:27.8658797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int8 PASSED [1.4163s] [ 45%] 2025-12-04T13:20:27.8658941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_uint8 PASSED [1.4163s] [ 45%] 2025-12-04T13:20:27.8659078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool PASSED [1.4025s] [ 45%] 2025-12-04T13:20:27.8659229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64 PASSED [1.3910s] [ 45%] 2025-12-04T13:20:27.8659367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16 PASSED [1.4206s] [ 45%] 2025-12-04T13:20:27.8659494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool PASSED [0.0675s] [ 45%] 2025-12-04T13:20:27.8659640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32 PASSED [0.0576s] [ 45%] 2025-12-04T13:20:27.8659774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64 PASSED [0.0570s] [ 45%] 2025-12-04T13:20:27.8659907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16 PASSED [0.0566s] [ 45%] 2025-12-04T13:20:27.8660038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64 PASSED [0.0566s] [ 45%] 2025-12-04T13:20:27.8660169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8 PASSED [0.0547s] [ 45%] 2025-12-04T13:20:27.8660300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32 PASSED [0.0609s] [ 45%] 2025-12-04T13:20:27.8660430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32 PASSED [0.0573s] [ 46%] 2025-12-04T13:20:27.8660557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64 PASSED [0.0562s] [ 46%] 2025-12-04T13:20:27.8660704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128 PASSED [0.3209s] [ 46%] 2025-12-04T13:20:27.8660846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64 PASSED [0.3155s] [ 46%] 2025-12-04T13:20:27.8660980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16 PASSED [0.1065s] [ 46%] 2025-12-04T13:20:27.8661127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32 PASSED [0.0683s] [ 46%] 2025-12-04T13:20:27.8661258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32 PASSED [0.0582s] [ 46%] 2025-12-04T13:20:27.8661388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64 PASSED [0.0577s] [ 46%] 2025-12-04T13:20:27.8661526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_bool PASSED [0.0517s] [ 46%] 2025-12-04T13:20:27.8661681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32 PASSED [0.0540s] [ 46%] 2025-12-04T13:20:27.8661822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float16 PASSED [1.5190s] [ 46%] 2025-12-04T13:20:27.8661962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int16 PASSED [0.0541s] [ 46%] 2025-12-04T13:20:27.8662101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8 PASSED [0.0516s] [ 46%] 2025-12-04T13:20:27.8662238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16 PASSED [0.0400s] [ 46%] 2025-12-04T13:20:27.8662370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16 PASSED [0.0375s] [ 46%] 2025-12-04T13:20:27.8662502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8 PASSED [0.0379s] [ 46%] 2025-12-04T13:20:27.8662635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8 PASSED [0.0376s] [ 46%] 2025-12-04T13:20:27.8662777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32 PASSED [0.0268s] [ 46%] 2025-12-04T13:20:27.8662914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8 PASSED [0.0245s] [ 46%] 2025-12-04T13:20:27.8663071Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32 PASSED [0.0719s] [ 46%] 2025-12-04T13:20:27.8663213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16 PASSED [0.0661s] [ 46%] 2025-12-04T13:20:27.8663386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8 PASSED [0.0638s] [ 46%] 2025-12-04T13:20:27.8663542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64 PASSED [0.0505s] [ 46%] 2025-12-04T13:20:27.8663674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64 PASSED [0.0482s] [ 46%] 2025-12-04T13:20:27.8663823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_bfloat16 PASSED [0.0252s] [ 46%] 2025-12-04T13:20:27.8663967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int32 PASSED [0.0247s] [ 46%] 2025-12-04T13:20:27.8664114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128 PASSED [0.0216s] [ 46%] 2025-12-04T13:20:27.8664254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64 PASSED [0.0207s] [ 46%] 2025-12-04T13:20:27.8664392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16 PASSED [0.0204s] [ 46%] 2025-12-04T13:20:27.8664528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64 PASSED [0.0198s] [ 46%] 2025-12-04T13:20:27.8664662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16 PASSED [0.0491s] [ 46%] 2025-12-04T13:20:27.8664790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32 PASSED [0.0327s] [ 46%] 2025-12-04T13:20:27.8664940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16 PASSED [0.0879s] [ 46%] 2025-12-04T13:20:27.8665075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32 PASSED [0.0510s] [ 46%] 2025-12-04T13:20:27.8665211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64 PASSED [0.0521s] [ 46%] 2025-12-04T13:20:27.8665347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_bfloat16 PASSED [1.4189s] [ 46%] 2025-12-04T13:20:27.8665482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vdot_executor_aten_cuda_complex128 PASSED [1.4159s] [ 46%] 2025-12-04T13:20:27.8665643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_complex_executor_aten_cuda_float32 PASSED [1.3914s] [ 46%] 2025-12-04T13:20:27.8665778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16 PASSED [0.0915s] [ 46%] 2025-12-04T13:20:27.8665922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_complex128 PASSED [0.0224s] [ 46%] 2025-12-04T13:20:27.8666059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float32 PASSED [0.0217s] [ 46%] 2025-12-04T13:20:27.8666193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int32 PASSED [0.0196s] [ 46%] 2025-12-04T13:20:27.8666327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16 PASSED [0.1025s] [ 46%] 2025-12-04T13:20:27.8666462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128 PASSED [0.1122s] [ 46%] 2025-12-04T13:20:27.8666593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32 PASSED [0.1109s] [ 46%] 2025-12-04T13:20:27.8666722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64 PASSED [1.4965s] [ 46%] 2025-12-04T13:20:27.8666865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8 PASSED [0.1179s] [ 46%] 2025-12-04T13:20:27.8667003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64 PASSED [1.4089s] [ 46%] 2025-12-04T13:20:27.8667137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16 PASSED [1.4069s] [ 46%] 2025-12-04T13:20:27.8667291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16 PASSED [1.4050s] [ 46%] 2025-12-04T13:20:27.8667424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32 PASSED [1.4132s] [ 46%] 2025-12-04T13:20:27.8667557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16 PASSED [0.0758s] [ 46%] 2025-12-04T13:20:27.8667688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16 PASSED [0.0531s] [ 46%] 2025-12-04T13:20:27.8667824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16 PASSED [0.8835s] [ 46%] 2025-12-04T13:20:27.8667955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64 PASSED [0.7819s] [ 46%] 2025-12-04T13:20:27.8668085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8 PASSED [0.7822s] [ 46%] 2025-12-04T13:20:27.8668217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool PASSED [0.0080s] [ 46%] 2025-12-04T13:20:27.8668353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128 PASSED [1.4710s] [ 46%] 2025-12-04T13:20:27.8668487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16 PASSED [1.3776s] [ 46%] 2025-12-04T13:20:27.8668595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16 PASSED [1.4019s] [ 46%] 2025-12-04T13:20:27.8668718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64 PASSED [1.4079s] [ 46%] 2025-12-04T13:20:27.8668825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16 PASSED [1.3924s] [ 46%] 2025-12-04T13:20:27.8668928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64 PASSED [1.3997s] [ 46%] 2025-12-04T13:20:27.8669061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16 PASSED [1.4106s] [ 46%] 2025-12-04T13:20:27.8669193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64 PASSED [1.4449s] [ 46%] 2025-12-04T13:20:27.8669332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16 PASSED [1.4325s] [ 46%] 2025-12-04T13:20:27.8669460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32 PASSED [1.4495s] [ 46%] 2025-12-04T13:20:27.8669586Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64 PASSED [1.4340s] [ 47%] 2025-12-04T13:20:27.8669709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool PASSED [1.4393s] [ 47%] 2025-12-04T13:20:27.8669835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64 PASSED [1.4503s] [ 47%] 2025-12-04T13:20:27.8669963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128 PASSED [1.4508s] [ 47%] 2025-12-04T13:20:27.8670087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32 PASSED [1.4352s] [ 47%] 2025-12-04T13:20:27.8670216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16 PASSED [1.4557s] [ 47%] 2025-12-04T13:20:27.8670342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool PASSED [1.4309s] [ 47%] 2025-12-04T13:20:27.8670473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64 PASSED [1.4604s] [ 47%] 2025-12-04T13:20:27.8670615Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32 PASSED [1.4583s] [ 47%] 2025-12-04T13:20:27.8670738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8 PASSED [1.4356s] [ 47%] 2025-12-04T13:20:27.8670869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32 PASSED [1.4406s] [ 47%] 2025-12-04T13:20:27.8671009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64 PASSED [1.4342s] [ 47%] 2025-12-04T13:20:27.8671134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32 PASSED [1.4445s] [ 47%] 2025-12-04T13:20:27.8671262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64 PASSED [1.4520s] [ 47%] 2025-12-04T13:20:27.8671390Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64 PASSED [1.4422s] [ 47%] 2025-12-04T13:20:27.8671519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32 PASSED [0.1324s] [ 47%] 2025-12-04T13:20:27.8671649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64 PASSED [0.1281s] [ 47%] 2025-12-04T13:20:27.8671780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64 PASSED [1.4422s] [ 47%] 2025-12-04T13:20:27.8671906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16 PASSED [1.4338s] [ 47%] 2025-12-04T13:20:27.8672032Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool PASSED [1.4503s] [ 47%] 2025-12-04T13:20:27.8672155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64 PASSED [1.4386s] [ 47%] 2025-12-04T13:20:27.8672280Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8 PASSED [1.4419s] [ 47%] 2025-12-04T13:20:27.8672413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8 PASSED [1.4355s] [ 47%] 2025-12-04T13:20:27.8672541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64 PASSED [1.4589s] [ 47%] 2025-12-04T13:20:27.8672663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32 PASSED [1.4250s] [ 47%] 2025-12-04T13:20:27.8672785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64 PASSED [1.4527s] [ 47%] 2025-12-04T13:20:27.8672905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8 PASSED [1.4390s] [ 47%] 2025-12-04T13:20:27.8673045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128 PASSED [1.4737s] [ 47%] 2025-12-04T13:20:27.8673172Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64 PASSED [1.4714s] [ 47%] 2025-12-04T13:20:27.8673330Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16 PASSED [1.4499s] [ 47%] 2025-12-04T13:20:27.8673455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64 PASSED [1.4630s] [ 47%] 2025-12-04T13:20:27.8673581Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16 PASSED [1.4598s] [ 47%] 2025-12-04T13:20:27.8673694Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64 PASSED [1.4485s] [ 47%] 2025-12-04T13:20:27.8673806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32 PASSED [1.4680s] [ 47%] 2025-12-04T13:20:27.8673914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64 PASSED [1.4536s] [ 47%] 2025-12-04T13:20:27.8674021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8 PASSED [1.4395s] [ 47%] 2025-12-04T13:20:27.8674132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32 PASSED [1.4432s] [ 47%] 2025-12-04T13:20:27.8674239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16 PASSED [1.4557s] [ 47%] 2025-12-04T13:20:27.8674363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8 PASSED [1.4433s] [ 47%] 2025-12-04T13:20:27.8674470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16 PASSED [0.1558s] [ 47%] 2025-12-04T13:20:27.8674576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32 PASSED [0.1098s] [ 47%] 2025-12-04T13:20:27.8674700Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64 PASSED [0.0544s] [ 47%] 2025-12-04T13:20:27.8674812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64 PASSED [0.0449s] [ 47%] 2025-12-04T13:20:27.8674923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8 PASSED [1.4741s] [ 47%] 2025-12-04T13:20:27.8675033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16 PASSED [1.4433s] [ 47%] 2025-12-04T13:20:27.8675142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128 PASSED [1.4383s] [ 47%] 2025-12-04T13:20:27.8675262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16 PASSED [1.4240s] [ 47%] 2025-12-04T13:20:27.8675380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex64 PASSED [1.4279s] [ 47%] 2025-12-04T13:20:27.8675495Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_float32 PASSED [1.4178s] [ 47%] 2025-12-04T13:20:27.8675601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32 PASSED [1.4510s] [ 47%] 2025-12-04T13:20:27.8675707Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8 PASSED [1.4379s] [ 47%] 2025-12-04T13:20:27.8675821Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16 PASSED [1.4473s] [ 47%] 2025-12-04T13:20:27.8675928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32 PASSED [1.4481s] [ 47%] 2025-12-04T13:20:27.8676048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool PASSED [1.4200s] [ 47%] 2025-12-04T13:20:27.8676155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16 PASSED [1.4378s] [ 47%] 2025-12-04T13:20:27.8676260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64 PASSED [1.4444s] [ 47%] 2025-12-04T13:20:27.8676372Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16 PASSED [0.0253s] [ 47%] 2025-12-04T13:20:27.8676483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32 PASSED [0.0105s] [ 47%] 2025-12-04T13:20:27.8676606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex64 PASSED [1.4264s] [ 47%] 2025-12-04T13:20:27.8676742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float64 PASSED [1.4151s] [ 47%] 2025-12-04T13:20:27.8676870Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool PASSED [1.4299s] [ 47%] 2025-12-04T13:20:27.8677008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64 PASSED [1.4154s] [ 47%] 2025-12-04T13:20:27.8677142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32 PASSED [1.4354s] [ 47%] 2025-12-04T13:20:27.8677275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64 PASSED [1.4258s] [ 47%] 2025-12-04T13:20:27.8677405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16 PASSED [1.4538s] [ 47%] 2025-12-04T13:20:27.8677531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16 PASSED [1.4266s] [ 47%] 2025-12-04T13:20:27.8677654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32 PASSED [1.4227s] [ 47%] 2025-12-04T13:20:27.8677777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32 PASSED [1.4281s] [ 48%] 2025-12-04T13:20:27.8677876Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8 2025-12-04T13:20:27.8677892Z 2025-12-04T13:20:27.8678064Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_ops/test_ops-7371450e9d0a79a0.xml - 2025-12-04T13:20:27.8678128Z !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:20:27.8678286Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py:2653: KeyboardInterrupt 2025-12-04T13:20:27.8678376Z (to show a full traceback on KeyboardInterrupt use --full-trace) 2025-12-04T13:20:27.8678457Z ========== 2655 passed, 511 skipped, 51 xfailed in 1793.25s (0:29:53) ========== 2025-12-04T13:20:27.8678506Z Command took >30min, returning 124 2025-12-04T13:20:27.8678545Z Got exit code 124 2025-12-04T13:20:27.8678587Z Retrying single test... 2025-12-04T13:20:27.8678710Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-7f39d969beb20c01.xml 2025-12-04T13:20:27.8678769Z ============================= test session starts ============================== 2025-12-04T13:20:27.8678883Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:20:27.8678924Z cachedir: .pytest_cache 2025-12-04T13:20:27.8679088Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:20:27.8679137Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:20:27.8679178Z configfile: pytest.ini 2025-12-04T13:20:27.8679342Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:20:27.8679428Z collecting ... collected 33666 items / 6701 deselected / 26965 selected 2025-12-04T13:20:27.8679614Z stepcurrent: skipping 3217 already run items. Running only test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8 2025-12-04T13:20:27.8679659Z Running 1 items in this shard 2025-12-04T13:20:27.8679663Z 2025-12-04T13:20:27.8679804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8 PASSED [0.1239s] [100%] 2025-12-04T13:20:27.8679806Z 2025-12-04T13:20:27.8679968Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_ops/test_ops-7f39d969beb20c01.xml - 2025-12-04T13:20:27.8680035Z ====================== 1 passed, 6701 deselected in 1.38s ====================== 2025-12-04T13:20:27.8680073Z Got exit code 0 2025-12-04T13:20:27.8680158Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T13:20:27.8680273Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-a2907f10ae1cea5a.xml 2025-12-04T13:20:27.8680342Z ============================= test session starts ============================== 2025-12-04T13:20:27.8680452Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:20:27.8680494Z cachedir: .pytest_cache 2025-12-04T13:20:27.8680653Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:20:27.8680699Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:20:27.8680739Z configfile: pytest.ini 2025-12-04T13:20:27.8680902Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:20:27.8680984Z collecting ... collected 33666 items / 3218 deselected / 30448 selected 2025-12-04T13:20:27.8681041Z stepcurrent: skipping 3218 already run items. 2025-12-04T13:20:27.8681086Z Running 3484 items in this shard 2025-12-04T13:20:27.8681088Z 2025-12-04T13:20:27.8681207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128 PASSED [0.2230s] [ 0%] 2025-12-04T13:20:27.8681322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128 PASSED [0.9685s] [ 0%] 2025-12-04T13:20:27.8681432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16 PASSED [0.8520s] [ 0%] 2025-12-04T13:20:27.8681564Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16 PASSED [0.1620s] [ 0%] 2025-12-04T13:20:27.8681676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32 PASSED [0.0669s] [ 0%] 2025-12-04T13:20:27.8681785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16 PASSED [0.0271s] [ 0%] 2025-12-04T13:20:27.8681901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32 PASSED [0.0322s] [ 0%] 2025-12-04T13:20:27.8682008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64 PASSED [0.0316s] [ 0%] 2025-12-04T13:20:27.8682119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32 PASSED [0.0317s] [ 0%] 2025-12-04T13:20:27.8682229Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64 PASSED [0.0307s] [ 0%] 2025-12-04T13:20:27.8682337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64 PASSED [0.7941s] [ 0%] 2025-12-04T13:20:27.8682444Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8 PASSED [0.8010s] [ 0%] 2025-12-04T13:20:27.8682562Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16 PASSED [0.0085s] [ 0%] 2025-12-04T13:20:27.8682682Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128 PASSED [0.7795s] [ 0%] 2025-12-04T13:20:27.8682800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32 PASSED [0.7729s] [ 0%] 2025-12-04T13:20:27.8682914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32 PASSED [0.0079s] [ 0%] 2025-12-04T13:20:27.8683027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8 PASSED [0.0068s] [ 0%] 2025-12-04T13:20:27.8683139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool PASSED [0.7744s] [ 0%] 2025-12-04T13:20:27.8683303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64 PASSED [0.7793s] [ 0%] 2025-12-04T13:20:27.8683417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16 PASSED [0.0090s] [ 0%] 2025-12-04T13:20:27.8683534Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16 PASSED [0.0090s] [ 0%] 2025-12-04T13:20:27.8683646Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool PASSED [0.0079s] [ 0%] 2025-12-04T13:20:27.8683765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128 PASSED [0.0088s] [ 0%] 2025-12-04T13:20:27.8683892Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16 PASSED [0.0481s] [ 0%] 2025-12-04T13:20:27.8684004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32 PASSED [0.1188s] [ 0%] 2025-12-04T13:20:27.8684116Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8 PASSED [0.1042s] [ 0%] 2025-12-04T13:20:27.8684230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool PASSED [0.1016s] [ 0%] 2025-12-04T13:20:27.8684343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32 PASSED [0.0915s] [ 0%] 2025-12-04T13:20:27.8684454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8 PASSED [0.0897s] [ 0%] 2025-12-04T13:20:27.8684566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8 PASSED [0.1052s] [ 0%] 2025-12-04T13:20:27.8684678Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int8 PASSED [0.0197s] [ 0%] 2025-12-04T13:20:27.8684804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32 PASSED [0.0108s] [ 0%] 2025-12-04T13:20:27.8684916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8 PASSED [0.2160s] [ 0%] 2025-12-04T13:20:27.8685025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8 PASSED [0.1247s] [ 0%] 2025-12-04T13:20:27.8685152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32 PASSED [0.0147s] [ 1%] 2025-12-04T13:20:27.8685261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16 PASSED [0.0584s] [ 1%] 2025-12-04T13:20:27.8685369Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16 PASSED [0.8192s] [ 1%] 2025-12-04T13:20:27.8685474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool PASSED [0.7960s] [ 1%] 2025-12-04T13:20:27.8685595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16 PASSED [0.8424s] [ 1%] 2025-12-04T13:20:27.8685706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool PASSED [0.1040s] [ 1%] 2025-12-04T13:20:27.8685820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32 PASSED [0.8641s] [ 1%] 2025-12-04T13:20:27.8685931Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16 PASSED [0.0893s] [ 1%] 2025-12-04T13:20:27.8686044Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64 PASSED [0.0879s] [ 1%] 2025-12-04T13:20:27.8686155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32 PASSED [0.8755s] [ 1%] 2025-12-04T13:20:27.8686267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16 PASSED [0.8350s] [ 1%] 2025-12-04T13:20:27.8686382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64 PASSED [0.8380s] [ 1%] 2025-12-04T13:20:27.8686494Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16 PASSED [0.8437s] [ 1%] 2025-12-04T13:20:27.8686601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16 PASSED [0.8458s] [ 1%] 2025-12-04T13:20:27.8686709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8 PASSED [0.8490s] [ 1%] 2025-12-04T13:20:27.8686828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16 PASSED [0.7962s] [ 1%] 2025-12-04T13:20:27.8686960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128 PASSED [0.7945s] [ 1%] 2025-12-04T13:20:27.8687067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8 PASSED [0.8026s] [ 1%] 2025-12-04T13:20:27.8687188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16 PASSED [0.7992s] [ 1%] 2025-12-04T13:20:27.8687304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8 PASSED [0.7960s] [ 1%] 2025-12-04T13:20:27.8687423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32 PASSED [0.8238s] [ 1%] 2025-12-04T13:20:27.8687553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64 PASSED [0.8195s] [ 1%] 2025-12-04T13:20:27.8687663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8 PASSED [0.8138s] [ 1%] 2025-12-04T13:20:27.8687776Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16 PASSED [0.2411s] [ 1%] 2025-12-04T13:20:27.8687888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32 PASSED [0.1577s] [ 1%] 2025-12-04T13:20:27.8687997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16 PASSED [0.8243s] [ 1%] 2025-12-04T13:20:27.8688103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64 PASSED [0.8162s] [ 1%] 2025-12-04T13:20:27.8688209Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8 PASSED [0.8089s] [ 1%] 2025-12-04T13:20:27.8688317Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16 PASSED [0.8281s] [ 1%] 2025-12-04T13:20:27.8688424Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32 PASSED [0.0246s] [ 1%] 2025-12-04T13:20:27.8688539Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int8 PASSED [0.8161s] [ 1%] 2025-12-04T13:20:27.8688656Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_uint8 PASSED [0.7998s] [ 1%] 2025-12-04T13:20:27.8688776Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16 PASSED [0.8157s] [ 1%] 2025-12-04T13:20:27.8688887Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32 PASSED [0.8045s] [ 1%] 2025-12-04T13:20:27.8688995Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_bool PASSED [0.8466s] [ 1%] 2025-12-04T13:20:27.8689126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64 PASSED [0.0486s] [ 2%] 2025-12-04T13:20:27.8689241Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16 PASSED [0.0336s] [ 2%] 2025-12-04T13:20:27.8689355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32 PASSED [0.8326s] [ 2%] 2025-12-04T13:20:27.8689473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16 PASSED [0.8069s] [ 2%] 2025-12-04T13:20:27.8689590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64 PASSED [0.8070s] [ 2%] 2025-12-04T13:20:27.8689700Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32 PASSED [0.8150s] [ 2%] 2025-12-04T13:20:27.8689826Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64 PASSED [0.8074s] [ 2%] 2025-12-04T13:20:27.8689950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32 PASSED [0.8082s] [ 2%] 2025-12-04T13:20:27.8690070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64 PASSED [0.8026s] [ 2%] 2025-12-04T13:20:27.8690182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32 PASSED [0.8231s] [ 2%] 2025-12-04T13:20:27.8690291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64 PASSED [0.0320s] [ 2%] 2025-12-04T13:20:27.8690401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16 PASSED [0.0253s] [ 2%] 2025-12-04T13:20:27.8690539Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16 PASSED [0.6236s] [ 2%] 2025-12-04T13:20:27.8690665Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16 PASSED [0.1063s] [ 2%] 2025-12-04T13:20:27.8690787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64 PASSED [0.1161s] [ 2%] 2025-12-04T13:20:27.8690911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8 PASSED [0.1186s] [ 2%] 2025-12-04T13:20:27.8691033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64 PASSED [0.1288s] [ 2%] 2025-12-04T13:20:27.8691165Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64 PASSED [0.1247s] [ 2%] 2025-12-04T13:20:27.8691275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_complex64 PASSED [0.8226s] [ 2%] 2025-12-04T13:20:27.8691389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64 PASSED [0.8001s] [ 2%] 2025-12-04T13:20:27.8691499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32 PASSED [0.8031s] [ 2%] 2025-12-04T13:20:27.8691610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32 PASSED [0.7978s] [ 2%] 2025-12-04T13:20:27.8691719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64 PASSED [0.8240s] [ 2%] 2025-12-04T13:20:27.8691826Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32 PASSED [0.8127s] [ 2%] 2025-12-04T13:20:27.8691933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64 PASSED [0.8041s] [ 2%] 2025-12-04T13:20:27.8692039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool PASSED [0.0060s] [ 2%] 2025-12-04T13:20:27.8692147Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16 PASSED [0.0048s] [ 2%] 2025-12-04T13:20:27.8692279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128 PASSED [0.0188s] [ 2%] 2025-12-04T13:20:27.8692390Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8 PASSED [0.0178s] [ 2%] 2025-12-04T13:20:27.8692503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_bool PASSED [0.8129s] [ 2%] 2025-12-04T13:20:27.8692627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex128 PASSED [0.8058s] [ 2%] 2025-12-04T13:20:27.8692740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8 PASSED [0.0926s] [ 2%] 2025-12-04T13:20:27.8692846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8 PASSED [0.0906s] [ 2%] 2025-12-04T13:20:27.8692951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int16 XFAIL [0.0056s] [ 2%] 2025-12-04T13:20:27.8693055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int64 XFAIL [0.8046s] [ 2%] 2025-12-04T13:20:27.8693164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16 PASSED [0.8415s] [ 3%] 2025-12-04T13:20:27.8693310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64 PASSED [0.0248s] [ 3%] 2025-12-04T13:20:27.8693414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64 PASSED [0.0199s] [ 3%] 2025-12-04T13:20:27.8693520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16 PASSED [0.0305s] [ 3%] 2025-12-04T13:20:27.8693625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool PASSED [0.0273s] [ 3%] 2025-12-04T13:20:27.8693732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16 PASSED [0.8211s] [ 3%] 2025-12-04T13:20:27.8693837Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8 PASSED [0.8240s] [ 3%] 2025-12-04T13:20:27.8693947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32 PASSED [0.8441s] [ 3%] 2025-12-04T13:20:27.8694073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64 PASSED [0.0428s] [ 3%] 2025-12-04T13:20:27.8694178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16 PASSED [0.0297s] [ 3%] 2025-12-04T13:20:27.8694285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64 PASSED [0.0270s] [ 3%] 2025-12-04T13:20:27.8694389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16 PASSED [0.0215s] [ 3%] 2025-12-04T13:20:27.8694494Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8 PASSED [0.8169s] [ 3%] 2025-12-04T13:20:27.8694598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8 PASSED [0.8204s] [ 3%] 2025-12-04T13:20:27.8694731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16 PASSED [0.8185s] [ 3%] 2025-12-04T13:20:27.8694844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32 PASSED [0.8243s] [ 3%] 2025-12-04T13:20:27.8694963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bfloat16 PASSED [0.8065s] [ 3%] 2025-12-04T13:20:27.8695075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bool PASSED [0.8044s] [ 3%] 2025-12-04T13:20:27.8695194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_complex64 PASSED [0.8123s] [ 3%] 2025-12-04T13:20:27.8695309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float32 PASSED [0.8170s] [ 3%] 2025-12-04T13:20:27.8695425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_float64 PASSED [0.8174s] [ 3%] 2025-12-04T13:20:27.8695532Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32 PASSED [0.8171s] [ 3%] 2025-12-04T13:20:27.8695643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_complex64 PASSED [0.8375s] [ 3%] 2025-12-04T13:20:27.8695751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64 PASSED [0.0263s] [ 3%] 2025-12-04T13:20:27.8695872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool PASSED [0.8543s] [ 3%] 2025-12-04T13:20:27.8695976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16 PASSED [0.0689s] [ 3%] 2025-12-04T13:20:27.8696092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float8_e4m3fnuz PASSED [0.0679s] [ 3%] 2025-12-04T13:20:27.8696196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64 PASSED [0.0687s] [ 3%] 2025-12-04T13:20:27.8696324Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32 PASSED [2.4929s] [ 3%] 2025-12-04T13:20:27.8696435Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8 PASSED [0.8281s] [ 3%] 2025-12-04T13:20:27.8696552Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16 PASSED [0.8243s] [ 3%] 2025-12-04T13:20:27.8696664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool PASSED [0.0082s] [ 3%] 2025-12-04T13:20:27.8696788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128 PASSED [0.8096s] [ 3%] 2025-12-04T13:20:27.8696908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32 PASSED [0.8130s] [ 3%] 2025-12-04T13:20:27.8697021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64 PASSED [0.8159s] [ 3%] 2025-12-04T13:20:27.8697133Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8 PASSED [0.8333s] [ 4%] 2025-12-04T13:20:27.8697246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8 PASSED [0.8227s] [ 4%] 2025-12-04T13:20:27.8697357Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8 PASSED [2.9441s] [ 4%] 2025-12-04T13:20:27.8697465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool PASSED [0.6326s] [ 4%] 2025-12-04T13:20:27.8697579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64 PASSED [0.0070s] [ 4%] 2025-12-04T13:20:27.8697699Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8 PASSED [0.8163s] [ 4%] 2025-12-04T13:20:27.8697816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128 PASSED [3.5878s] [ 4%] 2025-12-04T13:20:27.8697931Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32 PASSED [2.6425s] [ 4%] 2025-12-04T13:20:27.8698043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64 PASSED [0.0093s] [ 4%] 2025-12-04T13:20:27.8698152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8 PASSED [0.8082s] [ 4%] 2025-12-04T13:20:27.8698274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8 PASSED [0.8281s] [ 4%] 2025-12-04T13:20:27.8698385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16 PASSED [3.1102s] [ 4%] 2025-12-04T13:20:27.8698497Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32 PASSED [1.0758s] [ 4%] 2025-12-04T13:20:27.8698607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16 PASSED [0.7897s] [ 4%] 2025-12-04T13:20:27.8698715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32 PASSED [0.7926s] [ 4%] 2025-12-04T13:20:27.8698823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64 PASSED [0.7949s] [ 4%] 2025-12-04T13:20:27.8698939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64 PASSED [2.3066s] [ 4%] 2025-12-04T13:20:27.8699051Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32 PASSED [0.0093s] [ 4%] 2025-12-04T13:20:27.8699160Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64 PASSED [0.7944s] [ 4%] 2025-12-04T13:20:27.8699280Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16 PASSED [0.8794s] [ 4%] 2025-12-04T13:20:27.8699403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128 PASSED [1.2961s] [ 4%] 2025-12-04T13:20:27.8699535Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32 PASSED [1.2896s] [ 4%] 2025-12-04T13:20:27.8699651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16 PASSED [1.2906s] [ 4%] 2025-12-04T13:20:27.8699768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32 PASSED [1.2667s] [ 4%] 2025-12-04T13:20:27.8699895Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8 PASSED [1.2774s] [ 4%] 2025-12-04T13:20:27.8700009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16 PASSED [3.1220s] [ 4%] 2025-12-04T13:20:27.8700119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32 PASSED [0.0327s] [ 4%] 2025-12-04T13:20:27.8700230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool PASSED [0.3309s] [ 4%] 2025-12-04T13:20:27.8700344Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16 PASSED [1.6247s] [ 4%] 2025-12-04T13:20:27.8700456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32 PASSED [0.0116s] [ 4%] 2025-12-04T13:20:27.8700564Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8 PASSED [1.2654s] [ 4%] 2025-12-04T13:20:27.8700674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8 PASSED [1.2803s] [ 4%] 2025-12-04T13:20:27.8700783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16 PASSED [1.7256s] [ 4%] 2025-12-04T13:20:27.8700902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128 PASSED [0.2780s] [ 4%] 2025-12-04T13:20:27.8701017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64 PASSED [1.2794s] [ 4%] 2025-12-04T13:20:27.8701128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32 PASSED [1.2725s] [ 5%] 2025-12-04T13:20:27.8701251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64 PASSED [1.2861s] [ 5%] 2025-12-04T13:20:27.8701361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32 PASSED [1.2841s] [ 5%] 2025-12-04T13:20:27.8701469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16 PASSED [1.3579s] [ 5%] 2025-12-04T13:20:27.8701579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32 PASSED [0.0070s] [ 5%] 2025-12-04T13:20:27.8701688Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16 PASSED [0.0068s] [ 5%] 2025-12-04T13:20:27.8701811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32 PASSED [0.7905s] [ 5%] 2025-12-04T13:20:27.8701920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8 PASSED [0.7897s] [ 5%] 2025-12-04T13:20:27.8702030Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8 PASSED [0.8017s] [ 5%] 2025-12-04T13:20:27.8702140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32 PASSED [0.8253s] [ 5%] 2025-12-04T13:20:27.8702249Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16 PASSED [0.0347s] [ 5%] 2025-12-04T13:20:27.8702358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16 PASSED [0.0081s] [ 5%] 2025-12-04T13:20:27.8702466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64 PASSED [0.0081s] [ 5%] 2025-12-04T13:20:27.8702580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128 PASSED [0.0046s] [ 5%] 2025-12-04T13:20:27.8702687Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64 PASSED [0.0046s] [ 5%] 2025-12-04T13:20:27.8702800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128 PASSED [0.0038s] [ 5%] 2025-12-04T13:20:27.8702909Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64 PASSED [0.0035s] [ 5%] 2025-12-04T13:20:27.8703031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32 PASSED [0.0040s] [ 5%] 2025-12-04T13:20:27.8703137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64 PASSED [0.0038s] [ 5%] 2025-12-04T13:20:27.8703282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128 PASSED [0.1496s] [ 5%] 2025-12-04T13:20:27.8703410Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32 PASSED [0.1329s] [ 5%] 2025-12-04T13:20:27.8703521Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8 PASSED [0.1142s] [ 5%] 2025-12-04T13:20:27.8703629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16 PASSED [0.0218s] [ 5%] 2025-12-04T13:20:27.8703734Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8 PASSED [0.0213s] [ 5%] 2025-12-04T13:20:27.8703853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16 PASSED [0.4489s] [ 5%] 2025-12-04T13:20:27.8703964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8 PASSED [0.2304s] [ 5%] 2025-12-04T13:20:27.8704071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32 PASSED [0.0881s] [ 5%] 2025-12-04T13:20:27.8704176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8 PASSED [0.0799s] [ 5%] 2025-12-04T13:20:27.8704284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64 PASSED [0.0982s] [ 5%] 2025-12-04T13:20:27.8704386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8 PASSED [0.0817s] [ 5%] 2025-12-04T13:20:27.8704492Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16 PASSED [0.0968s] [ 5%] 2025-12-04T13:20:27.8704596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16 PASSED [0.0940s] [ 5%] 2025-12-04T13:20:27.8704731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float16 PASSED [0.8306s] [ 5%] 2025-12-04T13:20:27.8704844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float64 PASSED [0.7989s] [ 5%] 2025-12-04T13:20:27.8704954Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int32 XFAIL [0.0044s] [ 5%] 2025-12-04T13:20:27.8705062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_int64 XFAIL [0.8000s] [ 6%] 2025-12-04T13:20:27.8705169Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16 PASSED [0.9034s] [ 6%] 2025-12-04T13:20:27.8705274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32 PASSED [0.0882s] [ 6%] 2025-12-04T13:20:27.8705391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64 PASSED [0.0932s] [ 6%] 2025-12-04T13:20:27.8705494Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16 PASSED [0.0886s] [ 6%] 2025-12-04T13:20:27.8705611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16 PASSED [0.1588s] [ 6%] 2025-12-04T13:20:27.8705724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32 PASSED [0.1292s] [ 6%] 2025-12-04T13:20:27.8705831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool PASSED [0.0041s] [ 6%] 2025-12-04T13:20:27.8705939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8 PASSED [0.0038s] [ 6%] 2025-12-04T13:20:27.8706048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64 PASSED [0.0049s] [ 6%] 2025-12-04T13:20:27.8706154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8 PASSED [0.8075s] [ 6%] 2025-12-04T13:20:27.8706260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8 PASSED [0.8033s] [ 6%] 2025-12-04T13:20:27.8706365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32 PASSED [0.8310s] [ 6%] 2025-12-04T13:20:27.8706469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64 PASSED [0.3186s] [ 6%] 2025-12-04T13:20:27.8706586Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32 PASSED [0.8305s] [ 6%] 2025-12-04T13:20:27.8706688Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64 PASSED [0.8275s] [ 6%] 2025-12-04T13:20:27.8706798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64 PASSED [0.8437s] [ 6%] 2025-12-04T13:20:27.8706925Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16 PASSED [0.8119s] [ 6%] 2025-12-04T13:20:27.8707038Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64 PASSED [0.8100s] [ 6%] 2025-12-04T13:20:27.8707155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex32 PASSED [0.7974s] [ 6%] 2025-12-04T13:20:27.8707267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8 PASSED [0.8012s] [ 6%] 2025-12-04T13:20:27.8707376Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool PASSED [0.8055s] [ 6%] 2025-12-04T13:20:27.8707496Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128 PASSED [0.8027s] [ 6%] 2025-12-04T13:20:27.8707609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64 PASSED [0.8095s] [ 6%] 2025-12-04T13:20:27.8707723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32 PASSED [0.8052s] [ 6%] 2025-12-04T13:20:27.8707837Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64 PASSED [0.8042s] [ 6%] 2025-12-04T13:20:27.8707949Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool PASSED [0.8300s] [ 6%] 2025-12-04T13:20:27.8708064Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64 PASSED [0.8410s] [ 6%] 2025-12-04T13:20:27.8708174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128 PASSED [0.0792s] [ 6%] 2025-12-04T13:20:27.8708304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64 PASSED [0.0763s] [ 6%] 2025-12-04T13:20:27.8708411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32 PASSED [0.0235s] [ 6%] 2025-12-04T13:20:27.8708521Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16 PASSED [0.0267s] [ 6%] 2025-12-04T13:20:27.8708626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8 PASSED [0.8319s] [ 6%] 2025-12-04T13:20:27.8708739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64 PASSED [0.8265s] [ 6%] 2025-12-04T13:20:27.8708859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16 PASSED [0.8235s] [ 7%] 2025-12-04T13:20:27.8708969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32 PASSED [0.8181s] [ 7%] 2025-12-04T13:20:27.8709076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8 PASSED [0.8263s] [ 7%] 2025-12-04T13:20:27.8709188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8 PASSED [0.8210s] [ 7%] 2025-12-04T13:20:27.8709294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool PASSED [0.8298s] [ 7%] 2025-12-04T13:20:27.8709405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64 PASSED [0.8489s] [ 7%] 2025-12-04T13:20:27.8709510Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8 PASSED [0.8224s] [ 7%] 2025-12-04T13:20:27.8709620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8 PASSED [0.8297s] [ 7%] 2025-12-04T13:20:27.8709724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int32 XFAIL [0.0051s] [ 7%] 2025-12-04T13:20:27.8709829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int64 XFAIL [0.8065s] [ 7%] 2025-12-04T13:20:27.8709932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64 PASSED [0.9546s] [ 7%] 2025-12-04T13:20:27.8710052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16 PASSED [0.0975s] [ 7%] 2025-12-04T13:20:27.8710156Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8 PASSED [0.0862s] [ 7%] 2025-12-04T13:20:27.8710267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bfloat16 PASSED [0.8412s] [ 7%] 2025-12-04T13:20:27.8710376Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32 PASSED [0.8369s] [ 7%] 2025-12-04T13:20:27.8710497Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64 PASSED [0.4289s] [ 7%] 2025-12-04T13:20:27.8710605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16 PASSED [0.0222s] [ 7%] 2025-12-04T13:20:27.8710711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64 PASSED [0.0215s] [ 7%] 2025-12-04T13:20:27.8710834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex64 PASSED [0.8212s] [ 7%] 2025-12-04T13:20:27.8710948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_int8 PASSED [0.8158s] [ 7%] 2025-12-04T13:20:27.8711068Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_bool PASSED [0.8213s] [ 7%] 2025-12-04T13:20:27.8711194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex128 PASSED [0.8289s] [ 7%] 2025-12-04T13:20:27.8711319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex64 PASSED [0.8242s] [ 7%] 2025-12-04T13:20:27.8711439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int16 PASSED [0.8159s] [ 7%] 2025-12-04T13:20:27.8711558Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int32 PASSED [0.8272s] [ 7%] 2025-12-04T13:20:27.8711674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int64 PASSED [0.8174s] [ 7%] 2025-12-04T13:20:27.8711802Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [0.8639s] [ 7%] 2025-12-04T13:20:27.8711932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128 PASSED [0.3221s] [ 7%] 2025-12-04T13:20:27.8712054Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32 PASSED [0.1178s] [ 7%] 2025-12-04T13:20:27.8712180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16 PASSED [0.1153s] [ 7%] 2025-12-04T13:20:27.8712307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16 PASSED [0.0971s] [ 7%] 2025-12-04T13:20:27.8712424Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128 PASSED [0.0309s] [ 7%] 2025-12-04T13:20:27.8712551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32 PASSED [0.0290s] [ 7%] 2025-12-04T13:20:27.8712661Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32 PASSED [0.0267s] [ 7%] 2025-12-04T13:20:27.8712795Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int32 PASSED [0.1043s] [ 7%] 2025-12-04T13:20:27.8712905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64 PASSED [0.4588s] [ 8%] 2025-12-04T13:20:27.8713013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32 PASSED [0.0272s] [ 8%] 2025-12-04T13:20:27.8713120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32 PASSED [0.8434s] [ 8%] 2025-12-04T13:20:27.8713226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool PASSED [0.8360s] [ 8%] 2025-12-04T13:20:27.8713364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128 PASSED [0.8487s] [ 8%] 2025-12-04T13:20:27.8713472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16 PASSED [0.8463s] [ 8%] 2025-12-04T13:20:27.8713578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64 PASSED [0.8333s] [ 8%] 2025-12-04T13:20:27.8713708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16 PASSED [0.8383s] [ 8%] 2025-12-04T13:20:27.8713812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8 PASSED [0.8281s] [ 8%] 2025-12-04T13:20:27.8713938Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8 PASSED [0.8295s] [ 8%] 2025-12-04T13:20:27.8714053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32 PASSED [0.8123s] [ 8%] 2025-12-04T13:20:27.8714181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp_cuda_float16 PASSED [0.2260s] [ 8%] 2025-12-04T13:20:27.8714293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool PASSED [0.0769s] [ 8%] 2025-12-04T13:20:27.8714403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8 PASSED [0.0893s] [ 8%] 2025-12-04T13:20:27.8714520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64 PASSED [0.8484s] [ 8%] 2025-12-04T13:20:27.8714633Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64 PASSED [0.8490s] [ 8%] 2025-12-04T13:20:27.8714752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64 PASSED [0.1335s] [ 8%] 2025-12-04T13:20:27.8714866Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64 PASSED [0.1090s] [ 8%] 2025-12-04T13:20:27.8714978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32 PASSED [0.1016s] [ 8%] 2025-12-04T13:20:27.8715088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8 PASSED [0.0946s] [ 8%] 2025-12-04T13:20:27.8715201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16 PASSED [0.1220s] [ 8%] 2025-12-04T13:20:27.8715312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64 PASSED [0.0883s] [ 8%] 2025-12-04T13:20:27.8715420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32 PASSED [0.1037s] [ 8%] 2025-12-04T13:20:27.8715544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8 PASSED [0.0438s] [ 8%] 2025-12-04T13:20:27.8715654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8 PASSED [0.0348s] [ 8%] 2025-12-04T13:20:27.8715793Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_complex64 PASSED [0.5383s] [ 8%] 2025-12-04T13:20:27.8715928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float16 PASSED [0.5205s] [ 8%] 2025-12-04T13:20:27.8716039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool PASSED [0.0170s] [ 8%] 2025-12-04T13:20:27.8716162Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8 PASSED [0.0090s] [ 8%] 2025-12-04T13:20:27.8716267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool PASSED [0.0848s] [ 8%] 2025-12-04T13:20:27.8716372Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32 PASSED [0.0888s] [ 8%] 2025-12-04T13:20:27.8716492Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128 PASSED [0.0086s] [ 8%] 2025-12-04T13:20:27.8716608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32 PASSED [0.0083s] [ 8%] 2025-12-04T13:20:27.8716718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16 PASSED [0.0838s] [ 8%] 2025-12-04T13:20:27.8716827Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32 PASSED [0.9046s] [ 8%] 2025-12-04T13:20:27.8716959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool PASSED [0.8372s] [ 9%] 2025-12-04T13:20:27.8717097Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [0.8343s] [ 9%] 2025-12-04T13:20:27.8717231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [0.8271s] [ 9%] 2025-12-04T13:20:27.8717374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [0.8408s] [ 9%] 2025-12-04T13:20:27.8717505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [0.8313s] [ 9%] 2025-12-04T13:20:27.8717637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool PASSED [0.8306s] [ 9%] 2025-12-04T13:20:27.8717786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [0.8453s] [ 9%] 2025-12-04T13:20:27.8717920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [0.8358s] [ 9%] 2025-12-04T13:20:27.8718052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [0.8302s] [ 9%] 2025-12-04T13:20:27.8718183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [0.8332s] [ 9%] 2025-12-04T13:20:27.8718297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16 PASSED [0.0981s] [ 9%] 2025-12-04T13:20:27.8718409Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64 PASSED [0.0856s] [ 9%] 2025-12-04T13:20:27.8718518Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64 PASSED [0.0841s] [ 9%] 2025-12-04T13:20:27.8718627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8 PASSED [0.0819s] [ 9%] 2025-12-04T13:20:27.8718735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool PASSED [0.0075s] [ 9%] 2025-12-04T13:20:27.8718853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128 PASSED [0.0073s] [ 9%] 2025-12-04T13:20:27.8718961Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16 PASSED [0.8388s] [ 9%] 2025-12-04T13:20:27.8719075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16 PASSED [0.8832s] [ 9%] 2025-12-04T13:20:27.8719198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16 PASSED [0.8460s] [ 9%] 2025-12-04T13:20:27.8719309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64 PASSED [0.8372s] [ 9%] 2025-12-04T13:20:27.8719418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8 PASSED [0.8432s] [ 9%] 2025-12-04T13:20:27.8719537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16 PASSED [0.8441s] [ 9%] 2025-12-04T13:20:27.8719647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool PASSED [0.8426s] [ 9%] 2025-12-04T13:20:27.8719779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32 PASSED [0.8404s] [ 9%] 2025-12-04T13:20:27.8719895Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64 PASSED [0.8366s] [ 9%] 2025-12-04T13:20:27.8720007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32 PASSED [0.8351s] [ 9%] 2025-12-04T13:20:27.8720118Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16 PASSED [0.8315s] [ 9%] 2025-12-04T13:20:27.8720226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16 PASSED [0.8471s] [ 9%] 2025-12-04T13:20:27.8720333Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32 PASSED [0.8509s] [ 9%] 2025-12-04T13:20:27.8720442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64 PASSED [0.8507s] [ 9%] 2025-12-04T13:20:27.8720569Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32 PASSED [0.0309s] [ 9%] 2025-12-04T13:20:27.8720673Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8 PASSED [0.0858s] [ 9%] 2025-12-04T13:20:27.8720777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8 PASSED [0.0867s] [ 9%] 2025-12-04T13:20:27.8720887Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128 PASSED [0.8630s] [ 9%] 2025-12-04T13:20:27.8721005Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32 PASSED [0.8443s] [ 9%] 2025-12-04T13:20:27.8721109Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64 PASSED [0.8390s] [ 10%] 2025-12-04T13:20:27.8721227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128 PASSED [0.8228s] [ 10%] 2025-12-04T13:20:27.8721353Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16 PASSED [0.8379s] [ 10%] 2025-12-04T13:20:27.8721463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8 PASSED [0.8283s] [ 10%] 2025-12-04T13:20:27.8721589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64 PASSED [0.8360s] [ 10%] 2025-12-04T13:20:27.8721705Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128 PASSED [0.8339s] [ 10%] 2025-12-04T13:20:27.8721816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16 PASSED [0.8319s] [ 10%] 2025-12-04T13:20:27.8721926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32 PASSED [0.8350s] [ 10%] 2025-12-04T13:20:27.8722036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32 PASSED [0.8337s] [ 10%] 2025-12-04T13:20:27.8722147Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16 PASSED [0.8449s] [ 10%] 2025-12-04T13:20:27.8722263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128 PASSED [0.8219s] [ 10%] 2025-12-04T13:20:27.8722374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16 PASSED [0.8276s] [ 10%] 2025-12-04T13:20:27.8722483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8 PASSED [0.8276s] [ 10%] 2025-12-04T13:20:27.8722598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64 PASSED [0.8390s] [ 10%] 2025-12-04T13:20:27.8722729Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16 PASSED [0.1145s] [ 10%] 2025-12-04T13:20:27.8722842Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float16 PASSED [0.0938s] [ 10%] 2025-12-04T13:20:27.8722980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32 PASSED [0.0121s] [ 10%] 2025-12-04T13:20:27.8723104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64 PASSED [0.0536s] [ 10%] 2025-12-04T13:20:27.8723242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_uint8 PASSED [0.8515s] [ 10%] 2025-12-04T13:20:27.8723423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64 PASSED [0.8814s] [ 10%] 2025-12-04T13:20:27.8723549Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16 PASSED [0.8435s] [ 10%] 2025-12-04T13:20:27.8723672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32 PASSED [0.0596s] [ 10%] 2025-12-04T13:20:27.8723807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32 PASSED [0.8772s] [ 10%] 2025-12-04T13:20:27.8723933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64 PASSED [0.8787s] [ 10%] 2025-12-04T13:20:27.8724058Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8 PASSED [0.8736s] [ 10%] 2025-12-04T13:20:27.8724208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [0.8657s] [ 10%] 2025-12-04T13:20:27.8724338Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64 PASSED [0.8382s] [ 10%] 2025-12-04T13:20:27.8724488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [0.8608s] [ 10%] 2025-12-04T13:20:27.8724653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [0.0074s] [ 10%] 2025-12-04T13:20:27.8724797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [0.0271s] [ 10%] 2025-12-04T13:20:27.8724928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [0.0809s] [ 10%] 2025-12-04T13:20:27.8725089Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [0.0226s] [ 10%] 2025-12-04T13:20:27.8725228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [0.0092s] [ 10%] 2025-12-04T13:20:27.8725355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64 PASSED [0.9842s] [ 10%] 2025-12-04T13:20:27.8725491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bfloat16 PASSED [0.0133s] [ 10%] 2025-12-04T13:20:27.8725627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_bool PASSED [0.0117s] [ 11%] 2025-12-04T13:20:27.8725763Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_bool PASSED [0.0096s] [ 11%] 2025-12-04T13:20:27.8725903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float16 PASSED [0.0109s] [ 11%] 2025-12-04T13:20:27.8726039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int16 PASSED [0.0110s] [ 11%] 2025-12-04T13:20:27.8726176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int8 PASSED [0.0108s] [ 11%] 2025-12-04T13:20:27.8726317Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [0.0718s] [ 11%] 2025-12-04T13:20:27.8726459Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [0.0640s] [ 11%] 2025-12-04T13:20:27.8726617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [0.0613s] [ 11%] 2025-12-04T13:20:27.8726754Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [0.8885s] [ 11%] 2025-12-04T13:20:27.8726879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16 PASSED [0.8748s] [ 11%] 2025-12-04T13:20:27.8727002Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32 PASSED [0.8645s] [ 11%] 2025-12-04T13:20:27.8727138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16 PASSED [0.8758s] [ 11%] 2025-12-04T13:20:27.8727262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32 PASSED [0.8767s] [ 11%] 2025-12-04T13:20:27.8727384Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64 PASSED [0.8701s] [ 11%] 2025-12-04T13:20:27.8727505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8 PASSED [0.8713s] [ 11%] 2025-12-04T13:20:27.8727627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8 PASSED [0.8738s] [ 11%] 2025-12-04T13:20:27.8727749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16 PASSED [0.8961s] [ 11%] 2025-12-04T13:20:27.8727887Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_smooth_l1_loss_cuda_float16 PASSED [0.8511s] [ 11%] 2025-12-04T13:20:27.8728025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [0.8399s] [ 11%] 2025-12-04T13:20:27.8728175Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [0.8532s] [ 11%] 2025-12-04T13:20:27.8728315Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [0.8365s] [ 11%] 2025-12-04T13:20:27.8728463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16 PASSED [0.8840s] [ 11%] 2025-12-04T13:20:27.8728601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [0.8596s] [ 11%] 2025-12-04T13:20:27.8728735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32 PASSED [0.8223s] [ 11%] 2025-12-04T13:20:27.8728875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8 PASSED [0.8188s] [ 11%] 2025-12-04T13:20:27.8729007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32 PASSED [0.8387s] [ 11%] 2025-12-04T13:20:27.8729138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32 PASSED [0.8315s] [ 11%] 2025-12-04T13:20:27.8729283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [0.8102s] [ 11%] 2025-12-04T13:20:27.8729429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [0.8026s] [ 11%] 2025-12-04T13:20:27.8729570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [0.7981s] [ 11%] 2025-12-04T13:20:27.8729681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32 PASSED [0.8228s] [ 11%] 2025-12-04T13:20:27.8729789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64 PASSED [0.8309s] [ 11%] 2025-12-04T13:20:27.8729917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_complex128 PASSED [0.8018s] [ 11%] 2025-12-04T13:20:27.8730040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal__in_place_cuda_float32 PASSED [0.7984s] [ 11%] 2025-12-04T13:20:27.8730151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float32 PASSED [0.8004s] [ 11%] 2025-12-04T13:20:27.8730274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float64 PASSED [0.8143s] [ 12%] 2025-12-04T13:20:27.8730401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float64 PASSED [0.8101s] [ 12%] 2025-12-04T13:20:27.8730507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool PASSED [0.8023s] [ 12%] 2025-12-04T13:20:27.8730626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float32 PASSED [0.8434s] [ 12%] 2025-12-04T13:20:27.8730739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_int8 PASSED [0.8407s] [ 12%] 2025-12-04T13:20:27.8730860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool PASSED [0.8479s] [ 12%] 2025-12-04T13:20:27.8730974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64 PASSED [0.8480s] [ 12%] 2025-12-04T13:20:27.8731085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16 PASSED [0.8555s] [ 12%] 2025-12-04T13:20:27.8731196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8 PASSED [0.8429s] [ 12%] 2025-12-04T13:20:27.8731308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16 PASSED [0.8158s] [ 12%] 2025-12-04T13:20:27.8731420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16 PASSED [0.8093s] [ 12%] 2025-12-04T13:20:27.8731529Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16 PASSED [0.8119s] [ 12%] 2025-12-04T13:20:27.8731640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8 PASSED [0.8075s] [ 12%] 2025-12-04T13:20:27.8731749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128 PASSED [0.8904s] [ 12%] 2025-12-04T13:20:27.8731855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16 PASSED [0.0973s] [ 12%] 2025-12-04T13:20:27.8731958Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool PASSED [0.8192s] [ 12%] 2025-12-04T13:20:27.8732077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8 PASSED [0.8174s] [ 12%] 2025-12-04T13:20:27.8732184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8 PASSED [0.8312s] [ 12%] 2025-12-04T13:20:27.8732295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_float64 PASSED [0.8273s] [ 12%] 2025-12-04T13:20:27.8732405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rad2deg_cuda_int16 PASSED [0.8270s] [ 12%] 2025-12-04T13:20:27.8732527Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32 PASSED [0.8147s] [ 12%] 2025-12-04T13:20:27.8732636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16 PASSED [0.8075s] [ 12%] 2025-12-04T13:20:27.8732745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32 PASSED [0.8046s] [ 12%] 2025-12-04T13:20:27.8732853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64 PASSED [0.7957s] [ 12%] 2025-12-04T13:20:27.8732963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16 PASSED [0.7988s] [ 12%] 2025-12-04T13:20:27.8733070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64 PASSED [0.8041s] [ 12%] 2025-12-04T13:20:27.8733182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128 PASSED [0.8379s] [ 12%] 2025-12-04T13:20:27.8733321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32 PASSED [0.8273s] [ 12%] 2025-12-04T13:20:27.8733427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8 PASSED [0.8260s] [ 12%] 2025-12-04T13:20:27.8733545Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16 PASSED [0.8358s] [ 12%] 2025-12-04T13:20:27.8733655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool PASSED [0.8385s] [ 12%] 2025-12-04T13:20:27.8733769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32 PASSED [0.8315s] [ 12%] 2025-12-04T13:20:27.8733896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16 PASSED [0.8266s] [ 12%] 2025-12-04T13:20:27.8734010Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16 PASSED [0.1124s] [ 12%] 2025-12-04T13:20:27.8734122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16 PASSED [0.1029s] [ 13%] 2025-12-04T13:20:27.8734234Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32 PASSED [0.0954s] [ 13%] 2025-12-04T13:20:27.8734345Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16 PASSED [0.0959s] [ 13%] 2025-12-04T13:20:27.8734469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_complex64 PASSED [0.0098s] [ 13%] 2025-12-04T13:20:27.8734578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64 PASSED [0.0412s] [ 13%] 2025-12-04T13:20:27.8734686Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8 PASSED [0.0418s] [ 13%] 2025-12-04T13:20:27.8734799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32 PASSED [0.0322s] [ 13%] 2025-12-04T13:20:27.8734911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64 PASSED [0.0404s] [ 13%] 2025-12-04T13:20:27.8735019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64 PASSED [0.0148s] [ 13%] 2025-12-04T13:20:27.8735127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8 PASSED [0.0217s] [ 13%] 2025-12-04T13:20:27.8735237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128 PASSED [0.0957s] [ 13%] 2025-12-04T13:20:27.8735348Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64 PASSED [0.0957s] [ 13%] 2025-12-04T13:20:27.8735455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16 PASSED [0.0847s] [ 13%] 2025-12-04T13:20:27.8735571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bool PASSED [0.0094s] [ 13%] 2025-12-04T13:20:27.8735714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float16 PASSED [0.0072s] [ 13%] 2025-12-04T13:20:27.8735832Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_float64 PASSED [0.0070s] [ 13%] 2025-12-04T13:20:27.8735937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64 PASSED [0.8454s] [ 13%] 2025-12-04T13:20:27.8736055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8 PASSED [0.8300s] [ 13%] 2025-12-04T13:20:27.8736163Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool PASSED [0.8558s] [ 13%] 2025-12-04T13:20:27.8736278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128 PASSED [0.9824s] [ 13%] 2025-12-04T13:20:27.8736389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32 PASSED [0.0380s] [ 13%] 2025-12-04T13:20:27.8736498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16 PASSED [0.0284s] [ 13%] 2025-12-04T13:20:27.8736602Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool PASSED [0.0254s] [ 13%] 2025-12-04T13:20:27.8736709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32 PASSED [0.0228s] [ 13%] 2025-12-04T13:20:27.8736814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16 PASSED [0.8253s] [ 13%] 2025-12-04T13:20:27.8736920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64 PASSED [0.8361s] [ 13%] 2025-12-04T13:20:27.8737030Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32 PASSED [0.8472s] [ 13%] 2025-12-04T13:20:27.8737138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8 PASSED [0.8392s] [ 13%] 2025-12-04T13:20:27.8737245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16 PASSED [0.8369s] [ 13%] 2025-12-04T13:20:27.8737354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16 PASSED [0.0482s] [ 13%] 2025-12-04T13:20:27.8737474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool PASSED [0.0344s] [ 13%] 2025-12-04T13:20:27.8737580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64 PASSED [0.8592s] [ 13%] 2025-12-04T13:20:27.8737686Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32 PASSED [0.8441s] [ 13%] 2025-12-04T13:20:27.8737814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16 PASSED [0.0176s] [ 13%] 2025-12-04T13:20:27.8737940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64 PASSED [0.8167s] [ 13%] 2025-12-04T13:20:27.8738073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8 PASSED [0.8219s] [ 14%] 2025-12-04T13:20:27.8738197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32 PASSED [0.8470s] [ 14%] 2025-12-04T13:20:27.8738319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32 PASSED [0.0287s] [ 14%] 2025-12-04T13:20:27.8738439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8 PASSED [0.8260s] [ 14%] 2025-12-04T13:20:27.8738558Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64 PASSED [0.0670s] [ 14%] 2025-12-04T13:20:27.8738672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64 PASSED [0.8637s] [ 14%] 2025-12-04T13:20:27.8738789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32 PASSED [0.8389s] [ 14%] 2025-12-04T13:20:27.8738905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64 PASSED [0.0229s] [ 14%] 2025-12-04T13:20:27.8739019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8 PASSED [0.8220s] [ 14%] 2025-12-04T13:20:27.8739136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16 PASSED [0.8358s] [ 14%] 2025-12-04T13:20:27.8739260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool PASSED [0.0263s] [ 14%] 2025-12-04T13:20:27.8739374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bfloat16 PASSED [0.0297s] [ 14%] 2025-12-04T13:20:27.8739488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32 PASSED [0.8434s] [ 14%] 2025-12-04T13:20:27.8739599Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64 PASSED [0.8336s] [ 14%] 2025-12-04T13:20:27.8739728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bfloat16 PASSED [0.8432s] [ 14%] 2025-12-04T13:20:27.8739840Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16 PASSED [0.0225s] [ 14%] 2025-12-04T13:20:27.8739982Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [0.8082s] [ 14%] 2025-12-04T13:20:27.8740120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [0.8086s] [ 14%] 2025-12-04T13:20:27.8740237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16 PASSED [0.8693s] [ 14%] 2025-12-04T13:20:27.8740352Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32 PASSED [0.8422s] [ 14%] 2025-12-04T13:20:27.8740466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8 PASSED [0.8369s] [ 14%] 2025-12-04T13:20:27.8740616Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [0.8900s] [ 14%] 2025-12-04T13:20:27.8740764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [0.8466s] [ 14%] 2025-12-04T13:20:27.8740908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [0.8637s] [ 14%] 2025-12-04T13:20:27.8741076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [0.0724s] [ 14%] 2025-12-04T13:20:27.8741222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [0.8622s] [ 14%] 2025-12-04T13:20:27.8741366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [0.8675s] [ 14%] 2025-12-04T13:20:27.8741481Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8 PASSED [0.8452s] [ 14%] 2025-12-04T13:20:27.8741620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32 PASSED [0.0066s] [ 14%] 2025-12-04T13:20:27.8741771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64 PASSED [0.0051s] [ 14%] 2025-12-04T13:20:27.8741906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16 PASSED [0.0053s] [ 14%] 2025-12-04T13:20:27.8742041Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64 PASSED [0.8336s] [ 14%] 2025-12-04T13:20:27.8742174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8 PASSED [0.8366s] [ 14%] 2025-12-04T13:20:27.8742311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64 PASSED [0.8489s] [ 14%] 2025-12-04T13:20:27.8742429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool PASSED [0.1575s] [ 14%] 2025-12-04T13:20:27.8742551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16 PASSED [0.1406s] [ 15%] 2025-12-04T13:20:27.8742671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32 PASSED [0.9435s] [ 15%] 2025-12-04T13:20:27.8742788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64 PASSED [7.3771s] [ 15%] 2025-12-04T13:20:27.8742901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8 PASSED [0.1136s] [ 15%] 2025-12-04T13:20:27.8743031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_bool PASSED [0.0056s] [ 15%] 2025-12-04T13:20:27.8743154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_float64 PASSED [0.0057s] [ 15%] 2025-12-04T13:20:27.8743306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_int64 PASSED [0.8356s] [ 15%] 2025-12-04T13:20:27.8743432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16 PASSED [0.8460s] [ 15%] 2025-12-04T13:20:27.8743542Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32 PASSED [0.8728s] [ 15%] 2025-12-04T13:20:27.8743654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64 PASSED [0.0389s] [ 15%] 2025-12-04T13:20:27.8743763Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16 PASSED [0.0235s] [ 15%] 2025-12-04T13:20:27.8743871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8 PASSED [0.0185s] [ 15%] 2025-12-04T13:20:27.8743983Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16 PASSED [0.0336s] [ 15%] 2025-12-04T13:20:27.8744092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64 PASSED [0.8623s] [ 15%] 2025-12-04T13:20:27.8744197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8 PASSED [0.8529s] [ 15%] 2025-12-04T13:20:27.8744310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int8 PASSED [0.8280s] [ 15%] 2025-12-04T13:20:27.8744421Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32 PASSED [0.8320s] [ 15%] 2025-12-04T13:20:27.8744550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_complex128 PASSED [0.8356s] [ 15%] 2025-12-04T13:20:27.8744668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int16 PASSED [0.8354s] [ 15%] 2025-12-04T13:20:27.8744803Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_multiple_cuda_int64 PASSED [0.8384s] [ 15%] 2025-12-04T13:20:27.8744915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128 PASSED [0.0120s] [ 15%] 2025-12-04T13:20:27.8745024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32 PASSED [0.0105s] [ 15%] 2025-12-04T13:20:27.8745134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64 PASSED [0.0104s] [ 15%] 2025-12-04T13:20:27.8745243Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16 PASSED [0.0104s] [ 15%] 2025-12-04T13:20:27.8745366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32 PASSED [0.0103s] [ 15%] 2025-12-04T13:20:27.8745472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16 PASSED [0.0105s] [ 15%] 2025-12-04T13:20:27.8745578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8 PASSED [0.0103s] [ 15%] 2025-12-04T13:20:27.8745689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128 PASSED [0.8443s] [ 15%] 2025-12-04T13:20:27.8745794Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16 PASSED [0.8351s] [ 15%] 2025-12-04T13:20:27.8745900Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16 PASSED [0.1074s] [ 15%] 2025-12-04T13:20:27.8746009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32 PASSED [0.1565s] [ 15%] 2025-12-04T13:20:27.8746113Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64 PASSED [0.0949s] [ 15%] 2025-12-04T13:20:27.8746218Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8 PASSED [0.0931s] [ 15%] 2025-12-04T13:20:27.8746323Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8 PASSED [0.0924s] [ 15%] 2025-12-04T13:20:27.8746427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8 PASSED [0.0161s] [ 15%] 2025-12-04T13:20:27.8746543Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16 PASSED [0.0097s] [ 16%] 2025-12-04T13:20:27.8746670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16 PASSED [0.8377s] [ 16%] 2025-12-04T13:20:27.8746783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_complex128 PASSED [0.8307s] [ 16%] 2025-12-04T13:20:27.8746894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float32 PASSED [0.8244s] [ 16%] 2025-12-04T13:20:27.8747016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8 PASSED [0.8314s] [ 16%] 2025-12-04T13:20:27.8747123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16 PASSED [0.8345s] [ 16%] 2025-12-04T13:20:27.8747240Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int16 PASSED [0.8889s] [ 16%] 2025-12-04T13:20:27.8747356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int8 PASSED [0.0103s] [ 16%] 2025-12-04T13:20:27.8747468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64 PASSED [0.0434s] [ 16%] 2025-12-04T13:20:27.8747572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32 PASSED [0.8466s] [ 16%] 2025-12-04T13:20:27.8747676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64 PASSED [0.8430s] [ 16%] 2025-12-04T13:20:27.8747781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32 PASSED [0.8512s] [ 16%] 2025-12-04T13:20:27.8747888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32 PASSED [0.8378s] [ 16%] 2025-12-04T13:20:27.8748010Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128 PASSED [0.8295s] [ 16%] 2025-12-04T13:20:27.8748128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16 PASSED [0.8331s] [ 16%] 2025-12-04T13:20:27.8748242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16 PASSED [0.8307s] [ 16%] 2025-12-04T13:20:27.8748367Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64 PASSED [0.8215s] [ 16%] 2025-12-04T13:20:27.8748474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64 PASSED [0.8353s] [ 16%] 2025-12-04T13:20:27.8748578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64 PASSED [0.8338s] [ 16%] 2025-12-04T13:20:27.8748686Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16 PASSED [0.8146s] [ 16%] 2025-12-04T13:20:27.8748798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32 PASSED [0.8208s] [ 16%] 2025-12-04T13:20:27.8748940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex64 PASSED [0.0087s] [ 16%] 2025-12-04T13:20:27.8749062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float16 PASSED [0.8187s] [ 16%] 2025-12-04T13:20:27.8749179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int64 PASSED [0.8265s] [ 16%] 2025-12-04T13:20:27.8749297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_int8 PASSED [0.8251s] [ 16%] 2025-12-04T13:20:27.8749414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128 PASSED [0.8304s] [ 16%] 2025-12-04T13:20:27.8749531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32 PASSED [0.8169s] [ 16%] 2025-12-04T13:20:27.8749648Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64 PASSED [0.8343s] [ 16%] 2025-12-04T13:20:27.8749761Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64 PASSED [0.8345s] [ 16%] 2025-12-04T13:20:27.8749869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16 PASSED [0.8329s] [ 16%] 2025-12-04T13:20:27.8749975Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8 PASSED [0.8304s] [ 16%] 2025-12-04T13:20:27.8750087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32 PASSED [0.8408s] [ 16%] 2025-12-04T13:20:27.8750203Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16 PASSED [0.8453s] [ 16%] 2025-12-04T13:20:27.8750309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32 PASSED [0.8371s] [ 16%] 2025-12-04T13:20:27.8750413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64 PASSED [0.8284s] [ 16%] 2025-12-04T13:20:27.8750530Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8 PASSED [0.8343s] [ 17%] 2025-12-04T13:20:27.8750648Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16 PASSED [0.9263s] [ 17%] 2025-12-04T13:20:27.8750762Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool PASSED [0.9114s] [ 17%] 2025-12-04T13:20:27.8750870Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64 PASSED [0.8388s] [ 17%] 2025-12-04T13:20:27.8750976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16 PASSED [0.8401s] [ 17%] 2025-12-04T13:20:27.8751084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64 PASSED [0.8668s] [ 17%] 2025-12-04T13:20:27.8751189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8 PASSED [0.8604s] [ 17%] 2025-12-04T13:20:27.8751307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bfloat16 PASSED [0.8420s] [ 17%] 2025-12-04T13:20:27.8751421Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bool PASSED [0.8268s] [ 17%] 2025-12-04T13:20:27.8751538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex64 PASSED [0.8351s] [ 17%] 2025-12-04T13:20:27.8751652Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int16 PASSED [0.8341s] [ 17%] 2025-12-04T13:20:27.8751763Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_uint8 PASSED [0.8339s] [ 17%] 2025-12-04T13:20:27.8751882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool PASSED [0.8432s] [ 17%] 2025-12-04T13:20:27.8751994Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32 PASSED [0.8408s] [ 17%] 2025-12-04T13:20:27.8752102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32 PASSED [0.8444s] [ 17%] 2025-12-04T13:20:27.8752208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64 PASSED [0.8466s] [ 17%] 2025-12-04T13:20:27.8752318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool PASSED [0.0107s] [ 17%] 2025-12-04T13:20:27.8752431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16 PASSED [0.0094s] [ 17%] 2025-12-04T13:20:27.8752553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64 PASSED [0.0093s] [ 17%] 2025-12-04T13:20:27.8752663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8 PASSED [0.0092s] [ 17%] 2025-12-04T13:20:27.8752769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool PASSED [0.8529s] [ 17%] 2025-12-04T13:20:27.8752878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16 PASSED [0.8312s] [ 17%] 2025-12-04T13:20:27.8753002Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_complex32 PASSED [0.8382s] [ 17%] 2025-12-04T13:20:27.8753115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64 PASSED [0.8344s] [ 17%] 2025-12-04T13:20:27.8753227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8 PASSED [0.8369s] [ 17%] 2025-12-04T13:20:27.8753365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16 PASSED [0.8432s] [ 17%] 2025-12-04T13:20:27.8753476Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex128 PASSED [0.8335s] [ 17%] 2025-12-04T13:20:27.8753585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex64 PASSED [0.8225s] [ 17%] 2025-12-04T13:20:27.8753723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float16 PASSED [0.8235s] [ 17%] 2025-12-04T13:20:27.8753838Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bfloat16 PASSED [0.8465s] [ 17%] 2025-12-04T13:20:27.8753947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_uint8 PASSED [0.8341s] [ 17%] 2025-12-04T13:20:27.8754053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8 PASSED [0.8778s] [ 17%] 2025-12-04T13:20:27.8754178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16 PASSED [0.8467s] [ 17%] 2025-12-04T13:20:27.8754285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool PASSED [0.8416s] [ 17%] 2025-12-04T13:20:27.8754396Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64 PASSED [0.8494s] [ 17%] 2025-12-04T13:20:27.8754505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32 PASSED [0.8351s] [ 18%] 2025-12-04T13:20:27.8754614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8 PASSED [0.8396s] [ 18%] 2025-12-04T13:20:27.8754722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16 PASSED [0.8323s] [ 18%] 2025-12-04T13:20:27.8754831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32 PASSED [0.8395s] [ 18%] 2025-12-04T13:20:27.8754938Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32 PASSED [0.8370s] [ 18%] 2025-12-04T13:20:27.8755045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16 PASSED [0.1297s] [ 18%] 2025-12-04T13:20:27.8755151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32 PASSED [0.1235s] [ 18%] 2025-12-04T13:20:27.8755260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16 PASSED [0.8523s] [ 18%] 2025-12-04T13:20:27.8755368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32 PASSED [0.8354s] [ 18%] 2025-12-04T13:20:27.8755487Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64 PASSED [0.8324s] [ 18%] 2025-12-04T13:20:27.8755599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool PASSED [0.8340s] [ 18%] 2025-12-04T13:20:27.8755720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32 PASSED [0.8348s] [ 18%] 2025-12-04T13:20:27.8755840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64 PASSED [0.8359s] [ 18%] 2025-12-04T13:20:27.8755954Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64 PASSED [0.8405s] [ 18%] 2025-12-04T13:20:27.8756111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool PASSED [0.8563s] [ 18%] 2025-12-04T13:20:27.8756258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32 PASSED [0.8449s] [ 18%] 2025-12-04T13:20:27.8756395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool PASSED [0.8470s] [ 18%] 2025-12-04T13:20:27.8756535Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16 PASSED [0.8425s] [ 18%] 2025-12-04T13:20:27.8756670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64 PASSED [0.8438s] [ 18%] 2025-12-04T13:20:27.8756813Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64 PASSED [0.8598s] [ 18%] 2025-12-04T13:20:27.8756959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16 PASSED [0.8590s] [ 18%] 2025-12-04T13:20:27.8757104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64 PASSED [0.8495s] [ 18%] 2025-12-04T13:20:27.8757244Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64 PASSED [0.8536s] [ 18%] 2025-12-04T13:20:27.8757382Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8 PASSED [0.8616s] [ 18%] 2025-12-04T13:20:27.8757533Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32 PASSED [0.8778s] [ 18%] 2025-12-04T13:20:27.8757667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16 PASSED [0.8403s] [ 18%] 2025-12-04T13:20:27.8757811Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64 PASSED [0.8672s] [ 18%] 2025-12-04T13:20:27.8757967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64 PASSED [0.8589s] [ 18%] 2025-12-04T13:20:27.8758108Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32 PASSED [0.8430s] [ 18%] 2025-12-04T13:20:27.8758245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32 PASSED [0.8335s] [ 18%] 2025-12-04T13:20:27.8758385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8 PASSED [0.8368s] [ 18%] 2025-12-04T13:20:27.8758529Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128 PASSED [0.8466s] [ 18%] 2025-12-04T13:20:27.8758668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32 PASSED [0.8404s] [ 18%] 2025-12-04T13:20:27.8758808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64 PASSED [0.8418s] [ 18%] 2025-12-04T13:20:27.8758947Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16 PASSED [0.8384s] [ 19%] 2025-12-04T13:20:27.8759085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64 PASSED [0.8459s] [ 19%] 2025-12-04T13:20:27.8759220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16 PASSED [0.8381s] [ 19%] 2025-12-04T13:20:27.8759370Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32 PASSED [0.8385s] [ 19%] 2025-12-04T13:20:27.8759506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8 PASSED [0.8471s] [ 19%] 2025-12-04T13:20:27.8759642Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8 PASSED [0.8434s] [ 19%] 2025-12-04T13:20:27.8759779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16 PASSED [0.8372s] [ 19%] 2025-12-04T13:20:27.8759931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32 PASSED [0.8334s] [ 19%] 2025-12-04T13:20:27.8760068Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64 PASSED [0.8407s] [ 19%] 2025-12-04T13:20:27.8760202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16 PASSED [0.8381s] [ 19%] 2025-12-04T13:20:27.8760337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8 PASSED [0.8367s] [ 19%] 2025-12-04T13:20:27.8760475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32 PASSED [0.8397s] [ 19%] 2025-12-04T13:20:27.8760611Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64 PASSED [0.8317s] [ 19%] 2025-12-04T13:20:27.8760747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16 PASSED [0.8342s] [ 19%] 2025-12-04T13:20:27.8760882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64 PASSED [0.8463s] [ 19%] 2025-12-04T13:20:27.8761022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float32 PASSED [0.8698s] [ 19%] 2025-12-04T13:20:27.8761164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16 PASSED [0.8360s] [ 19%] 2025-12-04T13:20:27.8761317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64 PASSED [0.8415s] [ 19%] 2025-12-04T13:20:27.8761454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16 PASSED [0.8386s] [ 19%] 2025-12-04T13:20:27.8761579Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64 PASSED [0.8617s] [ 19%] 2025-12-04T13:20:27.8761708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8 PASSED [0.8373s] [ 19%] 2025-12-04T13:20:27.8761826Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8 PASSED [0.8393s] [ 19%] 2025-12-04T13:20:27.8761950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32 PASSED [0.8573s] [ 19%] 2025-12-04T13:20:27.8762071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64 PASSED [0.0178s] [ 19%] 2025-12-04T13:20:27.8762193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16 PASSED [0.0213s] [ 19%] 2025-12-04T13:20:27.8762311Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32 PASSED [0.0152s] [ 19%] 2025-12-04T13:20:27.8762436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16 PASSED [0.8718s] [ 19%] 2025-12-04T13:20:27.8762559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32 PASSED [0.8387s] [ 19%] 2025-12-04T13:20:27.8762681Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64 PASSED [0.8502s] [ 19%] 2025-12-04T13:20:27.8762802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8 PASSED [0.8521s] [ 19%] 2025-12-04T13:20:27.8762923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8 PASSED [0.8476s] [ 19%] 2025-12-04T13:20:27.8763060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16 PASSED [0.8459s] [ 19%] 2025-12-04T13:20:27.8763182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64 PASSED [0.8306s] [ 19%] 2025-12-04T13:20:27.8763343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16 PASSED [0.8327s] [ 19%] 2025-12-04T13:20:27.8763473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bfloat16 PASSED [0.8285s] [ 19%] 2025-12-04T13:20:27.8763608Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32 PASSED [0.8278s] [ 20%] 2025-12-04T13:20:27.8763751Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_float32 PASSED [0.8276s] [ 20%] 2025-12-04T13:20:27.8763879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_uint8 PASSED [0.8264s] [ 20%] 2025-12-04T13:20:27.8763998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32 PASSED [0.8278s] [ 20%] 2025-12-04T13:20:27.8764129Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16 PASSED [0.8347s] [ 20%] 2025-12-04T13:20:27.8764246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8 PASSED [0.8372s] [ 20%] 2025-12-04T13:20:27.8764364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8 PASSED [0.8319s] [ 20%] 2025-12-04T13:20:27.8764483Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16 PASSED [0.8610s] [ 20%] 2025-12-04T13:20:27.8764600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32 PASSED [0.8361s] [ 20%] 2025-12-04T13:20:27.8764717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8 PASSED [0.8359s] [ 20%] 2025-12-04T13:20:27.8764836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16 PASSED [0.8448s] [ 20%] 2025-12-04T13:20:27.8764966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8 PASSED [0.8460s] [ 20%] 2025-12-04T13:20:27.8765091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16 PASSED [0.0119s] [ 20%] 2025-12-04T13:20:27.8765211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64 PASSED [0.0083s] [ 20%] 2025-12-04T13:20:27.8765347Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_float64 PASSED [0.0044s] [ 20%] 2025-12-04T13:20:27.8765494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_int64 PASSED [0.0039s] [ 20%] 2025-12-04T13:20:27.8765628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_copy_cuda_uint8 PASSED [0.0038s] [ 20%] 2025-12-04T13:20:27.8765758Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16 PASSED [0.0049s] [ 20%] 2025-12-04T13:20:27.8765891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128 PASSED [0.0048s] [ 20%] 2025-12-04T13:20:27.8766021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64 PASSED [0.0046s] [ 20%] 2025-12-04T13:20:27.8766146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32 PASSED [0.0041s] [ 20%] 2025-12-04T13:20:27.8766270Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8 PASSED [0.0041s] [ 20%] 2025-12-04T13:20:27.8766395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8 PASSED [0.0041s] [ 20%] 2025-12-04T13:20:27.8766546Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32 PASSED [0.0040s] [ 20%] 2025-12-04T13:20:27.8766692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64 PASSED [0.0039s] [ 20%] 2025-12-04T13:20:27.8766849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32 PASSED [0.0055s] [ 20%] 2025-12-04T13:20:27.8766972Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16 PASSED [0.0158s] [ 20%] 2025-12-04T13:20:27.8767098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128 PASSED [0.0270s] [ 20%] 2025-12-04T13:20:27.8767218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64 PASSED [0.8495s] [ 20%] 2025-12-04T13:20:27.8767337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8 PASSED [0.8477s] [ 20%] 2025-12-04T13:20:27.8767467Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool PASSED [0.8400s] [ 20%] 2025-12-04T13:20:27.8767594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128 PASSED [0.8630s] [ 20%] 2025-12-04T13:20:27.8767717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32 PASSED [0.8487s] [ 20%] 2025-12-04T13:20:27.8767837Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16 PASSED [0.8449s] [ 20%] 2025-12-04T13:20:27.8767959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64 PASSED [0.8685s] [ 20%] 2025-12-04T13:20:27.8768077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16 PASSED [0.8492s] [ 21%] 2025-12-04T13:20:27.8768196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8 PASSED [0.8443s] [ 21%] 2025-12-04T13:20:27.8768316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32 PASSED [0.8457s] [ 21%] 2025-12-04T13:20:27.8768438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64 PASSED [0.8498s] [ 21%] 2025-12-04T13:20:27.8768555Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16 PASSED [0.8409s] [ 21%] 2025-12-04T13:20:27.8768696Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16 PASSED [0.8418s] [ 21%] 2025-12-04T13:20:27.8768829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64 PASSED [0.8453s] [ 21%] 2025-12-04T13:20:27.8768959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16 PASSED [0.8457s] [ 21%] 2025-12-04T13:20:27.8769087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16 PASSED [0.8390s] [ 21%] 2025-12-04T13:20:27.8769227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32 PASSED [0.8436s] [ 21%] 2025-12-04T13:20:27.8769355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16 PASSED [0.8823s] [ 21%] 2025-12-04T13:20:27.8769492Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8 PASSED [0.8842s] [ 21%] 2025-12-04T13:20:27.8769619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool PASSED [0.0156s] [ 21%] 2025-12-04T13:20:27.8769746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64 PASSED [0.0123s] [ 21%] 2025-12-04T13:20:27.8769871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8 PASSED [0.0116s] [ 21%] 2025-12-04T13:20:27.8770010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64 PASSED [0.0369s] [ 21%] 2025-12-04T13:20:27.8770138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool PASSED [0.8712s] [ 21%] 2025-12-04T13:20:27.8770268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32 PASSED [0.8650s] [ 21%] 2025-12-04T13:20:27.8770393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_bool PASSED [0.8322s] [ 21%] 2025-12-04T13:20:27.8770528Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int8 PASSED [0.8404s] [ 21%] 2025-12-04T13:20:27.8770655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_uint8 PASSED [0.8545s] [ 21%] 2025-12-04T13:20:27.8770792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32 PASSED [1.3232s] [ 21%] 2025-12-04T13:20:27.8770928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16 PASSED [1.4184s] [ 21%] 2025-12-04T13:20:27.8771064Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32 PASSED [1.4174s] [ 21%] 2025-12-04T13:20:27.8771202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64 PASSED [1.4355s] [ 21%] 2025-12-04T13:20:27.8771325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8 PASSED [1.4380s] [ 21%] 2025-12-04T13:20:27.8771447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16 PASSED [0.0094s] [ 21%] 2025-12-04T13:20:27.8771565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool PASSED [0.0066s] [ 21%] 2025-12-04T13:20:27.8771687Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32 PASSED [0.0077s] [ 21%] 2025-12-04T13:20:27.8771806Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32 PASSED [0.0073s] [ 21%] 2025-12-04T13:20:27.8771924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16 PASSED [0.0064s] [ 21%] 2025-12-04T13:20:27.8772040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8 PASSED [0.0063s] [ 21%] 2025-12-04T13:20:27.8772158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32 PASSED [0.0113s] [ 21%] 2025-12-04T13:20:27.8772278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8 PASSED [0.0106s] [ 21%] 2025-12-04T13:20:27.8772397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool PASSED [0.0089s] [ 21%] 2025-12-04T13:20:27.8772528Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32 PASSED [0.0090s] [ 22%] 2025-12-04T13:20:27.8772652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16 PASSED [0.0127s] [ 22%] 2025-12-04T13:20:27.8772781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16 PASSED [0.0341s] [ 22%] 2025-12-04T13:20:27.8772916Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32 PASSED [1.4337s] [ 22%] 2025-12-04T13:20:27.8773041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8 PASSED [1.4304s] [ 22%] 2025-12-04T13:20:27.8773165Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32 PASSED [1.4551s] [ 22%] 2025-12-04T13:20:27.8773321Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16 PASSED [1.4260s] [ 22%] 2025-12-04T13:20:27.8773441Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16 PASSED [1.4340s] [ 22%] 2025-12-04T13:20:27.8773560Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64 PASSED [1.4387s] [ 22%] 2025-12-04T13:20:27.8773678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8 PASSED [1.4463s] [ 22%] 2025-12-04T13:20:27.8773814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64 PASSED [1.4217s] [ 22%] 2025-12-04T13:20:27.8773948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32 PASSED [1.4206s] [ 22%] 2025-12-04T13:20:27.8774080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64 PASSED [1.4260s] [ 22%] 2025-12-04T13:20:27.8774202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16 PASSED [1.4369s] [ 22%] 2025-12-04T13:20:27.8774335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool PASSED [1.4535s] [ 22%] 2025-12-04T13:20:27.8774455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8 PASSED [1.4221s] [ 22%] 2025-12-04T13:20:27.8774588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16 PASSED [1.4240s] [ 22%] 2025-12-04T13:20:27.8774726Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16 PASSED [1.4371s] [ 22%] 2025-12-04T13:20:27.8774858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16 PASSED [1.4312s] [ 22%] 2025-12-04T13:20:27.8775005Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64 PASSED [1.4543s] [ 22%] 2025-12-04T13:20:27.8775130Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool PASSED [1.4274s] [ 22%] 2025-12-04T13:20:27.8775266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32 PASSED [1.4354s] [ 22%] 2025-12-04T13:20:27.8775398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64 PASSED [1.4353s] [ 22%] 2025-12-04T13:20:27.8775525Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32 PASSED [1.4406s] [ 22%] 2025-12-04T13:20:27.8775649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8 PASSED [1.4353s] [ 22%] 2025-12-04T13:20:27.8775777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8 PASSED [1.4271s] [ 22%] 2025-12-04T13:20:27.8775902Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8 PASSED [1.4937s] [ 22%] 2025-12-04T13:20:27.8776019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool PASSED [0.0193s] [ 22%] 2025-12-04T13:20:27.8776141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32 PASSED [0.4211s] [ 22%] 2025-12-04T13:20:27.8776274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32 PASSED [0.0155s] [ 22%] 2025-12-04T13:20:27.8776393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64 PASSED [1.4333s] [ 22%] 2025-12-04T13:20:27.8776516Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32 PASSED [1.4540s] [ 22%] 2025-12-04T13:20:27.8776659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32 PASSED [0.0183s] [ 22%] 2025-12-04T13:20:27.8776777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16 PASSED [0.0149s] [ 22%] 2025-12-04T13:20:27.8776895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32 PASSED [0.0146s] [ 22%] 2025-12-04T13:20:27.8777011Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8 PASSED [1.4600s] [ 23%] 2025-12-04T13:20:27.8777147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_bfloat16 PASSED [1.4233s] [ 23%] 2025-12-04T13:20:27.8777276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_int64 PASSED [1.4130s] [ 23%] 2025-12-04T13:20:27.8777406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_uint8 PASSED [1.4214s] [ 23%] 2025-12-04T13:20:27.8777530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int64 PASSED [1.4504s] [ 23%] 2025-12-04T13:20:27.8777652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_uint8 PASSED [1.4206s] [ 23%] 2025-12-04T13:20:27.8777775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64 PASSED [1.4220s] [ 23%] 2025-12-04T13:20:27.8777901Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_bfloat16 PASSED [0.0160s] [ 23%] 2025-12-04T13:20:27.8778034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int32 PASSED [0.0124s] [ 23%] 2025-12-04T13:20:27.8778154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64 PASSED [0.0060s] [ 23%] 2025-12-04T13:20:27.8778286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32 PASSED [0.0117s] [ 23%] 2025-12-04T13:20:27.8778414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32 PASSED [0.0111s] [ 23%] 2025-12-04T13:20:27.8778542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64 PASSED [0.0112s] [ 23%] 2025-12-04T13:20:27.8778699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32 PASSED [0.0097s] [ 23%] 2025-12-04T13:20:27.8778833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16 PASSED [0.0093s] [ 23%] 2025-12-04T13:20:27.8778965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16 PASSED [0.0082s] [ 23%] 2025-12-04T13:20:27.8779098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64 PASSED [0.0091s] [ 23%] 2025-12-04T13:20:27.8779226Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64 PASSED [0.0098s] [ 23%] 2025-12-04T13:20:27.8779353Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16 PASSED [0.0090s] [ 23%] 2025-12-04T13:20:27.8779477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16 PASSED [0.0074s] [ 23%] 2025-12-04T13:20:27.8779615Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16 PASSED [0.0081s] [ 23%] 2025-12-04T13:20:27.8779752Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64 PASSED [1.4260s] [ 23%] 2025-12-04T13:20:27.8779887Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16 PASSED [1.4302s] [ 23%] 2025-12-04T13:20:27.8780031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64 PASSED [1.4169s] [ 23%] 2025-12-04T13:20:27.8780166Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16 PASSED [1.4663s] [ 23%] 2025-12-04T13:20:27.8780301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8 PASSED [1.4540s] [ 23%] 2025-12-04T13:20:27.8780453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8 PASSED [1.4747s] [ 23%] 2025-12-04T13:20:27.8780588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16 PASSED [1.4662s] [ 23%] 2025-12-04T13:20:27.8780710Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool PASSED [1.4167s] [ 23%] 2025-12-04T13:20:27.8780834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16 PASSED [1.4051s] [ 23%] 2025-12-04T13:20:27.8780958Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16 PASSED [1.4392s] [ 23%] 2025-12-04T13:20:27.8781079Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8 PASSED [1.4309s] [ 23%] 2025-12-04T13:20:27.8781204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64 PASSED [1.4139s] [ 23%] 2025-12-04T13:20:27.8781385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128 SKIPPED [0.0002s] (Expected: empty is not comparable) [ 23%] 2025-12-04T13:20:27.8781557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 23%] 2025-12-04T13:20:27.8781747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_int32 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 24%] 2025-12-04T13:20:27.8781879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64 PASSED [0.0507s] [ 24%] 2025-12-04T13:20:27.8781999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16 PASSED [1.4456s] [ 24%] 2025-12-04T13:20:27.8782117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_bool PASSED [1.4214s] [ 24%] 2025-12-04T13:20:27.8782243Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_complex128 PASSED [1.4212s] [ 24%] 2025-12-04T13:20:27.8782366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_float64 PASSED [1.4167s] [ 24%] 2025-12-04T13:20:27.8782497Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_equal_cuda_int64 PASSED [1.4271s] [ 24%] 2025-12-04T13:20:27.8782613Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32 PASSED [1.4388s] [ 24%] 2025-12-04T13:20:27.8782731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8 PASSED [1.4363s] [ 24%] 2025-12-04T13:20:27.8782850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool PASSED [0.0212s] [ 24%] 2025-12-04T13:20:27.8782975Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16 PASSED [0.2575s] [ 24%] 2025-12-04T13:20:27.8783099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64 PASSED [0.1930s] [ 24%] 2025-12-04T13:20:27.8783223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64 PASSED [1.4321s] [ 24%] 2025-12-04T13:20:27.8783377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8 PASSED [1.4318s] [ 24%] 2025-12-04T13:20:27.8783498Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64 PASSED [1.7361s] [ 24%] 2025-12-04T13:20:27.8783627Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16 PASSED [1.4306s] [ 24%] 2025-12-04T13:20:27.8783767Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64 PASSED [1.4312s] [ 24%] 2025-12-04T13:20:27.8783890Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8 PASSED [1.4198s] [ 24%] 2025-12-04T13:20:27.8784024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_complex64 PASSED [1.4226s] [ 24%] 2025-12-04T13:20:27.8784167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_copy_cuda_int16 PASSED [1.4224s] [ 24%] 2025-12-04T13:20:27.8784294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128 PASSED [1.4304s] [ 24%] 2025-12-04T13:20:27.8784418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64 PASSED [1.4215s] [ 24%] 2025-12-04T13:20:27.8784540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16 PASSED [1.4414s] [ 24%] 2025-12-04T13:20:27.8784661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool PASSED [1.4347s] [ 24%] 2025-12-04T13:20:27.8784779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64 PASSED [1.4262s] [ 24%] 2025-12-04T13:20:27.8784969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exponential_cuda_float16 SKIPPED [0.0002s] (Expected: exponential is not comparable) [ 24%] 2025-12-04T13:20:27.8785086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool PASSED [1.4433s] [ 24%] 2025-12-04T13:20:27.8785210Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128 PASSED [1.4501s] [ 24%] 2025-12-04T13:20:27.8785341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float8_e4m3fnuz PASSED [1.4501s] [ 24%] 2025-12-04T13:20:27.8785464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool PASSED [1.4444s] [ 24%] 2025-12-04T13:20:27.8785607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32 PASSED [1.4421s] [ 24%] 2025-12-04T13:20:27.8785729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32 PASSED [1.4456s] [ 24%] 2025-12-04T13:20:27.8785851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8 PASSED [1.4203s] [ 24%] 2025-12-04T13:20:27.8785974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16 PASSED [1.4337s] [ 24%] 2025-12-04T13:20:27.8786098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16 PASSED [1.4163s] [ 25%] 2025-12-04T13:20:27.8786235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32 PASSED [1.4069s] [ 25%] 2025-12-04T13:20:27.8786358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64 PASSED [1.4211s] [ 25%] 2025-12-04T13:20:27.8786491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16 PASSED [1.4251s] [ 25%] 2025-12-04T13:20:27.8786619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool PASSED [1.4504s] [ 25%] 2025-12-04T13:20:27.8786754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128 PASSED [1.4090s] [ 25%] 2025-12-04T13:20:27.8786883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8 PASSED [1.4166s] [ 25%] 2025-12-04T13:20:27.8787004Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool PASSED [1.4283s] [ 25%] 2025-12-04T13:20:27.8787132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64 PASSED [2.0212s] [ 25%] 2025-12-04T13:20:27.8787255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64 PASSED [1.4215s] [ 25%] 2025-12-04T13:20:27.8787387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128 PASSED [1.4203s] [ 25%] 2025-12-04T13:20:27.8787525Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16 PASSED [1.7343s] [ 25%] 2025-12-04T13:20:27.8787647Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16 PASSED [1.4284s] [ 25%] 2025-12-04T13:20:27.8787778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128 PASSED [1.4213s] [ 25%] 2025-12-04T13:20:27.8787915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16 PASSED [1.4261s] [ 25%] 2025-12-04T13:20:27.8788041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32 PASSED [1.4372s] [ 25%] 2025-12-04T13:20:27.8788164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16 PASSED [1.4814s] [ 25%] 2025-12-04T13:20:27.8788295Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128 PASSED [3.0047s] [ 25%] 2025-12-04T13:20:27.8788421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8 PASSED [1.4398s] [ 25%] 2025-12-04T13:20:27.8788543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool PASSED [1.4294s] [ 25%] 2025-12-04T13:20:27.8788671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128 PASSED [2.9678s] [ 25%] 2025-12-04T13:20:27.8788797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16 PASSED [1.4387s] [ 25%] 2025-12-04T13:20:27.8788918Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16 PASSED [1.4188s] [ 25%] 2025-12-04T13:20:27.8789045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64 PASSED [2.0044s] [ 25%] 2025-12-04T13:20:27.8789168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32 PASSED [1.4168s] [ 25%] 2025-12-04T13:20:27.8789313Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16 PASSED [1.4156s] [ 25%] 2025-12-04T13:20:27.8789445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16 PASSED [1.4165s] [ 25%] 2025-12-04T13:20:27.8789574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8 PASSED [1.4178s] [ 25%] 2025-12-04T13:20:27.8789702Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32 PASSED [1.4316s] [ 25%] 2025-12-04T13:20:27.8789829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16 PASSED [0.0097s] [ 25%] 2025-12-04T13:20:27.8789964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool PASSED [0.0052s] [ 25%] 2025-12-04T13:20:27.8790089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64 PASSED [1.4223s] [ 25%] 2025-12-04T13:20:27.8790212Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool PASSED [1.4267s] [ 25%] 2025-12-04T13:20:27.8790342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32 PASSED [1.4352s] [ 25%] 2025-12-04T13:20:27.8790468Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64 PASSED [1.4358s] [ 25%] 2025-12-04T13:20:27.8790591Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64 PASSED [1.4181s] [ 26%] 2025-12-04T13:20:27.8790724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128 PASSED [1.7349s] [ 26%] 2025-12-04T13:20:27.8790848Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8 PASSED [1.4301s] [ 26%] 2025-12-04T13:20:27.8790974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16 PASSED [2.2316s] [ 26%] 2025-12-04T13:20:27.8791100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32 PASSED [1.4281s] [ 26%] 2025-12-04T13:20:27.8791242Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32 PASSED [1.4395s] [ 26%] 2025-12-04T13:20:27.8791364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8 PASSED [1.4289s] [ 26%] 2025-12-04T13:20:27.8791487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32 PASSED [1.4188s] [ 26%] 2025-12-04T13:20:27.8791622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8 PASSED [1.4253s] [ 26%] 2025-12-04T13:20:27.8791748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16 PASSED [1.4394s] [ 26%] 2025-12-04T13:20:27.8791866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16 PASSED [1.4591s] [ 26%] 2025-12-04T13:20:27.8791985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64 PASSED [1.4335s] [ 26%] 2025-12-04T13:20:27.8792113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16 PASSED [0.0190s] [ 26%] 2025-12-04T13:20:27.8792235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16 PASSED [0.0142s] [ 26%] 2025-12-04T13:20:27.8792352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool PASSED [0.0051s] [ 26%] 2025-12-04T13:20:27.8792476Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128 PASSED [0.0058s] [ 26%] 2025-12-04T13:20:27.8792598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16 PASSED [0.0055s] [ 26%] 2025-12-04T13:20:27.8792719Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64 PASSED [0.0055s] [ 26%] 2025-12-04T13:20:27.8792838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8 PASSED [0.0050s] [ 26%] 2025-12-04T13:20:27.8792969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool PASSED [0.0032s] [ 26%] 2025-12-04T13:20:27.8793097Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64 PASSED [0.0034s] [ 26%] 2025-12-04T13:20:27.8793220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32 PASSED [0.0034s] [ 26%] 2025-12-04T13:20:27.8793377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16 PASSED [0.0032s] [ 26%] 2025-12-04T13:20:27.8793501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64 PASSED [0.0032s] [ 26%] 2025-12-04T13:20:27.8793642Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16 PASSED [0.0032s] [ 26%] 2025-12-04T13:20:27.8793772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32 PASSED [0.0523s] [ 26%] 2025-12-04T13:20:27.8793891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8 PASSED [1.4602s] [ 26%] 2025-12-04T13:20:27.8794013Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64 PASSED [1.4965s] [ 26%] 2025-12-04T13:20:27.8794133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16 PASSED [1.4591s] [ 26%] 2025-12-04T13:20:27.8794251Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16 PASSED [1.4587s] [ 26%] 2025-12-04T13:20:27.8794369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64 PASSED [1.4497s] [ 26%] 2025-12-04T13:20:27.8794491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16 PASSED [1.4307s] [ 26%] 2025-12-04T13:20:27.8794614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frexp_cuda_float16 PASSED [1.4238s] [ 26%] 2025-12-04T13:20:27.8794731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32 PASSED [1.4743s] [ 26%] 2025-12-04T13:20:27.8794847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64 PASSED [1.4634s] [ 26%] 2025-12-04T13:20:27.8795042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int16 SKIPPED [0.0002s] (Expected: geometric is not comparable) [ 27%] 2025-12-04T13:20:27.8795157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16 PASSED [1.4798s] [ 27%] 2025-12-04T13:20:27.8795272Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32 PASSED [1.4848s] [ 27%] 2025-12-04T13:20:27.8795414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32 PASSED [1.4896s] [ 27%] 2025-12-04T13:20:27.8795541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32 PASSED [1.4770s] [ 27%] 2025-12-04T13:20:27.8795665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8 PASSED [1.4488s] [ 27%] 2025-12-04T13:20:27.8795792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64 PASSED [1.4335s] [ 27%] 2025-12-04T13:20:27.8795911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool PASSED [1.4202s] [ 27%] 2025-12-04T13:20:27.8796034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16 PASSED [1.4185s] [ 27%] 2025-12-04T13:20:27.8796156Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16 PASSED [1.4745s] [ 27%] 2025-12-04T13:20:27.8796279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16 PASSED [1.4694s] [ 27%] 2025-12-04T13:20:27.8796402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32 PASSED [1.4967s] [ 27%] 2025-12-04T13:20:27.8796524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool PASSED [1.4348s] [ 27%] 2025-12-04T13:20:27.8796654Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32 PASSED [1.4201s] [ 27%] 2025-12-04T13:20:27.8796794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16 PASSED [1.4184s] [ 27%] 2025-12-04T13:20:27.8796922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32 PASSED [1.4244s] [ 27%] 2025-12-04T13:20:27.8797045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32 PASSED [1.4238s] [ 27%] 2025-12-04T13:20:27.8797169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8 PASSED [1.4216s] [ 27%] 2025-12-04T13:20:27.8797298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32 PASSED [1.4144s] [ 27%] 2025-12-04T13:20:27.8797437Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64 PASSED [1.4106s] [ 27%] 2025-12-04T13:20:27.8797563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16 PASSED [1.4256s] [ 27%] 2025-12-04T13:20:27.8797693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32 PASSED [1.4367s] [ 27%] 2025-12-04T13:20:27.8797820Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32 PASSED [1.4538s] [ 27%] 2025-12-04T13:20:27.8797946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8 PASSED [1.4281s] [ 27%] 2025-12-04T13:20:27.8798079Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16 PASSED [1.4251s] [ 27%] 2025-12-04T13:20:27.8798214Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32 PASSED [1.4239s] [ 27%] 2025-12-04T13:20:27.8798346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32 PASSED [1.4260s] [ 27%] 2025-12-04T13:20:27.8798474Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64 PASSED [1.4133s] [ 27%] 2025-12-04T13:20:27.8798617Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128 PASSED [1.4486s] [ 27%] 2025-12-04T13:20:27.8798740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8 PASSED [1.4350s] [ 27%] 2025-12-04T13:20:27.8798861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32 PASSED [1.4370s] [ 27%] 2025-12-04T13:20:27.8798981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16 PASSED [1.4333s] [ 27%] 2025-12-04T13:20:27.8799113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32 PASSED [1.4277s] [ 27%] 2025-12-04T13:20:27.8799240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32 PASSED [1.4347s] [ 27%] 2025-12-04T13:20:27.8799362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8 PASSED [1.4431s] [ 27%] 2025-12-04T13:20:27.8799485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64 PASSED [1.4333s] [ 28%] 2025-12-04T13:20:27.8799610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16 PASSED [1.4355s] [ 28%] 2025-12-04T13:20:27.8799729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool PASSED [1.4471s] [ 28%] 2025-12-04T13:20:27.8799853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16 PASSED [1.4536s] [ 28%] 2025-12-04T13:20:27.8799974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64 PASSED [1.4237s] [ 28%] 2025-12-04T13:20:27.8800094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int32 PASSED [1.4311s] [ 28%] 2025-12-04T13:20:27.8800211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32 PASSED [1.4776s] [ 28%] 2025-12-04T13:20:27.8800327Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32 PASSED [1.4448s] [ 28%] 2025-12-04T13:20:27.8800453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8 PASSED [1.4582s] [ 28%] 2025-12-04T13:20:27.8800576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16 PASSED [1.4347s] [ 28%] 2025-12-04T13:20:27.8800698Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16 PASSED [0.4160s] [ 28%] 2025-12-04T13:20:27.8800837Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex128 PASSED [0.0095s] [ 28%] 2025-12-04T13:20:27.8800971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float16 PASSED [1.4196s] [ 28%] 2025-12-04T13:20:27.8801111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float64 PASSED [1.4141s] [ 28%] 2025-12-04T13:20:27.8801252Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_complex32 PASSED [1.5832s] [ 28%] 2025-12-04T13:20:27.8801388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_float32 PASSED [1.4244s] [ 28%] 2025-12-04T13:20:27.8801521Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_int64 PASSED [1.4306s] [ 28%] 2025-12-04T13:20:27.8801661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [1.4295s] [ 28%] 2025-12-04T13:20:27.8801804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64 PASSED [1.5393s] [ 28%] 2025-12-04T13:20:27.8801938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float16 PASSED [0.0202s] [ 28%] 2025-12-04T13:20:27.8802080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16 PASSED [0.0547s] [ 28%] 2025-12-04T13:20:27.8802223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128 PASSED [0.0546s] [ 28%] 2025-12-04T13:20:27.8802364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16 PASSED [0.0218s] [ 28%] 2025-12-04T13:20:27.8802489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64 PASSED [0.0189s] [ 28%] 2025-12-04T13:20:27.8802612Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16 XFAIL [0.0103s] [ 28%] 2025-12-04T13:20:27.8802730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool PASSED [1.4142s] [ 28%] 2025-12-04T13:20:27.8802861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32 PASSED [1.4381s] [ 28%] 2025-12-04T13:20:27.8802980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool PASSED [1.4323s] [ 28%] 2025-12-04T13:20:27.8803100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32 PASSED [1.4364s] [ 28%] 2025-12-04T13:20:27.8803220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8 PASSED [1.4291s] [ 28%] 2025-12-04T13:20:27.8803389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool PASSED [1.4273s] [ 28%] 2025-12-04T13:20:27.8803507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32 PASSED [1.4350s] [ 28%] 2025-12-04T13:20:27.8803623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32 PASSED [1.4461s] [ 28%] 2025-12-04T13:20:27.8803739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64 PASSED [1.4261s] [ 28%] 2025-12-04T13:20:27.8803853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8 PASSED [1.4347s] [ 28%] 2025-12-04T13:20:27.8803971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8 PASSED [1.4290s] [ 29%] 2025-12-04T13:20:27.8804158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float16 SKIPPED [0.0002s] (Expected: log_normal is not comparable) [ 29%] 2025-12-04T13:20:27.8804360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float64 SKIPPED [0.0001s] (Expected: log_normal is not comparable) [ 29%] 2025-12-04T13:20:27.8804508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32 PASSED [1.4108s] [ 29%] 2025-12-04T13:20:27.8804656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64 PASSED [1.4242s] [ 29%] 2025-12-04T13:20:27.8804788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float64 PASSED [1.4412s] [ 29%] 2025-12-04T13:20:27.8804950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64 PASSED [0.4258s] [ 29%] 2025-12-04T13:20:27.8805084Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64 PASSED [1.4520s] [ 29%] 2025-12-04T13:20:27.8805215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64 PASSED [1.4277s] [ 29%] 2025-12-04T13:20:27.8805344Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16 PASSED [1.4737s] [ 29%] 2025-12-04T13:20:27.8805471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64 PASSED [1.4998s] [ 29%] 2025-12-04T13:20:27.8805602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16 PASSED [1.5307s] [ 29%] 2025-12-04T13:20:27.8805730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32 PASSED [1.5349s] [ 29%] 2025-12-04T13:20:27.8805858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64 PASSED [1.5376s] [ 29%] 2025-12-04T13:20:27.8805988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64 PASSED [1.5412s] [ 29%] 2025-12-04T13:20:27.8806116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16 PASSED [1.6209s] [ 29%] 2025-12-04T13:20:27.8806281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_float64 PASSED [0.3899s] [ 29%] 2025-12-04T13:20:27.8806426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int16 XFAIL [0.0473s] [ 29%] 2025-12-04T13:20:27.8806568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int64 XFAIL [1.5439s] [ 29%] 2025-12-04T13:20:27.8806727Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_uint8 PASSED [1.5906s] [ 29%] 2025-12-04T13:20:27.8806860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_complex128 PASSED [0.0125s] [ 29%] 2025-12-04T13:20:27.8806989Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16 PASSED [0.0060s] [ 29%] 2025-12-04T13:20:27.8807116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64 PASSED [0.0093s] [ 29%] 2025-12-04T13:20:27.8807242Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64 PASSED [0.0058s] [ 29%] 2025-12-04T13:20:27.8807361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16 PASSED [0.0387s] [ 29%] 2025-12-04T13:20:27.8807490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32 PASSED [1.5057s] [ 29%] 2025-12-04T13:20:27.8807619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16 PASSED [1.5148s] [ 29%] 2025-12-04T13:20:27.8807745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32 PASSED [1.5129s] [ 29%] 2025-12-04T13:20:27.8807869Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64 PASSED [1.5203s] [ 29%] 2025-12-04T13:20:27.8808026Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool PASSED [1.4941s] [ 29%] 2025-12-04T13:20:27.8808181Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [1.5074s] [ 29%] 2025-12-04T13:20:27.8808329Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [1.5068s] [ 29%] 2025-12-04T13:20:27.8808473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [1.4986s] [ 29%] 2025-12-04T13:20:27.8808623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [1.5176s] [ 29%] 2025-12-04T13:20:27.8808780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [1.4927s] [ 29%] 2025-12-04T13:20:27.8808925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [1.5147s] [ 30%] 2025-12-04T13:20:27.8809050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool PASSED [1.5257s] [ 30%] 2025-12-04T13:20:27.8809173Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16 PASSED [1.5082s] [ 30%] 2025-12-04T13:20:27.8809296Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8 PASSED [1.5329s] [ 30%] 2025-12-04T13:20:27.8809423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64 PASSED [1.4964s] [ 30%] 2025-12-04T13:20:27.8809549Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32 PASSED [1.5084s] [ 30%] 2025-12-04T13:20:27.8809674Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64 PASSED [1.4984s] [ 30%] 2025-12-04T13:20:27.8809795Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16 PASSED [1.4889s] [ 30%] 2025-12-04T13:20:27.8809917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8 PASSED [1.4867s] [ 30%] 2025-12-04T13:20:27.8810045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64 PASSED [1.5281s] [ 30%] 2025-12-04T13:20:27.8810160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8 PASSED [1.5337s] [ 30%] 2025-12-04T13:20:27.8810286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool PASSED [1.5014s] [ 30%] 2025-12-04T13:20:27.8810430Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64 PASSED [1.5229s] [ 30%] 2025-12-04T13:20:27.8810562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16 PASSED [1.5050s] [ 30%] 2025-12-04T13:20:27.8810692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32 PASSED [1.4998s] [ 30%] 2025-12-04T13:20:27.8810821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64 PASSED [1.4945s] [ 30%] 2025-12-04T13:20:27.8810948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16 PASSED [1.5250s] [ 30%] 2025-12-04T13:20:27.8811073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64 PASSED [1.5003s] [ 30%] 2025-12-04T13:20:27.8811195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16 PASSED [1.5274s] [ 30%] 2025-12-04T13:20:27.8811317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32 PASSED [1.6687s] [ 30%] 2025-12-04T13:20:27.8811443Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64 PASSED [0.0308s] [ 30%] 2025-12-04T13:20:27.8811564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16 PASSED [0.0155s] [ 30%] 2025-12-04T13:20:27.8811683Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32 PASSED [0.0146s] [ 30%] 2025-12-04T13:20:27.8811813Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64 PASSED [1.5099s] [ 30%] 2025-12-04T13:20:27.8811988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool SKIPPED [0.0003s] (Expected: empty is not comparable) [ 30%] 2025-12-04T13:20:27.8812169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 30%] 2025-12-04T13:20:27.8812343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 30%] 2025-12-04T13:20:27.8812552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 30%] 2025-12-04T13:20:27.8812746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 30%] 2025-12-04T13:20:27.8815729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8 SKIPPED [0.0001s] (Expected: empty_strided is not comparable) [ 30%] 2025-12-04T13:20:27.8815882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64 PASSED [0.0066s] [ 30%] 2025-12-04T13:20:27.8816019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32 PASSED [0.0051s] [ 30%] 2025-12-04T13:20:27.8816151Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64 PASSED [0.0049s] [ 30%] 2025-12-04T13:20:27.8816283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8 PASSED [1.4915s] [ 30%] 2025-12-04T13:20:27.8816409Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8 PASSED [1.4981s] [ 30%] 2025-12-04T13:20:27.8816542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16 PASSED [1.5254s] [ 31%] 2025-12-04T13:20:27.8816697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float16 PASSED [1.5391s] [ 31%] 2025-12-04T13:20:27.8816908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16 SKIPPED [0.0002s] (Expected: dropout is not comparable) [ 31%] 2025-12-04T13:20:27.8817053Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16 PASSED [1.5240s] [ 31%] 2025-12-04T13:20:27.8817213Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32 PASSED [0.0179s] [ 31%] 2025-12-04T13:20:27.8817361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64 PASSED [0.0159s] [ 31%] 2025-12-04T13:20:27.8817522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_complex64 PASSED [0.0039s] [ 31%] 2025-12-04T13:20:27.8817679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int32 PASSED [0.0034s] [ 31%] 2025-12-04T13:20:27.8817833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int64 PASSED [1.5301s] [ 31%] 2025-12-04T13:20:27.8818032Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32 SKIPPED [0.0003s] (Expected: dropout is not comparable) [ 31%] 2025-12-04T13:20:27.8818176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16 PASSED [1.5056s] [ 31%] 2025-12-04T13:20:27.8818319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32 PASSED [1.5309s] [ 31%] 2025-12-04T13:20:27.8818460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64 PASSED [1.5169s] [ 31%] 2025-12-04T13:20:27.8818614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16 PASSED [1.5122s] [ 31%] 2025-12-04T13:20:27.8818778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32 PASSED [0.0183s] [ 31%] 2025-12-04T13:20:27.8818925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32 PASSED [0.0121s] [ 31%] 2025-12-04T13:20:27.8819077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64 PASSED [0.0265s] [ 31%] 2025-12-04T13:20:27.8819225Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16 PASSED [0.0054s] [ 31%] 2025-12-04T13:20:27.8819393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64 PASSED [0.0110s] [ 31%] 2025-12-04T13:20:27.8819542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64 PASSED [1.4933s] [ 31%] 2025-12-04T13:20:27.8819712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [1.4998s] [ 31%] 2025-12-04T13:20:27.8819874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [1.5054s] [ 31%] 2025-12-04T13:20:27.8820036Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [1.5007s] [ 31%] 2025-12-04T13:20:27.8820180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16 PASSED [1.5329s] [ 31%] 2025-12-04T13:20:27.8820324Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32 PASSED [0.0171s] [ 31%] 2025-12-04T13:20:27.8820470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32 PASSED [0.0444s] [ 31%] 2025-12-04T13:20:27.8820631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_complex64 PASSED [0.0046s] [ 31%] 2025-12-04T13:20:27.8820801Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_bfloat16 PASSED [0.0042s] [ 31%] 2025-12-04T13:20:27.8820965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_complex128 PASSED [0.0041s] [ 31%] 2025-12-04T13:20:27.8821121Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_unshuffle_cuda_int32 PASSED [0.0037s] [ 31%] 2025-12-04T13:20:27.8821276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16 PASSED [0.0581s] [ 31%] 2025-12-04T13:20:27.8821423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16 PASSED [0.0139s] [ 31%] 2025-12-04T13:20:27.8821564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32 PASSED [0.0156s] [ 31%] 2025-12-04T13:20:27.8821711Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16 PASSED [0.0116s] [ 31%] 2025-12-04T13:20:27.8821853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64 PASSED [0.0118s] [ 32%] 2025-12-04T13:20:27.8821995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64 PASSED [0.0153s] [ 32%] 2025-12-04T13:20:27.8822157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [0.0047s] [ 32%] 2025-12-04T13:20:27.8822317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [0.0046s] [ 32%] 2025-12-04T13:20:27.8822473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [1.4903s] [ 32%] 2025-12-04T13:20:27.8822653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [1.4949s] [ 32%] 2025-12-04T13:20:27.8822809Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [1.4978s] [ 32%] 2025-12-04T13:20:27.8822959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16 PASSED [1.5064s] [ 32%] 2025-12-04T13:20:27.8823109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64 PASSED [1.5039s] [ 32%] 2025-12-04T13:20:27.8823314Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32 PASSED [1.5237s] [ 32%] 2025-12-04T13:20:27.8823489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [0.0300s] [ 32%] 2025-12-04T13:20:27.8823640Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16 PASSED [1.5370s] [ 32%] 2025-12-04T13:20:27.8823796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64 PASSED [1.5177s] [ 32%] 2025-12-04T13:20:27.8823941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64 PASSED [1.5076s] [ 32%] 2025-12-04T13:20:27.8824091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16 PASSED [1.5330s] [ 32%] 2025-12-04T13:20:27.8824240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64 PASSED [0.0191s] [ 32%] 2025-12-04T13:20:27.8824386Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8 PASSED [0.0121s] [ 32%] 2025-12-04T13:20:27.8824550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [0.0057s] [ 32%] 2025-12-04T13:20:27.8824708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [0.0053s] [ 32%] 2025-12-04T13:20:27.8824851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32 PASSED [1.5073s] [ 32%] 2025-12-04T13:20:27.8825030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_cuda_float32 SKIPPED [0.0003s] (Expected: normal is not comparable) [ 32%] 2025-12-04T13:20:27.8825156Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64 PASSED [1.4893s] [ 32%] 2025-12-04T13:20:27.8825292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16 PASSED [1.4947s] [ 32%] 2025-12-04T13:20:27.8825417Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32 PASSED [1.4967s] [ 32%] 2025-12-04T13:20:27.8825538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8 PASSED [1.5081s] [ 32%] 2025-12-04T13:20:27.8825677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_complex32 PASSED [1.5207s] [ 32%] 2025-12-04T13:20:27.8825813Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_float16 PASSED [1.4995s] [ 32%] 2025-12-04T13:20:27.8825945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_copy_cuda_int8 PASSED [1.5038s] [ 32%] 2025-12-04T13:20:27.8826075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128 PASSED [1.5344s] [ 32%] 2025-12-04T13:20:27.8826204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16 PASSED [1.5296s] [ 32%] 2025-12-04T13:20:27.8826330Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64 PASSED [1.5059s] [ 32%] 2025-12-04T13:20:27.8826457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64 PASSED [1.5151s] [ 32%] 2025-12-04T13:20:27.8826580Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32 PASSED [1.5609s] [ 32%] 2025-12-04T13:20:27.8826716Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8 PASSED [1.5274s] [ 32%] 2025-12-04T13:20:27.8826842Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16 PASSED [1.7716s] [ 32%] 2025-12-04T13:20:27.8826970Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32 PASSED [0.6160s] [ 33%] 2025-12-04T13:20:27.8827096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float16 PASSED [1.4907s] [ 33%] 2025-12-04T13:20:27.8827223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int64 PASSED [1.5148s] [ 33%] 2025-12-04T13:20:27.8827360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16 PASSED [1.4984s] [ 33%] 2025-12-04T13:20:27.8827490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32 PASSED [1.4945s] [ 33%] 2025-12-04T13:20:27.8827615Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32 PASSED [1.5048s] [ 33%] 2025-12-04T13:20:27.8827740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16 PASSED [1.5141s] [ 33%] 2025-12-04T13:20:27.8827863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64 PASSED [1.5024s] [ 33%] 2025-12-04T13:20:27.8827982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8 PASSED [1.5129s] [ 33%] 2025-12-04T13:20:27.8828114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool PASSED [1.5311s] [ 33%] 2025-12-04T13:20:27.8828247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16 PASSED [1.5121s] [ 33%] 2025-12-04T13:20:27.8828378Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64 PASSED [1.5252s] [ 33%] 2025-12-04T13:20:27.8828508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16 PASSED [1.5594s] [ 33%] 2025-12-04T13:20:27.8828652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8 PASSED [1.5320s] [ 33%] 2025-12-04T13:20:27.8828778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_renorm_cuda_float16 PASSED [1.5163s] [ 33%] 2025-12-04T13:20:27.8849500Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64 PASSED [0.0151s] [ 33%] 2025-12-04T13:20:27.8849656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64 PASSED [0.0112s] [ 33%] 2025-12-04T13:20:27.8849790Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16 PASSED [1.5058s] [ 33%] 2025-12-04T13:20:27.8849921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64 PASSED [1.5342s] [ 33%] 2025-12-04T13:20:27.8850049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8 PASSED [1.5160s] [ 33%] 2025-12-04T13:20:27.8850176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool PASSED [1.5183s] [ 33%] 2025-12-04T13:20:27.8850305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32 PASSED [1.5033s] [ 33%] 2025-12-04T13:20:27.8850434Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64 PASSED [1.5173s] [ 33%] 2025-12-04T13:20:27.8850556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32 PASSED [1.5144s] [ 33%] 2025-12-04T13:20:27.8850680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64 PASSED [1.5140s] [ 33%] 2025-12-04T13:20:27.8850801Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32 PASSED [1.5191s] [ 33%] 2025-12-04T13:20:27.8850921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16 PASSED [1.5017s] [ 33%] 2025-12-04T13:20:27.8851052Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool PASSED [1.5148s] [ 33%] 2025-12-04T13:20:27.8851180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128 PASSED [1.5167s] [ 33%] 2025-12-04T13:20:27.8851301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16 PASSED [1.5220s] [ 33%] 2025-12-04T13:20:27.8851423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32 PASSED [1.4953s] [ 33%] 2025-12-04T13:20:27.8851542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64 PASSED [1.4998s] [ 33%] 2025-12-04T13:20:27.8851677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8 PASSED [1.5016s] [ 33%] 2025-12-04T13:20:27.8851799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64 PASSED [1.5131s] [ 33%] 2025-12-04T13:20:27.8851922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16 PASSED [1.5225s] [ 33%] 2025-12-04T13:20:27.8852045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64 PASSED [1.5175s] [ 34%] 2025-12-04T13:20:27.8852164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32 PASSED [1.4907s] [ 34%] 2025-12-04T13:20:27.8852282Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16 PASSED [1.5231s] [ 34%] 2025-12-04T13:20:27.8852413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_bool PASSED [1.5139s] [ 34%] 2025-12-04T13:20:27.8852531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32 PASSED [1.5220s] [ 34%] 2025-12-04T13:20:27.8852650Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64 PASSED [1.5043s] [ 34%] 2025-12-04T13:20:27.8852766Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32 PASSED [1.5045s] [ 34%] 2025-12-04T13:20:27.8852888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16 PASSED [1.4989s] [ 34%] 2025-12-04T13:20:27.8853027Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32 PASSED [1.5060s] [ 34%] 2025-12-04T13:20:27.8853144Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64 PASSED [1.5003s] [ 34%] 2025-12-04T13:20:27.8853309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32 PASSED [1.5029s] [ 34%] 2025-12-04T13:20:27.8853533Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32 PASSED [1.5083s] [ 34%] 2025-12-04T13:20:27.8853679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64 PASSED [1.5060s] [ 34%] 2025-12-04T13:20:27.8853867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool PASSED [1.5157s] [ 34%] 2025-12-04T13:20:27.8854018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64 PASSED [0.4805s] [ 34%] 2025-12-04T13:20:27.8854188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32 PASSED [0.0204s] [ 34%] 2025-12-04T13:20:27.8854362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16 PASSED [0.0146s] [ 34%] 2025-12-04T13:20:27.8854479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64 PASSED [0.0143s] [ 34%] 2025-12-04T13:20:27.8854598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64 PASSED [0.0149s] [ 34%] 2025-12-04T13:20:27.8854737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32 PASSED [1.5011s] [ 34%] 2025-12-04T13:20:27.8854872Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8 PASSED [1.5165s] [ 34%] 2025-12-04T13:20:27.8855016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32 PASSED [0.0202s] [ 34%] 2025-12-04T13:20:27.8855177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32 PASSED [0.0167s] [ 34%] 2025-12-04T13:20:27.8855308Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64 PASSED [0.0163s] [ 34%] 2025-12-04T13:20:27.8855479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8 PASSED [0.0134s] [ 34%] 2025-12-04T13:20:27.8855645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64 PASSED [0.0168s] [ 34%] 2025-12-04T13:20:27.8855832Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8 PASSED [1.5119s] [ 34%] 2025-12-04T13:20:27.8855998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32 PASSED [1.5210s] [ 34%] 2025-12-04T13:20:27.8856127Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64 PASSED [1.8180s] [ 34%] 2025-12-04T13:20:27.8856277Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bfloat16 PASSED [0.0208s] [ 34%] 2025-12-04T13:20:27.8856406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32 PASSED [0.0162s] [ 34%] 2025-12-04T13:20:27.8856548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8 PASSED [0.0161s] [ 34%] 2025-12-04T13:20:27.8856704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [0.0048s] [ 34%] 2025-12-04T13:20:27.8856864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [0.0047s] [ 34%] 2025-12-04T13:20:27.8857022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [1.5118s] [ 34%] 2025-12-04T13:20:27.8857185Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [1.5266s] [ 35%] 2025-12-04T13:20:27.8857333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32 PASSED [1.5054s] [ 35%] 2025-12-04T13:20:27.8857493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [1.5092s] [ 35%] 2025-12-04T13:20:27.8857652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [1.5186s] [ 35%] 2025-12-04T13:20:27.8857836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_bfloat16 PASSED [1.5152s] [ 35%] 2025-12-04T13:20:27.8857996Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [0.0235s] [ 35%] 2025-12-04T13:20:27.8858154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [0.0196s] [ 35%] 2025-12-04T13:20:27.8858312Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [0.0193s] [ 35%] 2025-12-04T13:20:27.8858442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16 PASSED [1.5023s] [ 35%] 2025-12-04T13:20:27.8858576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64 PASSED [1.9912s] [ 35%] 2025-12-04T13:20:27.8858728Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16 PASSED [1.4989s] [ 35%] 2025-12-04T13:20:27.8858876Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8 PASSED [1.4910s] [ 35%] 2025-12-04T13:20:27.8859010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32 PASSED [1.5421s] [ 35%] 2025-12-04T13:20:27.8859143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8 PASSED [1.5391s] [ 35%] 2025-12-04T13:20:27.8859297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8 PASSED [1.5300s] [ 35%] 2025-12-04T13:20:27.8859430Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32 PASSED [2.1262s] [ 35%] 2025-12-04T13:20:27.8859562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64 PASSED [12.4068s] [ 35%] 2025-12-04T13:20:27.8859691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64 PASSED [1.5421s] [ 35%] 2025-12-04T13:20:27.8859811Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8 PASSED [1.5132s] [ 35%] 2025-12-04T13:20:27.8859948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32 PASSED [1.5010s] [ 35%] 2025-12-04T13:20:27.8860078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_int8 PASSED [1.4859s] [ 35%] 2025-12-04T13:20:27.8860201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8 PASSED [1.5054s] [ 35%] 2025-12-04T13:20:27.8860340Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32 PASSED [1.4997s] [ 35%] 2025-12-04T13:20:27.8860474Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_int32 PASSED [1.5025s] [ 35%] 2025-12-04T13:20:27.8860610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_uint8 PASSED [1.4986s] [ 35%] 2025-12-04T13:20:27.8860736Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128 PASSED [0.0069s] [ 35%] 2025-12-04T13:20:27.8860861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32 PASSED [0.0055s] [ 35%] 2025-12-04T13:20:27.8860981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16 PASSED [0.0046s] [ 35%] 2025-12-04T13:20:27.8861101Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8 PASSED [0.0046s] [ 35%] 2025-12-04T13:20:27.8861233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16 PASSED [0.0071s] [ 35%] 2025-12-04T13:20:27.8861366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64 PASSED [0.0084s] [ 35%] 2025-12-04T13:20:27.8861493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128 PASSED [0.0629s] [ 35%] 2025-12-04T13:20:27.8861628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32 PASSED [1.5482s] [ 35%] 2025-12-04T13:20:27.8861749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16 PASSED [1.5404s] [ 35%] 2025-12-04T13:20:27.8861866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8 PASSED [1.5094s] [ 35%] 2025-12-04T13:20:27.8861986Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32 PASSED [1.5060s] [ 36%] 2025-12-04T13:20:27.8862104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64 PASSED [1.4948s] [ 36%] 2025-12-04T13:20:27.8862231Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_bool PASSED [1.4917s] [ 36%] 2025-12-04T13:20:27.8862358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_complex64 PASSED [1.5022s] [ 36%] 2025-12-04T13:20:27.8862488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float64 PASSED [1.5059s] [ 36%] 2025-12-04T13:20:27.8862610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int16 PASSED [1.4939s] [ 36%] 2025-12-04T13:20:27.8862731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32 PASSED [1.5075s] [ 36%] 2025-12-04T13:20:27.8862849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64 PASSED [1.4897s] [ 36%] 2025-12-04T13:20:27.8862978Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16 PASSED [1.4936s] [ 36%] 2025-12-04T13:20:27.8863094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8 PASSED [1.4999s] [ 36%] 2025-12-04T13:20:27.8863236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex128 PASSED [1.5066s] [ 36%] 2025-12-04T13:20:27.8863412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex64 PASSED [1.4920s] [ 36%] 2025-12-04T13:20:27.8863552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float64 PASSED [1.5014s] [ 36%] 2025-12-04T13:20:27.8863703Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int32 PASSED [1.5106s] [ 36%] 2025-12-04T13:20:27.8863834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_int8 PASSED [1.5027s] [ 36%] 2025-12-04T13:20:27.8863971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_uint8 PASSED [1.4991s] [ 36%] 2025-12-04T13:20:27.8864089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8 PASSED [1.5015s] [ 36%] 2025-12-04T13:20:27.8864212Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16 PASSED [1.5054s] [ 36%] 2025-12-04T13:20:27.8864331Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32 PASSED [1.5168s] [ 36%] 2025-12-04T13:20:27.8864453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64 PASSED [1.5242s] [ 36%] 2025-12-04T13:20:27.8864573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8 PASSED [1.5244s] [ 36%] 2025-12-04T13:20:27.8864714Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64 PASSED [1.5036s] [ 36%] 2025-12-04T13:20:27.8864845Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8 PASSED [1.4913s] [ 36%] 2025-12-04T13:20:27.8864981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32 PASSED [1.4985s] [ 36%] 2025-12-04T13:20:27.8865101Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32 PASSED [1.5077s] [ 36%] 2025-12-04T13:20:27.8865245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_complex128 PASSED [1.4855s] [ 36%] 2025-12-04T13:20:27.8865396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int32 PASSED [1.5015s] [ 36%] 2025-12-04T13:20:27.8865531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_int64 PASSED [1.4876s] [ 36%] 2025-12-04T13:20:27.8865665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128 PASSED [1.4938s] [ 36%] 2025-12-04T13:20:27.8865794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16 PASSED [1.4999s] [ 36%] 2025-12-04T13:20:27.8865922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bfloat16 PASSED [1.4982s] [ 36%] 2025-12-04T13:20:27.8866048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128 PASSED [1.5090s] [ 36%] 2025-12-04T13:20:27.8866174Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64 PASSED [1.5020s] [ 36%] 2025-12-04T13:20:27.8866304Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64 PASSED [1.5093s] [ 36%] 2025-12-04T13:20:27.8866432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bfloat16 PASSED [1.4997s] [ 36%] 2025-12-04T13:20:27.8866556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64 PASSED [1.4933s] [ 37%] 2025-12-04T13:20:27.8866681Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16 PASSED [1.4913s] [ 37%] 2025-12-04T13:20:27.8866816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16 PASSED [1.5149s] [ 37%] 2025-12-04T13:20:27.8866941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64 PASSED [1.5031s] [ 37%] 2025-12-04T13:20:27.8867069Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_bool PASSED [1.4977s] [ 37%] 2025-12-04T13:20:27.8867207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex128 PASSED [1.5009s] [ 37%] 2025-12-04T13:20:27.8867341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float16 PASSED [1.5093s] [ 37%] 2025-12-04T13:20:27.8867485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float32 PASSED [1.4939s] [ 37%] 2025-12-04T13:20:27.8867614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128 PASSED [1.5067s] [ 37%] 2025-12-04T13:20:27.8867743Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32 PASSED [1.5038s] [ 37%] 2025-12-04T13:20:27.8867866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32 PASSED [1.5060s] [ 37%] 2025-12-04T13:20:27.8868002Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32 PASSED [0.0072s] [ 37%] 2025-12-04T13:20:27.8868131Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool PASSED [1.4977s] [ 37%] 2025-12-04T13:20:27.8868265Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64 PASSED [1.5038s] [ 37%] 2025-12-04T13:20:27.8868393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32 PASSED [1.4904s] [ 37%] 2025-12-04T13:20:27.8868530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16 PASSED [1.5045s] [ 37%] 2025-12-04T13:20:27.8868668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_float32 PASSED [1.5024s] [ 37%] 2025-12-04T13:20:27.8868812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16 PASSED [1.5049s] [ 37%] 2025-12-04T13:20:27.8868948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int32 PASSED [1.5053s] [ 37%] 2025-12-04T13:20:27.8869080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int64 PASSED [1.5058s] [ 37%] 2025-12-04T13:20:27.8869224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16 PASSED [1.4940s] [ 37%] 2025-12-04T13:20:27.8869349Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64 PASSED [1.4947s] [ 37%] 2025-12-04T13:20:27.8869473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16 PASSED [1.5254s] [ 37%] 2025-12-04T13:20:27.8869602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32 PASSED [1.5057s] [ 37%] 2025-12-04T13:20:27.8869731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex64 PASSED [1.5099s] [ 37%] 2025-12-04T13:20:27.8869860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32 PASSED [1.5057s] [ 37%] 2025-12-04T13:20:27.8869985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16 PASSED [1.5143s] [ 37%] 2025-12-04T13:20:27.8870105Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32 PASSED [1.4957s] [ 37%] 2025-12-04T13:20:27.8870230Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64 PASSED [1.5144s] [ 37%] 2025-12-04T13:20:27.8870360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_int64 PASSED [1.4904s] [ 37%] 2025-12-04T13:20:27.8870482Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16 PASSED [1.5026s] [ 37%] 2025-12-04T13:20:27.8870613Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32 PASSED [1.5201s] [ 37%] 2025-12-04T13:20:27.8870733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64 PASSED [1.5063s] [ 37%] 2025-12-04T13:20:27.8870852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8 PASSED [1.5069s] [ 37%] 2025-12-04T13:20:27.8870982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128 PASSED [1.5222s] [ 38%] 2025-12-04T13:20:27.8871107Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8 PASSED [1.4926s] [ 38%] 2025-12-04T13:20:27.8871245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32 PASSED [1.4952s] [ 38%] 2025-12-04T13:20:27.8871369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8 PASSED [1.4880s] [ 38%] 2025-12-04T13:20:27.8871497Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128 PASSED [1.4943s] [ 38%] 2025-12-04T13:20:27.8871624Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16 PASSED [1.5012s] [ 38%] 2025-12-04T13:20:27.8871744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool PASSED [1.5472s] [ 38%] 2025-12-04T13:20:27.8871870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32 PASSED [1.5347s] [ 38%] 2025-12-04T13:20:27.8871991Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16 PASSED [1.5437s] [ 38%] 2025-12-04T13:20:27.8872113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32 PASSED [1.5303s] [ 38%] 2025-12-04T13:20:27.8872233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8 PASSED [1.5547s] [ 38%] 2025-12-04T13:20:27.8872356Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16 PASSED [1.4907s] [ 38%] 2025-12-04T13:20:27.8872464Z test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_aminmax_cuda PASSED [0.0111s] [ 38%] 2025-12-04T13:20:27.8872588Z test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_any_cuda PASSED [1.5043s] [ 38%] 2025-12-04T13:20:27.8872692Z test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmax_cuda PASSED [1.5196s] [ 38%] 2025-12-04T13:20:27.8872796Z test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmin_cuda PASSED [1.5089s] [ 38%] 2025-12-04T13:20:27.8872932Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32 PASSED [1.5041s] [ 38%] 2025-12-04T13:20:27.8873057Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64 PASSED [1.5226s] [ 38%] 2025-12-04T13:20:27.8873184Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64 PASSED [1.4996s] [ 38%] 2025-12-04T13:20:27.8873384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager__unsafe_masked_index_put_accumulate_cuda_complex64 PASSED [1.5042s] [ 38%] 2025-12-04T13:20:27.8873509Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64 PASSED [1.5152s] [ 38%] 2025-12-04T13:20:27.8873628Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32 PASSED [1.5250s] [ 38%] 2025-12-04T13:20:27.8873753Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64 PASSED [0.0296s] [ 38%] 2025-12-04T13:20:27.8873874Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64 PASSED [0.9751s] [ 38%] 2025-12-04T13:20:27.8874001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32 PASSED [0.9163s] [ 38%] 2025-12-04T13:20:27.8874122Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32 PASSED [0.8970s] [ 38%] 2025-12-04T13:20:27.8874243Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32 PASSED [0.8899s] [ 38%] 2025-12-04T13:20:27.8874362Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32 PASSED [1.0505s] [ 38%] 2025-12-04T13:20:27.8874507Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64 PASSED [0.9747s] [ 38%] 2025-12-04T13:20:27.8874692Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64 SKIPPED [0.0002s] (Errors when storage_offset is included) [ 38%] 2025-12-04T13:20:27.8874872Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32 SKIPPED [0.0001s] (Errors when storage_offset is included) [ 38%] 2025-12-04T13:20:27.8874991Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32 PASSED [0.8743s] [ 38%] 2025-12-04T13:20:27.8875129Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64 PASSED [0.8487s] [ 38%] 2025-12-04T13:20:27.8875259Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64 PASSED [0.8573s] [ 38%] 2025-12-04T13:20:27.8875383Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32 PASSED [0.8438s] [ 38%] 2025-12-04T13:20:27.8875514Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64 PASSED [0.8484s] [ 39%] 2025-12-04T13:20:27.8875638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64 PASSED [0.8535s] [ 39%] 2025-12-04T13:20:27.8875761Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32 PASSED [0.8465s] [ 39%] 2025-12-04T13:20:27.8875889Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64 PASSED [0.8470s] [ 39%] 2025-12-04T13:20:27.8876007Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32 PASSED [3.4129s] [ 39%] 2025-12-04T13:20:27.8876146Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64 PASSED [0.8448s] [ 39%] 2025-12-04T13:20:27.8876281Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64 PASSED [0.8485s] [ 39%] 2025-12-04T13:20:27.8876419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32 PASSED [0.8363s] [ 39%] 2025-12-04T13:20:27.8876545Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64 PASSED [0.8552s] [ 39%] 2025-12-04T13:20:27.8876667Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64 PASSED [0.8589s] [ 39%] 2025-12-04T13:20:27.8876788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64 PASSED [0.8520s] [ 39%] 2025-12-04T13:20:27.8876928Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64 PASSED [0.9554s] [ 39%] 2025-12-04T13:20:27.8877068Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64 PASSED [0.9266s] [ 39%] 2025-12-04T13:20:27.8877200Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64 PASSED [0.8707s] [ 39%] 2025-12-04T13:20:27.8877325Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32 PASSED [0.8565s] [ 39%] 2025-12-04T13:20:27.8877455Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32 PASSED [0.8546s] [ 39%] 2025-12-04T13:20:27.8877584Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64 PASSED [0.8806s] [ 39%] 2025-12-04T13:20:27.8877705Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64 PASSED [0.8668s] [ 39%] 2025-12-04T13:20:27.8877827Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32 PASSED [0.8544s] [ 39%] 2025-12-04T13:20:27.8877951Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32 PASSED [0.8536s] [ 39%] 2025-12-04T13:20:27.8878069Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64 PASSED [0.8533s] [ 39%] 2025-12-04T13:20:27.8878190Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32 PASSED [0.8479s] [ 39%] 2025-12-04T13:20:27.8878323Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64 PASSED [0.8524s] [ 39%] 2025-12-04T13:20:27.8878445Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32 PASSED [0.8473s] [ 39%] 2025-12-04T13:20:27.8878566Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32 PASSED [0.8645s] [ 39%] 2025-12-04T13:20:27.8878686Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32 PASSED [0.8498s] [ 39%] 2025-12-04T13:20:27.8878831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64 PASSED [0.8543s] [ 39%] 2025-12-04T13:20:27.8878963Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32 PASSED [0.8517s] [ 39%] 2025-12-04T13:20:27.8879088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64 PASSED [0.8607s] [ 39%] 2025-12-04T13:20:27.8879215Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64 PASSED [0.8790s] [ 39%] 2025-12-04T13:20:27.8879334Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32 PASSED [0.9074s] [ 39%] 2025-12-04T13:20:27.8879455Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64 PASSED [0.9227s] [ 39%] 2025-12-04T13:20:27.8879571Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32 PASSED [0.8985s] [ 39%] 2025-12-04T13:20:27.8879709Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32 PASSED [0.8771s] [ 39%] 2025-12-04T13:20:27.8879825Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32 PASSED [0.8645s] [ 39%] 2025-12-04T13:20:27.8879952Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64 PASSED [0.8560s] [ 40%] 2025-12-04T13:20:27.8880078Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32 PASSED [0.8513s] [ 40%] 2025-12-04T13:20:27.8880211Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32 PASSED [0.8439s] [ 40%] 2025-12-04T13:20:27.8880334Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32 PASSED [0.8829s] [ 40%] 2025-12-04T13:20:27.8880458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32 PASSED [1.4870s] [ 40%] 2025-12-04T13:20:27.8880583Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32 PASSED [1.4817s] [ 40%] 2025-12-04T13:20:27.8880720Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64 PASSED [1.0063s] [ 40%] 2025-12-04T13:20:27.8880844Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32 PASSED [0.8551s] [ 40%] 2025-12-04T13:20:27.8880960Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32 PASSED [0.8554s] [ 40%] 2025-12-04T13:20:27.8881093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32 PASSED [0.8585s] [ 40%] 2025-12-04T13:20:27.8881243Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_3d_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 40%] 2025-12-04T13:20:27.8881362Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32 PASSED [0.8456s] [ 40%] 2025-12-04T13:20:27.8881479Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32 PASSED [0.8608s] [ 40%] 2025-12-04T13:20:27.8881598Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32 PASSED [0.8564s] [ 40%] 2025-12-04T13:20:27.8881720Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32 PASSED [0.8627s] [ 40%] 2025-12-04T13:20:27.8881849Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64 PASSED [0.0307s] [ 40%] 2025-12-04T13:20:27.8881982Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amin_cuda_float32 PASSED [0.0120s] [ 40%] 2025-12-04T13:20:27.8882116Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64 PASSED [1.8385s] [ 40%] 2025-12-04T13:20:27.8882235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_item_cuda_float32 PASSED [0.8463s] [ 40%] 2025-12-04T13:20:27.8882383Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32 PASSED [0.8588s] [ 40%] 2025-12-04T13:20:27.8882524Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64 PASSED [0.9972s] [ 40%] 2025-12-04T13:20:27.8882643Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64 XFAIL [0.0130s] [ 40%] 2025-12-04T13:20:27.8882793Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64 PASSED [1.7140s] [ 40%] 2025-12-04T13:20:27.8882922Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32 PASSED [0.8483s] [ 40%] 2025-12-04T13:20:27.8883057Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64 PASSED [0.8499s] [ 40%] 2025-12-04T13:20:27.8883187Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32 PASSED [1.5459s] [ 40%] 2025-12-04T13:20:27.8883358Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64 PASSED [1.4903s] [ 40%] 2025-12-04T13:20:27.8883491Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32 PASSED [1.4742s] [ 40%] 2025-12-04T13:20:27.8883719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64 SKIPPED [0.0012s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 40%] 2025-12-04T13:20:27.8883936Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32 SKIPPED [0.0007s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 40%] 2025-12-04T13:20:27.8884069Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64 PASSED [1.8539s] [ 40%] 2025-12-04T13:20:27.8884225Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64 PASSED [1.4632s] [ 40%] 2025-12-04T13:20:27.8884358Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64 PASSED [0.0632s] [ 40%] 2025-12-04T13:20:27.8884502Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64 PASSED [1.5024s] [ 40%] 2025-12-04T13:20:27.8884649Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32 PASSED [1.5036s] [ 40%] 2025-12-04T13:20:27.8884785Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64 PASSED [1.4879s] [ 41%] 2025-12-04T13:20:27.8884926Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64 PASSED [1.4589s] [ 41%] 2025-12-04T13:20:27.8885059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64 PASSED [1.4689s] [ 41%] 2025-12-04T13:20:27.8885189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32 PASSED [1.4785s] [ 41%] 2025-12-04T13:20:27.8885310Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32 PASSED [1.4581s] [ 41%] 2025-12-04T13:20:27.8885431Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64 PASSED [1.4748s] [ 41%] 2025-12-04T13:20:27.8885555Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_normal_cuda_float32 XFAIL [0.0118s] [ 41%] 2025-12-04T13:20:27.8885682Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32 PASSED [1.4712s] [ 41%] 2025-12-04T13:20:27.8885828Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32 PASSED [1.4513s] [ 41%] 2025-12-04T13:20:27.8885947Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32 PASSED [1.4434s] [ 41%] 2025-12-04T13:20:27.8886088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32 PASSED [1.4327s] [ 41%] 2025-12-04T13:20:27.8886237Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 41%] 2025-12-04T13:20:27.8886365Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32 PASSED [1.4333s] [ 41%] 2025-12-04T13:20:27.8886492Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64 PASSED [1.4467s] [ 41%] 2025-12-04T13:20:27.8886621Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32 PASSED [1.4571s] [ 41%] 2025-12-04T13:20:27.8886771Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_complex64 PASSED [1.4760s] [ 41%] 2025-12-04T13:20:27.8886895Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32 PASSED [1.4592s] [ 41%] 2025-12-04T13:20:27.8887015Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64 PASSED [1.5076s] [ 41%] 2025-12-04T13:20:27.8887137Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64 PASSED [1.4460s] [ 41%] 2025-12-04T13:20:27.8887272Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64 PASSED [1.4321s] [ 41%] 2025-12-04T13:20:27.8887400Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32 PASSED [1.4404s] [ 41%] 2025-12-04T13:20:27.8887531Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64 PASSED [1.4893s] [ 41%] 2025-12-04T13:20:27.8887655Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32 PASSED [1.4682s] [ 41%] 2025-12-04T13:20:27.8887785Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64 PASSED [0.0664s] [ 41%] 2025-12-04T13:20:27.8887917Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32 PASSED [0.0141s] [ 41%] 2025-12-04T13:20:27.8888069Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32 PASSED [0.0324s] [ 41%] 2025-12-04T13:20:27.8888189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32 PASSED [0.0233s] [ 41%] 2025-12-04T13:20:27.8888319Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64 PASSED [1.4900s] [ 41%] 2025-12-04T13:20:27.8888474Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32 PASSED [1.8092s] [ 41%] 2025-12-04T13:20:27.8888607Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64 PASSED [1.4937s] [ 41%] 2025-12-04T13:20:27.8888731Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64 PASSED [1.4652s] [ 41%] 2025-12-04T13:20:27.8888851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64 PASSED [1.4742s] [ 41%] 2025-12-04T13:20:27.8888966Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32 PASSED [1.4622s] [ 41%] 2025-12-04T13:20:27.8889112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [1.4642s] [ 41%] 2025-12-04T13:20:27.8889237Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32 PASSED [1.4669s] [ 41%] 2025-12-04T13:20:27.8889359Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_complex64 PASSED [2.5273s] [ 42%] 2025-12-04T13:20:27.8889491Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64 PASSED [1.4578s] [ 42%] 2025-12-04T13:20:27.8889615Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64 PASSED [1.4757s] [ 42%] 2025-12-04T13:20:27.8889752Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32 PASSED [1.4866s] [ 42%] 2025-12-04T13:20:27.8889965Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 42%] 2025-12-04T13:20:27.8890091Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32 PASSED [1.4726s] [ 42%] 2025-12-04T13:20:27.8890248Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [1.4617s] [ 42%] 2025-12-04T13:20:27.8890405Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [1.4554s] [ 42%] 2025-12-04T13:20:27.8890549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64 PASSED [1.6269s] [ 42%] 2025-12-04T13:20:27.8890702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32 PASSED [1.4804s] [ 42%] 2025-12-04T13:20:27.8890855Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32 PASSED [1.8630s] [ 42%] 2025-12-04T13:20:27.8891013Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64 PASSED [1.4972s] [ 42%] 2025-12-04T13:20:27.8891169Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64 PASSED [1.6763s] [ 42%] 2025-12-04T13:20:27.8891313Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32 PASSED [1.4736s] [ 42%] 2025-12-04T13:20:27.8891473Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [0.0299s] [ 42%] 2025-12-04T13:20:27.8891610Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32 PASSED [0.0046s] [ 42%] 2025-12-04T13:20:27.8891758Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32 PASSED [0.0377s] [ 42%] 2025-12-04T13:20:27.8891901Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32 PASSED [1.4494s] [ 42%] 2025-12-04T13:20:27.8892073Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32 PASSED [1.4611s] [ 42%] 2025-12-04T13:20:27.8892228Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32 PASSED [1.4570s] [ 42%] 2025-12-04T13:20:27.8892390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32 PASSED [1.4349s] [ 42%] 2025-12-04T13:20:27.8892544Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32 PASSED [1.4402s] [ 42%] 2025-12-04T13:20:27.8892701Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32 PASSED [1.4529s] [ 42%] 2025-12-04T13:20:27.8892843Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32 PASSED [1.5811s] [ 42%] 2025-12-04T13:20:27.8892997Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32 PASSED [1.4332s] [ 42%] 2025-12-04T13:20:27.8893133Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32 PASSED [1.4303s] [ 42%] 2025-12-04T13:20:27.8893308Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32 PASSED [1.4240s] [ 42%] 2025-12-04T13:20:27.8893465Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_complex_cuda_complex64 PASSED [1.4380s] [ 42%] 2025-12-04T13:20:27.8893605Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32 PASSED [1.4384s] [ 42%] 2025-12-04T13:20:27.8893748Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32 PASSED [1.4376s] [ 42%] 2025-12-04T13:20:27.8893887Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32 PASSED [1.4219s] [ 42%] 2025-12-04T13:20:27.8894078Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [1.4341s] [ 42%] 2025-12-04T13:20:27.8894231Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32 PASSED [1.4481s] [ 42%] 2025-12-04T13:20:27.8894383Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32 PASSED [1.4600s] [ 42%] 2025-12-04T13:20:27.8894538Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_static_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 42%] 2025-12-04T13:20:27.8894659Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64 PASSED [1.4616s] [ 43%] 2025-12-04T13:20:27.8894789Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32 PASSED [1.4518s] [ 43%] 2025-12-04T13:20:27.8894912Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32 PASSED [1.4580s] [ 43%] 2025-12-04T13:20:27.8895040Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64 PASSED [1.4451s] [ 43%] 2025-12-04T13:20:27.8895160Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64 XFAIL [0.0060s] [ 43%] 2025-12-04T13:20:27.8895302Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32 PASSED [0.0149s] [ 43%] 2025-12-04T13:20:27.8895461Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 43%] 2025-12-04T13:20:27.8895579Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32 PASSED [0.0252s] [ 43%] 2025-12-04T13:20:27.8895694Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64 PASSED [1.4767s] [ 43%] 2025-12-04T13:20:27.8895812Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32 XFAIL [0.0060s] [ 43%] 2025-12-04T13:20:27.8895927Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32 XFAIL [0.0034s] [ 43%] 2025-12-04T13:20:27.8896059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32 PASSED [1.4526s] [ 43%] 2025-12-04T13:20:27.8896185Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64 PASSED [1.4386s] [ 43%] 2025-12-04T13:20:27.8896304Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32 PASSED [1.4487s] [ 43%] 2025-12-04T13:20:27.8896434Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32 PASSED [1.4534s] [ 43%] 2025-12-04T13:20:27.8896554Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32 PASSED [1.4776s] [ 43%] 2025-12-04T13:20:27.8896671Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64 PASSED [1.5109s] [ 43%] 2025-12-04T13:20:27.8896788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32 PASSED [1.4962s] [ 43%] 2025-12-04T13:20:27.8896941Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 43%] 2025-12-04T13:20:27.8897062Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64 PASSED [1.4996s] [ 43%] 2025-12-04T13:20:27.8897178Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32 PASSED [1.4923s] [ 43%] 2025-12-04T13:20:27.8897336Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 43%] 2025-12-04T13:20:27.8897492Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 43%] 2025-12-04T13:20:27.8897610Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64 PASSED [1.5019s] [ 43%] 2025-12-04T13:20:27.8897730Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64 PASSED [1.4763s] [ 43%] 2025-12-04T13:20:27.8897880Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64 PASSED [1.5111s] [ 43%] 2025-12-04T13:20:27.8898016Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32 PASSED [1.5241s] [ 43%] 2025-12-04T13:20:27.8898149Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32 PASSED [1.4948s] [ 43%] 2025-12-04T13:20:27.8898283Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32 PASSED [1.4952s] [ 43%] 2025-12-04T13:20:27.8898437Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32 PASSED [1.4957s] [ 43%] 2025-12-04T13:20:27.8898577Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32 PASSED [1.4939s] [ 43%] 2025-12-04T13:20:27.8898723Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32 PASSED [1.4953s] [ 43%] 2025-12-04T13:20:27.8898871Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32 PASSED [1.5039s] [ 43%] 2025-12-04T13:20:27.8899002Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32 PASSED [1.5078s] [ 43%] 2025-12-04T13:20:27.8899120Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32 PASSED [1.5353s] [ 44%] 2025-12-04T13:20:27.8899251Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64 PASSED [1.5121s] [ 44%] 2025-12-04T13:20:27.8899390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_copy_cuda_float32 PASSED [1.5034s] [ 44%] 2025-12-04T13:20:27.8899525Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64 PASSED [1.5407s] [ 44%] 2025-12-04T13:20:27.8899651Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_float32 PASSED [1.5074s] [ 44%] 2025-12-04T13:20:27.8899774Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64 PASSED [0.0219s] [ 44%] 2025-12-04T13:20:27.8899901Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32 PASSED [1.5038s] [ 44%] 2025-12-04T13:20:27.8900038Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64 PASSED [1.4969s] [ 44%] 2025-12-04T13:20:27.8900153Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32 PASSED [1.5220s] [ 44%] 2025-12-04T13:20:27.8900278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32 PASSED [1.6459s] [ 44%] 2025-12-04T13:20:27.8900399Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_copy_cuda_complex64 PASSED [1.5191s] [ 44%] 2025-12-04T13:20:27.8900518Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64 PASSED [1.5234s] [ 44%] 2025-12-04T13:20:27.8900640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32 PASSED [1.4938s] [ 44%] 2025-12-04T13:20:27.8900759Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64 PASSED [1.5133s] [ 44%] 2025-12-04T13:20:27.8900914Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__safe_softmax_default_cuda_float32 PASSED [1.5135s] [ 44%] 2025-12-04T13:20:27.8901030Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32 PASSED [1.5180s] [ 44%] 2025-12-04T13:20:27.8901163Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_complex64 PASSED [1.4901s] [ 44%] 2025-12-04T13:20:27.8901289Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64 PASSED [1.5547s] [ 44%] 2025-12-04T13:20:27.8901411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32 PASSED [1.5205s] [ 44%] 2025-12-04T13:20:27.8901549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64 PASSED [1.5559s] [ 44%] 2025-12-04T13:20:27.8901693Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64 PASSED [1.5493s] [ 44%] 2025-12-04T13:20:27.8901817Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32 PASSED [1.5274s] [ 44%] 2025-12-04T13:20:27.8901943Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_copy_cuda_complex64 PASSED [1.5133s] [ 44%] 2025-12-04T13:20:27.8902070Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64 PASSED [1.5123s] [ 44%] 2025-12-04T13:20:27.8902198Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_complex64 PASSED [1.5049s] [ 44%] 2025-12-04T13:20:27.8902338Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64 PASSED [1.5116s] [ 44%] 2025-12-04T13:20:27.8902475Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64 PASSED [1.4995s] [ 44%] 2025-12-04T13:20:27.8902597Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64 PASSED [1.5097s] [ 44%] 2025-12-04T13:20:27.8902722Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32 PASSED [1.5028s] [ 44%] 2025-12-04T13:20:27.8902848Z test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32 PASSED [0.0479s] [ 44%] 2025-12-04T13:20:27.8902973Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32 PASSED [0.0983s] [ 44%] 2025-12-04T13:20:27.8903116Z test_ops.py::TestCompositeComplianceCUDA::test_backward__segment_reduce_offsets_cuda_float32 PASSED [0.4028s] [ 44%] 2025-12-04T13:20:27.8903238Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32 PASSED [0.1842s] [ 44%] 2025-12-04T13:20:27.8903398Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32 PASSED [0.0822s] [ 44%] 2025-12-04T13:20:27.8903516Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32 PASSED [0.0542s] [ 44%] 2025-12-04T13:20:27.8903647Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32 PASSED [0.0539s] [ 45%] 2025-12-04T13:20:27.8903766Z test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32 PASSED [1.5057s] [ 45%] 2025-12-04T13:20:27.8903883Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32 PASSED [0.0724s] [ 45%] 2025-12-04T13:20:27.8904008Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32 PASSED [1.5470s] [ 45%] 2025-12-04T13:20:27.8904146Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32 PASSED [1.5281s] [ 45%] 2025-12-04T13:20:27.8904268Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32 PASSED [1.4962s] [ 45%] 2025-12-04T13:20:27.8904384Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32 PASSED [1.5148s] [ 45%] 2025-12-04T13:20:27.8904501Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32 PASSED [1.6229s] [ 45%] 2025-12-04T13:20:27.8904635Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32 PASSED [0.0786s] [ 45%] 2025-12-04T13:20:27.8904763Z test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32 PASSED [0.0105s] [ 45%] 2025-12-04T13:20:27.8904892Z test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32 PASSED [0.0886s] [ 45%] 2025-12-04T13:20:27.8905019Z test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32 PASSED [1.5224s] [ 45%] 2025-12-04T13:20:27.8905140Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32 PASSED [1.4997s] [ 45%] 2025-12-04T13:20:27.8905258Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32 PASSED [1.5157s] [ 45%] 2025-12-04T13:20:27.8905379Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32 PASSED [0.0614s] [ 45%] 2025-12-04T13:20:27.8905507Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32 PASSED [1.5168s] [ 45%] 2025-12-04T13:20:27.8905656Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32 PASSED [0.0470s] [ 45%] 2025-12-04T13:20:27.8905772Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32 PASSED [1.5038s] [ 45%] 2025-12-04T13:20:27.8905890Z test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32 PASSED [1.5257s] [ 45%] 2025-12-04T13:20:27.8906005Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32 PASSED [1.5070s] [ 45%] 2025-12-04T13:20:27.8906120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32 PASSED [1.5141s] [ 45%] 2025-12-04T13:20:27.8906262Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_copy_cuda_float32 PASSED [1.5321s] [ 45%] 2025-12-04T13:20:27.8906389Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32 PASSED [1.5161s] [ 45%] 2025-12-04T13:20:27.8906510Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32 PASSED [1.5410s] [ 45%] 2025-12-04T13:20:27.8906633Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32 PASSED [0.0352s] [ 45%] 2025-12-04T13:20:27.8906756Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32 PASSED [1.5312s] [ 45%] 2025-12-04T13:20:27.8906872Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32 PASSED [0.0103s] [ 45%] 2025-12-04T13:20:27.8906988Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32 PASSED [0.0648s] [ 45%] 2025-12-04T13:20:27.8907138Z test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_3d_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 45%] 2025-12-04T13:20:27.8907257Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32 PASSED [1.4995s] [ 45%] 2025-12-04T13:20:27.8907389Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32 PASSED [0.3710s] [ 45%] 2025-12-04T13:20:27.8907515Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32 PASSED [0.2178s] [ 45%] 2025-12-04T13:20:27.8907657Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32 PASSED [0.1240s] [ 45%] 2025-12-04T13:20:27.8907801Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32 PASSED [2.2759s] [ 45%] 2025-12-04T13:20:27.8907915Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32 PASSED [1.4849s] [ 45%] 2025-12-04T13:20:27.8908060Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32 PASSED [0.5035s] [ 46%] 2025-12-04T13:20:27.8908191Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32 PASSED [0.0448s] [ 46%] 2025-12-04T13:20:27.8908319Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32 PASSED [0.0336s] [ 46%] 2025-12-04T13:20:27.8908448Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32 PASSED [0.1188s] [ 46%] 2025-12-04T13:20:27.8908569Z test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32 PASSED [0.0948s] [ 46%] 2025-12-04T13:20:27.8908710Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32 PASSED [1.4955s] [ 46%] 2025-12-04T13:20:27.8908830Z test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32 PASSED [0.0704s] [ 46%] 2025-12-04T13:20:27.8908946Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32 PASSED [0.0381s] [ 46%] 2025-12-04T13:20:27.8909087Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [0.0270s] [ 46%] 2025-12-04T13:20:27.8909205Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32 PASSED [0.0745s] [ 46%] 2025-12-04T13:20:27.8909320Z test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32 PASSED [1.4786s] [ 46%] 2025-12-04T13:20:27.8909490Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [1.5013s] [ 46%] 2025-12-04T13:20:27.8909650Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [1.5524s] [ 46%] 2025-12-04T13:20:27.8909795Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32 PASSED [1.5166s] [ 46%] 2025-12-04T13:20:27.8909958Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [0.4086s] [ 46%] 2025-12-04T13:20:27.8910113Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32 PASSED [0.1165s] [ 46%] 2025-12-04T13:20:27.8910283Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [0.2280s] [ 46%] 2025-12-04T13:20:27.8910432Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32 PASSED [0.1567s] [ 46%] 2025-12-04T13:20:27.8910576Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32 PASSED [0.3104s] [ 46%] 2025-12-04T13:20:27.8910715Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32 PASSED [0.0368s] [ 46%] 2025-12-04T13:20:27.8910852Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32 PASSED [0.2023s] [ 46%] 2025-12-04T13:20:27.8911101Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_head_attention_forward_cuda_float32 SKIPPED [0.0003s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 46%] 2025-12-04T13:20:27.8911270Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [0.0567s] [ 46%] 2025-12-04T13:20:27.8911413Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32 PASSED [0.0348s] [ 46%] 2025-12-04T13:20:27.8911551Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32 PASSED [0.1222s] [ 46%] 2025-12-04T13:20:27.8911699Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32 PASSED [0.0173s] [ 46%] 2025-12-04T13:20:27.8911842Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32 PASSED [0.0106s] [ 46%] 2025-12-04T13:20:27.8911984Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32 PASSED [0.0109s] [ 46%] 2025-12-04T13:20:27.8912123Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32 PASSED [0.0090s] [ 46%] 2025-12-04T13:20:27.8912243Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32 PASSED [1.5302s] [ 46%] 2025-12-04T13:20:27.8912360Z test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32 PASSED [1.5244s] [ 46%] 2025-12-04T13:20:27.8912493Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32 PASSED [1.5128s] [ 46%] 2025-12-04T13:20:27.8912633Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32 PASSED [1.4865s] [ 46%] 2025-12-04T13:20:27.8912768Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32 PASSED [0.3704s] [ 46%] 2025-12-04T13:20:27.8912885Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32 PASSED [1.4928s] [ 46%] 2025-12-04T13:20:27.8913014Z test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32 PASSED [1.5454s] [ 47%] 2025-12-04T13:20:27.8913144Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32 PASSED [1.5116s] [ 47%] 2025-12-04T13:20:27.8913319Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32 PASSED [0.0152s] [ 47%] 2025-12-04T13:20:27.8913486Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [1.5166s] [ 47%] 2025-12-04T13:20:27.8913638Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_copy_cuda_float32 PASSED [1.5194s] [ 47%] 2025-12-04T13:20:27.8913760Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32 PASSED [0.0882s] [ 47%] 2025-12-04T13:20:27.8913887Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32 PASSED [0.0090s] [ 47%] 2025-12-04T13:20:27.8914003Z test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32 PASSED [0.7372s] [ 47%] 2025-12-04T13:20:27.8914133Z test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32 PASSED [1.5071s] [ 47%] 2025-12-04T13:20:27.8914267Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32 PASSED [1.5096s] [ 47%] 2025-12-04T13:20:27.8914390Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32 PASSED [1.5324s] [ 47%] 2025-12-04T13:20:27.8914514Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32 PASSED [0.0741s] [ 47%] 2025-12-04T13:20:27.8914634Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32 PASSED [0.1143s] [ 47%] 2025-12-04T13:20:27.8914756Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32 PASSED [0.0175s] [ 47%] 2025-12-04T13:20:27.8914882Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32 PASSED [0.0385s] [ 47%] 2025-12-04T13:20:27.8915013Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32 PASSED [0.0119s] [ 47%] 2025-12-04T13:20:27.8915136Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32 PASSED [0.0138s] [ 47%] 2025-12-04T13:20:27.8915255Z test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32 PASSED [0.0593s] [ 47%] 2025-12-04T13:20:27.8915373Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rmod___cuda_float32 PASSED [0.0074s] [ 47%] 2025-12-04T13:20:27.8915495Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input__chunk_cat_cuda_float32 PASSED [1.5087s] [ 47%] 2025-12-04T13:20:27.8915628Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addmv_cuda_float32 PASSED [1.5032s] [ 47%] 2025-12-04T13:20:27.8915752Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_alias_copy_cuda_float32 PASSED [1.4990s] [ 47%] 2025-12-04T13:20:27.8915887Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_as_strided_scatter_cuda_float32 PASSED [1.5029s] [ 47%] 2025-12-04T13:20:27.8916021Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_asin_cuda_float32 PASSED [1.5102s] [ 47%] 2025-12-04T13:20:27.8916145Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_bernoulli_cuda_float32 PASSED [1.5031s] [ 47%] 2025-12-04T13:20:27.8916262Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_chunk_cuda_float32 PASSED [1.5029s] [ 47%] 2025-12-04T13:20:27.8916387Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_contiguous_cuda_float32 PASSED [1.5069s] [ 47%] 2025-12-04T13:20:27.8916507Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_corrcoef_cuda_float32 PASSED [1.5072s] [ 47%] 2025-12-04T13:20:27.8916626Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cummin_cuda_float32 PASSED [1.5023s] [ 47%] 2025-12-04T13:20:27.8916766Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32 PASSED [1.5034s] [ 47%] 2025-12-04T13:20:27.8916896Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_copy_cuda_float32 PASSED [1.4989s] [ 47%] 2025-12-04T13:20:27.8917030Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagonal_scatter_cuda_float32 PASSED [1.5152s] [ 47%] 2025-12-04T13:20:27.8917146Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dot_cuda_float32 PASSED [1.4975s] [ 47%] 2025-12-04T13:20:27.8917264Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_dstack_cuda_float32 PASSED [1.5009s] [ 47%] 2025-12-04T13:20:27.8917381Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_expm1_cuda_float32 PASSED [1.4982s] [ 47%] 2025-12-04T13:20:27.8917520Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_fftshift_cuda_float32 PASSED [1.5070s] [ 48%] 2025-12-04T13:20:27.8917639Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ifft_cuda_float32 PASSED [1.4974s] [ 48%] 2025-12-04T13:20:27.8917761Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_rfft2_cuda_float32 PASSED [1.5005s] [ 48%] 2025-12-04T13:20:27.8917881Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32 PASSED [1.5178s] [ 48%] 2025-12-04T13:20:27.8918002Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gradient_cuda_float32 PASSED [1.5067s] [ 48%] 2025-12-04T13:20:27.8918131Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_igammac_cuda_float32 PASSED [0.0065s] [ 48%] 2025-12-04T13:20:27.8918267Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_index_reduce_amax_cuda_float32 PASSED [1.5224s] [ 48%] 2025-12-04T13:20:27.8918385Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isclose_cuda_float32 PASSED [1.5123s] [ 48%] 2025-12-04T13:20:27.8918504Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isin_cuda_float32 PASSED [1.5050s] [ 48%] 2025-12-04T13:20:27.8918622Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isneginf_cuda_float32 PASSED [1.5227s] [ 48%] 2025-12-04T13:20:27.8918755Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_binary_cuda_float32 PASSED [0.0083s] [ 48%] 2025-12-04T13:20:27.8918881Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cond_cuda_float32 PASSED [1.4977s] [ 48%] 2025-12-04T13:20:27.8919006Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_det_cuda_float32 PASSED [1.5083s] [ 48%] 2025-12-04T13:20:27.8919138Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_diagonal_cuda_float32 PASSED [1.4935s] [ 48%] 2025-12-04T13:20:27.8919267Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_eigvals_cuda_float32 PASSED [0.0475s] [ 48%] 2025-12-04T13:20:27.8919498Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_ldl_solve_cuda_float32 SKIPPED [0.0009s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 48%] 2025-12-04T13:20:27.8919626Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lstsq_cuda_float32 PASSED [1.0018s] [ 48%] 2025-12-04T13:20:27.8919758Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_solve_cuda_float32 PASSED [0.1862s] [ 48%] 2025-12-04T13:20:27.8919910Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_hermitian_cuda_float32 PASSED [0.0350s] [ 48%] 2025-12-04T13:20:27.8920037Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_cuda_float32 PASSED [0.0197s] [ 48%] 2025-12-04T13:20:27.8920164Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vander_cuda_float32 PASSED [0.8611s] [ 48%] 2025-12-04T13:20:27.8920299Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vector_norm_cuda_float32 PASSED [0.9338s] [ 48%] 2025-12-04T13:20:27.8920416Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_log_cuda_float32 PASSED [0.8488s] [ 48%] 2025-12-04T13:20:27.8920538Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_cuda_float32 PASSED [0.0509s] [ 48%] 2025-12-04T13:20:27.8920652Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lu_cuda_float32 PASSED [0.0411s] [ 48%] 2025-12-04T13:20:27.8920777Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_amin_cuda_float32 PASSED [0.0408s] [ 48%] 2025-12-04T13:20:27.8920904Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmax_cuda_float32 PASSED [0.0174s] [ 48%] 2025-12-04T13:20:27.8921031Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_argmin_cuda_float32 PASSED [0.8623s] [ 48%] 2025-12-04T13:20:27.8921163Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_logsumexp_cuda_float32 PASSED [0.8941s] [ 48%] 2025-12-04T13:20:27.8921297Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_normalize_cuda_float32 PASSED [0.8667s] [ 48%] 2025-12-04T13:20:27.8921437Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_scatter_cuda_float32 PASSED [0.8529s] [ 48%] 2025-12-04T13:20:27.8921574Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_reduction_no_dim_cuda_float32 PASSED [0.8681s] [ 48%] 2025-12-04T13:20:27.8921692Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_movedim_cuda_float32 PASSED [0.8548s] [ 48%] 2025-12-04T13:20:27.8921835Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [0.8546s] [ 48%] 2025-12-04T13:20:27.8921954Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmean_cuda_float32 PASSED [0.8621s] [ 48%] 2025-12-04T13:20:27.8922089Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nanmedian_cuda_float32 PASSED [0.8618s] [ 49%] 2025-12-04T13:20:27.8922224Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_native_batch_norm_cuda_float32 PASSED [0.8639s] [ 49%] 2025-12-04T13:20:27.8922340Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ne_cuda_float32 PASSED [0.0061s] [ 49%] 2025-12-04T13:20:27.8922462Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_empty_cuda_float32 PASSED [0.8587s] [ 49%] 2025-12-04T13:20:27.8922580Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_new_full_cuda_float32 PASSED [0.8502s] [ 49%] 2025-12-04T13:20:27.8922738Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [0.8709s] [ 49%] 2025-12-04T13:20:27.8922895Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [0.0122s] [ 49%] 2025-12-04T13:20:27.8923041Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_cuda_float32 PASSED [0.0938s] [ 49%] 2025-12-04T13:20:27.8923403Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv2d_cuda_float32 MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x7ac6d3801c00 size: 1024 2025-12-04T13:20:27.8923603Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x7ac6d3801c00 size: 1024 2025-12-04T13:20:27.8923793Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x7ac6d3800e00 size: 1024 2025-12-04T13:20:27.8923993Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x7ac6d3800e00 size: 1024 2025-12-04T13:20:27.8924194Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x7ac6d3801000 size: 1024 2025-12-04T13:20:27.8924383Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x7ac6d3801000 size: 1024 2025-12-04T13:20:27.8924426Z PASSED [0.0407s] [ 49%] 2025-12-04T13:20:27.8924585Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose3d_cuda_float32 PASSED [1.4705s] [ 49%] 2025-12-04T13:20:27.8924741Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cosine_similarity_cuda_float32 PASSED [1.4888s] [ 49%] 2025-12-04T13:20:27.8924889Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_cross_entropy_cuda_float32 PASSED [1.4917s] [ 49%] 2025-12-04T13:20:27.8925038Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_embedding_bag_cuda_float32 PASSED [1.5184s] [ 49%] 2025-12-04T13:20:27.8925219Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [1.4976s] [ 49%] 2025-12-04T13:20:27.8925373Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [1.7173s] [ 49%] 2025-12-04T13:20:27.8925523Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_gelu_cuda_float32 PASSED [1.4906s] [ 49%] 2025-12-04T13:20:27.8925666Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_hardtanh_cuda_float32 PASSED [1.4848s] [ 49%] 2025-12-04T13:20:27.8925823Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_margin_ranking_loss_cuda_float32 PASSED [1.4912s] [ 49%] 2025-12-04T13:20:27.8925967Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool1d_cuda_float32 PASSED [0.3134s] [ 49%] 2025-12-04T13:20:27.8926110Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_pool3d_cuda_float32 PASSED [0.2074s] [ 49%] 2025-12-04T13:20:27.8926270Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_cuda_float32 PASSED [0.0558s] [ 49%] 2025-12-04T13:20:27.8926411Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_mse_loss_cuda_float32 PASSED [0.0056s] [ 49%] 2025-12-04T13:20:27.8926556Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_constant_cuda_float32 PASSED [0.0186s] [ 49%] 2025-12-04T13:20:27.8926703Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_reflect_cuda_float32 PASSED [1.5191s] [ 49%] 2025-12-04T13:20:27.8926838Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pdist_cuda_float32 PASSED [1.4901s] [ 49%] 2025-12-04T13:20:27.8926972Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_selu_cuda_float32 PASSED [1.4906s] [ 49%] 2025-12-04T13:20:27.8927106Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_silu_cuda_float32 PASSED [1.4883s] [ 49%] 2025-12-04T13:20:27.8927262Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softmin_with_dtype_cuda_float32 PASSED [1.4948s] [ 49%] 2025-12-04T13:20:27.8927404Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_threshold_cuda_float32 PASSED [1.4963s] [ 49%] 2025-12-04T13:20:27.8927548Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_1_cuda_float32 PASSED [1.5039s] [ 49%] 2025-12-04T13:20:27.8927701Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32 PASSED [1.5237s] [ 49%] 2025-12-04T13:20:27.8927824Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32 PASSED [1.4985s] [ 49%] 2025-12-04T13:20:27.8927940Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pow_cuda_float32 PASSED [0.0093s] [ 49%] 2025-12-04T13:20:27.8928068Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_qr_cuda_float32 PASSED [0.0399s] [ 49%] 2025-12-04T13:20:27.8928192Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reciprocal_cuda_float32 PASSED [0.0041s] [ 50%] 2025-12-04T13:20:27.8928316Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reshape_as_cuda_float32 PASSED [1.5009s] [ 50%] 2025-12-04T13:20:27.8928438Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resize_as__cuda_float32 PASSED [1.4716s] [ 50%] 2025-12-04T13:20:27.8928557Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_cuda_float32 PASSED [1.4936s] [ 50%] 2025-12-04T13:20:27.8928691Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32 PASSED [1.4802s] [ 50%] 2025-12-04T13:20:27.8928808Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_rsub_cuda_float32 PASSED [1.4933s] [ 50%] 2025-12-04T13:20:27.8928936Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scalar_tensor_cuda_float32 PASSED [1.4879s] [ 50%] 2025-12-04T13:20:27.8929074Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_amax_cuda_float32 PASSED [1.5033s] [ 50%] 2025-12-04T13:20:27.8929211Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32 PASSED [1.4931s] [ 50%] 2025-12-04T13:20:27.8929354Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_bartlett_cuda_float32 PASSED [1.5208s] [ 50%] 2025-12-04T13:20:27.8929509Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_blackman_cuda_float32 PASSED [1.4981s] [ 50%] 2025-12-04T13:20:27.8929657Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_exponential_cuda_float32 PASSED [1.4915s] [ 50%] 2025-12-04T13:20:27.8929795Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hann_cuda_float32 PASSED [1.5029s] [ 50%] 2025-12-04T13:20:27.8929913Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signbit_cuda_float32 PASSED [1.4812s] [ 50%] 2025-12-04T13:20:27.8930049Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_softmax_with_dtype_cuda_float32 PASSED [1.4958s] [ 50%] 2025-12-04T13:20:27.8930193Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_bessel_j0_cuda_float32 PASSED [1.5059s] [ 50%] 2025-12-04T13:20:27.8930322Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_erfcx_cuda_float32 PASSED [1.4847s] [ 50%] 2025-12-04T13:20:27.8930448Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_i0e_cuda_float32 PASSED [1.4970s] [ 50%] 2025-12-04T13:20:27.8930597Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_i1_cuda_float32 PASSED [1.5053s] [ 50%] 2025-12-04T13:20:27.8930742Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_modified_bessel_k0_cuda_float32 PASSED [1.5061s] [ 50%] 2025-12-04T13:20:27.8930900Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_scaled_modified_bessel_k0_cuda_float32 PASSED [1.4690s] [ 50%] 2025-12-04T13:20:27.8931067Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_shifted_chebyshev_polynomial_u_cuda_float32 PASSED [0.0088s] [ 50%] 2025-12-04T13:20:27.8931198Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_split_list_args_cuda_float32 PASSED [1.4781s] [ 50%] 2025-12-04T13:20:27.8931330Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_split_with_sizes_cuda_float32 PASSED [1.4914s] [ 50%] 2025-12-04T13:20:27.8931462Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_squeeze_multiple_cuda_float32 PASSED [1.5003s] [ 50%] 2025-12-04T13:20:27.8931605Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_std_mean_cuda_float32 PASSED [1.4890s] [ 50%] 2025-12-04T13:20:27.8931731Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_std_unbiased_cuda_float32 PASSED [1.4847s] [ 50%] 2025-12-04T13:20:27.8931856Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sum_to_size_cuda_float32 PASSED [1.4874s] [ 50%] 2025-12-04T13:20:27.8931984Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_copy_cuda_float32 PASSED [1.4868s] [ 50%] 2025-12-04T13:20:27.8932100Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_t_cuda_float32 PASSED [1.4854s] [ 50%] 2025-12-04T13:20:27.8932217Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_take_cuda_float32 PASSED [1.4813s] [ 50%] 2025-12-04T13:20:27.8932480Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_torch_ops_aten__efficient_attention_forward_cuda_float32 SKIPPED [0.0011s] (Efficient attention on ROCM doesn't support custom_mask_type==2) [ 50%] 2025-12-04T13:20:27.8932613Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_triangular_solve_cuda_float32 PASSED [0.0162s] [ 50%] 2025-12-04T13:20:27.8932730Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_tril_cuda_float32 PASSED [0.0057s] [ 50%] 2025-12-04T13:20:27.8932855Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_copy_cuda_float32 PASSED [0.0059s] [ 50%] 2025-12-04T13:20:27.8932974Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unbind_cuda_float32 PASSED [0.0057s] [ 51%] 2025-12-04T13:20:27.8933090Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unfold_cuda_float32 PASSED [0.0099s] [ 51%] 2025-12-04T13:20:27.8933218Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsafe_split_cuda_float32 PASSED [0.0035s] [ 51%] 2025-12-04T13:20:27.8933370Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unsqueeze_cuda_float32 PASSED [0.0055s] [ 51%] 2025-12-04T13:20:27.8933502Z test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32 PASSED [0.0084s] [ 51%] 2025-12-04T13:20:27.8933618Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32 PASSED [0.0549s] [ 51%] 2025-12-04T13:20:27.8933804Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__segment_reduce_lengths_cuda_float32 SKIPPED [0.0014s] (Does not support forward_ad) [ 51%] 2025-12-04T13:20:27.8933987Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__segment_reduce_offsets_cuda_float32 SKIPPED [0.0013s] (Does not support forward_ad) [ 51%] 2025-12-04T13:20:27.8934147Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__unsafe_masked_index_put_accumulate_cuda_float32 PASSED [0.3822s] [ 51%] 2025-12-04T13:20:27.8934278Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32 PASSED [0.1044s] [ 51%] 2025-12-04T13:20:27.8934398Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32 PASSED [0.4012s] [ 51%] 2025-12-04T13:20:27.8934518Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32 PASSED [0.3515s] [ 51%] 2025-12-04T13:20:27.8934652Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32 PASSED [0.3257s] [ 51%] 2025-12-04T13:20:27.8934816Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32 SKIPPED [0.0016s] (Does not support autograd) [ 51%] 2025-12-04T13:20:27.8934941Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32 PASSED [0.0190s] [ 51%] 2025-12-04T13:20:27.8935069Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32 PASSED [0.0205s] [ 51%] 2025-12-04T13:20:27.8935194Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32 PASSED [0.0142s] [ 51%] 2025-12-04T13:20:27.8935359Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32 SKIPPED [0.0011s] (Does not support forward_ad) [ 51%] 2025-12-04T13:20:27.8935517Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32 SKIPPED [0.0013s] (Does not support autograd) [ 51%] 2025-12-04T13:20:27.8935693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32 SKIPPED [0.0011s] (Does not support forward_ad) [ 51%] 2025-12-04T13:20:27.8935856Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32 SKIPPED [0.0011s] (Does not support forward_ad) [ 51%] 2025-12-04T13:20:27.8936004Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32 PASSED [0.0890s] [ 51%] 2025-12-04T13:20:27.8936137Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32 PASSED [0.1973s] [ 51%] 2025-12-04T13:20:27.8936262Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32 PASSED [0.0959s] [ 51%] 2025-12-04T13:20:27.8936388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32 PASSED [0.0947s] [ 51%] 2025-12-04T13:20:27.8936507Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32 PASSED [1.5021s] [ 51%] 2025-12-04T13:20:27.8936628Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32 PASSED [0.0930s] [ 51%] 2025-12-04T13:20:27.8936759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32 PASSED [0.1110s] [ 51%] 2025-12-04T13:20:27.8936883Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32 PASSED [0.1057s] [ 51%] 2025-12-04T13:20:27.8937002Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32 PASSED [0.0108s] [ 51%] 2025-12-04T13:20:27.8937123Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32 PASSED [0.0112s] [ 51%] 2025-12-04T13:20:27.8937240Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32 PASSED [1.5184s] [ 51%] 2025-12-04T13:20:27.8937360Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32 PASSED [0.0609s] [ 51%] 2025-12-04T13:20:27.8937535Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32 SKIPPED [0.0013s] (Does not support autograd) [ 51%] 2025-12-04T13:20:27.8937654Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32 PASSED [0.0092s] [ 51%] 2025-12-04T13:20:27.8937776Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32 PASSED [1.4931s] [ 52%] 2025-12-04T13:20:27.8937904Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32 PASSED [0.1755s] [ 52%] 2025-12-04T13:20:27.8938072Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32 SKIPPED [0.0014s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8938202Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32 PASSED [0.0314s] [ 52%] 2025-12-04T13:20:27.8938362Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32 SKIPPED [0.0011s] (Does not support forward_ad) [ 52%] 2025-12-04T13:20:27.8938482Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32 PASSED [0.1105s] [ 52%] 2025-12-04T13:20:27.8938598Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32 PASSED [0.0105s] [ 52%] 2025-12-04T13:20:27.8938724Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32 PASSED [0.0886s] [ 52%] 2025-12-04T13:20:27.8938900Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_amax_cuda_float32 SKIPPED [0.0012s] (Does not support forward_ad) [ 52%] 2025-12-04T13:20:27.8939075Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_mean_cuda_float32 SKIPPED [0.0013s] (Does not support forward_ad) [ 52%] 2025-12-04T13:20:27.8939196Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32 PASSED [0.0288s] [ 52%] 2025-12-04T13:20:27.8939356Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8939523Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8939656Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32 PASSED [0.0758s] [ 52%] 2025-12-04T13:20:27.8939791Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_diagonal_cuda_float32 PASSED [0.0346s] [ 52%] 2025-12-04T13:20:27.8939931Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32 PASSED [0.2294s] [ 52%] 2025-12-04T13:20:27.8940169Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32 SKIPPED [0.0010s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 52%] 2025-12-04T13:20:27.8940345Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32 SKIPPED [0.0013s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8940478Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32 PASSED [0.3165s] [ 52%] 2025-12-04T13:20:27.8940617Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32 PASSED [0.2986s] [ 52%] 2025-12-04T13:20:27.8940747Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32 PASSED [0.2870s] [ 52%] 2025-12-04T13:20:27.8940911Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32 SKIPPED [0.0014s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8941029Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32 PASSED [0.0092s] [ 52%] 2025-12-04T13:20:27.8941157Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32 PASSED [0.0240s] [ 52%] 2025-12-04T13:20:27.8941320Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 52%] 2025-12-04T13:20:27.8941466Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32 PASSED [0.0517s] [ 52%] 2025-12-04T13:20:27.8941584Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32 PASSED [0.2606s] [ 52%] 2025-12-04T13:20:27.8941720Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32 PASSED [0.9021s] [ 52%] 2025-12-04T13:20:27.8941849Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32 PASSED [3.2329s] [ 52%] 2025-12-04T13:20:27.8941985Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32 PASSED [0.2064s] [ 52%] 2025-12-04T13:20:27.8942112Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32 PASSED [0.7596s] [ 52%] 2025-12-04T13:20:27.8942254Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32 PASSED [0.0775s] [ 52%] 2025-12-04T13:20:27.8942383Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32 PASSED [0.0365s] [ 52%] 2025-12-04T13:20:27.8942526Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32 PASSED [1.5116s] [ 52%] 2025-12-04T13:20:27.8942646Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32 PASSED [0.0486s] [ 52%] 2025-12-04T13:20:27.8942765Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32 PASSED [0.0372s] [ 53%] 2025-12-04T13:20:27.8942903Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32 PASSED [0.0080s] [ 53%] 2025-12-04T13:20:27.8943024Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32 PASSED [0.0070s] [ 53%] 2025-12-04T13:20:27.8943195Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 53%] 2025-12-04T13:20:27.8943362Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [0.0290s] [ 53%] 2025-12-04T13:20:27.8943490Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32 PASSED [1.5376s] [ 53%] 2025-12-04T13:20:27.8943625Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32 XFAIL [0.0087s] [ 53%] 2025-12-04T13:20:27.8943801Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32 SKIPPED [1.5048s] (Does not support autograd) [ 53%] 2025-12-04T13:20:27.8943966Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32 SKIPPED [0.0018s] (Does not support autograd) [ 53%] 2025-12-04T13:20:27.8944139Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [0.0608s] [ 53%] 2025-12-04T13:20:27.8944289Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32 PASSED [0.0780s] [ 53%] 2025-12-04T13:20:27.8944445Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32 PASSED [0.5198s] [ 53%] 2025-12-04T13:20:27.8944591Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32 PASSED [1.4950s] [ 53%] 2025-12-04T13:20:27.8944754Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [0.1594s] [ 53%] 2025-12-04T13:20:27.8944903Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32 PASSED [1.5200s] [ 53%] 2025-12-04T13:20:27.8945045Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32 PASSED [1.5126s] [ 53%] 2025-12-04T13:20:27.8945213Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest-exact_cuda_float32 PASSED [0.0545s] [ 53%] 2025-12-04T13:20:27.8945352Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32 PASSED [1.5997s] [ 53%] 2025-12-04T13:20:27.8945497Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32 PASSED [1.5209s] [ 53%] 2025-12-04T13:20:27.8945649Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32 PASSED [0.6983s] [ 53%] 2025-12-04T13:20:27.8945796Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32 PASSED [1.5060s] [ 53%] 2025-12-04T13:20:27.8945959Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 SKIPPED [0.0003s] (Skipped!) [ 53%] 2025-12-04T13:20:27.8946115Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [0.1960s] [ 53%] 2025-12-04T13:20:27.8946331Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32 SKIPPED [0.0013s] (Does not support forward_ad) [ 53%] 2025-12-04T13:20:27.8946474Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32 PASSED [0.4508s] [ 53%] 2025-12-04T13:20:27.8946625Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32 PASSED [0.0123s] [ 53%] 2025-12-04T13:20:27.8946780Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32 PASSED [1.7642s] [ 53%] 2025-12-04T13:20:27.8946922Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rms_norm_cuda_float32 PASSED [0.1183s] [ 53%] 2025-12-04T13:20:27.8947060Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32 PASSED [1.5291s] [ 53%] 2025-12-04T13:20:27.8947195Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32 PASSED [1.5111s] [ 53%] 2025-12-04T13:20:27.8947342Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32 PASSED [1.5238s] [ 53%] 2025-12-04T13:20:27.8947496Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32 PASSED [0.0335s] [ 53%] 2025-12-04T13:20:27.8947667Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_static_cuda_float32 SKIPPED [0.0007s] (Only runs on cpu) [ 53%] 2025-12-04T13:20:27.8947793Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32 PASSED [0.0117s] [ 53%] 2025-12-04T13:20:27.8947913Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32 PASSED [0.0152s] [ 53%] 2025-12-04T13:20:27.8948035Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32 PASSED [0.0112s] [ 54%] 2025-12-04T13:20:27.8948192Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32 PASSED [0.0264s] [ 54%] 2025-12-04T13:20:27.8948317Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32 PASSED [1.5045s] [ 54%] 2025-12-04T13:20:27.8948436Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32 PASSED [0.1399s] [ 54%] 2025-12-04T13:20:27.8948560Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32 PASSED [1.4734s] [ 54%] 2025-12-04T13:20:27.8948679Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32 PASSED [1.4951s] [ 54%] 2025-12-04T13:20:27.8948806Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32 PASSED [1.5052s] [ 54%] 2025-12-04T13:20:27.8948930Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32 PASSED [0.1077s] [ 54%] 2025-12-04T13:20:27.8949052Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32 PASSED [0.0533s] [ 54%] 2025-12-04T13:20:27.8949169Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32 PASSED [0.0844s] [ 54%] 2025-12-04T13:20:27.8949308Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32 PASSED [0.4625s] [ 54%] 2025-12-04T13:20:27.8949478Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32 SKIPPED [0.0013s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8949672Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32 SKIPPED [0.0012s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8949849Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8949969Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32 PASSED [0.0115s] [ 54%] 2025-12-04T13:20:27.8950108Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32 PASSED [0.0273s] [ 54%] 2025-12-04T13:20:27.8950280Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_mm_reduce_cuda_float32 SKIPPED [0.0005s] (Only runs on cpu) [ 54%] 2025-12-04T13:20:27.8950472Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8950662Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32 SKIPPED [0.0012s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8950847Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8951030Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32 SKIPPED [0.0012s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8951164Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32 PASSED [0.0159s] [ 54%] 2025-12-04T13:20:27.8951283Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32 PASSED [1.5103s] [ 54%] 2025-12-04T13:20:27.8951418Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32 PASSED [1.5158s] [ 54%] 2025-12-04T13:20:27.8951536Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32 PASSED [0.0989s] [ 54%] 2025-12-04T13:20:27.8951752Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32 SKIPPED [0.0007s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 54%] 2025-12-04T13:20:27.8951886Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32 PASSED [0.0314s] [ 54%] 2025-12-04T13:20:27.8952052Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32 SKIPPED [0.0011s] (Does not support forward_ad) [ 54%] 2025-12-04T13:20:27.8952224Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_torch_ops_aten__safe_softmax_default_cuda_float32 PASSED [0.0349s] [ 54%] 2025-12-04T13:20:27.8952353Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32 PASSED [0.0575s] [ 54%] 2025-12-04T13:20:27.8952532Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32 SKIPPED [0.0011s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8952651Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32 PASSED [0.0135s] [ 54%] 2025-12-04T13:20:27.8952777Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32 PASSED [0.0179s] [ 54%] 2025-12-04T13:20:27.8952941Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32 SKIPPED [0.0012s] (Does not support autograd) [ 54%] 2025-12-04T13:20:27.8953058Z test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32 PASSED [0.0042s] [ 54%] 2025-12-04T13:20:27.8953201Z test_ops.py::TestCompositeComplianceCUDA::test_operator__segment_reduce_offsets_cuda_float32 PASSED [0.1193s] [ 55%] 2025-12-04T13:20:27.8953356Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32 PASSED [1.5278s] [ 55%] 2025-12-04T13:20:27.8953478Z test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32 PASSED [1.5369s] [ 55%] 2025-12-04T13:20:27.8953595Z test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32 PASSED [1.5213s] [ 55%] 2025-12-04T13:20:27.8953743Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_copy_cuda_float32 PASSED [1.5123s] [ 55%] 2025-12-04T13:20:27.8953867Z test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32 PASSED [1.5358s] [ 55%] 2025-12-04T13:20:27.8953986Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32 PASSED [1.5681s] [ 55%] 2025-12-04T13:20:27.8954102Z test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32 PASSED [0.0074s] [ 55%] 2025-12-04T13:20:27.8954224Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32 PASSED [0.0169s] [ 55%] 2025-12-04T13:20:27.8954370Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32 PASSED [0.0139s] [ 55%] 2025-12-04T13:20:27.8954504Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32 PASSED [0.0173s] [ 55%] 2025-12-04T13:20:27.8954625Z test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32 PASSED [1.5127s] [ 55%] 2025-12-04T13:20:27.8954743Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32 PASSED [0.2318s] [ 55%] 2025-12-04T13:20:27.8954862Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32 PASSED [1.5036s] [ 55%] 2025-12-04T13:20:27.8954996Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32 PASSED [1.5267s] [ 55%] 2025-12-04T13:20:27.8955134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32 PASSED [0.0152s] [ 55%] 2025-12-04T13:20:27.8955254Z test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32 PASSED [0.0071s] [ 55%] 2025-12-04T13:20:27.8955369Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32 PASSED [1.5072s] [ 55%] 2025-12-04T13:20:27.8955486Z test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32 PASSED [1.5145s] [ 55%] 2025-12-04T13:20:27.8955607Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32 PASSED [1.5266s] [ 55%] 2025-12-04T13:20:27.8955742Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32 PASSED [1.5229s] [ 55%] 2025-12-04T13:20:27.8955863Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32 PASSED [1.5418s] [ 55%] 2025-12-04T13:20:27.8955982Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32 PASSED [0.0061s] [ 55%] 2025-12-04T13:20:27.8956110Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32 PASSED [0.0128s] [ 55%] 2025-12-04T13:20:27.8956233Z test_ops.py::TestCompositeComplianceCUDA::test_operator_geometric_cuda_float32 PASSED [0.0150s] [ 55%] 2025-12-04T13:20:27.8956351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32 PASSED [0.0091s] [ 55%] 2025-12-04T13:20:27.8956468Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32 PASSED [1.5108s] [ 55%] 2025-12-04T13:20:27.8956584Z test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32 PASSED [1.5167s] [ 55%] 2025-12-04T13:20:27.8956719Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amax_cuda_float32 PASSED [1.5140s] [ 55%] 2025-12-04T13:20:27.8956855Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_amin_cuda_float32 PASSED [1.5200s] [ 55%] 2025-12-04T13:20:27.8956970Z test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32 PASSED [1.5182s] [ 55%] 2025-12-04T13:20:27.8957093Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32 PASSED [1.5061s] [ 55%] 2025-12-04T13:20:27.8957207Z test_ops.py::TestCompositeComplianceCUDA::test_operator_item_cuda_float32 XFAIL [0.0049s] [ 55%] 2025-12-04T13:20:27.8957327Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32 PASSED [1.5350s] [ 55%] 2025-12-04T13:20:27.8957458Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32 PASSED [0.0186s] [ 55%] 2025-12-04T13:20:27.8957598Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32 PASSED [0.0164s] [ 56%] 2025-12-04T13:20:27.8957730Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32 PASSED [0.0124s] [ 56%] 2025-12-04T13:20:27.8957964Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32 SKIPPED [0.0007s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 56%] 2025-12-04T13:20:27.8958093Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32 PASSED [0.0120s] [ 56%] 2025-12-04T13:20:27.8958243Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32 PASSED [0.0069s] [ 56%] 2025-12-04T13:20:27.8958365Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32 PASSED [0.0664s] [ 56%] 2025-12-04T13:20:27.8958583Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32 SKIPPED [0.0006s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 56%] 2025-12-04T13:20:27.8958713Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32 PASSED [0.0176s] [ 56%] 2025-12-04T13:20:27.8958831Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32 PASSED [1.4914s] [ 56%] 2025-12-04T13:20:27.8958955Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32 PASSED [0.0147s] [ 56%] 2025-12-04T13:20:27.8959074Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32 PASSED [0.0198s] [ 56%] 2025-12-04T13:20:27.8959220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_tensor_overload_cuda_float32 PASSED [1.2072s] [ 56%] 2025-12-04T13:20:27.8959351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32 PASSED [0.0347s] [ 56%] 2025-12-04T13:20:27.8959478Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32 PASSED [0.0169s] [ 56%] 2025-12-04T13:20:27.8959617Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32 PASSED [0.8153s] [ 56%] 2025-12-04T13:20:27.8959743Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32 PASSED [0.1692s] [ 56%] 2025-12-04T13:20:27.8959870Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32 PASSED [0.0137s] [ 56%] 2025-12-04T13:20:27.8959995Z test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32 PASSED [1.5139s] [ 56%] 2025-12-04T13:20:27.8960148Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32 PASSED [1.5049s] [ 56%] 2025-12-04T13:20:27.8960274Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32 PASSED [0.0144s] [ 56%] 2025-12-04T13:20:27.8960394Z test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32 PASSED [0.0045s] [ 56%] 2025-12-04T13:20:27.8960512Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32 PASSED [0.0040s] [ 56%] 2025-12-04T13:20:27.8960655Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [1.5085s] [ 56%] 2025-12-04T13:20:27.8960781Z test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32 PASSED [1.5139s] [ 56%] 2025-12-04T13:20:27.8960924Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32 PASSED [1.5276s] [ 56%] 2025-12-04T13:20:27.8961041Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32 PASSED [0.0111s] [ 56%] 2025-12-04T13:20:27.8961159Z test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32 PASSED [1.4994s] [ 56%] 2025-12-04T13:20:27.8961339Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32 SKIPPED [0.0002s] (Expected: new_empty is not comparable) [ 56%] 2025-12-04T13:20:27.8961463Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32 PASSED [1.5066s] [ 56%] 2025-12-04T13:20:27.8961640Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [1.5032s] [ 56%] 2025-12-04T13:20:27.8961785Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32 PASSED [0.0192s] [ 56%] 2025-12-04T13:20:27.8961958Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [0.0613s] [ 56%] 2025-12-04T13:20:27.8962114Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32 PASSED [1.5437s] [ 56%] 2025-12-04T13:20:27.8962283Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [0.0530s] [ 56%] 2025-12-04T13:20:27.8962426Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32 PASSED [0.0293s] [ 57%] 2025-12-04T13:20:27.8962573Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32 PASSED [1.5189s] [ 57%] 2025-12-04T13:20:27.8962732Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32 PASSED [0.0224s] [ 57%] 2025-12-04T13:20:27.8962879Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32 PASSED [0.1405s] [ 57%] 2025-12-04T13:20:27.8963021Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32 PASSED [0.0100s] [ 57%] 2025-12-04T13:20:27.8963162Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32 PASSED [0.1071s] [ 57%] 2025-12-04T13:20:27.8963346Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32 PASSED [0.0116s] [ 57%] 2025-12-04T13:20:27.8963495Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32 PASSED [0.0058s] [ 57%] 2025-12-04T13:20:27.8963632Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32 PASSED [1.5035s] [ 57%] 2025-12-04T13:20:27.8963793Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32 PASSED [1.4974s] [ 57%] 2025-12-04T13:20:27.8963937Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32 PASSED [1.4967s] [ 57%] 2025-12-04T13:20:27.8964058Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32 PASSED [0.2805s] [ 57%] 2025-12-04T13:20:27.8964197Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32 PASSED [0.2790s] [ 57%] 2025-12-04T13:20:27.8964344Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32 PASSED [0.0100s] [ 57%] 2025-12-04T13:20:27.8964461Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32 PASSED [0.0119s] [ 57%] 2025-12-04T13:20:27.8964583Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32 PASSED [1.5043s] [ 57%] 2025-12-04T13:20:27.8964701Z test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32 PASSED [1.5195s] [ 57%] 2025-12-04T13:20:27.8964835Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32 PASSED [1.5065s] [ 57%] 2025-12-04T13:20:27.8964972Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32 PASSED [1.5270s] [ 57%] 2025-12-04T13:20:27.8965098Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32 PASSED [1.5208s] [ 57%] 2025-12-04T13:20:27.8965218Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32 PASSED [0.0582s] [ 57%] 2025-12-04T13:20:27.8965358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32 PASSED [0.0486s] [ 57%] 2025-12-04T13:20:27.8965476Z test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32 PASSED [0.0074s] [ 57%] 2025-12-04T13:20:27.8965645Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32 PASSED [0.0132s] [ 57%] 2025-12-04T13:20:27.8965787Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32 PASSED [0.0126s] [ 57%] 2025-12-04T13:20:27.8965908Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32 PASSED [1.5143s] [ 57%] 2025-12-04T13:20:27.8966037Z test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32 PASSED [1.5340s] [ 57%] 2025-12-04T13:20:27.8966192Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32 PASSED [0.0175s] [ 57%] 2025-12-04T13:20:27.8966358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32 PASSED [0.3138s] [ 57%] 2025-12-04T13:20:27.8966487Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32 PASSED [0.0126s] [ 57%] 2025-12-04T13:20:27.8966616Z test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_copy_cuda_float32 PASSED [0.0088s] [ 57%] 2025-12-04T13:20:27.8966751Z test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_multiple_cuda_float32 PASSED [1.5085s] [ 57%] 2025-12-04T13:20:27.8966869Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32 PASSED [1.5139s] [ 57%] 2025-12-04T13:20:27.8966993Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32 PASSED [1.5283s] [ 57%] 2025-12-04T13:20:27.8967119Z test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32 PASSED [0.4328s] [ 57%] 2025-12-04T13:20:27.8967234Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32 PASSED [1.5110s] [ 58%] 2025-12-04T13:20:27.8967466Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32 PASSED [1.5051s] [ 58%] 2025-12-04T13:20:27.8967586Z test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32 PASSED [1.5168s] [ 58%] 2025-12-04T13:20:27.8967716Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unsafe_split_cuda_float32 PASSED [1.5081s] [ 58%] 2025-12-04T13:20:27.8967852Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32 PASSED [1.5067s] [ 58%] 2025-12-04T13:20:27.8967970Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32 PASSED [1.5010s] [ 58%] 2025-12-04T13:20:27.8968094Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32 PASSED [1.5044s] [ 58%] 2025-12-04T13:20:27.8968236Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rmatmul___cuda_float32 PASSED [1.5091s] [ 58%] 2025-12-04T13:20:27.8968383Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay__segment_reduce_lengths_cuda_float32 PASSED [1.5243s] [ 58%] 2025-12-04T13:20:27.8968507Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmax_cuda_float32 PASSED [1.5179s] [ 58%] 2025-12-04T13:20:27.8968630Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_asinh_cuda_float32 PASSED [1.5348s] [ 58%] 2025-12-04T13:20:27.8968770Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_shapes_cuda_float32 PASSED [0.0045s] [ 58%] 2025-12-04T13:20:27.8968889Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_byte_cuda_float32 PASSED [1.5277s] [ 58%] 2025-12-04T13:20:27.8969007Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cat_cuda_float32 PASSED [1.5249s] [ 58%] 2025-12-04T13:20:27.8969126Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_chunk_cuda_float32 PASSED [1.5141s] [ 58%] 2025-12-04T13:20:27.8969253Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clamp_min_cuda_float32 PASSED [0.0051s] [ 58%] 2025-12-04T13:20:27.8969378Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_copysign_cuda_float32 PASSED [1.5140s] [ 58%] 2025-12-04T13:20:27.8969496Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cosh_cuda_float32 PASSED [1.5037s] [ 58%] 2025-12-04T13:20:27.8969615Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cross_cuda_float32 PASSED [1.5094s] [ 58%] 2025-12-04T13:20:27.8969748Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cummax_cuda_float32 PASSED [1.5231s] [ 58%] 2025-12-04T13:20:27.8969868Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diag_cuda_float32 PASSED [1.4960s] [ 58%] 2025-12-04T13:20:27.8970003Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_diagonal_scatter_cuda_float32 PASSED [1.5078s] [ 58%] 2025-12-04T13:20:27.8970122Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dist_cuda_float32 PASSED [1.5284s] [ 58%] 2025-12-04T13:20:27.8970264Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_no_rounding_mode_cuda_float32 PASSED [0.0056s] [ 58%] 2025-12-04T13:20:27.8970394Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_dot_cuda_float32 PASSED [1.5043s] [ 58%] 2025-12-04T13:20:27.8970513Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_cuda_float32 PASSED [0.0042s] [ 58%] 2025-12-04T13:20:27.8970646Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_empty_strided_cuda_float32 PASSED [1.5062s] [ 58%] 2025-12-04T13:20:27.8970764Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfc_cuda_float32 PASSED [1.5062s] [ 58%] 2025-12-04T13:20:27.8970886Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_erfinv_cuda_float32 PASSED [1.4988s] [ 58%] 2025-12-04T13:20:27.8971005Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_exp2_cuda_float32 PASSED [1.5023s] [ 58%] 2025-12-04T13:20:27.8971124Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_expm1_cuda_float32 PASSED [1.4974s] [ 58%] 2025-12-04T13:20:27.8971255Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_exponential_cuda_float32 PASSED [1.5328s] [ 58%] 2025-12-04T13:20:27.8971373Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_eye_cuda_float32 PASSED [1.5267s] [ 58%] 2025-12-04T13:20:27.8971500Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfftn_cuda_float32 PASSED [1.5119s] [ 58%] 2025-12-04T13:20:27.8971626Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ihfft_cuda_float32 PASSED [1.4995s] [ 58%] 2025-12-04T13:20:27.8971760Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_flatten_cuda_float32 PASSED [1.5180s] [ 59%] 2025-12-04T13:20:27.8971889Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_float_power_cuda_float32 PASSED [0.0052s] [ 59%] 2025-12-04T13:20:27.8972008Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_cuda_float32 PASSED [1.5250s] [ 59%] 2025-12-04T13:20:27.8972145Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_full_like_cuda_float32 PASSED [1.5042s] [ 59%] 2025-12-04T13:20:27.8972265Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ge_cuda_float32 PASSED [0.0051s] [ 59%] 2025-12-04T13:20:27.8972416Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_grid_sampler_3d_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 59%] 2025-12-04T13:20:27.8972544Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_heaviside_cuda_float32 PASSED [0.0041s] [ 59%] 2025-12-04T13:20:27.8972666Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32 PASSED [0.0031s] [ 59%] 2025-12-04T13:20:27.8972786Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_int_cuda_float32 PASSED [1.5157s] [ 59%] 2025-12-04T13:20:27.8972909Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isfinite_cuda_float32 PASSED [1.5176s] [ 59%] 2025-12-04T13:20:27.8973067Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_return_by_ref_cuda_float32 PASSED [0.2696s] [ 59%] 2025-12-04T13:20:27.8973186Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_le_cuda_float32 PASSED [0.0045s] [ 59%] 2025-12-04T13:20:27.8973345Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lerp_cuda_float32 PASSED [0.0047s] [ 59%] 2025-12-04T13:20:27.8973474Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eig_cuda_float32 PASSED [0.0405s] [ 59%] 2025-12-04T13:20:27.8973616Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_cuda_float32 PASSED [0.0098s] [ 59%] 2025-12-04T13:20:27.8973759Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_factor_ex_cuda_float32 PASSED [0.0083s] [ 59%] 2025-12-04T13:20:27.8973892Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_solve_cuda_float32 PASSED [0.0210s] [ 59%] 2025-12-04T13:20:27.8974031Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_matrix_rank_cuda_float32 PASSED [0.0286s] [ 59%] 2025-12-04T13:20:27.8974174Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_pinv_hermitian_cuda_float32 PASSED [0.0061s] [ 59%] 2025-12-04T13:20:27.8974308Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log10_cuda_float32 PASSED [1.5189s] [ 59%] 2025-12-04T13:20:27.8974435Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp_cuda_float32 PASSED [0.0050s] [ 59%] 2025-12-04T13:20:27.8974565Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_and_cuda_float32 PASSED [0.0042s] [ 59%] 2025-12-04T13:20:27.8974714Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logspace_tensor_overload_cuda_float32 PASSED [0.1808s] [ 59%] 2025-12-04T13:20:27.8974838Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lu_solve_cuda_float32 PASSED [0.0122s] [ 59%] 2025-12-04T13:20:27.8974966Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amax_cuda_float32 PASSED [0.0173s] [ 59%] 2025-12-04T13:20:27.8975097Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amin_cuda_float32 PASSED [1.5019s] [ 59%] 2025-12-04T13:20:27.8975228Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_median_cuda_float32 PASSED [1.4963s] [ 59%] 2025-12-04T13:20:27.8975357Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_std_cuda_float32 PASSED [1.5144s] [ 59%] 2025-12-04T13:20:27.8975504Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32 PASSED [1.4881s] [ 59%] 2025-12-04T13:20:27.8975642Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_minimum_cuda_float32 PASSED [0.0051s] [ 59%] 2025-12-04T13:20:27.8975760Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mode_cuda_float32 PASSED [1.6260s] [ 59%] 2025-12-04T13:20:27.8975884Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_movedim_cuda_float32 PASSED [1.5095s] [ 59%] 2025-12-04T13:20:27.8976004Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_msort_cuda_float32 PASSED [1.4830s] [ 59%] 2025-12-04T13:20:27.8976141Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mul_cuda_float32 PASSED [0.0051s] [ 59%] 2025-12-04T13:20:27.8976265Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nanmean_cuda_float32 PASSED [1.4922s] [ 59%] 2025-12-04T13:20:27.8976385Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nansum_cuda_float32 PASSED [1.4927s] [ 60%] 2025-12-04T13:20:27.8976524Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_native_batch_norm_cuda_float32 PASSED [1.5043s] [ 60%] 2025-12-04T13:20:27.8976645Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_neg_cuda_float32 PASSED [1.4938s] [ 60%] 2025-12-04T13:20:27.8976782Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_empty_strided_cuda_float32 PASSED [1.4956s] [ 60%] 2025-12-04T13:20:27.8976936Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_channel_shuffle_cuda_float32 PASSED [1.4974s] [ 60%] 2025-12-04T13:20:27.8977080Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_dropout_cuda_float32 PASSED [1.4931s] [ 60%] 2025-12-04T13:20:27.8977234Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_embedding_bag_cuda_float32 PASSED [1.5043s] [ 60%] 2025-12-04T13:20:27.8977411Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [1.5047s] [ 60%] 2025-12-04T13:20:27.8977600Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [1.5062s] [ 60%] 2025-12-04T13:20:27.8977754Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardsigmoid_cuda_float32 PASSED [1.4943s] [ 60%] 2025-12-04T13:20:27.8977901Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardswish_cuda_float32 PASSED [1.5020s] [ 60%] 2025-12-04T13:20:27.8978045Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_hardtanh_cuda_float32 PASSED [1.4995s] [ 60%] 2025-12-04T13:20:27.8978207Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_local_response_norm_cuda_float32 PASSED [0.0059s] [ 60%] 2025-12-04T13:20:27.8978364Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_logsigmoid_cuda_float32 PASSED [1.4855s] [ 60%] 2025-12-04T13:20:27.8978525Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multi_margin_loss_cuda_float32 PASSED [1.5036s] [ 60%] 2025-12-04T13:20:27.8978670Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_nll_loss_cuda_float32 PASSED [1.5131s] [ 60%] 2025-12-04T13:20:27.8978820Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pad_constant_cuda_float32 PASSED [1.5008s] [ 60%] 2025-12-04T13:20:27.8978971Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_pixel_shuffle_cuda_float32 PASSED [0.0043s] [ 60%] 2025-12-04T13:20:27.8979113Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_prelu_cuda_float32 PASSED [1.5030s] [ 60%] 2025-12-04T13:20:27.8979250Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_selu_cuda_float32 PASSED [1.5046s] [ 60%] 2025-12-04T13:20:27.8979366Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32 2025-12-04T13:20:27.8979371Z 2025-12-04T13:20:27.8979540Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_ops/test_ops-a2907f10ae1cea5a.xml - 2025-12-04T13:20:27.8979617Z !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T13:20:27.8979773Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py:2653: KeyboardInterrupt 2025-12-04T13:20:27.8979851Z (to show a full traceback on KeyboardInterrupt use --full-trace) 2025-12-04T13:20:27.8979938Z == 2022 passed, 72 skipped, 3218 deselected, 16 xfailed in 1792.11s (0:29:52) == 2025-12-04T13:20:27.8980000Z Command took >30min, returning 124 2025-12-04T13:20:27.8980039Z Got exit code 124 2025-12-04T13:20:27.8980082Z Retrying single test... 2025-12-04T13:20:27.8980203Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-d2f40d0772a4c51b.xml 2025-12-04T13:20:27.8980266Z ============================= test session starts ============================== 2025-12-04T13:20:27.8980378Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:20:27.8980420Z cachedir: .pytest_cache 2025-12-04T13:20:27.8980579Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:20:27.8980628Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:20:27.8980669Z configfile: pytest.ini 2025-12-04T13:20:27.8980833Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:20:27.8980917Z collecting ... collected 33666 items / 6701 deselected / 26965 selected 2025-12-04T13:20:27.8981119Z stepcurrent: skipping 5328 already run items. Running only test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32 2025-12-04T13:20:27.8981165Z Running 1 items in this shard 2025-12-04T13:20:27.8981167Z 2025-12-04T13:20:27.8981309Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32 PASSED [0.9329s] [100%] 2025-12-04T13:20:27.8981311Z 2025-12-04T13:20:27.8981483Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_ops/test_ops-d2f40d0772a4c51b.xml - 2025-12-04T13:20:27.8981549Z ====================== 1 passed, 6701 deselected in 2.20s ====================== 2025-12-04T13:20:27.8981590Z Got exit code 0 2025-12-04T13:20:27.8981672Z Test succeeded in new process, continuing with the rest of the tests 2025-12-04T13:20:27.8981789Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-a26ae297ecf7f94e.xml 2025-12-04T13:20:27.8981848Z ============================= test session starts ============================== 2025-12-04T13:20:27.8981958Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T13:20:27.8982019Z cachedir: .pytest_cache 2025-12-04T13:20:27.8982178Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T13:20:27.8982223Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T13:20:27.8982265Z configfile: pytest.ini 2025-12-04T13:20:27.8982424Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T13:20:27.8982507Z collecting ... collected 33666 items / 5329 deselected / 28337 selected 2025-12-04T13:20:27.8982562Z stepcurrent: skipping 5329 already run items. 2025-12-04T13:20:27.8982608Z Running 1373 items in this shard 2025-12-04T13:20:27.8982611Z 2025-12-04T13:20:27.8982762Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softshrink_cuda_float32 PASSED [1.0965s] [ 0%] 2025-12-04T13:20:27.8982913Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_tanhshrink_cuda_float32 PASSED [0.9334s] [ 0%] 2025-12-04T13:20:27.8983039Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32 PASSED [0.9287s] [ 0%] 2025-12-04T13:20:27.8983171Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_pca_lowrank_cuda_float32 PASSED [3.9243s] [ 0%] 2025-12-04T13:20:27.8983362Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_polygamma_polygamma_n_3_cuda_float32 PASSED [0.8832s] [ 0%] 2025-12-04T13:20:27.8983482Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_qr_cuda_float32 PASSED [0.0594s] [ 0%] 2025-12-04T13:20:27.8983613Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resize_as__cuda_float32 PASSED [0.8891s] [ 0%] 2025-12-04T13:20:27.8983771Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_round_decimals_neg_3_cuda_float32 PASSED [0.8987s] [ 0%] 2025-12-04T13:20:27.8983913Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_prod_cuda_float32 PASSED [0.9514s] [ 0%] 2025-12-04T13:20:27.8984032Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sgn_cuda_float32 PASSED [0.9064s] [ 0%] 2025-12-04T13:20:27.8984181Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_blackman_cuda_float32 PASSED [0.9085s] [ 0%] 2025-12-04T13:20:27.8984337Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_general_cosine_cuda_float32 PASSED [0.8921s] [ 0%] 2025-12-04T13:20:27.8984480Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_hann_cuda_float32 PASSED [0.9013s] [ 0%] 2025-12-04T13:20:27.8984600Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sinh_cuda_float32 PASSED [0.8925s] [ 1%] 2025-12-04T13:20:27.8984724Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_softmax_cuda_float32 PASSED [0.9118s] [ 1%] 2025-12-04T13:20:27.8984862Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_y0_cuda_float32 PASSED [0.9003s] [ 1%] 2025-12-04T13:20:27.8985022Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_u_cuda_float32 PASSED [0.0087s] [ 1%] 2025-12-04T13:20:27.8985153Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_entr_cuda_float32 PASSED [0.8902s] [ 1%] 2025-12-04T13:20:27.8985301Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_erfcx_cuda_float32 PASSED [0.8956s] [ 1%] 2025-12-04T13:20:27.8985453Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_i1_cuda_float32 PASSED [0.8961s] [ 1%] 2025-12-04T13:20:27.8985584Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_ndtr_cuda_float32 PASSED [0.8933s] [ 1%] 2025-12-04T13:20:27.8985743Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_scaled_modified_bessel_k1_cuda_float32 PASSED [0.8931s] [ 1%] 2025-12-04T13:20:27.8985910Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_v_cuda_float32 PASSED [0.0088s] [ 1%] 2025-12-04T13:20:27.8986090Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_w_cuda_float32 PASSED [0.4698s] [ 1%] 2025-12-04T13:20:27.8986224Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_xlog1py_cuda_float32 PASSED [0.0239s] [ 1%] 2025-12-04T13:20:27.8986347Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_std_cuda_float32 PASSED [0.8974s] [ 1%] 2025-12-04T13:20:27.8986465Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sub_cuda_float32 PASSED [0.8974s] [ 1%] 2025-12-04T13:20:27.8986594Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_to_size_cuda_float32 PASSED [0.8866s] [ 2%] 2025-12-04T13:20:27.8986712Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32 PASSED [0.9578s] [ 2%] 2025-12-04T13:20:27.8986831Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tan_cuda_float32 PASSED [0.8972s] [ 2%] 2025-12-04T13:20:27.8986960Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tensordot_cuda_float32 PASSED [0.8929s] [ 2%] 2025-12-04T13:20:27.8987081Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tile_cuda_float32 PASSED [0.0121s] [ 2%] 2025-12-04T13:20:27.8987198Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_to_cuda_float32 PASSED [0.8999s] [ 2%] 2025-12-04T13:20:27.8987479Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_torch_ops_aten__efficient_attention_forward_cuda_float32 SKIPPED [0.0011s] (Efficient attention on ROCM doesn't support custom_mask_type==2) [ 2%] 2025-12-04T13:20:27.8987599Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_trace_cuda_float32 PASSED [0.8933s] [ 2%] 2025-12-04T13:20:27.8987729Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_true_divide_cuda_float32 PASSED [0.0699s] [ 2%] 2025-12-04T13:20:27.8987860Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unique_cuda_float32 PASSED [0.3265s] [ 2%] 2025-12-04T13:20:27.8987993Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_split_cuda_float32 PASSED [0.0033s] [ 2%] 2025-12-04T13:20:27.8988126Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_copy_cuda_float32 PASSED [0.8929s] [ 2%] 2025-12-04T13:20:27.8988249Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_view_as_cuda_float32 PASSED [0.9155s] [ 2%] 2025-12-04T13:20:27.8988371Z test_ops.py::TestCompositeComplianceCUDA::test_view_replay_xlogy_cuda_float32 PASSED [0.0049s] [ 2%] 2025-12-04T13:20:27.8988469Z test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64 PASSED [0.9032s] [ 3%] 2025-12-04T13:20:27.8988576Z test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64 PASSED [0.0131s] [ 3%] 2025-12-04T13:20:27.8988680Z test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64 PASSED [0.0697s] [ 3%] 2025-12-04T13:20:27.8988805Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64 PASSED [0.8917s] [ 3%] 2025-12-04T13:20:27.8988912Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64 PASSED [0.8955s] [ 3%] 2025-12-04T13:20:27.8989025Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64 PASSED [0.8960s] [ 3%] 2025-12-04T13:20:27.8989208Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_copy_cuda_complex64 SKIPPED [0.0002s] (Errors when storage_offset is included) [ 3%] 2025-12-04T13:20:27.8989318Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64 PASSED [0.8881s] [ 3%] 2025-12-04T13:20:27.8989425Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64 PASSED [0.8938s] [ 3%] 2025-12-04T13:20:27.8989539Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64 PASSED [0.0073s] [ 3%] 2025-12-04T13:20:27.8989649Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumprod_cuda_complex64 PASSED [0.0094s] [ 3%] 2025-12-04T13:20:27.8989755Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64 PASSED [0.8946s] [ 3%] 2025-12-04T13:20:27.8989882Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64 PASSED [0.9008s] [ 3%] 2025-12-04T13:20:27.8990061Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_strided_cuda_complex64 SKIPPED [0.0002s] (Expected: empty_strided is not comparable) [ 4%] 2025-12-04T13:20:27.8990174Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64 PASSED [3.3153s] [ 4%] 2025-12-04T13:20:27.8990285Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64 PASSED [3.0086s] [ 4%] 2025-12-04T13:20:27.8990395Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64 PASSED [1.4387s] [ 4%] 2025-12-04T13:20:27.8990508Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64 PASSED [2.2904s] [ 4%] 2025-12-04T13:20:27.8990626Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64 PASSED [0.0111s] [ 4%] 2025-12-04T13:20:27.8990738Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64 PASSED [0.0040s] [ 4%] 2025-12-04T13:20:27.8990845Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64 PASSED [0.8375s] [ 4%] 2025-12-04T13:20:27.8990957Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64 PASSED [1.1316s] [ 4%] 2025-12-04T13:20:27.8991084Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64 XFAIL [0.0038s] [ 4%] 2025-12-04T13:20:27.8991213Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64 PASSED [0.8650s] [ 4%] 2025-12-04T13:20:27.8991325Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logaddexp_cuda_complex64 PASSED [0.0119s] [ 4%] 2025-12-04T13:20:27.8991438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64 PASSED [0.0055s] [ 4%] 2025-12-04T13:20:27.8991563Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64 PASSED [0.8366s] [ 4%] 2025-12-04T13:20:27.8991670Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64 PASSED [0.8514s] [ 5%] 2025-12-04T13:20:27.8991780Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64 PASSED [0.8414s] [ 5%] 2025-12-04T13:20:27.8991907Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64 PASSED [0.8357s] [ 5%] 2025-12-04T13:20:27.8992053Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [0.8330s] [ 5%] 2025-12-04T13:20:27.8992160Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64 XFAIL [0.0041s] [ 5%] 2025-12-04T13:20:27.8992267Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64 PASSED [0.8334s] [ 5%] 2025-12-04T13:20:27.8992375Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_renorm_cuda_complex64 PASSED [0.8293s] [ 5%] 2025-12-04T13:20:27.8992484Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64 PASSED [0.0158s] [ 5%] 2025-12-04T13:20:27.8992598Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64 PASSED [0.8209s] [ 5%] 2025-12-04T13:20:27.8992709Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64 PASSED [0.8358s] [ 5%] 2025-12-04T13:20:27.8992847Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64 PASSED [0.8381s] [ 5%] 2025-12-04T13:20:27.8992958Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64 PASSED [0.8235s] [ 5%] 2025-12-04T13:20:27.8993067Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64 PASSED [0.8329s] [ 5%] 2025-12-04T13:20:27.8993175Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64 PASSED [0.8361s] [ 5%] 2025-12-04T13:20:27.8993326Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64 PASSED [0.8319s] [ 6%] 2025-12-04T13:20:27.8993424Z test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64 PASSED [0.8270s] [ 6%] 2025-12-04T13:20:27.8993544Z test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64 PASSED [0.8579s] [ 6%] 2025-12-04T13:20:27.8993645Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64 PASSED [0.8887s] [ 6%] 2025-12-04T13:20:27.8993761Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64 PASSED [0.8483s] [ 6%] 2025-12-04T13:20:27.8993860Z test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64 PASSED [0.8334s] [ 6%] 2025-12-04T13:20:27.8994027Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_copy_cuda_complex64 SKIPPED [0.0002s] (Errors when storage_offset is included) [ 6%] 2025-12-04T13:20:27.8994127Z test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64 PASSED [0.8323s] [ 6%] 2025-12-04T13:20:27.8994236Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64 PASSED [0.8272s] [ 6%] 2025-12-04T13:20:27.8994339Z test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64 PASSED [0.8473s] [ 6%] 2025-12-04T13:20:27.8994450Z test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64 XFAIL [0.0159s] [ 6%] 2025-12-04T13:20:27.8994641Z test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64 PASSED [0.8288s] [ 6%] 2025-12-04T13:20:27.8994741Z test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64 PASSED [0.0057s] [ 6%] 2025-12-04T13:20:27.8994853Z test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64 PASSED [0.0095s] [ 6%] 2025-12-04T13:20:27.8994953Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64 PASSED [0.8409s] [ 7%] 2025-12-04T13:20:27.8995062Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64 PASSED [0.8531s] [ 7%] 2025-12-04T13:20:27.8995180Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 7%] 2025-12-04T13:20:27.8995318Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64 SKIPPED [0.0001s] (Skipped!) [ 7%] 2025-12-04T13:20:27.8995425Z test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64 PASSED [0.8288s] [ 7%] 2025-12-04T13:20:27.8995530Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64 PASSED [2.3359s] [ 7%] 2025-12-04T13:20:27.8995636Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64 PASSED [0.8960s] [ 7%] 2025-12-04T13:20:27.8995744Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64 PASSED [1.4366s] [ 7%] 2025-12-04T13:20:27.8995847Z test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64 PASSED [0.0158s] [ 7%] 2025-12-04T13:20:27.8995948Z test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64 PASSED [0.8285s] [ 7%] 2025-12-04T13:20:27.8996052Z test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64 PASSED [0.8229s] [ 7%] 2025-12-04T13:20:27.8996151Z test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64 PASSED [0.8323s] [ 7%] 2025-12-04T13:20:27.8996257Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64 PASSED [0.8513s] [ 7%] 2025-12-04T13:20:27.8996362Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64 PASSED [0.8717s] [ 8%] 2025-12-04T13:20:27.8996460Z test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64 PASSED [0.8327s] [ 8%] 2025-12-04T13:20:27.8996575Z test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64 PASSED [0.8490s] [ 8%] 2025-12-04T13:20:27.8996674Z test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64 PASSED [0.8294s] [ 8%] 2025-12-04T13:20:27.8996777Z test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64 PASSED [0.8372s] [ 8%] 2025-12-04T13:20:27.8996889Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64 PASSED [0.1232s] [ 8%] 2025-12-04T13:20:27.8996998Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64 PASSED [0.0129s] [ 8%] 2025-12-04T13:20:27.8997108Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64 PASSED [0.0076s] [ 8%] 2025-12-04T13:20:27.8997323Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64 SKIPPED [0.0009s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 8%] 2025-12-04T13:20:27.8997433Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64 PASSED [1.0962s] [ 8%] 2025-12-04T13:20:27.8997566Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [1.0781s] [ 8%] 2025-12-04T13:20:27.8997681Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64 PASSED [0.0638s] [ 8%] 2025-12-04T13:20:27.8997799Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64 PASSED [0.0653s] [ 8%] 2025-12-04T13:20:27.8997930Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [0.0293s] [ 8%] 2025-12-04T13:20:27.8998039Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64 PASSED [0.0382s] [ 9%] 2025-12-04T13:20:27.8998147Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64 PASSED [0.0236s] [ 9%] 2025-12-04T13:20:27.8998254Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64 PASSED [0.0318s] [ 9%] 2025-12-04T13:20:27.8998371Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64 PASSED [0.0074s] [ 9%] 2025-12-04T13:20:27.8998488Z test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64 XFAIL [0.0039s] [ 9%] 2025-12-04T13:20:27.8998586Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64 PASSED [0.0067s] [ 9%] 2025-12-04T13:20:27.8998707Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64 PASSED [0.8921s] [ 9%] 2025-12-04T13:20:27.8998828Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64 PASSED [0.0092s] [ 9%] 2025-12-04T13:20:27.8998934Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64 PASSED [0.8677s] [ 9%] 2025-12-04T13:20:27.8999042Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64 PASSED [0.0087s] [ 9%] 2025-12-04T13:20:27.8999152Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64 PASSED [0.8852s] [ 9%] 2025-12-04T13:20:27.8999281Z test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64 PASSED [0.8803s] [ 9%] 2025-12-04T13:20:27.8999381Z test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64 PASSED [0.0095s] [ 9%] 2025-12-04T13:20:27.8999502Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 9%] 2025-12-04T13:20:27.8999635Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_channel_shuffle_cuda_complex64 PASSED [0.8769s] [ 10%] 2025-12-04T13:20:27.8999765Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64 PASSED [0.9134s] [ 10%] 2025-12-04T13:20:27.8999902Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64 PASSED [0.8764s] [ 10%] 2025-12-04T13:20:27.9000026Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_rms_norm_cuda_complex64 PASSED [0.8752s] [ 10%] 2025-12-04T13:20:27.9000164Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [0.8715s] [ 10%] 2025-12-04T13:20:27.9000302Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64 PASSED [1.0118s] [ 10%] 2025-12-04T13:20:27.9000409Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64 PASSED [0.8679s] [ 10%] 2025-12-04T13:20:27.9000514Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64 PASSED [0.8876s] [ 10%] 2025-12-04T13:20:27.9000619Z test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64 PASSED [0.8727s] [ 10%] 2025-12-04T13:20:27.9000719Z test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64 PASSED [0.4119s] [ 10%] 2025-12-04T13:20:27.9000829Z test_ops.py::TestMathBitsCUDA::test_conj_view_permute_copy_cuda_complex64 PASSED [0.8647s] [ 10%] 2025-12-04T13:20:27.9000939Z test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64 PASSED [0.9249s] [ 10%] 2025-12-04T13:20:27.9001045Z test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64 PASSED [0.8782s] [ 10%] 2025-12-04T13:20:27.9001154Z test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64 PASSED [0.8762s] [ 10%] 2025-12-04T13:20:27.9001257Z test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64 PASSED [0.8786s] [ 11%] 2025-12-04T13:20:27.9001356Z test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64 PASSED [0.9024s] [ 11%] 2025-12-04T13:20:27.9001457Z test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64 PASSED [0.8757s] [ 11%] 2025-12-04T13:20:27.9001586Z test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64 SKIPPED [0.0002s] (Skipped!) [ 11%] 2025-12-04T13:20:27.9001687Z test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64 PASSED [0.8812s] [ 11%] 2025-12-04T13:20:27.9001808Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_copy_cuda_complex64 PASSED [0.8868s] [ 11%] 2025-12-04T13:20:27.9001924Z test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_multiple_cuda_complex64 PASSED [0.8944s] [ 11%] 2025-12-04T13:20:27.9002033Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64 PASSED [0.8814s] [ 11%] 2025-12-04T13:20:27.9002146Z test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64 PASSED [0.8859s] [ 11%] 2025-12-04T13:20:27.9002242Z test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64 PASSED [0.8978s] [ 11%] 2025-12-04T13:20:27.9002340Z test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64 PASSED [1.0063s] [ 11%] 2025-12-04T13:20:27.9002452Z test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64 PASSED [0.8709s] [ 11%] 2025-12-04T13:20:27.9002552Z test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64 PASSED [0.8816s] [ 11%] 2025-12-04T13:20:27.9002659Z test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64 PASSED [0.0170s] [ 12%] 2025-12-04T13:20:27.9002767Z test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64 PASSED [0.0115s] [ 12%] 2025-12-04T13:20:27.9002868Z test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64 PASSED [0.0281s] [ 12%] 2025-12-04T13:20:27.9002972Z test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64 XFAIL [0.0041s] [ 12%] 2025-12-04T13:20:27.9003076Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64 PASSED [0.8767s] [ 12%] 2025-12-04T13:20:27.9003174Z test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64 PASSED [0.8663s] [ 12%] 2025-12-04T13:20:27.9003316Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64 PASSED [0.8605s] [ 12%] 2025-12-04T13:20:27.9003487Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64 SKIPPED [0.0016s] (Operation doesn't support conjugated inputs.) [ 12%] 2025-12-04T13:20:27.9003589Z test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64 PASSED [0.8660s] [ 12%] 2025-12-04T13:20:27.9003703Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128 PASSED [0.8892s] [ 12%] 2025-12-04T13:20:27.9003817Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128 PASSED [0.8687s] [ 12%] 2025-12-04T13:20:27.9003946Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128 PASSED [0.8524s] [ 12%] 2025-12-04T13:20:27.9004063Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128 PASSED [0.8647s] [ 12%] 2025-12-04T13:20:27.9004178Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128 PASSED [0.8563s] [ 12%] 2025-12-04T13:20:27.9004356Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128 SKIPPED [0.0002s] (Errors when storage_offset is included) [ 13%] 2025-12-04T13:20:27.9004569Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128 SKIPPED [0.0001s] (Errors when storage_offset is included) [ 13%] 2025-12-04T13:20:27.9004685Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128 PASSED [0.8692s] [ 13%] 2025-12-04T13:20:27.9004806Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128 PASSED [0.8567s] [ 13%] 2025-12-04T13:20:27.9004929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_block_diag_cuda_complex128 PASSED [0.8619s] [ 13%] 2025-12-04T13:20:27.9005057Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128 PASSED [0.8609s] [ 13%] 2025-12-04T13:20:27.9005171Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128 PASSED [0.8569s] [ 13%] 2025-12-04T13:20:27.9005296Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_count_nonzero_cuda_complex128 PASSED [0.8640s] [ 13%] 2025-12-04T13:20:27.9005419Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128 PASSED [0.8612s] [ 13%] 2025-12-04T13:20:27.9005533Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dot_cuda_complex128 PASSED [0.8722s] [ 13%] 2025-12-04T13:20:27.9005647Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128 PASSED [0.8550s] [ 13%] 2025-12-04T13:20:27.9005815Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128 SKIPPED [0.0002s] (Expected: empty is not comparable) [ 13%] 2025-12-04T13:20:27.9005954Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128 PASSED [0.8606s] [ 13%] 2025-12-04T13:20:27.9006068Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128 PASSED [0.8675s] [ 13%] 2025-12-04T13:20:27.9006186Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128 PASSED [1.7226s] [ 14%] 2025-12-04T13:20:27.9006320Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128 PASSED [1.4890s] [ 14%] 2025-12-04T13:20:27.9006435Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128 PASSED [0.8626s] [ 14%] 2025-12-04T13:20:27.9006550Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128 PASSED [0.8627s] [ 14%] 2025-12-04T13:20:27.9006669Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128 PASSED [0.8658s] [ 14%] 2025-12-04T13:20:27.9006786Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_item_cuda_complex128 PASSED [0.8612s] [ 14%] 2025-12-04T13:20:27.9006917Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128 PASSED [0.8608s] [ 14%] 2025-12-04T13:20:27.9007039Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128 PASSED [0.8706s] [ 14%] 2025-12-04T13:20:27.9007166Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128 PASSED [0.9213s] [ 14%] 2025-12-04T13:20:27.9007298Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128 PASSED [0.8698s] [ 14%] 2025-12-04T13:20:27.9007440Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_tensor_overload_cuda_complex128 PASSED [0.8563s] [ 14%] 2025-12-04T13:20:27.9007559Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128 XFAIL [0.0037s] [ 14%] 2025-12-04T13:20:27.9007690Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128 PASSED [0.8668s] [ 14%] 2025-12-04T13:20:27.9007809Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128 PASSED [0.8525s] [ 15%] 2025-12-04T13:20:27.9007958Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128 PASSED [0.8609s] [ 15%] 2025-12-04T13:20:27.9008110Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [0.8685s] [ 15%] 2025-12-04T13:20:27.9008269Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128 SKIPPED [0.0002s] (Test expects tensor input) [ 15%] 2025-12-04T13:20:27.9008397Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_renorm_cuda_complex128 PASSED [0.8598s] [ 15%] 2025-12-04T13:20:27.9008510Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128 PASSED [0.0038s] [ 15%] 2025-12-04T13:20:27.9008623Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128 PASSED [0.8593s] [ 15%] 2025-12-04T13:20:27.9008736Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128 PASSED [0.8598s] [ 15%] 2025-12-04T13:20:27.9008879Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128 PASSED [0.8637s] [ 15%] 2025-12-04T13:20:27.9008993Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128 PASSED [0.8613s] [ 15%] 2025-12-04T13:20:27.9009103Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128 PASSED [0.8615s] [ 15%] 2025-12-04T13:20:27.9009214Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128 PASSED [0.8535s] [ 15%] 2025-12-04T13:20:27.9009323Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128 PASSED [0.8645s] [ 15%] 2025-12-04T13:20:27.9009445Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128 PASSED [0.8671s] [ 15%] 2025-12-04T13:20:27.9009571Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128 PASSED [0.8531s] [ 16%] 2025-12-04T13:20:27.9009692Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128 PASSED [0.8566s] [ 16%] 2025-12-04T13:20:27.9009803Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128 PASSED [0.8687s] [ 16%] 2025-12-04T13:20:27.9009930Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_copy_cuda_complex128 PASSED [0.8590s] [ 16%] 2025-12-04T13:20:27.9010055Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128 PASSED [0.8565s] [ 16%] 2025-12-04T13:20:27.9010171Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128 PASSED [0.8666s] [ 16%] 2025-12-04T13:20:27.9010284Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128 PASSED [0.8607s] [ 16%] 2025-12-04T13:20:27.9010410Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_cuda_complex128 PASSED [0.8707s] [ 16%] 2025-12-04T13:20:27.9010520Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128 PASSED [0.8709s] [ 16%] 2025-12-04T13:20:27.9010632Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128 PASSED [0.8555s] [ 16%] 2025-12-04T13:20:27.9010753Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_copy_cuda_complex128 PASSED [0.8530s] [ 16%] 2025-12-04T13:20:27.9010862Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128 PASSED [0.8609s] [ 16%] 2025-12-04T13:20:27.9010978Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128 PASSED [0.8581s] [ 16%] 2025-12-04T13:20:27.9011092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128 PASSED [0.8492s] [ 16%] 2025-12-04T13:20:27.9011205Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128 PASSED [0.8575s] [ 17%] 2025-12-04T13:20:27.9011335Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128 PASSED [0.8682s] [ 17%] 2025-12-04T13:20:27.9011461Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128 PASSED [0.8554s] [ 17%] 2025-12-04T13:20:27.9011584Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128 PASSED [0.8669s] [ 17%] 2025-12-04T13:20:27.9011694Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128 PASSED [0.8515s] [ 17%] 2025-12-04T13:20:27.9011817Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128 SKIPPED [0.0002s] (Skipped!) [ 17%] 2025-12-04T13:20:27.9011927Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp2_cuda_complex128 PASSED [1.0499s] [ 17%] 2025-12-04T13:20:27.9012054Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128 PASSED [0.8629s] [ 17%] 2025-12-04T13:20:27.9012166Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128 PASSED [1.2173s] [ 17%] 2025-12-04T13:20:27.9012277Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128 PASSED [0.8871s] [ 17%] 2025-12-04T13:20:27.9012389Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128 PASSED [0.8758s] [ 17%] 2025-12-04T13:20:27.9012496Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128 PASSED [0.8754s] [ 17%] 2025-12-04T13:20:27.9012607Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128 PASSED [0.8764s] [ 17%] 2025-12-04T13:20:27.9012719Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128 PASSED [0.8715s] [ 17%] 2025-12-04T13:20:27.9012829Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128 PASSED [0.8747s] [ 18%] 2025-12-04T13:20:27.9012949Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128 PASSED [0.8717s] [ 18%] 2025-12-04T13:20:27.9013058Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128 PASSED [0.8765s] [ 18%] 2025-12-04T13:20:27.9015800Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128 PASSED [0.8788s] [ 18%] 2025-12-04T13:20:27.9015954Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128 PASSED [1.1853s] [ 18%] 2025-12-04T13:20:27.9016092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128 XFAIL [0.1253s] [ 18%] 2025-12-04T13:20:27.9016199Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128 XFAIL [0.8782s] [ 18%] 2025-12-04T13:20:27.9016331Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128 PASSED [0.8848s] [ 18%] 2025-12-04T13:20:27.9016452Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_diagonal_cuda_complex128 PASSED [0.0042s] [ 18%] 2025-12-04T13:20:27.9016567Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128 PASSED [0.0690s] [ 18%] 2025-12-04T13:20:27.9016687Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128 PASSED [0.8368s] [ 18%] 2025-12-04T13:20:27.9016802Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128 PASSED [0.9005s] [ 18%] 2025-12-04T13:20:27.9016919Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128 PASSED [0.9010s] [ 18%] 2025-12-04T13:20:27.9017032Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128 PASSED [0.8443s] [ 19%] 2025-12-04T13:20:27.9017175Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128 PASSED [0.8438s] [ 19%] 2025-12-04T13:20:27.9017288Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128 PASSED [0.8952s] [ 19%] 2025-12-04T13:20:27.9018630Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128 PASSED [0.8477s] [ 19%] 2025-12-04T13:20:27.9018926Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128 PASSED [0.8436s] [ 19%] 2025-12-04T13:20:27.9019143Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128 PASSED [0.8477s] [ 19%] 2025-12-04T13:20:27.9019887Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128 PASSED [0.0052s] [ 19%] 2025-12-04T13:20:27.9020138Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128 PASSED [0.8473s] [ 19%] 2025-12-04T13:20:27.9020371Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128 PASSED [0.8366s] [ 19%] 2025-12-04T13:20:27.9020607Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128 PASSED [0.8367s] [ 19%] 2025-12-04T13:20:27.9020847Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128 PASSED [0.8410s] [ 19%] 2025-12-04T13:20:27.9021140Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128 PASSED [0.8623s] [ 19%] 2025-12-04T13:20:27.9021354Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128 PASSED [0.8510s] [ 19%] 2025-12-04T13:20:27.9021558Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128 PASSED [0.8335s] [ 19%] 2025-12-04T13:20:27.9021817Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128 SKIPPED [0.0002s] (Skipped!) [ 20%] 2025-12-04T13:20:27.9022144Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128 PASSED [0.8320s] [ 20%] 2025-12-04T13:20:27.9022394Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128 PASSED [0.8429s] [ 20%] 2025-12-04T13:20:27.9022637Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_rms_norm_cuda_complex128 PASSED [0.8478s] [ 20%] 2025-12-04T13:20:27.9022910Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [0.8438s] [ 20%] 2025-12-04T13:20:27.9023216Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128 PASSED [0.8363s] [ 20%] 2025-12-04T13:20:27.9023547Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_static_cuda_complex128 SKIPPED [0.0010s] (Only runs on cpu) [ 20%] 2025-12-04T13:20:27.9023819Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128 PASSED [0.8432s] [ 20%] 2025-12-04T13:20:27.9024017Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128 PASSED [0.8466s] [ 20%] 2025-12-04T13:20:27.9024229Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128 PASSED [0.8343s] [ 20%] 2025-12-04T13:20:27.9024633Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128 SKIPPED [0.0016s] (Operation not tested with tensors with negative bit.) [ 20%] 2025-12-04T13:20:27.9024850Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128 PASSED [0.8484s] [ 20%] 2025-12-04T13:20:27.9025044Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128 PASSED [0.0049s] [ 20%] 2025-12-04T13:20:27.9025251Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128 PASSED [0.8387s] [ 20%] 2025-12-04T13:20:27.9025449Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128 PASSED [0.8301s] [ 21%] 2025-12-04T13:20:27.9025655Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128 PASSED [0.8397s] [ 21%] 2025-12-04T13:20:27.9025861Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128 PASSED [1.0228s] [ 21%] 2025-12-04T13:20:27.9026064Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128 PASSED [0.8328s] [ 21%] 2025-12-04T13:20:27.9026284Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_copy_cuda_complex128 PASSED [0.8416s] [ 21%] 2025-12-04T13:20:27.9026496Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128 PASSED [0.8396s] [ 21%] 2025-12-04T13:20:27.9026693Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128 PASSED [0.8420s] [ 21%] 2025-12-04T13:20:27.9026906Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128 PASSED [0.8325s] [ 21%] 2025-12-04T13:20:27.9027126Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128 PASSED [0.8429s] [ 21%] 2025-12-04T13:20:27.9027337Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128 PASSED [0.8359s] [ 21%] 2025-12-04T13:20:27.9027552Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_copy_cuda_complex128 PASSED [0.8450s] [ 21%] 2025-12-04T13:20:27.9027762Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128 PASSED [0.8340s] [ 21%] 2025-12-04T13:20:27.9027967Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128 PASSED [0.8392s] [ 21%] 2025-12-04T13:20:27.9028197Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128 XFAIL [0.0055s] [ 21%] 2025-12-04T13:20:27.9028416Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsafe_split_cuda_complex128 PASSED [0.8341s] [ 22%] 2025-12-04T13:20:27.9028610Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128 PASSED [0.0050s] [ 22%] 2025-12-04T13:20:27.9028952Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128 SKIPPED [0.0012s] (Operation doesn't support conjugated inputs.) [ 22%] 2025-12-04T13:20:27.9029151Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128 PASSED [0.8305s] [ 22%] 2025-12-04T13:20:27.9029345Z test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64 PASSED [0.0144s] [ 22%] 2025-12-04T13:20:27.9029529Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64 PASSED [0.0111s] [ 22%] 2025-12-04T13:20:27.9029716Z test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64 PASSED [0.0136s] [ 22%] 2025-12-04T13:20:27.9029907Z test_ops.py::TestMathBitsCUDA::test_neg_view__chunk_cat_cuda_float64 PASSED [0.0097s] [ 22%] 2025-12-04T13:20:27.9030146Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64 PASSED [0.8790s] [ 22%] 2025-12-04T13:20:27.9030374Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64 PASSED [0.8997s] [ 22%] 2025-12-04T13:20:27.9030625Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64 PASSED [0.8906s] [ 22%] 2025-12-04T13:20:27.9030858Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64 PASSED [0.8850s] [ 22%] 2025-12-04T13:20:27.9031078Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64 PASSED [0.9045s] [ 22%] 2025-12-04T13:20:27.9031353Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64 PASSED [0.8879s] [ 23%] 2025-12-04T13:20:27.9031546Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64 PASSED [0.8946s] [ 23%] 2025-12-04T13:20:27.9031761Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_block_diag_cuda_float64 PASSED [0.8987s] [ 23%] 2025-12-04T13:20:27.9031914Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64 PASSED [0.8866s] [ 23%] 2025-12-04T13:20:27.9032083Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cauchy_cuda_float64 XFAIL [0.0049s] [ 23%] 2025-12-04T13:20:27.9032226Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64 PASSED [0.8932s] [ 23%] 2025-12-04T13:20:27.9032375Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64 PASSED [0.0072s] [ 23%] 2025-12-04T13:20:27.9032545Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64 PASSED [0.0086s] [ 23%] 2025-12-04T13:20:27.9032765Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64 SKIPPED [0.0001s] (Expected: empty is not comparable) [ 23%] 2025-12-04T13:20:27.9032910Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64 PASSED [0.0064s] [ 23%] 2025-12-04T13:20:27.9033060Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64 PASSED [2.2130s] [ 23%] 2025-12-04T13:20:27.9033204Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64 PASSED [1.7907s] [ 23%] 2025-12-04T13:20:27.9033405Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64 PASSED [1.4683s] [ 23%] 2025-12-04T13:20:27.9033562Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64 PASSED [0.8550s] [ 23%] 2025-12-04T13:20:27.9033718Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64 PASSED [1.7569s] [ 24%] 2025-12-04T13:20:27.9033861Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64 PASSED [0.8464s] [ 24%] 2025-12-04T13:20:27.9034007Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64 PASSED [0.8357s] [ 24%] 2025-12-04T13:20:27.9034147Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64 PASSED [0.8305s] [ 24%] 2025-12-04T13:20:27.9034324Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64 PASSED [0.0085s] [ 24%] 2025-12-04T13:20:27.9034469Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_geometric_cuda_float64 XFAIL [0.0040s] [ 24%] 2025-12-04T13:20:27.9034612Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64 PASSED [0.8449s] [ 24%] 2025-12-04T13:20:27.9034764Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64 PASSED [0.8326s] [ 24%] 2025-12-04T13:20:27.9034910Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64 PASSED [0.8244s] [ 24%] 2025-12-04T13:20:27.9035066Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64 PASSED [0.1839s] [ 24%] 2025-12-04T13:20:27.9035228Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64 PASSED [0.8819s] [ 24%] 2025-12-04T13:20:27.9035378Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64 XFAIL [0.0036s] [ 24%] 2025-12-04T13:20:27.9035548Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64 PASSED [0.8514s] [ 24%] 2025-12-04T13:20:27.9035694Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logaddexp_cuda_float64 PASSED [0.0087s] [ 24%] 2025-12-04T13:20:27.9035840Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64 XFAIL [0.0031s] [ 25%] 2025-12-04T13:20:27.9035997Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64 PASSED [0.0061s] [ 25%] 2025-12-04T13:20:27.9036139Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64 PASSED [0.8392s] [ 25%] 2025-12-04T13:20:27.9036286Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64 PASSED [0.8349s] [ 25%] 2025-12-04T13:20:27.9036461Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64 PASSED [0.0146s] [ 25%] 2025-12-04T13:20:27.9036605Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64 PASSED [0.0062s] [ 25%] 2025-12-04T13:20:27.9036751Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64 PASSED [0.8363s] [ 25%] 2025-12-04T13:20:27.9036997Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64 SKIPPED [0.0002s] (Expected: dropout is not comparable) [ 25%] 2025-12-04T13:20:27.9037170Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64 PASSED [0.8317s] [ 25%] 2025-12-04T13:20:27.9037347Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64 PASSED [0.8421s] [ 25%] 2025-12-04T13:20:27.9037516Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64 PASSED [0.8484s] [ 25%] 2025-12-04T13:20:27.9037677Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64 PASSED [0.8284s] [ 25%] 2025-12-04T13:20:27.9037876Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [0.8380s] [ 25%] 2025-12-04T13:20:27.9038041Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64 PASSED [0.8384s] [ 26%] 2025-12-04T13:20:27.9038217Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64 PASSED [0.8279s] [ 26%] 2025-12-04T13:20:27.9038402Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64 PASSED [0.8450s] [ 26%] 2025-12-04T13:20:27.9038550Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64 PASSED [0.8524s] [ 26%] 2025-12-04T13:20:27.9038696Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rad2deg_cuda_float64 PASSED [0.8310s] [ 26%] 2025-12-04T13:20:27.9038835Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64 PASSED [0.8317s] [ 26%] 2025-12-04T13:20:27.9038971Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64 PASSED [0.8380s] [ 26%] 2025-12-04T13:20:27.9039166Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [0.8401s] [ 26%] 2025-12-04T13:20:27.9039341Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64 PASSED [0.0078s] [ 26%] 2025-12-04T13:20:27.9039498Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_copy_cuda_float64 PASSED [0.8409s] [ 26%] 2025-12-04T13:20:27.9039641Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64 PASSED [0.8461s] [ 26%] 2025-12-04T13:20:27.9039785Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64 PASSED [0.0095s] [ 26%] 2025-12-04T13:20:27.9039935Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64 PASSED [0.0078s] [ 26%] 2025-12-04T13:20:27.9040093Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_copy_cuda_float64 PASSED [0.0050s] [ 26%] 2025-12-04T13:20:27.9040245Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_copy_cuda_float64 PASSED [0.0065s] [ 27%] 2025-12-04T13:20:27.9040392Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64 PASSED [0.0066s] [ 27%] 2025-12-04T13:20:27.9040544Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64 PASSED [0.0082s] [ 27%] 2025-12-04T13:20:27.9040803Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_complex_cuda_float64 SKIPPED [0.0011s] (Operation not tested with tensors with negative bit.) [ 27%] 2025-12-04T13:20:27.9041024Z test_ops.py::TestMathBitsCUDA::test_neg_view__unsafe_masked_index_put_accumulate_cuda_float64 PASSED [0.8600s] [ 27%] 2025-12-04T13:20:27.9041160Z test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64 PASSED [0.8508s] [ 27%] 2025-12-04T13:20:27.9041302Z test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64 PASSED [0.8680s] [ 27%] 2025-12-04T13:20:27.9041429Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64 PASSED [0.8738s] [ 27%] 2025-12-04T13:20:27.9041581Z test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64 PASSED [0.8469s] [ 27%] 2025-12-04T13:20:27.9041721Z test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64 PASSED [0.8413s] [ 27%] 2025-12-04T13:20:27.9041941Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_copy_cuda_float64 SKIPPED [0.0002s] (Errors when storage_offset is included) [ 27%] 2025-12-04T13:20:27.9042077Z test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64 PASSED [0.8512s] [ 27%] 2025-12-04T13:20:27.9042226Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64 PASSED [0.8452s] [ 27%] 2025-12-04T13:20:27.9042361Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64 PASSED [0.8515s] [ 27%] 2025-12-04T13:20:27.9042535Z test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64 PASSED [0.8445s] [ 28%] 2025-12-04T13:20:27.9042687Z test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64 PASSED [0.8669s] [ 28%] 2025-12-04T13:20:27.9042798Z test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64 PASSED [0.8409s] [ 28%] 2025-12-04T13:20:27.9042905Z test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64 PASSED [0.8410s] [ 28%] 2025-12-04T13:20:27.9043013Z test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64 PASSED [0.8501s] [ 28%] 2025-12-04T13:20:27.9043124Z test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64 PASSED [0.8466s] [ 28%] 2025-12-04T13:20:27.9043245Z test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64 PASSED [0.8488s] [ 28%] 2025-12-04T13:20:27.9043419Z test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64 PASSED [0.8417s] [ 28%] 2025-12-04T13:20:27.9043524Z test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64 PASSED [0.8464s] [ 28%] 2025-12-04T13:20:27.9043626Z test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64 PASSED [0.8467s] [ 28%] 2025-12-04T13:20:27.9043732Z test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64 PASSED [0.8499s] [ 28%] 2025-12-04T13:20:27.9043861Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64 PASSED [0.8653s] [ 28%] 2025-12-04T13:20:27.9043989Z test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64 PASSED [0.8428s] [ 28%] 2025-12-04T13:20:27.9044097Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64 PASSED [0.8433s] [ 28%] 2025-12-04T13:20:27.9044220Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64 PASSED [0.8471s] [ 29%] 2025-12-04T13:20:27.9044335Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64 PASSED [0.8532s] [ 29%] 2025-12-04T13:20:27.9044442Z test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64 PASSED [0.8379s] [ 29%] 2025-12-04T13:20:27.9044582Z test_ops.py::TestMathBitsCUDA::test_neg_view_empty_strided_cuda_float64 SKIPPED [0.0002s] (Skipped!) [ 29%] 2025-12-04T13:20:27.9044684Z test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64 PASSED [0.8339s] [ 29%] 2025-12-04T13:20:27.9044790Z test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64 PASSED [0.8424s] [ 29%] 2025-12-04T13:20:27.9044901Z test_ops.py::TestMathBitsCUDA::test_neg_view_expand_copy_cuda_float64 PASSED [0.8429s] [ 29%] 2025-12-04T13:20:27.9045015Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64 PASSED [1.5128s] [ 29%] 2025-12-04T13:20:27.9045122Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64 PASSED [3.9410s] [ 29%] 2025-12-04T13:20:27.9045240Z test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64 PASSED [0.0264s] [ 29%] 2025-12-04T13:20:27.9045360Z test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64 PASSED [1.1706s] [ 29%] 2025-12-04T13:20:27.9045471Z test_ops.py::TestMathBitsCUDA::test_neg_view_geometric_cuda_float64 XFAIL [0.0112s] [ 29%] 2025-12-04T13:20:27.9045607Z test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_3d_cuda_float64 SKIPPED [0.0001s] (Skipped!) [ 29%] 2025-12-04T13:20:27.9045735Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64 PASSED [1.1006s] [ 30%] 2025-12-04T13:20:27.9045858Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_amin_cuda_float64 PASSED [1.0881s] [ 30%] 2025-12-04T13:20:27.9045967Z test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64 PASSED [1.0742s] [ 30%] 2025-12-04T13:20:27.9046069Z test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64 PASSED [1.0672s] [ 30%] 2025-12-04T13:20:27.9046173Z test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64 PASSED [1.0748s] [ 30%] 2025-12-04T13:20:27.9046281Z test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64 PASSED [1.0983s] [ 30%] 2025-12-04T13:20:27.9046422Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64 XFAIL [0.1189s] [ 30%] 2025-12-04T13:20:27.9046525Z test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64 PASSED [1.1101s] [ 30%] 2025-12-04T13:20:27.9046644Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64 PASSED [0.0067s] [ 30%] 2025-12-04T13:20:27.9046756Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64 PASSED [0.0776s] [ 30%] 2025-12-04T13:20:27.9046883Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64 PASSED [0.3648s] [ 30%] 2025-12-04T13:20:27.9047021Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64 PASSED [0.0297s] [ 30%] 2025-12-04T13:20:27.9047149Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64 PASSED [1.1258s] [ 30%] 2025-12-04T13:20:27.9047284Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64 PASSED [1.1630s] [ 30%] 2025-12-04T13:20:27.9047396Z test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64 XFAIL [0.0037s] [ 31%] 2025-12-04T13:20:27.9047511Z test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64 PASSED [0.0230s] [ 31%] 2025-12-04T13:20:27.9047624Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64 PASSED [0.0050s] [ 31%] 2025-12-04T13:20:27.9047740Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64 PASSED [0.0047s] [ 31%] 2025-12-04T13:20:27.9047858Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64 PASSED [0.0047s] [ 31%] 2025-12-04T13:20:27.9047981Z test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64 XFAIL [0.0030s] [ 31%] 2025-12-04T13:20:27.9048083Z test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64 PASSED [0.0047s] [ 31%] 2025-12-04T13:20:27.9048190Z test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64 PASSED [1.1371s] [ 31%] 2025-12-04T13:20:27.9048303Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64 PASSED [1.1383s] [ 31%] 2025-12-04T13:20:27.9048431Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64 PASSED [0.9480s] [ 31%] 2025-12-04T13:20:27.9048553Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64 PASSED [0.8981s] [ 31%] 2025-12-04T13:20:27.9048674Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64 PASSED [0.9317s] [ 31%] 2025-12-04T13:20:27.9048783Z test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64 PASSED [0.8562s] [ 31%] 2025-12-04T13:20:27.9048897Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64 PASSED [0.0156s] [ 31%] 2025-12-04T13:20:27.9049021Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64 PASSED [0.8589s] [ 32%] 2025-12-04T13:20:27.9049156Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64 PASSED [0.8562s] [ 32%] 2025-12-04T13:20:27.9049301Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64 PASSED [0.8528s] [ 32%] 2025-12-04T13:20:27.9049499Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64 SKIPPED [0.0002s] (Expected: new_empty_strided is not comparable) [ 32%] 2025-12-04T13:20:27.9049610Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64 PASSED [0.8288s] [ 32%] 2025-12-04T13:20:27.9049740Z test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64 PASSED [0.0238s] [ 32%] 2025-12-04T13:20:27.9049888Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64 PASSED [0.8485s] [ 32%] 2025-12-04T13:20:27.9050027Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64 PASSED [0.8657s] [ 32%] 2025-12-04T13:20:27.9050175Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64 PASSED [0.8675s] [ 32%] 2025-12-04T13:20:27.9050306Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64 PASSED [0.8517s] [ 32%] 2025-12-04T13:20:27.9050541Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64 SKIPPED [0.0017s] (Operation not tested with tensors with negative bit.) [ 32%] 2025-12-04T13:20:27.9050678Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64 PASSED [0.8437s] [ 32%] 2025-12-04T13:20:27.9050808Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64 PASSED [0.8574s] [ 32%] 2025-12-04T13:20:27.9050941Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64 PASSED [0.8364s] [ 32%] 2025-12-04T13:20:27.9051087Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64 PASSED [0.8522s] [ 33%] 2025-12-04T13:20:27.9051238Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64 PASSED [0.8573s] [ 33%] 2025-12-04T13:20:27.9051384Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64 PASSED [0.8661s] [ 33%] 2025-12-04T13:20:27.9051531Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64 PASSED [0.0252s] [ 33%] 2025-12-04T13:20:27.9051659Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64 PASSED [0.8411s] [ 33%] 2025-12-04T13:20:27.9051799Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64 PASSED [0.8463s] [ 33%] 2025-12-04T13:20:27.9051924Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64 PASSED [0.8486s] [ 33%] 2025-12-04T13:20:27.9052061Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64 PASSED [0.8449s] [ 33%] 2025-12-04T13:20:27.9052180Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64 PASSED [0.8487s] [ 33%] 2025-12-04T13:20:27.9052309Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64 PASSED [0.0083s] [ 33%] 2025-12-04T13:20:27.9052423Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64 PASSED [0.8638s] [ 33%] 2025-12-04T13:20:27.9052555Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64 PASSED [0.8435s] [ 33%] 2025-12-04T13:20:27.9052677Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64 PASSED [0.8373s] [ 33%] 2025-12-04T13:20:27.9052801Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64 PASSED [0.8366s] [ 34%] 2025-12-04T13:20:27.9052923Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64 PASSED [0.8675s] [ 34%] 2025-12-04T13:20:27.9053030Z test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64 PASSED [0.0104s] [ 34%] 2025-12-04T13:20:27.9053128Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64 PASSED [0.8849s] [ 34%] 2025-12-04T13:20:27.9053245Z test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64 PASSED [0.8387s] [ 34%] 2025-12-04T13:20:27.9053392Z test_ops.py::TestMathBitsCUDA::test_neg_view_permute_copy_cuda_float64 PASSED [0.8416s] [ 34%] 2025-12-04T13:20:27.9053519Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64 PASSED [0.8487s] [ 34%] 2025-12-04T13:20:27.9053617Z test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64 PASSED [0.0240s] [ 34%] 2025-12-04T13:20:27.9053743Z test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64 PASSED [0.8374s] [ 34%] 2025-12-04T13:20:27.9053840Z test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64 XFAIL [0.0044s] [ 34%] 2025-12-04T13:20:27.9053943Z test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64 PASSED [0.8405s] [ 34%] 2025-12-04T13:20:27.9054083Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64 SKIPPED [0.0002s] (Skipped!) [ 34%] 2025-12-04T13:20:27.9054222Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64 SKIPPED [0.0001s] (Skipped!) [ 34%] 2025-12-04T13:20:27.9054359Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64 SKIPPED [0.0001s] (Skipped!) [ 34%] 2025-12-04T13:20:27.9054465Z test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64 PASSED [0.8425s] [ 35%] 2025-12-04T13:20:27.9054562Z test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64 PASSED [0.8453s] [ 35%] 2025-12-04T13:20:27.9054663Z test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64 PASSED [0.9087s] [ 35%] 2025-12-04T13:20:27.9054774Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64 PASSED [0.8356s] [ 35%] 2025-12-04T13:20:27.9054894Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64 PASSED [1.0394s] [ 35%] 2025-12-04T13:20:27.9055032Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64 PASSED [0.0089s] [ 35%] 2025-12-04T13:20:27.9055180Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64 PASSED [0.0067s] [ 35%] 2025-12-04T13:20:27.9055319Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64 PASSED [0.0064s] [ 35%] 2025-12-04T13:20:27.9055426Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64 PASSED [0.8378s] [ 35%] 2025-12-04T13:20:27.9055575Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64 PASSED [0.3562s] [ 35%] 2025-12-04T13:20:27.9055705Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64 PASSED [0.8297s] [ 35%] 2025-12-04T13:20:27.9055832Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64 PASSED [0.8273s] [ 35%] 2025-12-04T13:20:27.9055945Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64 PASSED [0.8318s] [ 35%] 2025-12-04T13:20:27.9056064Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64 PASSED [0.8460s] [ 35%] 2025-12-04T13:20:27.9056166Z test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64 PASSED [0.8493s] [ 36%] 2025-12-04T13:20:27.9056278Z test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_copy_cuda_float64 PASSED [0.8453s] [ 36%] 2025-12-04T13:20:27.9056379Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64 PASSED [0.8501s] [ 36%] 2025-12-04T13:20:27.9056495Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64 PASSED [0.8366s] [ 36%] 2025-12-04T13:20:27.9056603Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64 PASSED [0.8407s] [ 36%] 2025-12-04T13:20:27.9056705Z test_ops.py::TestMathBitsCUDA::test_neg_view_t_copy_cuda_float64 PASSED [0.8389s] [ 36%] 2025-12-04T13:20:27.9056803Z test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64 PASSED [0.8489s] [ 36%] 2025-12-04T13:20:27.9056904Z test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64 PASSED [0.8570s] [ 36%] 2025-12-04T13:20:27.9057044Z test_ops.py::TestMathBitsCUDA::test_neg_view_torch_ops_aten__safe_softmax_default_cuda_float64 PASSED [0.8506s] [ 36%] 2025-12-04T13:20:27.9057168Z test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64 PASSED [0.0147s] [ 36%] 2025-12-04T13:20:27.9057274Z test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_copy_cuda_float64 PASSED [0.8444s] [ 36%] 2025-12-04T13:20:27.9057375Z test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64 PASSED [0.8516s] [ 36%] 2025-12-04T13:20:27.9057492Z test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64 PASSED [0.0127s] [ 36%] 2025-12-04T13:20:27.9057606Z test_ops.py::TestMathBitsCUDA::test_neg_view_unsafe_chunk_cuda_float64 PASSED [0.8348s] [ 36%] 2025-12-04T13:20:27.9057714Z test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64 PASSED [0.8535s] [ 37%] 2025-12-04T13:20:27.9057812Z test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64 PASSED [0.8460s] [ 37%] 2025-12-04T13:20:27.9057915Z test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64 PASSED [0.8454s] [ 37%] 2025-12-04T13:20:27.9058014Z test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64 PASSED [0.0155s] [ 37%] 2025-12-04T13:20:27.9058123Z test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32 PASSED [0.0352s] [ 37%] 2025-12-04T13:20:27.9058220Z test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64 PASSED [0.0107s] [ 37%] 2025-12-04T13:20:27.9058345Z test_ops.py::TestFakeTensorCUDA::test_fake__segment_reduce_lengths_cuda_float32 PASSED [0.0900s] [ 37%] 2025-12-04T13:20:27.9058467Z test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32 PASSED [0.0088s] [ 37%] 2025-12-04T13:20:27.9058570Z test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32 PASSED [0.0049s] [ 37%] 2025-12-04T13:20:27.9058668Z test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32 PASSED [0.0118s] [ 37%] 2025-12-04T13:20:27.9058772Z test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32 PASSED [0.8522s] [ 37%] 2025-12-04T13:20:27.9058887Z test_ops.py::TestFakeTensorCUDA::test_fake_alias_copy_cuda_float32 PASSED [0.8369s] [ 37%] 2025-12-04T13:20:27.9058989Z test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32 PASSED [0.8607s] [ 37%] 2025-12-04T13:20:27.9059091Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32 PASSED [0.8418s] [ 38%] 2025-12-04T13:20:27.9059217Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32 PASSED [0.8368s] [ 38%] 2025-12-04T13:20:27.9059330Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32 PASSED [0.8505s] [ 38%] 2025-12-04T13:20:27.9059431Z test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32 PASSED [0.8430s] [ 38%] 2025-12-04T13:20:27.9059548Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32 PASSED [0.8489s] [ 38%] 2025-12-04T13:20:27.9059671Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32 PASSED [0.4044s] [ 38%] 2025-12-04T13:20:27.9059783Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32 PASSED [0.0107s] [ 38%] 2025-12-04T13:20:27.9059922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32 PASSED [0.8446s] [ 38%] 2025-12-04T13:20:27.9060073Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__unsafe_masked_index_put_accumulate_cuda_float32 PASSED [0.8669s] [ 38%] 2025-12-04T13:20:27.9060185Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32 PASSED [0.8376s] [ 38%] 2025-12-04T13:20:27.9060314Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32 PASSED [0.8676s] [ 38%] 2025-12-04T13:20:27.9060422Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32 PASSED [0.8439s] [ 38%] 2025-12-04T13:20:27.9060539Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32 PASSED [0.8558s] [ 38%] 2025-12-04T13:20:27.9060644Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32 PASSED [0.8534s] [ 38%] 2025-12-04T13:20:27.9060774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_copy_cuda_float32 PASSED [0.8414s] [ 39%] 2025-12-04T13:20:27.9060905Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32 PASSED [0.8490s] [ 39%] 2025-12-04T13:20:27.9061022Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32 PASSED [0.8472s] [ 39%] 2025-12-04T13:20:27.9061141Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32 PASSED [0.8504s] [ 39%] 2025-12-04T13:20:27.9061263Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32 PASSED [0.8414s] [ 39%] 2025-12-04T13:20:27.9061373Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32 PASSED [0.8478s] [ 39%] 2025-12-04T13:20:27.9061498Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32 PASSED [0.0136s] [ 39%] 2025-12-04T13:20:27.9061608Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32 PASSED [0.8335s] [ 39%] 2025-12-04T13:20:27.9061744Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32 PASSED [0.8630s] [ 39%] 2025-12-04T13:20:27.9061857Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32 PASSED [0.8464s] [ 39%] 2025-12-04T13:20:27.9061989Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32 PASSED [0.0136s] [ 39%] 2025-12-04T13:20:27.9062099Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32 PASSED [0.0064s] [ 39%] 2025-12-04T13:20:27.9062220Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32 PASSED [0.8402s] [ 39%] 2025-12-04T13:20:27.9062329Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32 PASSED [0.8651s] [ 39%] 2025-12-04T13:20:27.9062448Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32 PASSED [0.0105s] [ 40%] 2025-12-04T13:20:27.9062558Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32 PASSED [0.8492s] [ 40%] 2025-12-04T13:20:27.9062691Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32 PASSED [0.8529s] [ 40%] 2025-12-04T13:20:27.9062809Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32 PASSED [0.8485s] [ 40%] 2025-12-04T13:20:27.9062934Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32 PASSED [0.8527s] [ 40%] 2025-12-04T13:20:27.9063056Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32 PASSED [0.8530s] [ 40%] 2025-12-04T13:20:27.9063165Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32 PASSED [0.8586s] [ 40%] 2025-12-04T13:20:27.9063321Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32 PASSED [0.0122s] [ 40%] 2025-12-04T13:20:27.9063454Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32 PASSED [0.8614s] [ 40%] 2025-12-04T13:20:27.9063572Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32 PASSED [0.0443s] [ 40%] 2025-12-04T13:20:27.9063695Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32 PASSED [1.3992s] [ 40%] 2025-12-04T13:20:27.9063812Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32 PASSED [0.0163s] [ 40%] 2025-12-04T13:20:27.9063922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64 PASSED [0.8478s] [ 40%] 2025-12-04T13:20:27.9064039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32 PASSED [0.8520s] [ 41%] 2025-12-04T13:20:27.9064153Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32 PASSED [0.8480s] [ 41%] 2025-12-04T13:20:27.9064267Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32 PASSED [0.8429s] [ 41%] 2025-12-04T13:20:27.9064375Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_item_cuda_float32 XFAIL [0.0055s] [ 41%] 2025-12-04T13:20:27.9064489Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32 PASSED [0.8705s] [ 41%] 2025-12-04T13:20:27.9064605Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32 PASSED [0.0479s] [ 41%] 2025-12-04T13:20:27.9064747Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32 PASSED [0.8540s] [ 41%] 2025-12-04T13:20:27.9064875Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32 PASSED [0.0373s] [ 41%] 2025-12-04T13:20:27.9065005Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32 PASSED [0.0823s] [ 41%] 2025-12-04T13:20:27.9065137Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32 PASSED [0.1200s] [ 41%] 2025-12-04T13:20:27.9065302Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32 SKIPPED [0.0012s] (Skip failing test) [ 41%] 2025-12-04T13:20:27.9065511Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32 SKIPPED [0.0006s] (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 41%] 2025-12-04T13:20:27.9065651Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32 PASSED [0.1150s] [ 41%] 2025-12-04T13:20:27.9065778Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32 PASSED [0.0321s] [ 41%] 2025-12-04T13:20:27.9065936Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32 SKIPPED [0.0013s] (Skip failing test) [ 42%] 2025-12-04T13:20:27.9066056Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32 PASSED [0.0332s] [ 42%] 2025-12-04T13:20:27.9066168Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32 PASSED [0.0051s] [ 42%] 2025-12-04T13:20:27.9066283Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32 PASSED [0.8492s] [ 42%] 2025-12-04T13:20:27.9066390Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32 PASSED [0.8473s] [ 42%] 2025-12-04T13:20:27.9066509Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_normal_cuda_float32 PASSED [0.8587s] [ 42%] 2025-12-04T13:20:27.9066647Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32 PASSED [0.0136s] [ 42%] 2025-12-04T13:20:27.9066794Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32 PASSED [0.1960s] [ 42%] 2025-12-04T13:20:27.9066928Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_tensor_overload_cuda_float32 PASSED [0.8680s] [ 42%] 2025-12-04T13:20:27.9067049Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32 PASSED [0.0956s] [ 42%] 2025-12-04T13:20:27.9067167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32 PASSED [0.0888s] [ 42%] 2025-12-04T13:20:27.9067286Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32 PASSED [0.0324s] [ 42%] 2025-12-04T13:20:27.9067413Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32 PASSED [0.1334s] [ 42%] 2025-12-04T13:20:27.9067534Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32 PASSED [0.0404s] [ 42%] 2025-12-04T13:20:27.9067651Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32 PASSED [0.8523s] [ 43%] 2025-12-04T13:20:27.9067799Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32 PASSED [0.8728s] [ 43%] 2025-12-04T13:20:27.9067934Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32 PASSED [0.8710s] [ 43%] 2025-12-04T13:20:27.9068068Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32 PASSED [0.8527s] [ 43%] 2025-12-04T13:20:27.9068182Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32 PASSED [0.8572s] [ 43%] 2025-12-04T13:20:27.9068289Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32 PASSED [0.8639s] [ 43%] 2025-12-04T13:20:27.9068432Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32 SKIPPED [0.0017s] (Skip failing test) [ 43%] 2025-12-04T13:20:27.9068573Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 43%] 2025-12-04T13:20:27.9068721Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32 PASSED [0.0549s] [ 43%] 2025-12-04T13:20:27.9068869Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [0.0096s] [ 43%] 2025-12-04T13:20:27.9069017Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [0.8692s] [ 43%] 2025-12-04T13:20:27.9069161Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32 PASSED [0.8698s] [ 43%] 2025-12-04T13:20:27.9069297Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32 PASSED [0.0168s] [ 43%] 2025-12-04T13:20:27.9069450Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [0.0485s] [ 43%] 2025-12-04T13:20:27.9069599Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32 PASSED [1.6206s] [ 44%] 2025-12-04T13:20:27.9069750Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [0.0359s] [ 44%] 2025-12-04T13:20:27.9069896Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32 PASSED [0.0326s] [ 44%] 2025-12-04T13:20:27.9070025Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32 PASSED [0.8776s] [ 44%] 2025-12-04T13:20:27.9070192Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [0.8754s] [ 44%] 2025-12-04T13:20:27.9070318Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32 PASSED [0.9033s] [ 44%] 2025-12-04T13:20:27.9070464Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32 PASSED [0.0335s] [ 44%] 2025-12-04T13:20:27.9070612Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32 PASSED [0.0676s] [ 44%] 2025-12-04T13:20:27.9070745Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32 PASSED [1.0721s] [ 44%] 2025-12-04T13:20:27.9070879Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32 PASSED [0.4488s] [ 44%] 2025-12-04T13:20:27.9071021Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [0.1015s] [ 44%] 2025-12-04T13:20:27.9071161Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32 PASSED [1.0681s] [ 44%] 2025-12-04T13:20:27.9071313Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [0.1340s] [ 44%] 2025-12-04T13:20:27.9071453Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32 PASSED [0.4354s] [ 45%] 2025-12-04T13:20:27.9071590Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32 PASSED [0.0363s] [ 45%] 2025-12-04T13:20:27.9071731Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32 PASSED [0.0111s] [ 45%] 2025-12-04T13:20:27.9071873Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32 PASSED [0.0125s] [ 45%] 2025-12-04T13:20:27.9072017Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32 PASSED [0.0063s] [ 45%] 2025-12-04T13:20:27.9072142Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32 PASSED [0.9119s] [ 45%] 2025-12-04T13:20:27.9072276Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rms_norm_cuda_float32 PASSED [0.9159s] [ 45%] 2025-12-04T13:20:27.9072400Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32 PASSED [0.9212s] [ 45%] 2025-12-04T13:20:27.9072535Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32 PASSED [0.0086s] [ 45%] 2025-12-04T13:20:27.9072665Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32 PASSED [0.9162s] [ 45%] 2025-12-04T13:20:27.9072781Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32 PASSED [0.9198s] [ 45%] 2025-12-04T13:20:27.9072895Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32 PASSED [0.9133s] [ 45%] 2025-12-04T13:20:27.9073014Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32 PASSED [0.9319s] [ 45%] 2025-12-04T13:20:27.9073129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32 PASSED [0.9239s] [ 45%] 2025-12-04T13:20:27.9073245Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32 PASSED [0.9131s] [ 46%] 2025-12-04T13:20:27.9073394Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32 PASSED [0.9239s] [ 46%] 2025-12-04T13:20:27.9073512Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32 PASSED [0.9128s] [ 46%] 2025-12-04T13:20:27.9073625Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32 PASSED [0.9155s] [ 46%] 2025-12-04T13:20:27.9073749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32 PASSED [0.9177s] [ 46%] 2025-12-04T13:20:27.9073860Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32 PASSED [0.9363s] [ 46%] 2025-12-04T13:20:27.9073976Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32 PASSED [0.9273s] [ 46%] 2025-12-04T13:20:27.9074084Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32 PASSED [0.9081s] [ 46%] 2025-12-04T13:20:27.9074223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32 PASSED [0.9334s] [ 46%] 2025-12-04T13:20:27.9074333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32 PASSED [0.9104s] [ 46%] 2025-12-04T13:20:27.9074489Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32 PASSED [0.9091s] [ 46%] 2025-12-04T13:20:27.9074632Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32 PASSED [0.9074s] [ 46%] 2025-12-04T13:20:27.9074749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32 PASSED [0.0147s] [ 46%] 2025-12-04T13:20:27.9074863Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32 PASSED [0.9153s] [ 46%] 2025-12-04T13:20:27.9074972Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32 PASSED [0.0125s] [ 47%] 2025-12-04T13:20:27.9075087Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32 PASSED [0.9155s] [ 47%] 2025-12-04T13:20:27.9075214Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32 PASSED [1.4034s] [ 47%] 2025-12-04T13:20:27.9075321Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32 PASSED [0.0403s] [ 47%] 2025-12-04T13:20:27.9075471Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32 SKIPPED [0.0013s] (Skip failing test) [ 47%] 2025-12-04T13:20:27.9075579Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32 PASSED [0.0131s] [ 47%] 2025-12-04T13:20:27.9075834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__efficient_attention_forward_cuda_float32 SKIPPED [0.0006s] (Efficient attention on ROCM doesn't support custom_mask_type==2) [ 47%] 2025-12-04T13:20:27.9075952Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64 PASSED [0.9296s] [ 47%] 2025-12-04T13:20:27.9076061Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32 PASSED [0.9218s] [ 47%] 2025-12-04T13:20:27.9076177Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64 PASSED [0.9261s] [ 47%] 2025-12-04T13:20:27.9076295Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_copy_cuda_float32 PASSED [0.9201s] [ 47%] 2025-12-04T13:20:27.9076409Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32 PASSED [0.0143s] [ 47%] 2025-12-04T13:20:27.9076544Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsafe_split_cuda_float32 PASSED [0.0050s] [ 47%] 2025-12-04T13:20:27.9076657Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32 PASSED [0.0086s] [ 47%] 2025-12-04T13:20:27.9076783Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32 PASSED [0.9113s] [ 48%] 2025-12-04T13:20:27.9076918Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64 PASSED [0.9114s] [ 48%] 2025-12-04T13:20:27.9077033Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32 PASSED [0.9252s] [ 48%] 2025-12-04T13:20:27.9077144Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32 PASSED [0.9119s] [ 48%] 2025-12-04T13:20:27.9077255Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32 PASSED [0.9134s] [ 48%] 2025-12-04T13:20:27.9077354Z test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64 PASSED [0.9316s] [ 48%] 2025-12-04T13:20:27.9077460Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64 PASSED [0.0122s] [ 48%] 2025-12-04T13:20:27.9077562Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64 PASSED [0.9263s] [ 48%] 2025-12-04T13:20:27.9077669Z test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32 PASSED [0.9228s] [ 48%] 2025-12-04T13:20:27.9077784Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32 PASSED [0.9085s] [ 48%] 2025-12-04T13:20:27.9077882Z test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32 PASSED [0.9174s] [ 48%] 2025-12-04T13:20:27.9077993Z test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32 PASSED [0.9104s] [ 48%] 2025-12-04T13:20:27.9078092Z test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32 PASSED [0.9213s] [ 48%] 2025-12-04T13:20:27.9078190Z test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32 PASSED [0.9100s] [ 49%] 2025-12-04T13:20:27.9078300Z test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32 PASSED [0.9216s] [ 49%] 2025-12-04T13:20:27.9078400Z test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32 PASSED [0.9177s] [ 49%] 2025-12-04T13:20:27.9078496Z test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32 PASSED [0.9126s] [ 49%] 2025-12-04T13:20:27.9078594Z test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32 PASSED [0.9249s] [ 49%] 2025-12-04T13:20:27.9078715Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32 PASSED [0.9161s] [ 49%] 2025-12-04T13:20:27.9078847Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32 PASSED [0.0491s] [ 49%] 2025-12-04T13:20:27.9078987Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32 PASSED [0.0929s] [ 49%] 2025-12-04T13:20:27.9079142Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__segment_reduce_lengths_cuda_float32 PASSED [0.2118s] [ 49%] 2025-12-04T13:20:27.9079270Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32 PASSED [0.0882s] [ 49%] 2025-12-04T13:20:27.9079424Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32 PASSED [0.0110s] [ 49%] 2025-12-04T13:20:27.9079551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32 PASSED [0.0218s] [ 49%] 2025-12-04T13:20:27.9079681Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32 PASSED [0.9434s] [ 49%] 2025-12-04T13:20:27.9079817Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32 PASSED [0.3864s] [ 49%] 2025-12-04T13:20:27.9079944Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32 PASSED [0.9131s] [ 50%] 2025-12-04T13:20:27.9080073Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32 PASSED [0.0599s] [ 50%] 2025-12-04T13:20:27.9080205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32 PASSED [0.2177s] [ 50%] 2025-12-04T13:20:27.9080370Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32 PASSED [0.1231s] [ 50%] 2025-12-04T13:20:27.9080493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32 PASSED [0.9278s] [ 50%] 2025-12-04T13:20:27.9080637Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32 PASSED [0.0401s] [ 50%] 2025-12-04T13:20:27.9080777Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32 PASSED [0.9439s] [ 50%] 2025-12-04T13:20:27.9080906Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32 PASSED [0.9355s] [ 50%] 2025-12-04T13:20:27.9081040Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_copy_cuda_float32 PASSED [0.9340s] [ 50%] 2025-12-04T13:20:27.9081168Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32 PASSED [0.9332s] [ 50%] 2025-12-04T13:20:27.9081298Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32 PASSED [0.0429s] [ 50%] 2025-12-04T13:20:27.9081431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32 PASSED [0.0805s] [ 50%] 2025-12-04T13:20:27.9081559Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32 PASSED [0.3442s] [ 50%] 2025-12-04T13:20:27.9081693Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32 PASSED [0.0606s] [ 50%] 2025-12-04T13:20:27.9081822Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32 PASSED [0.1776s] [ 51%] 2025-12-04T13:20:27.9081948Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32 PASSED [0.9176s] [ 51%] 2025-12-04T13:20:27.9082079Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32 PASSED [0.9348s] [ 51%] 2025-12-04T13:20:27.9082220Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32 PASSED [0.0581s] [ 51%] 2025-12-04T13:20:27.9082360Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32 PASSED [0.5432s] [ 51%] 2025-12-04T13:20:27.9082513Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 51%] 2025-12-04T13:20:27.9082653Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32 PASSED [0.2341s] [ 51%] 2025-12-04T13:20:27.9082818Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [0.2785s] [ 51%] 2025-12-04T13:20:27.9082983Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 51%] 2025-12-04T13:20:27.9083113Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32 PASSED [0.9232s] [ 51%] 2025-12-04T13:20:27.9083299Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32 PASSED [0.2434s] [ 51%] 2025-12-04T13:20:27.9083448Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32 SKIPPED [0.0002s] (Skipped!) [ 51%] 2025-12-04T13:20:27.9083592Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32 PASSED [0.0321s] [ 51%] 2025-12-04T13:20:27.9083728Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32 PASSED [0.2718s] [ 52%] 2025-12-04T13:20:27.9083874Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32 PASSED [1.0547s] [ 52%] 2025-12-04T13:20:27.9083999Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32 PASSED [0.0053s] [ 52%] 2025-12-04T13:20:27.9084132Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32 PASSED [0.0639s] [ 52%] 2025-12-04T13:20:27.9084276Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32 PASSED [0.0754s] [ 52%] 2025-12-04T13:20:27.9084409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32 PASSED [0.9862s] [ 52%] 2025-12-04T13:20:27.9084540Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32 PASSED [0.7732s] [ 52%] 2025-12-04T13:20:27.9084698Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32 PASSED [1.2311s] [ 52%] 2025-12-04T13:20:27.9084837Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32 PASSED [0.0235s] [ 52%] 2025-12-04T13:20:27.9084963Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32 PASSED [0.1469s] [ 52%] 2025-12-04T13:20:27.9085091Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32 PASSED [0.1316s] [ 52%] 2025-12-04T13:20:27.9085243Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [0.9727s] [ 52%] 2025-12-04T13:20:27.9085397Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32 PASSED [0.1839s] [ 52%] 2025-12-04T13:20:27.9085577Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [0.2542s] [ 52%] 2025-12-04T13:20:27.9085908Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32 MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x76c827801000 size: 768 2025-12-04T13:20:27.9086093Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x76c827801000 size: 768 2025-12-04T13:20:27.9086303Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x76c827801200 size: 1024 2025-12-04T13:20:27.9086487Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x76c827801200 size: 1024 2025-12-04T13:20:27.9086692Z MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 1200, provided ptr: 0x76c827801400 size: 1024 2025-12-04T13:20:27.9086885Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 1200, provided ptr: 0x76c827801400 size: 1024 2025-12-04T13:20:27.9086929Z PASSED [0.4439s] [ 53%] 2025-12-04T13:20:27.9087105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32 PASSED [0.1383s] [ 53%] 2025-12-04T13:20:27.9087272Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [0.0955s] [ 53%] 2025-12-04T13:20:27.9087429Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32 PASSED [0.2832s] [ 53%] 2025-12-04T13:20:27.9087579Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32 PASSED [0.0465s] [ 53%] 2025-12-04T13:20:27.9087723Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32 PASSED [0.0157s] [ 53%] 2025-12-04T13:20:27.9087872Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32 PASSED [0.0310s] [ 53%] 2025-12-04T13:20:27.9088027Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32 PASSED [0.0228s] [ 53%] 2025-12-04T13:20:27.9088187Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32 PASSED [0.0644s] [ 53%] 2025-12-04T13:20:27.9088333Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32 PASSED [0.9949s] [ 53%] 2025-12-04T13:20:27.9088502Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [0.0134s] [ 53%] 2025-12-04T13:20:27.9088662Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32 PASSED [1.0956s] [ 53%] 2025-12-04T13:20:27.9088805Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32 PASSED [0.2395s] [ 53%] 2025-12-04T13:20:27.9088969Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32 PASSED [0.0323s] [ 53%] 2025-12-04T13:20:27.9089121Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [0.0241s] [ 54%] 2025-12-04T13:20:27.9089252Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32 PASSED [0.0035s] [ 54%] 2025-12-04T13:20:27.9089380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32 PASSED [0.0789s] [ 54%] 2025-12-04T13:20:27.9089511Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32 PASSED [1.4379s] [ 54%] 2025-12-04T13:20:27.9089642Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32 PASSED [0.0048s] [ 54%] 2025-12-04T13:20:27.9089765Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32 PASSED [0.9799s] [ 54%] 2025-12-04T13:20:27.9089901Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32 PASSED [0.9625s] [ 54%] 2025-12-04T13:20:27.9090032Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32 PASSED [0.0508s] [ 54%] 2025-12-04T13:20:27.9090161Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32 PASSED [0.0176s] [ 54%] 2025-12-04T13:20:27.9090307Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32 PASSED [0.0242s] [ 54%] 2025-12-04T13:20:27.9090454Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32 PASSED [0.1350s] [ 54%] 2025-12-04T13:20:27.9090577Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32 PASSED [0.9894s] [ 54%] 2025-12-04T13:20:27.9090706Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32 PASSED [0.9996s] [ 54%] 2025-12-04T13:20:27.9090848Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32 PASSED [0.0392s] [ 54%] 2025-12-04T13:20:27.9090999Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32 PASSED [0.0164s] [ 55%] 2025-12-04T13:20:27.9091127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32 PASSED [0.9804s] [ 55%] 2025-12-04T13:20:27.9091255Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32 PASSED [0.0286s] [ 55%] 2025-12-04T13:20:27.9091382Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32 PASSED [0.8073s] [ 55%] 2025-12-04T13:20:27.9091505Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32 PASSED [8.7997s] [ 55%] 2025-12-04T13:20:27.9091780Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_torch_ops_aten__efficient_attention_forward_cuda_float32 SKIPPED [0.0011s] (Efficient attention on ROCM doesn't support custom_mask_type==2) [ 55%] 2025-12-04T13:20:27.9091906Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32 PASSED [0.0081s] [ 55%] 2025-12-04T13:20:27.9092034Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32 PASSED [0.0886s] [ 55%] 2025-12-04T13:20:27.9092168Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32 PASSED [0.0100s] [ 55%] 2025-12-04T13:20:27.9092298Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32 PASSED [1.0016s] [ 55%] 2025-12-04T13:20:27.9092436Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32 PASSED [1.0032s] [ 55%] 2025-12-04T13:20:27.9092570Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32 PASSED [0.0486s] [ 55%] 2025-12-04T13:20:27.9092720Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__batch_norm_with_update_cuda_float32 PASSED [0.2309s] [ 55%] 2025-12-04T13:20:27.9092868Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32 PASSED [0.0842s] [ 56%] 2025-12-04T13:20:27.9093014Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32 PASSED [0.0263s] [ 56%] 2025-12-04T13:20:27.9093153Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32 PASSED [0.0102s] [ 56%] 2025-12-04T13:20:27.9093325Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32 PASSED [0.0130s] [ 56%] 2025-12-04T13:20:27.9093459Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32 PASSED [0.0327s] [ 56%] 2025-12-04T13:20:27.9093589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32 PASSED [0.9853s] [ 56%] 2025-12-04T13:20:27.9093730Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32 PASSED [0.2638s] [ 56%] 2025-12-04T13:20:27.9093862Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32 PASSED [1.0033s] [ 56%] 2025-12-04T13:20:27.9093995Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32 PASSED [0.0538s] [ 56%] 2025-12-04T13:20:27.9094127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32 PASSED [0.0482s] [ 56%] 2025-12-04T13:20:27.9094291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32 PASSED [0.1244s] [ 56%] 2025-12-04T13:20:27.9094425Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32 PASSED [0.0054s] [ 56%] 2025-12-04T13:20:27.9094554Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32 PASSED [0.0129s] [ 56%] 2025-12-04T13:20:27.9094691Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32 PASSED [0.0208s] [ 56%] 2025-12-04T13:20:27.9094816Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32 PASSED [0.9905s] [ 57%] 2025-12-04T13:20:27.9094968Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32 PASSED [1.0103s] [ 57%] 2025-12-04T13:20:27.9095106Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_copy_cuda_float32 PASSED [1.0001s] [ 57%] 2025-12-04T13:20:27.9095240Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32 PASSED [0.0363s] [ 57%] 2025-12-04T13:20:27.9095373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32 PASSED [0.4146s] [ 57%] 2025-12-04T13:20:27.9095508Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32 PASSED [0.0317s] [ 57%] 2025-12-04T13:20:27.9095634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32 PASSED [0.9940s] [ 57%] 2025-12-04T13:20:27.9095767Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32 PASSED [0.9819s] [ 57%] 2025-12-04T13:20:27.9095897Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32 PASSED [0.0209s] [ 57%] 2025-12-04T13:20:27.9096025Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32 PASSED [0.9907s] [ 57%] 2025-12-04T13:20:27.9096152Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32 PASSED [0.0641s] [ 57%] 2025-12-04T13:20:27.9096295Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32 PASSED [0.0044s] [ 57%] 2025-12-04T13:20:27.9096430Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32 PASSED [0.0125s] [ 57%] 2025-12-04T13:20:27.9096566Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32 PASSED [0.9917s] [ 57%] 2025-12-04T13:20:27.9096714Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32 PASSED [1.0080s] [ 58%] 2025-12-04T13:20:27.9096845Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32 PASSED [1.0058s] [ 58%] 2025-12-04T13:20:27.9097006Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [0.2298s] [ 58%] 2025-12-04T13:20:27.9097147Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32 PASSED [0.8765s] [ 58%] 2025-12-04T13:20:27.9097301Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32 PASSED [0.3269s] [ 58%] 2025-12-04T13:20:27.9097453Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32 PASSED [2.6625s] [ 58%] 2025-12-04T13:20:27.9097594Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32 PASSED [0.1898s] [ 58%] 2025-12-04T13:20:27.9097727Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32 PASSED [0.0763s] [ 58%] 2025-12-04T13:20:27.9097860Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32 PASSED [0.8670s] [ 58%] 2025-12-04T13:20:27.9097994Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32 PASSED [0.2161s] [ 58%] 2025-12-04T13:20:27.9098130Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32 PASSED [0.8072s] [ 58%] 2025-12-04T13:20:27.9098292Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32 PASSED [1.2027s] [ 58%] 2025-12-04T13:20:27.9098429Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32 PASSED [4.3687s] [ 58%] 2025-12-04T13:20:27.9098574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32 PASSED [0.2047s] [ 58%] 2025-12-04T13:20:27.9098709Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32 PASSED [1.4875s] [ 59%] 2025-12-04T13:20:27.9098852Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32 PASSED [0.1257s] [ 59%] 2025-12-04T13:20:27.9098988Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32 PASSED [0.0180s] [ 59%] 2025-12-04T13:20:27.9099143Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32 PASSED [1.0337s] [ 59%] 2025-12-04T13:20:27.9099274Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32 PASSED [0.0719s] [ 59%] 2025-12-04T13:20:27.9099405Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32 PASSED [0.0495s] [ 59%] 2025-12-04T13:20:27.9099557Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32 PASSED [0.0712s] [ 59%] 2025-12-04T13:20:27.9099708Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32 PASSED [0.0147s] [ 59%] 2025-12-04T13:20:27.9099839Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32 PASSED [0.0685s] [ 59%] 2025-12-04T13:20:27.9099967Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32 PASSED [0.0135s] [ 59%] 2025-12-04T13:20:27.9100095Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32 PASSED [0.0073s] [ 59%] 2025-12-04T13:20:27.9100241Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32 PASSED [0.0308s] [ 59%] 2025-12-04T13:20:27.9100368Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32 PASSED [0.0074s] [ 59%] 2025-12-04T13:20:27.9100518Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [0.0498s] [ 60%] 2025-12-04T13:20:27.9100683Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [0.0487s] [ 60%] 2025-12-04T13:20:27.9100816Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32 PASSED [0.0489s] [ 60%] 2025-12-04T13:20:27.9100948Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32 PASSED [0.1261s] [ 60%] 2025-12-04T13:20:27.9101116Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [0.0429s] [ 60%] 2025-12-04T13:20:27.9101277Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [0.0147s] [ 60%] 2025-12-04T13:20:27.9101458Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [0.2249s] [ 60%] 2025-12-04T13:20:27.9101607Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32 PASSED [0.0138s] [ 60%] 2025-12-04T13:20:27.9101754Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32 PASSED [0.0732s] [ 60%] 2025-12-04T13:20:27.9101915Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32 PASSED [0.2557s] [ 60%] 2025-12-04T13:20:27.9102062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32 PASSED [0.0336s] [ 60%] 2025-12-04T13:20:27.9102224Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32 PASSED [1.0235s] [ 60%] 2025-12-04T13:20:27.9102389Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [6.2826s] [ 60%] 2025-12-04T13:20:27.9102557Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32 PASSED [0.4332s] [ 60%] 2025-12-04T13:20:27.9102713Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32 PASSED [1.0492s] [ 61%] 2025-12-04T13:20:27.9102888Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32 PASSED [0.1990s] [ 61%] 2025-12-04T13:20:27.9103044Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32 PASSED [2.1044s] [ 61%] 2025-12-04T13:20:27.9103205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [0.1508s] [ 61%] 2025-12-04T13:20:27.9103413Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32 PASSED [0.0227s] [ 61%] 2025-12-04T13:20:27.9103572Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [0.0107s] [ 61%] 2025-12-04T13:20:27.9103722Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32 PASSED [0.0242s] [ 61%] 2025-12-04T13:20:27.9103873Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32 PASSED [0.0136s] [ 61%] 2025-12-04T13:20:27.9104040Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32 PASSED [0.1807s] [ 61%] 2025-12-04T13:20:27.9104181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_copy_cuda_float32 PASSED [1.0145s] [ 61%] 2025-12-04T13:20:27.9104332Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32 PASSED [1.0275s] [ 61%] 2025-12-04T13:20:27.9104464Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32 PASSED [0.0938s] [ 61%] 2025-12-04T13:20:27.9104615Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [0.0239s] [ 61%] 2025-12-04T13:20:27.9104787Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32 PASSED [0.0235s] [ 61%] 2025-12-04T13:20:27.9104916Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32 PASSED [0.0094s] [ 62%] 2025-12-04T13:20:27.9105047Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32 PASSED [0.0045s] [ 62%] 2025-12-04T13:20:27.9105179Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32 PASSED [0.0459s] [ 62%] 2025-12-04T13:20:27.9105314Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32 PASSED [1.0319s] [ 62%] 2025-12-04T13:20:27.9105455Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32 PASSED [1.0182s] [ 62%] 2025-12-04T13:20:27.9105604Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32 PASSED [0.1362s] [ 62%] 2025-12-04T13:20:27.9105732Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32 PASSED [0.0262s] [ 62%] 2025-12-04T13:20:27.9105863Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32 PASSED [0.0052s] [ 62%] 2025-12-04T13:20:27.9105991Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32 PASSED [0.0105s] [ 62%] 2025-12-04T13:20:27.9106158Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 62%] 2025-12-04T13:20:27.9106301Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32 PASSED [0.0165s] [ 62%] 2025-12-04T13:20:27.9106431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32 PASSED [0.0882s] [ 62%] 2025-12-04T13:20:27.9106556Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32 PASSED [0.0294s] [ 62%] 2025-12-04T13:20:27.9106689Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_copy_cuda_float32 PASSED [0.0072s] [ 63%] 2025-12-04T13:20:27.9106824Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32 PASSED [0.0415s] [ 63%] 2025-12-04T13:20:27.9106962Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32 PASSED [0.0192s] [ 63%] 2025-12-04T13:20:27.9107092Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32 PASSED [1.0474s] [ 63%] 2025-12-04T13:20:27.9107223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32 PASSED [0.0650s] [ 63%] 2025-12-04T13:20:27.9107362Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsafe_chunk_cuda_float32 PASSED [0.0128s] [ 63%] 2025-12-04T13:20:27.9107500Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsafe_split_cuda_float32 PASSED [1.0260s] [ 63%] 2025-12-04T13:20:27.9107634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32 PASSED [0.0898s] [ 63%] 2025-12-04T13:20:27.9107778Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32 PASSED [0.0141s] [ 63%] 2025-12-04T13:20:27.9107919Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32 PASSED [0.0094s] [ 63%] 2025-12-04T13:20:27.9108046Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32 PASSED [1.0221s] [ 63%] 2025-12-04T13:20:27.9108202Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32 PASSED [1.0222s] [ 63%] 2025-12-04T13:20:27.9108332Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32 PASSED [1.0341s] [ 63%] 2025-12-04T13:20:27.9108438Z test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32 PASSED [1.0089s] [ 63%] 2025-12-04T13:20:27.9108536Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32 PASSED [1.0361s] [ 64%] 2025-12-04T13:20:27.9108656Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32 PASSED [1.0091s] [ 64%] 2025-12-04T13:20:27.9108757Z test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32 PASSED [1.0269s] [ 64%] 2025-12-04T13:20:27.9108869Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32 PASSED [1.0262s] [ 64%] 2025-12-04T13:20:27.9108965Z test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32 PASSED [0.1529s] [ 64%] 2025-12-04T13:20:27.9109065Z test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32 PASSED [0.0057s] [ 64%] 2025-12-04T13:20:27.9109166Z test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32 PASSED [0.0060s] [ 64%] 2025-12-04T13:20:27.9109263Z test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32 PASSED [0.0301s] [ 64%] 2025-12-04T13:20:27.9109368Z test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32 PASSED [0.0081s] [ 64%] 2025-12-04T13:20:27.9109475Z test_ops.py::TestFakeTensorCUDA::test_fake_expand_copy_cuda_float32 PASSED [0.0086s] [ 64%] 2025-12-04T13:20:27.9109576Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32 PASSED [1.0294s] [ 64%] 2025-12-04T13:20:27.9109675Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32 PASSED [1.0162s] [ 64%] 2025-12-04T13:20:27.9109777Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32 PASSED [1.0338s] [ 64%] 2025-12-04T13:20:27.9109877Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32 PASSED [1.0271s] [ 64%] 2025-12-04T13:20:27.9109992Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32 PASSED [1.0376s] [ 65%] 2025-12-04T13:20:27.9110092Z test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32 PASSED [1.0113s] [ 65%] 2025-12-04T13:20:27.9110188Z test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64 PASSED [0.0145s] [ 65%] 2025-12-04T13:20:27.9110283Z test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32 PASSED [0.0104s] [ 65%] 2025-12-04T13:20:27.9110388Z test_ops.py::TestFakeTensorCUDA::test_fake_geometric_cuda_float32 PASSED [0.0066s] [ 65%] 2025-12-04T13:20:27.9110481Z test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32 PASSED [0.0101s] [ 65%] 2025-12-04T13:20:27.9110590Z test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32 PASSED [0.0081s] [ 65%] 2025-12-04T13:20:27.9110691Z test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32 PASSED [1.0107s] [ 65%] 2025-12-04T13:20:27.9110807Z test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_mean_cuda_float32 PASSED [1.0402s] [ 65%] 2025-12-04T13:20:27.9110908Z test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32 PASSED [0.0566s] [ 65%] 2025-12-04T13:20:27.9111006Z test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32 PASSED [1.0351s] [ 65%] 2025-12-04T13:20:27.9111102Z test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32 PASSED [1.0157s] [ 65%] 2025-12-04T13:20:27.9111204Z test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32 PASSED [1.0299s] [ 65%] 2025-12-04T13:20:27.9111333Z test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64 SKIPPED [0.0018s] (Skip failing test) [ 65%] 2025-12-04T13:20:27.9111435Z test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32 PASSED [0.0117s] [ 66%] 2025-12-04T13:20:27.9111533Z test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32 PASSED [1.0152s] [ 66%] 2025-12-04T13:20:27.9111749Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32 SKIPPED [0.0011s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 66%] 2025-12-04T13:20:27.9111876Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32 PASSED [0.0949s] [ 66%] 2025-12-04T13:20:27.9111988Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32 PASSED [0.0122s] [ 66%] 2025-12-04T13:20:27.9112142Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32 SKIPPED [0.0012s] (Skip failing test) [ 66%] 2025-12-04T13:20:27.9112259Z test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32 PASSED [1.0331s] [ 66%] 2025-12-04T13:20:27.9112369Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32 PASSED [0.0135s] [ 66%] 2025-12-04T13:20:27.9112466Z test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32 PASSED [1.0212s] [ 66%] 2025-12-04T13:20:27.9112575Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32 PASSED [0.0901s] [ 66%] 2025-12-04T13:20:27.9112686Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32 PASSED [1.0528s] [ 66%] 2025-12-04T13:20:27.9112801Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32 PASSED [0.0528s] [ 66%] 2025-12-04T13:20:27.9112905Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32 PASSED [0.1285s] [ 66%] 2025-12-04T13:20:27.9113016Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32 PASSED [0.0369s] [ 67%] 2025-12-04T13:20:27.9113116Z test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32 PASSED [0.0102s] [ 67%] 2025-12-04T13:20:27.9113219Z test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32 PASSED [0.0099s] [ 67%] 2025-12-04T13:20:27.9113394Z test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32 SKIPPED [0.0011s] (Skip failing test) [ 67%] 2025-12-04T13:20:27.9113498Z test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32 PASSED [0.0119s] [ 67%] 2025-12-04T13:20:27.9113596Z test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32 PASSED [0.0082s] [ 67%] 2025-12-04T13:20:27.9113713Z test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32 PASSED [0.0081s] [ 67%] 2025-12-04T13:20:27.9113814Z test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32 PASSED [0.0126s] [ 67%] 2025-12-04T13:20:27.9113951Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [0.0097s] [ 67%] 2025-12-04T13:20:27.9114084Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [1.0307s] [ 67%] 2025-12-04T13:20:27.9114220Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [1.0400s] [ 67%] 2025-12-04T13:20:27.9114544Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32 MIOpen(HIP): Warning [IsEnoughWorkspace] [GetSolutionsFallback AI] Solver , workspace required: 2400, provided ptr: 0x76c824600c00 size: 1024 2025-12-04T13:20:27.9114732Z MIOpen(HIP): Warning [IsEnoughWorkspace] [EvaluateInvokers] Solver , workspace required: 2400, provided ptr: 0x76c824600c00 size: 1024 2025-12-04T13:20:27.9114778Z PASSED [1.0813s] [ 67%] 2025-12-04T13:20:27.9114909Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32 PASSED [0.0202s] [ 67%] 2025-12-04T13:20:27.9115050Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [0.0328s] [ 67%] 2025-12-04T13:20:27.9115178Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32 PASSED [1.1028s] [ 68%] 2025-12-04T13:20:27.9115294Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32 PASSED [1.0360s] [ 68%] 2025-12-04T13:20:27.9115446Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [1.0394s] [ 68%] 2025-12-04T13:20:27.9115586Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [0.0674s] [ 68%] 2025-12-04T13:20:27.9115708Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32 PASSED [0.0318s] [ 68%] 2025-12-04T13:20:27.9115845Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32 PASSED [0.0131s] [ 68%] 2025-12-04T13:20:27.9115967Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32 PASSED [0.0085s] [ 68%] 2025-12-04T13:20:27.9116089Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32 PASSED [1.0393s] [ 68%] 2025-12-04T13:20:27.9116223Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32 PASSED [1.0295s] [ 68%] 2025-12-04T13:20:27.9116354Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32 PASSED [1.0404s] [ 68%] 2025-12-04T13:20:27.9116487Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32 PASSED [0.9866s] [ 68%] 2025-12-04T13:20:27.9116621Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32 PASSED [0.0270s] [ 68%] 2025-12-04T13:20:27.9116754Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [0.0891s] [ 68%] 2025-12-04T13:20:27.9116878Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32 PASSED [0.0167s] [ 68%] 2025-12-04T13:20:27.9117005Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32 PASSED [0.0106s] [ 69%] 2025-12-04T13:20:27.9117132Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32 PASSED [0.0059s] [ 69%] 2025-12-04T13:20:27.9117251Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rms_norm_cuda_float32 PASSED [0.0127s] [ 69%] 2025-12-04T13:20:27.9117379Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32 PASSED [0.0138s] [ 69%] 2025-12-04T13:20:27.9117482Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32 PASSED [0.0053s] [ 69%] 2025-12-04T13:20:27.9117580Z test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32 PASSED [1.0207s] [ 69%] 2025-12-04T13:20:27.9117689Z test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32 PASSED [0.0391s] [ 69%] 2025-12-04T13:20:27.9117790Z test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32 PASSED [1.0395s] [ 69%] 2025-12-04T13:20:27.9117891Z test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32 PASSED [1.0360s] [ 69%] 2025-12-04T13:20:27.9117987Z test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32 PASSED [1.0370s] [ 69%] 2025-12-04T13:20:27.9118087Z test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32 PASSED [1.0323s] [ 69%] 2025-12-04T13:20:27.9118218Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32 PASSED [1.0408s] [ 69%] 2025-12-04T13:20:27.9118340Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32 PASSED [1.0425s] [ 69%] 2025-12-04T13:20:27.9118457Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32 PASSED [1.0337s] [ 69%] 2025-12-04T13:20:27.9118558Z test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32 PASSED [1.0211s] [ 70%] 2025-12-04T13:20:27.9118654Z test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32 PASSED [1.0633s] [ 70%] 2025-12-04T13:20:27.9118770Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32 PASSED [1.0438s] [ 70%] 2025-12-04T13:20:27.9118898Z test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32 PASSED [0.2909s] [ 70%] 2025-12-04T13:20:27.9119008Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32 PASSED [1.0296s] [ 70%] 2025-12-04T13:20:27.9119137Z test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32 PASSED [0.0142s] [ 70%] 2025-12-04T13:20:27.9119265Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32 PASSED [1.0310s] [ 70%] 2025-12-04T13:20:27.9119409Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32 PASSED [0.0123s] [ 70%] 2025-12-04T13:20:27.9119518Z test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32 PASSED [1.0207s] [ 70%] 2025-12-04T13:20:27.9119619Z test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32 PASSED [1.0343s] [ 70%] 2025-12-04T13:20:27.9119718Z test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32 PASSED [1.0414s] [ 70%] 2025-12-04T13:20:27.9119817Z test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32 PASSED [1.0509s] [ 70%] 2025-12-04T13:20:27.9119926Z test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32 PASSED [0.2643s] [ 70%] 2025-12-04T13:20:27.9120025Z test_ops.py::TestFakeTensorCUDA::test_fake_t_copy_cuda_float32 PASSED [1.0308s] [ 71%] 2025-12-04T13:20:27.9120121Z test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32 PASSED [1.0168s] [ 71%] 2025-12-04T13:20:27.9120220Z test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32 PASSED [1.0350s] [ 71%] 2025-12-04T13:20:27.9120324Z test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32 PASSED [1.0252s] [ 71%] 2025-12-04T13:20:27.9120427Z test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32 PASSED [1.0219s] [ 71%] 2025-12-04T13:20:27.9120525Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32 PASSED [1.0198s] [ 71%] 2025-12-04T13:20:27.9120624Z test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32 PASSED [1.0189s] [ 71%] 2025-12-04T13:20:27.9120718Z test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32 PASSED [1.0241s] [ 71%] 2025-12-04T13:20:27.9120818Z test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32 PASSED [1.0199s] [ 71%] 2025-12-04T13:20:27.9120921Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32 PASSED [1.0226s] [ 71%] 2025-12-04T13:20:27.9121042Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32 PASSED [1.0420s] [ 71%] 2025-12-04T13:20:27.9121153Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64 PASSED [0.0140s] [ 71%] 2025-12-04T13:20:27.9121276Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64 PASSED [0.0118s] [ 71%] 2025-12-04T13:20:27.9121409Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32 PASSED [1.0476s] [ 71%] 2025-12-04T13:20:27.9121539Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__unsafe_masked_index_cuda_float32 PASSED [1.0560s] [ 72%] 2025-12-04T13:20:27.9121672Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32 PASSED [1.0253s] [ 72%] 2025-12-04T13:20:27.9121788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32 PASSED [1.0304s] [ 72%] 2025-12-04T13:20:27.9121896Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32 PASSED [1.0411s] [ 72%] 2025-12-04T13:20:27.9122021Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32 PASSED [1.0445s] [ 72%] 2025-12-04T13:20:27.9122132Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32 PASSED [1.0399s] [ 72%] 2025-12-04T13:20:27.9122247Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32 PASSED [1.0354s] [ 72%] 2025-12-04T13:20:27.9122365Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32 PASSED [1.0328s] [ 72%] 2025-12-04T13:20:27.9122475Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64 PASSED [1.0295s] [ 72%] 2025-12-04T13:20:27.9122602Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64 PASSED [0.0138s] [ 72%] 2025-12-04T13:20:27.9122710Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32 PASSED [0.0094s] [ 72%] 2025-12-04T13:20:27.9122838Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32 PASSED [1.0459s] [ 72%] 2025-12-04T13:20:27.9122945Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32 PASSED [1.0319s] [ 72%] 2025-12-04T13:20:27.9123057Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32 PASSED [1.0267s] [ 72%] 2025-12-04T13:20:27.9123179Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32 PASSED [1.0329s] [ 73%] 2025-12-04T13:20:27.9123332Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32 PASSED [0.9974s] [ 73%] 2025-12-04T13:20:27.9123466Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32 SKIPPED [0.0017s] (Skip failing test) [ 73%] 2025-12-04T13:20:27.9123579Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32 PASSED [0.0065s] [ 73%] 2025-12-04T13:20:27.9123707Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32 PASSED [0.9729s] [ 73%] 2025-12-04T13:20:27.9123835Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32 PASSED [0.9705s] [ 73%] 2025-12-04T13:20:27.9123943Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32 PASSED [0.2618s] [ 73%] 2025-12-04T13:20:27.9124057Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32 PASSED [0.9514s] [ 73%] 2025-12-04T13:20:27.9124174Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32 PASSED [0.9586s] [ 73%] 2025-12-04T13:20:27.9124287Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32 PASSED [0.9656s] [ 73%] 2025-12-04T13:20:27.9124393Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32 PASSED [0.0488s] [ 73%] 2025-12-04T13:20:27.9124515Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32 PASSED [0.9611s] [ 73%] 2025-12-04T13:20:27.9124629Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32 PASSED [0.9740s] [ 73%] 2025-12-04T13:20:27.9124740Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32 PASSED [0.9535s] [ 73%] 2025-12-04T13:20:27.9124846Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64 PASSED [0.0135s] [ 74%] 2025-12-04T13:20:27.9124958Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32 PASSED [0.1990s] [ 74%] 2025-12-04T13:20:27.9125114Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_3d_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 74%] 2025-12-04T13:20:27.9125225Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32 PASSED [0.0087s] [ 74%] 2025-12-04T13:20:27.9125336Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32 PASSED [0.0604s] [ 74%] 2025-12-04T13:20:27.9125440Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32 PASSED [0.9497s] [ 74%] 2025-12-04T13:20:27.9125556Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32 PASSED [0.0135s] [ 74%] 2025-12-04T13:20:27.9125681Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32 PASSED [0.9570s] [ 74%] 2025-12-04T13:20:27.9125868Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32 SKIPPED [0.0017s] (Skip failing test) [ 74%] 2025-12-04T13:20:27.9125981Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32 PASSED [0.0121s] [ 74%] 2025-12-04T13:20:27.9126092Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32 PASSED [0.0119s] [ 74%] 2025-12-04T13:20:27.9126214Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32 PASSED [0.0195s] [ 74%] 2025-12-04T13:20:27.9126339Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_diagonal_cuda_float32 PASSED [0.0131s] [ 74%] 2025-12-04T13:20:27.9126455Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32 PASSED [0.0114s] [ 75%] 2025-12-04T13:20:27.9126575Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32 PASSED [0.0109s] [ 75%] 2025-12-04T13:20:27.9126801Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32 SKIPPED [0.0007s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 75%] 2025-12-04T13:20:27.9126932Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32 PASSED [0.0065s] [ 75%] 2025-12-04T13:20:27.9127077Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32 PASSED [0.0372s] [ 75%] 2025-12-04T13:20:27.9127237Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32 SKIPPED [0.0013s] (Skip failing test) [ 75%] 2025-12-04T13:20:27.9127366Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32 PASSED [0.1127s] [ 75%] 2025-12-04T13:20:27.9127533Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED [0.0015s] (Skip failing test) [ 75%] 2025-12-04T13:20:27.9127664Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32 PASSED [0.1454s] [ 75%] 2025-12-04T13:20:27.9127788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32 PASSED [0.0113s] [ 75%] 2025-12-04T13:20:27.9127948Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32 SKIPPED [0.0013s] (Skip failing test) [ 75%] 2025-12-04T13:20:27.9128065Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32 PASSED [0.0172s] [ 75%] 2025-12-04T13:20:27.9128186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32 PASSED [0.9630s] [ 75%] 2025-12-04T13:20:27.9128302Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32 PASSED [0.9647s] [ 75%] 2025-12-04T13:20:27.9128438Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_tensor_overload_cuda_float32 PASSED [0.8344s] [ 76%] 2025-12-04T13:20:27.9128579Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32 SKIPPED [0.0014s] (Skip failing test) [ 76%] 2025-12-04T13:20:27.9128698Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32 PASSED [0.0423s] [ 76%] 2025-12-04T13:20:27.9128812Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32 PASSED [0.0876s] [ 76%] 2025-12-04T13:20:27.9128935Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32 PASSED [0.0228s] [ 76%] 2025-12-04T13:20:27.9129068Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32 PASSED [0.0121s] [ 76%] 2025-12-04T13:20:27.9129186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32 PASSED [0.1012s] [ 76%] 2025-12-04T13:20:27.9129303Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32 PASSED [0.0257s] [ 76%] 2025-12-04T13:20:27.9129422Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32 PASSED [0.0808s] [ 76%] 2025-12-04T13:20:27.9129537Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32 PASSED [0.0869s] [ 76%] 2025-12-04T13:20:27.9129665Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32 PASSED [0.0115s] [ 76%] 2025-12-04T13:20:27.9129814Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32 PASSED [2.3768s] [ 76%] 2025-12-04T13:20:27.9129945Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32 PASSED [0.0063s] [ 76%] 2025-12-04T13:20:27.9130056Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32 PASSED [0.0118s] [ 76%] 2025-12-04T13:20:27.9130217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED [0.0013s] (Skip failing test) [ 77%] 2025-12-04T13:20:27.9130379Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED [0.0011s] (Skip failing test) [ 77%] 2025-12-04T13:20:27.9130495Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32 PASSED [0.9590s] [ 77%] 2025-12-04T13:20:27.9130623Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32 PASSED [0.9634s] [ 77%] 2025-12-04T13:20:27.9130770Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [0.9858s] [ 77%] 2025-12-04T13:20:27.9130906Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32 PASSED [0.9932s] [ 77%] 2025-12-04T13:20:27.9131060Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_channel_shuffle_cuda_float32 PASSED [0.9780s] [ 77%] 2025-12-04T13:20:27.9131191Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32 PASSED [1.0024s] [ 77%] 2025-12-04T13:20:27.9131317Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32 PASSED [1.0182s] [ 77%] 2025-12-04T13:20:27.9131476Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32 PASSED [0.9881s] [ 77%] 2025-12-04T13:20:27.9131620Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32 PASSED [0.0598s] [ 77%] 2025-12-04T13:20:27.9131788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [0.0139s] [ 77%] 2025-12-04T13:20:27.9131941Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [0.0374s] [ 77%] 2025-12-04T13:20:27.9132075Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32 PASSED [0.9745s] [ 78%] 2025-12-04T13:20:27.9132227Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [0.0582s] [ 78%] 2025-12-04T13:20:27.9132385Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest-exact_cuda_float32 PASSED [0.9808s] [ 78%] 2025-12-04T13:20:27.9132521Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32 PASSED [0.9839s] [ 78%] 2025-12-04T13:20:27.9132667Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32 PASSED [0.0477s] [ 78%] 2025-12-04T13:20:27.9132803Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32 PASSED [0.0127s] [ 78%] 2025-12-04T13:20:27.9132958Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32 PASSED [0.0810s] [ 78%] 2025-12-04T13:20:27.9133099Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32 PASSED [0.9393s] [ 78%] 2025-12-04T13:20:27.9133223Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32 PASSED [0.9812s] [ 78%] 2025-12-04T13:20:27.9133423Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_head_attention_forward_cuda_float32 PASSED [4.5627s] [ 78%] 2025-12-04T13:20:27.9133575Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [0.0710s] [ 78%] 2025-12-04T13:20:27.9133727Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32 PASSED [1.0127s] [ 78%] 2025-12-04T13:20:27.9133877Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_negative_cuda_float32 PASSED [0.9782s] [ 78%] 2025-12-04T13:20:27.9134006Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32 PASSED [0.9766s] [ 78%] 2025-12-04T13:20:27.9134150Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_complex_cuda_complex64 PASSED [0.9772s] [ 79%] 2025-12-04T13:20:27.9134280Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32 PASSED [0.9871s] [ 79%] 2025-12-04T13:20:27.9134413Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32 PASSED [0.9557s] [ 79%] 2025-12-04T13:20:27.9134530Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32 PASSED [0.0066s] [ 79%] 2025-12-04T13:20:27.9134644Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32 PASSED [0.1246s] [ 79%] 2025-12-04T13:20:27.9134755Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32 PASSED [0.0139s] [ 79%] 2025-12-04T13:20:27.9134890Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32 PASSED [0.0102s] [ 79%] 2025-12-04T13:20:27.9135014Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32 PASSED [0.0374s] [ 79%] 2025-12-04T13:20:27.9135135Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32 PASSED [0.0191s] [ 79%] 2025-12-04T13:20:27.9135243Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32 PASSED [0.0043s] [ 79%] 2025-12-04T13:20:27.9135354Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32 PASSED [0.0063s] [ 79%] 2025-12-04T13:20:27.9135480Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32 PASSED [0.0093s] [ 79%] 2025-12-04T13:20:27.9135594Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32 PASSED [0.0283s] [ 79%] 2025-12-04T13:20:27.9135722Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32 PASSED [0.0050s] [ 79%] 2025-12-04T13:20:27.9135833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32 PASSED [0.9942s] [ 80%] 2025-12-04T13:20:27.9135946Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32 PASSED [1.0045s] [ 80%] 2025-12-04T13:20:27.9136077Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32 PASSED [0.9976s] [ 80%] 2025-12-04T13:20:27.9136183Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32 PASSED [0.9769s] [ 80%] 2025-12-04T13:20:27.9136318Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32 PASSED [0.0255s] [ 80%] 2025-12-04T13:20:27.9136459Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32 PASSED [0.0151s] [ 80%] 2025-12-04T13:20:27.9136569Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32 PASSED [0.0038s] [ 80%] 2025-12-04T13:20:27.9136683Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32 PASSED [0.0079s] [ 80%] 2025-12-04T13:20:27.9136859Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32 SKIPPED [0.0012s] (Skip failing test) [ 80%] 2025-12-04T13:20:27.9137001Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32 PASSED [0.0053s] [ 80%] 2025-12-04T13:20:27.9137157Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [0.0102s] [ 80%] 2025-12-04T13:20:27.9137283Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32 PASSED [0.0160s] [ 80%] 2025-12-04T13:20:27.9137391Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32 PASSED [0.9792s] [ 80%] 2025-12-04T13:20:27.9137527Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32 PASSED [0.9773s] [ 80%] 2025-12-04T13:20:27.9137642Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32 PASSED [0.9909s] [ 81%] 2025-12-04T13:20:27.9137750Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32 PASSED [0.3013s] [ 81%] 2025-12-04T13:20:27.9137862Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_copy_cuda_float32 PASSED [0.9783s] [ 81%] 2025-12-04T13:20:27.9137972Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32 PASSED [0.9859s] [ 81%] 2025-12-04T13:20:27.9138138Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch__scaled_mm_cuda_float8_e4m3fn SKIPPED [0.0011s] (Requires CUDA SM >= 8.9) [ 81%] 2025-12-04T13:20:27.9138308Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch__scaled_mm_v2_cuda_float8_e4m3fn SKIPPED [0.0008s] (Requires CUDA SM >= 8.9) [ 81%] 2025-12-04T13:20:27.9138426Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_copy_cuda_float32 PASSED [0.0082s] [ 81%] 2025-12-04T13:20:27.9138546Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32 PASSED [0.0130s] [ 81%] 2025-12-04T13:20:27.9138662Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32 PASSED [0.0175s] [ 81%] 2025-12-04T13:20:27.9138775Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32 PASSED [0.0175s] [ 81%] 2025-12-04T13:20:27.9138895Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32 PASSED [0.9893s] [ 81%] 2025-12-04T13:20:27.9139018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64 PASSED [0.9777s] [ 81%] 2025-12-04T13:20:27.9139131Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32 PASSED [0.9829s] [ 81%] 2025-12-04T13:20:27.9139252Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32 PASSED [0.0183s] [ 82%] 2025-12-04T13:20:27.9139375Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_bfloat16 PASSED [0.0079s] [ 82%] 2025-12-04T13:20:27.9139491Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int32 PASSED [0.0067s] [ 82%] 2025-12-04T13:20:27.9139611Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_int8 PASSED [0.0121s] [ 82%] 2025-12-04T13:20:27.9139755Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float32 PASSED [0.0390s] [ 82%] 2025-12-04T13:20:27.9139876Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int8 PASSED [0.0243s] [ 82%] 2025-12-04T13:20:27.9140018Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_float32 PASSED [0.2441s] [ 82%] 2025-12-04T13:20:27.9140137Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_bfloat16 PASSED [0.9826s] [ 82%] 2025-12-04T13:20:27.9140251Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_float64 PASSED [0.9759s] [ 82%] 2025-12-04T13:20:27.9140369Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_int64 PASSED [0.9816s] [ 82%] 2025-12-04T13:20:27.9140485Z test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int16 PASSED [0.9784s] [ 82%] 2025-12-04T13:20:27.9140597Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int16 PASSED [0.0071s] [ 82%] 2025-12-04T13:20:27.9140720Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int32 PASSED [0.0063s] [ 82%] 2025-12-04T13:20:27.9140830Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int8 PASSED [0.0061s] [ 82%] 2025-12-04T13:20:27.9140940Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_complex32 PASSED [0.0033s] [ 83%] 2025-12-04T13:20:27.9141052Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_float64 PASSED [0.9836s] [ 83%] 2025-12-04T13:20:27.9141160Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_full_cuda_int32 PASSED [0.9799s] [ 83%] 2025-12-04T13:20:27.9141281Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_bfloat16 PASSED [0.0116s] [ 83%] 2025-12-04T13:20:27.9141411Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex128 PASSED [0.0107s] [ 83%] 2025-12-04T13:20:27.9141531Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_complex64 PASSED [0.0107s] [ 83%] 2025-12-04T13:20:27.9141651Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float64 PASSED [0.0105s] [ 83%] 2025-12-04T13:20:27.9141763Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_uint8 PASSED [0.0075s] [ 83%] 2025-12-04T13:20:27.9141903Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_bfloat16 PASSED [0.0355s] [ 83%] 2025-12-04T13:20:27.9142045Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_complex128 PASSED [0.0354s] [ 83%] 2025-12-04T13:20:27.9142183Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_float64 PASSED [0.0376s] [ 83%] 2025-12-04T13:20:27.9142315Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_tensor_overload_cuda_uint8 PASSED [0.0231s] [ 83%] 2025-12-04T13:20:27.9142459Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_complex128 PASSED [0.2230s] [ 83%] 2025-12-04T13:20:27.9142591Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_tensor_overload_cuda_int32 PASSED [0.2033s] [ 83%] 2025-12-04T13:20:27.9142717Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_bfloat16 PASSED [0.0032s] [ 84%] 2025-12-04T13:20:27.9142823Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int32 PASSED [0.9892s] [ 84%] 2025-12-04T13:20:27.9142932Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_int8 PASSED [0.9786s] [ 84%] 2025-12-04T13:20:27.9143066Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_bfloat16 PASSED [0.9780s] [ 84%] 2025-12-04T13:20:27.9143176Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int16 PASSED [0.9896s] [ 84%] 2025-12-04T13:20:27.9143327Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32 PASSED [0.9840s] [ 84%] 2025-12-04T13:20:27.9143440Z test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8 PASSED [0.9744s] [ 84%] 2025-12-04T13:20:27.9143546Z test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32 SKIPPED [0.0017s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9143664Z test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9143783Z test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9143897Z test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9144014Z test_ops.py::TestTagsCUDA::test_tags__chunk_cat_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9144153Z test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9144266Z test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 84%] 2025-12-04T13:20:27.9144401Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9144559Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9144696Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9144834Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9144948Z test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145077Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145218Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145343Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145466Z test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145598Z test_ops.py::TestTagsCUDA::test_tags__refs_count_nonzero_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145717Z test_ops.py::TestTagsCUDA::test_tags__refs_equal_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145842Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9145964Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 85%] 2025-12-04T13:20:27.9146087Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146209Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146332Z test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146470Z test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146580Z test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146704Z test_ops.py::TestTagsCUDA::test_tags__refs_geometric_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146837Z test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9146951Z test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147069Z test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147194Z test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32 SKIPPED [0.0014s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147315Z test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147440Z test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147550Z test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147672Z test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 86%] 2025-12-04T13:20:27.9147801Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_cross_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9147929Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148045Z test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148173Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148310Z test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148435Z test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148581Z test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148704Z test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148826Z test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9148963Z test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9149084Z test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9149220Z test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9149373Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED [0.0016s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9149510Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 87%] 2025-12-04T13:20:27.9149667Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9149799Z test_ops.py::TestTagsCUDA::test_tags__refs_normal__in_place_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9149918Z test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150031Z test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150167Z test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150281Z test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150411Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150566Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150716Z test_ops.py::TestTagsCUDA::test_tags__refs_split_with_sizes_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150833Z test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9150951Z test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9151074Z test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32 SKIPPED [0.0016s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9151187Z test_ops.py::TestTagsCUDA::test_tags__refs_stft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 88%] 2025-12-04T13:20:27.9151312Z test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9151441Z test_ops.py::TestTagsCUDA::test_tags__refs_take_along_dim_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9151557Z test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9151674Z test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9151800Z test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9151923Z test_ops.py::TestTagsCUDA::test_tags__refs_unbind_copy_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152063Z test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152192Z test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152313Z test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152449Z test_ops.py::TestTagsCUDA::test_tags__segment_reduce_offsets_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152566Z test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152695Z test_ops.py::TestTagsCUDA::test_tags_alias_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152812Z test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9152925Z test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 89%] 2025-12-04T13:20:27.9153064Z test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153178Z test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153333Z test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153452Z test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153562Z test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153691Z test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153818Z test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9153956Z test_ops.py::TestTagsCUDA::test_tags_cauchy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154072Z test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154193Z test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154308Z test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154441Z test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154565Z test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154689Z test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 90%] 2025-12-04T13:20:27.9154800Z test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32 SKIPPED [0.0014s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9154919Z test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155041Z test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155159Z test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155272Z test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155401Z test_ops.py::TestTagsCUDA::test_tags_empty_permuted_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155514Z test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155627Z test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155759Z test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155882Z test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9155999Z test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9156110Z test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9156231Z test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9156363Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 91%] 2025-12-04T13:20:27.9156487Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9156604Z test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9156720Z test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9156840Z test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9156953Z test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157061Z test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157175Z test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157289Z test_ops.py::TestTagsCUDA::test_tags_grid_sampler_3d_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 92%] 2025-12-04T13:20:27.9157411Z test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157518Z test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157662Z test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157775Z test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9157894Z test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64 SKIPPED [0.0011s] (Only runs on cpu) [ 92%] 2025-12-04T13:20:27.9158017Z test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158159Z test_ops.py::TestTagsCUDA::test_tags_index_reduce_amin_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158275Z test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158384Z test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32 SKIPPED [0.0014s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158501Z test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158617Z test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158772Z test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9158899Z test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159011Z test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64 SKIPPED [0.0012s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159139Z test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159269Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159417Z test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159550Z test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159674Z test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 93%] 2025-12-04T13:20:27.9159803Z test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9159925Z test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160060Z test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160186Z test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160307Z test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160430Z test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160543Z test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160664Z test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160788Z test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9160917Z test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9161038Z test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9161170Z test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9161298Z test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9161436Z test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 94%] 2025-12-04T13:20:27.9161549Z test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9161684Z test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9161792Z test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9161922Z test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162037Z test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162173Z test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32 SKIPPED [0.0015s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162292Z test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162439Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162593Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162735Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9162894Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9163050Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9163209Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9163438Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 95%] 2025-12-04T13:20:27.9163584Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9163726Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9163880Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164028Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164185Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164352Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_head_attention_forward_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164505Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164642Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164811Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_scaled_dot_product_attention_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9164929Z test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9165074Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9165216Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32 SKIPPED [0.0015s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9165336Z test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 96%] 2025-12-04T13:20:27.9165466Z test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9165583Z test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9165714Z test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9165836Z test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9165970Z test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166094Z test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166211Z test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166340Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166457Z test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32 SKIPPED [0.0017s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166589Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166708Z test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166820Z test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9166967Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9167099Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 97%] 2025-12-04T13:20:27.9167213Z test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9167352Z test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9167473Z test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9167594Z test_ops.py::TestTagsCUDA::test_tags_sparse_mm_reduce_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 98%] 2025-12-04T13:20:27.9167718Z test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 98%] 2025-12-04T13:20:27.9167846Z test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9167968Z test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168128Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168255Z test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168383Z test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168509Z test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168635Z test_ops.py::TestTagsCUDA::test_tags_squeeze_copy_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168751Z test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168870Z test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 98%] 2025-12-04T13:20:27.9168986Z test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169102Z test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169223Z test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169364Z test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169477Z test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169601Z test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169728Z test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32 SKIPPED [0.0013s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169846Z test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9169972Z test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9170087Z test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32 SKIPPED [0.0012s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9170208Z test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32 SKIPPED [0.0011s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9170321Z test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32 SKIPPED [0.0014s] (Only runs on cpu) [ 99%] 2025-12-04T13:20:27.9170497Z test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_no_rounding_mode_cuda_float32 PASSED [0.0469s] [ 99%] 2025-12-04T13:20:27.9170666Z test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_trunc_rounding_cuda_float32 PASSED [0.0019s] [100%] 2025-12-04T13:20:27.9170671Z 2025-12-04T13:20:27.9170856Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_ops/test_ops-a26ae297ecf7f94e.xml - 2025-12-04T13:20:27.9170950Z == 1084 passed, 270 skipped, 5329 deselected, 19 xfailed in 686.65s (0:11:26) == 2025-12-04T13:20:27.9171314Z The following tests failed and then succeeded when run in a new process['test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8', 'test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_silu_cuda_float32'] 2025-12-04T13:20:27.9171318Z 2025-12-04T13:20:27.9171439Z FINISHED PRINTING LOG FILE of test_ops 3/5 (test/test-reports/test_ops_3.5_0a83ba8be83064cd_.log) 2025-12-04T13:20:27.9171445Z 2025-12-04T13:20:27.9171537Z Finished test_ops 3/5 ... [2025-12-04 13:20:27.658219][2258411.924889564], took 72.27min 2025-12-04T13:20:27.9171786Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:20:27.9171883Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:20:27.9172013Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T13:20:27.9172070Z Uploading artifacts took 0.00 seconds 2025-12-04T13:20:27.9172164Z Running test_decomp 4/11 ... [2025-12-04 13:20:27.664457][2258411.931130643] 2025-12-04T13:20:27.9172219Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:20:27.9172533Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_decomp.py', '--shard-id=4', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:20:27.664625] 2025-12-04T13:33:12.5301380Z 2025-12-04T13:33:12.5302380Z test_decomp 4/11 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_4.11_9380825e6ac11cbd_.log 2025-12-04T13:33:12.5405075Z Running 847 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cdouble_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_det_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lu_unpack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_median_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_hamming_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_cauchy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_addcmul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_inf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_hypot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_embedding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_std_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_tril_indices_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_GRU_eval_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_train_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_uniform_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_decomp.py::HasDecompTest::test_mm_decompose_mm_dde 2025-12-04T13:33:12.5491654Z 2025-12-04T13:33:12.5491765Z Finished test_decomp 4/11 ... [2025-12-04 13:33:12.530570][2259176.797238192], took 12.75min 2025-12-04T13:33:12.5492140Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:33:12.5492526Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:33:12.5492732Z Running test_decomp 10/11 ... [2025-12-04 13:33:12.537169][2259176.803842795] 2025-12-04T13:33:12.5492898Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:33:12.5493303Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_decomp.py', '--shard-id=10', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:33:12.537356] 2025-12-04T13:41:54.7524011Z 2025-12-04T13:41:54.7525540Z test_decomp 10/11 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_10.11_699c8d89599c03bd_.log 2025-12-04T13:41:54.7624308Z Running 822 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_allclose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_and_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_min_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_complex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frac_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kron_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_householder_product_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumprod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_one_hot_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_smooth_l1_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softshrink_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_nuc_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pinverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rand_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_short_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sort_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_ndtri_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensor_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_consecutive_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_complex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_alias_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_max_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_min_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_complex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sinc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_erfinv_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_frexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_maximum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_native_layer_norm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_repeat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_round_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_tril_indices_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_triu_indices_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_eval_mode_cuda_float64 2025-12-04T13:41:54.7707358Z 2025-12-04T13:41:54.7707469Z Finished test_decomp 10/11 ... [2025-12-04 13:41:54.752836][2259699.019504919], took 8.70min 2025-12-04T13:41:54.7707855Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:41:54.7708245Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:41:54.7708465Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T13:41:54.7708642Z Uploading artifacts took 0.00 seconds 2025-12-04T13:41:54.7708827Z Running nn/test_multihead_attention 1/1 ... [2025-12-04 13:41:54.759357][2259699.026030099] 2025-12-04T13:41:54.7709015Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:41:54.7709404Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'nn/test_multihead_attention.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:41:54.759550] 2025-12-04T13:43:24.3746913Z 2025-12-04T13:43:24.3748019Z nn/test_multihead_attention 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_multihead_attention_1.1_57633f3f13ced814_.log 2025-12-04T13:43:24.3756456Z Running 20 items in this shard: test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attention_average_attn_weights_False, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attention_average_attn_weights_True, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attn_3d_attn_mask, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attn_fast_path_invalid_shape, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attn_invalid_shape, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attn_nested_tensor_outside_fast_path, test/nn/test_multihead_attention.py::TestMultiheadAttentionNN::test_multihead_attn_no_bias, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_fast_path_check_with_mask_does_not_break_in_compile_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_batch_first_cuda_float16, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_batch_first_cuda_float32, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_batch_first_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_cuda_float16, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_cuda_float32, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attention_dtype_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attn_fast_path_query_and_bias_have_different_dtypes_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attn_fast_path_small_test_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attn_in_proj_bias_none_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_attn_in_proj_weight_none_cuda_float64, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_self_attn_two_masks_fast_path_cuda, test/nn/test_multihead_attention.py::TestMultiheadAttentionNNDeviceTypeCUDA::test_multihead_self_attn_two_masks_fast_path_mock_cuda 2025-12-04T13:43:24.3761963Z 2025-12-04T13:43:24.3762143Z Finished nn/test_multihead_attention 1/1 ... [2025-12-04 13:43:24.374402][2259788.641070849], took 1.49min 2025-12-04T13:43:24.3762733Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:43:24.3809080Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:43:24.3811230Z Running higher_order_ops/test_invoke_quant 1/1 ... [2025-12-04 13:43:24.380979][2259788.647652758] 2025-12-04T13:43:24.3811804Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:43:24.3812711Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'higher_order_ops/test_invoke_quant.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:43:24.381167] 2025-12-04T13:43:34.8608417Z 2025-12-04T13:43:34.8609218Z higher_order_ops/test_invoke_quant 1/1 was successful, full logs can be found in artifacts with path test/test-reports/higher_order_ops.test_invoke_quant_1.1_bc04501e4d833c3d_.log 2025-12-04T13:43:34.8612175Z Running 14 items in this shard: test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantEager::test_simple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantAotEager::test_simple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_construct_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_inline, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_multiple, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_pattern_matching, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_prologue, test/higher_order_ops/test_invoke_quant.py::TestInvokeQuantInductor::test_simple 2025-12-04T13:43:34.8615342Z 2025-12-04T13:43:34.8615532Z Finished higher_order_ops/test_invoke_quant 1/1 ... [2025-12-04 13:43:34.860482][2259799.127152343], took 0.17min 2025-12-04T13:43:34.8616670Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:43:34.8666149Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:43:34.8669842Z Running higher_order_ops/test_local_map 1/1 ... [2025-12-04 13:43:34.866670][2259799.133343129] 2025-12-04T13:43:34.8670203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:43:34.8670891Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'higher_order_ops/test_local_map.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:43:34.866863] 2025-12-04T13:43:41.3482498Z 2025-12-04T13:43:41.3482833Z higher_order_ops/test_local_map 1/1 was successful, full logs can be found in artifacts with path test/test-reports/higher_order_ops.test_local_map_1.1_64800fdd86bb5a18_.log 2025-12-04T13:43:41.3484779Z Running 12 items in this shard: test/higher_order_ops/test_local_map.py::TestLocalMap::test_filtered_gradients, test/higher_order_ops/test_local_map.py::TestLocalMap::test_fx_annotations, test/higher_order_ops/test_local_map.py::TestLocalMap::test_local_map_dynamo_mismatch_placements, test/higher_order_ops/test_local_map.py::TestLocalMap::test_local_map_dynamo_reordered_inputs, test/higher_order_ops/test_local_map.py::TestLocalMap::test_local_map_with_local_shapes_dynamo_tracing, test/higher_order_ops/test_local_map.py::TestLocalMap::test_local_map_with_local_shapes_hop_tracing, test/higher_order_ops/test_local_map.py::TestLocalMap::test_none_gradients, test/higher_order_ops/test_local_map.py::TestLocalMap::test_none_placements, test/higher_order_ops/test_local_map.py::TestLocalMap::test_sac, test/higher_order_ops/test_local_map.py::TestLocalMap::test_sac_deferred, test/higher_order_ops/test_local_map.py::TestLocalMap::test_simple, test/higher_order_ops/test_local_map.py::TestLocalMap::test_symint_activations 2025-12-04T13:43:41.3486137Z 2025-12-04T13:43:41.3486262Z Finished higher_order_ops/test_local_map 1/1 ... [2025-12-04 13:43:41.347963][2259805.6146332], took 0.11min 2025-12-04T13:43:41.3491541Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:43:41.3543331Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:43:41.3543599Z Running higher_order_ops/test_invoke_subgraph 1/1 ... [2025-12-04 13:43:41.354221][2259805.620894134] 2025-12-04T13:43:41.3543804Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:43:41.3546210Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'higher_order_ops/test_invoke_subgraph.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:43:41.354418] 2025-12-04T13:44:12.4643786Z 2025-12-04T13:44:12.4644587Z higher_order_ops/test_invoke_subgraph 1/1 was successful, full logs can be found in artifacts with path test/test-reports/higher_order_ops.test_invoke_subgraph_1.1_b3e002e2c5d30e2a_.log 2025-12-04T13:44:12.4665844Z Running 73 items in this shard: test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraph::test_aot_function, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraph::test_multiple, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraph::test_simple, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_ac, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_ac_rng, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_ac_rng_cudagraphs, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_auto_functionalize, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_autograd_function, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_buffer_mutation_errors_under_training, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_buffer_mutation_works_under_no_grad, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_bwd_partitioning, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_complex, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_const_tensor, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dce, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dce_recursive, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dedupe, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_strides_in_backward, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_different_symint, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_differing_strides_for_grad_outs, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_div, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dropout_checks_joint_graph, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dropout_checks_joint_graph_inference, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_dynamic, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_fail_with_direct_invoke_subgraph, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_fake_tensor_checking, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_gen_schema, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_gen_schema_with_buffer_mutation, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_grad_accuracy_check, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_input_aliasing, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_mutation, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_mutation_inference_mode, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_mutation_mutiple_times, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_mutation_mutiple_times_fake_tensor_cahche_hit, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_input_output_aliasing, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_kwargs_only, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_list, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_mod_attr_aliasing, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_module, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_module_forward, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_module_method, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_nonlocal_list_mutation_hidden, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_nonlocal_update, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_normalize_gm, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_output_output_aliasing, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_pending_unbacked, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_preserves_output_strides, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_preserves_strides, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_redundant_compile_region, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_return_none, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_return_none_from_fwd, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_return_size, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_sdpa, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_simple, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_simple_module, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_symint_from_fwd_to_bwd, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_triton_kernel_native, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_tuple_of_tuple, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_udf_output, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_unbacked1, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_unbacked2, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_unbacked_symbol, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphCompile::test_view_to_reshape, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportNonstrict::test_multiple_module, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportNonstrict::test_pending_unbacked, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportNonstrict::test_simple_func, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportNonstrict::test_simple_method, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportNonstrict::test_unbacked, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportStrict::test_multiple_module, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportStrict::test_pending_unbacked, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportStrict::test_simple_func, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportStrict::test_simple_method, test/higher_order_ops/test_invoke_subgraph.py::TestInvokeSubgraphExportStrict::test_unbacked, test/higher_order_ops/test_invoke_subgraph.py::NegativeTesting::test_graph_break 2025-12-04T13:44:12.4678645Z 2025-12-04T13:44:12.4678807Z Finished higher_order_ops/test_invoke_subgraph 1/1 ... [2025-12-04 13:44:12.464129][2259836.730797854], took 0.52min 2025-12-04T13:44:12.4679292Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:44:12.4708931Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:44:12.4710323Z Running test_utils 1/1 ... [2025-12-04 13:44:12.470919][2259836.737591991] 2025-12-04T13:44:12.4710512Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:44:12.4712262Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:12.471105] 2025-12-04T13:44:35.5218218Z 2025-12-04T13:44:35.5219253Z test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_1.1_a86518468ed58dfc_.log 2025-12-04T13:44:35.5935925Z Running 6014 items in this shard: test/test_utils.py::TestCheckpoint::test_checkpoint, test/test_utils.py::TestCheckpoint::test_checkpoint_module_list, test/test_utils.py::TestCheckpoint::test_checkpoint_no_tensors, test/test_utils.py::TestCheckpoint::test_checkpoint_non_tensor, test/test_utils.py::TestCheckpoint::test_checkpoint_non_tensor_inputs_outputs, test/test_utils.py::TestCheckpoint::test_checkpoint_not_preserve_rng_state_and_without_reentrant, test/test_utils.py::TestCheckpoint::test_checkpoint_partial_grad, test/test_utils.py::TestCheckpoint::test_checkpoint_rng_cpu, test/test_utils.py::TestCheckpoint::test_checkpoint_rng_gpu, test/test_utils.py::TestCheckpoint::test_checkpoint_sequential_deprecated_multiple_args, test/test_utils.py::TestCheckpoint::test_checkpoint_sequential_deprecated_no_args, test/test_utils.py::TestCheckpoint::test_checkpoint_trigger, test/test_utils.py::TestCheckpoint::test_checkpoint_valid, test/test_utils.py::TestCheckpoint::test_checkpointing_without_reentrant_early_free, test/test_utils.py::TestCheckpoint::test_get_device_states_recursive, test/test_utils.py::TestCheckpoint::test_infer_device_state_recursive_meta, test/test_utils.py::TestCheckpoint::test_infer_device_state_recursive_multi_gpu, test/test_utils.py::TestDataLoaderUtils::test_multi_drop, test/test_utils.py::TestDataLoaderUtils::test_multi_keep, test/test_utils.py::TestDataLoaderUtils::test_random_seed, test/test_utils.py::TestDataLoaderUtils::test_single_drop, test/test_utils.py::TestDataLoaderUtils::test_single_keep, test/test_utils.py::TestCollectEnv::test_smoke, test/test_utils.py::TestHipify::test_import_hipify, test/test_utils.py::TestHipifyTrie::test_add_and_search_trie, test/test_utils.py::TestHipifyTrie::test_add_multiple_and_search_trie, test/test_utils.py::TestHipifyTrie::test_char_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_prefix_words_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_quote_escape, test/test_utils.py::TestHipifyTrie::test_single_export_trie_to_regex, test/test_utils.py::TestHipifyTrie::test_special_char_export_trie_to_regex, test/test_utils.py::TestAssert::test_assert_scriptable, test/test_utils.py::TestAssert::test_assert_true, test/test_utils.py::TestStandaloneCPPJIT::test_load_standalone, test/test_utils.py::TestRenderUtils::test_basic, test/test_utils.py::TestDeviceUtilsCUDA::test_basic_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_decorator_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_decorator_generator_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_H_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_T_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___getitem___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___radd___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rand___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rdiv___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmatmul___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmod___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rmul___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___ror___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rpow___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rsub___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops___rxor___cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__batch_norm_with_update_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__chunk_cat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__native_batch_norm_legit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_lengths_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__segment_reduce_offsets_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__softmax_backward_data_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops__upsample_bilinear2d_aa_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_abs_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acos_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_acosh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addbmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcdiv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addcmul_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmm_decomposed_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addmv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_addr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_alias_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_all_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_allclose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_aminmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_angle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_any_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_arange_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argsort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_argwhere_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_partial_views_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_as_strided_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_asinh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atanh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_1d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_2d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_atleast_3d_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_baddbmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bernoulli_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bfloat16_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bincount_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_and_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_left_shift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_not_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_or_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_right_shift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bitwise_xor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_block_diag_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bool_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_shapes_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_broadcast_to_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_bucketize_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_byte_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cartesian_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cauchy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cdouble_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ceil_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cfloat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chalf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_char_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_inverse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cholesky_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_chunk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_max_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clamp_min_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_clone_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_column_stack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_combinations_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_complex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_conj_physical_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_constant_pad_nd_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_contiguous_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_copysign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_corrcoef_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cos_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cosh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_count_nonzero_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cov_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cross_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cummin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumprod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumsum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_cumulative_trapezoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_deg2rad_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diag_embed_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagflat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diagonal_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_diff_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_digamma_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_floor_rounding_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_no_rounding_mode_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_div_trunc_rounding_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_double_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_dstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_einsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_permuted_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_empty_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eq_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_equal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_erfinv_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expand_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_expm1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_exponential_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e4m3fn, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e4m3fnuz, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e5m2, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_float8_e5m2fnuz, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_eye_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_fftshift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_hfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ifftshift_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_ihfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_irfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfft_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fft_rfftn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flatten_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flip_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fliplr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_flipud_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_float_power_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_floor_divide_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_fmod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frac_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_frexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_full_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gather_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gcd_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ge_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geometric_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_geqrf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gradient_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_grid_sampler_3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_gt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_half_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hash_tensor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_heaviside_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_histc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_hypot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_i0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igammac_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_igammac_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_imag_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_put_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_mean_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_reduce_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_index_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_inner_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_int_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isclose_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isfinite_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isinf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isnan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isneginf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isposinf_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_isreal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_istft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_istft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_item_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_2inputs_2outputs_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_binary_return_by_ref_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_jiterator_unary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kron_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_kthvalue_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lcm_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ldexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_le_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lerp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lgamma_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cholesky_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cond_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_cross_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_det_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_diagonal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eig_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvals_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_eigvalsh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_householder_product_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_inv_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_factor_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_ldl_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lstsq_grad_oriented_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_factor_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_lu_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_power_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_matrix_rank_hermitian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_multi_dot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_norm_subgradients_at_zero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_hermitian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_pinv_singular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_qr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_slogdet_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_ex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_solve_triangular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_svdvals_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorinv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_tensorsolve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vander_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vecdot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linalg_vector_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_linspace_tensor_overload_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log10_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log1p_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_normal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_log_softmax_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logaddexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logcumsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logdet_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_and_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_not_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_or_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logical_xor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logspace_tensor_overload_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_logsumexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_long_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_lu_unpack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mH_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mT_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_argmin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumprod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_cumsum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_fill_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_log_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logaddexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_logsumexp_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_median_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_normalize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_softmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_std_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_masked_var_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matmul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_matrix_exp_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_pool2d_with_indices_backward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_no_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_max_reduction_with_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_maximum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_median_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_list_of_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_meshgrid_variadic_tensors_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_binary_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_no_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_min_reduction_with_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_minimum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mode_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_movedim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_msort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mul_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_multinomial_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mv_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nan_to_num_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanmedian_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanquantile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nanquantile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nansum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_narrow_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_batch_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_dropout_backward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_native_layer_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ne_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_neg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_empty_strided_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_full_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_ones_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_new_zeros_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nextafter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_alpha_dropout_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_avg_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_celu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_channel_shuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_conv_transpose3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cosine_similarity_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_cross_entropy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_ctc_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_ctc_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_dropout_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_elu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_bag_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_embedding_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_fractional_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gaussian_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_gelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_glu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_grid_sample_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_group_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardsigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardswish_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hardtanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_hinge_embedding_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_huber_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_instance_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_area_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bicubic_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_linear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_nearest_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_interpolate_trilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_kl_div_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_l1_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_layer_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_leaky_relu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_linear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_local_response_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_logsigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_margin_ranking_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_pool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool1d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool2d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_max_unpool3d_grad_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mish_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_mse_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_head_attention_forward_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multi_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_normalize_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_one_hot_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_circular_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_constant_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_reflect_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pad_replicate_negative_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pairwise_distance_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pdist_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pdist_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_shuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_pixel_unshuffle_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_poisson_nll_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_prelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu6_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_relu_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rms_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_rrelu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_selu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_complex_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_complex_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_silu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_smooth_l1_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_soft_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softmin_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softplus_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_softsign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_tanhshrink_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_threshold_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_unfold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_bilinear_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nn_functional_upsample_nearest_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_nonzero_static_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_fro_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_inf_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_norm_nuc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_in_place_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_normal_number_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ones_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ormqr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_outer_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pca_lowrank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_permute_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pinverse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polar_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polar_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_2_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_3_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_polygamma_polygamma_n_4_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_positive_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_pow_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_put_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_qr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_quantile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_quantile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rad2deg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rand_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randint_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_randn_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_ravel_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_real_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reciprocal_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_remainder_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_renorm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_repeat_interleave_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_reshape_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resize_as__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_conj_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_resolve_neg_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_roll_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rot90_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_round_decimals_neg_3_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsqrt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_rsub_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scalar_tensor_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_add_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amax_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_amin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_mean_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_prod_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_scatter_reduce_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_searchsorted_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_select_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sgn_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_short_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sigmoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sign_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_bartlett_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_bartlett_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_blackman_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_blackman_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_cosine_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_cosine_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_exponential_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_exponential_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_gaussian_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_gaussian_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_cosine_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_cosine_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_hamming_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_general_hamming_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hamming_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hamming_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hann_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_hann_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_kaiser_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_kaiser_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_nuttall_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signal_windows_nuttall_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_signbit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sin_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sinh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_slice_scatter_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_softmax_with_dtype_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sort_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_mm_reduce_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sparse_sampled_addmm_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_airy_ai_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_j1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_bessel_y1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_u_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_v_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_chebyshev_polynomial_w_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_entr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_erfcx_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_h_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_hermite_polynomial_he_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i0e_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_i1e_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_laguerre_polynomial_l_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_legendre_polynomial_p_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_log_ndtr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_i1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_modified_bessel_k1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtr_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_ndtri_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_scaled_modified_bessel_k1_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_spherical_bessel_j0_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_xlog1py_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_special_zeta_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_list_args_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_split_with_sizes_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sqrt_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_square_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_squeeze_multiple_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_mean_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_std_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_stft_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sub_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_sum_to_size_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_svd_lowrank_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_t_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_along_dim_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_take_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tan_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tanh_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensor_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tensordot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tile_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_to_sparse_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_topk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch__scaled_mm_cuda_float8_e4m3fn, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trace_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_transpose_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapezoid_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trapz_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triangular_solve_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_indices_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_tril_indices_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_indices_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_triu_indices_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_true_divide_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_trunc_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unbind_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unflatten_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unfold_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_uniform_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_consecutive_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unique_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unravel_index_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_chunk_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsafe_split_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_unsqueeze_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_mean_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_var_unbiased_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vdot_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_complex_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_real_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_as_real_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_copy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_view_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vsplit_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_vstack_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_where_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_xlogy_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zero__cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_bfloat16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_bool, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex128, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_complex64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_float64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int16, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int32, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int64, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_int8, test/test_utils.py::TestDeviceUtilsCUDA::test_device_mode_ops_zeros_like_cuda_uint8, test/test_utils.py::TestDeviceUtilsCUDA::test_get_default_device_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_get_default_device_more_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_nn_module_cuda, test/test_utils.py::TestDeviceUtilsCUDA::test_set_default_device_cuda, test/test_utils.py::TestCppExtensionUtils::test_cc_compiler_is_ok, test/test_utils.py::TestCppExtensionUtils::test_cpp_compiler_is_ok, test/test_utils.py::TestTraceback::test_basic, test/test_utils.py::TestTraceback::test_captured_traceback, test/test_utils.py::TestTraceback::test_captured_traceback_format_all, test/test_utils.py::TestTraceback::test_captured_traceback_format_all_cached, test/test_utils.py::TestTraceback::test_format_traceback_short, test/test_utils.py::TestTryImport::test_import_existing, test/test_utils.py::TestTryImport::test_import_imported, test/test_utils.py::TestTryImport::test_import_missing, test/test_utils.py::TestDeprecate::test_deprecated 2025-12-04T13:44:35.6613089Z 2025-12-04T13:44:35.6613204Z Finished test_utils 1/1 ... [2025-12-04 13:44:35.525393][2259859.792062737], took 0.38min 2025-12-04T13:44:35.6613617Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:44:35.6613976Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:44:35.6614216Z Running profiler/test_memory_profiler 1/1 ... [2025-12-04 13:44:35.531616][2259859.798288061] 2025-12-04T13:44:35.6614412Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:44:35.6614818Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_memory_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:35.531814] 2025-12-04T13:44:40.1555715Z 2025-12-04T13:44:40.1556974Z profiler/test_memory_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_memory_profiler_1.1_c8df726909b0fd6e_.log 2025-12-04T13:44:40.1565595Z Running 33 items in this shard: test/profiler/test_memory_profiler.py::TestMemoryProfiler::test_config_check, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_module_and_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_from_optimizer_set_to_none, test/profiler/test_memory_profiler.py::TestIdentifyGradients::test_extract_gradients_low_level, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_complicated, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_non_op_allocations, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_simple_inplace, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_stacked, test/profiler/test_memory_profiler.py::TestDataFlow::test_data_flow_graph_with_annotations, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_backward, test/profiler/test_memory_profiler.py::TestDataFlow::test_match_schemas_tensorlist, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_sequential_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_categories_e2e_simple_module_fwd_bwd_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_bwd, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_inputs_fwd_lazy, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_lazily_initialized, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_manual_optimizer_step, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_memory_timeline, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients, test/profiler/test_memory_profiler.py::TestMemoryProfilerE2E::test_parameters_and_gradients_set_to_none, test/profiler/test_memory_profiler.py::TestMemoryProfilerTimelineCUDA::test_memory_timeline_no_id_cuda 2025-12-04T13:44:40.1571928Z 2025-12-04T13:44:40.1572066Z Finished profiler/test_memory_profiler 1/1 ... [2025-12-04 13:44:40.155300][2259864.421969881], took 0.08min 2025-12-04T13:44:40.1572527Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:44:40.1616986Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:44:40.1618719Z Running functorch/test_aotdispatch 1/1 ... [2025-12-04 13:44:40.161763][2259864.428435952] 2025-12-04T13:44:40.1618934Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:44:40.1620701Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'functorch/test_aotdispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:40.161952] 2025-12-04T13:45:49.9000587Z 2025-12-04T13:45:49.9001580Z functorch/test_aotdispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_aotdispatch_1.1_ff23e5502d01598f_.log 2025-12-04T13:45:49.9090631Z Running 537 items in this shard: test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_module, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutograd::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_ban_dropout_mut_pre_dispatch, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_multiple_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_forward_mutation_no_buffer_mut, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_functionalized_rng_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_dupes_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_input_requiring_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_input_mutation_on_parameter_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_metadata_mutation_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_module_joint, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_multiple_outputs_require_grad_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_buffer_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_inplace, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_composite_implicit_linear, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_contiguous, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_conv_and_bn, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_composite_implicit, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_simple, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_func_view, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_1, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_map_2, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_outdtype, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_reshape, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_autograd_op, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_predispatch_with_cond_nested, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_basic, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_simplified_pytrees_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_synthetic_bases_banned, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_unbacked_arg, test/functorch/test_aotdispatch.py::TestAOTExport::test_aot_export_with_torch_cond, test/functorch/test_aotdispatch.py::TestPartitioning::test_autocast, test/functorch/test_aotdispatch.py::TestPartitioning::test_contiguous, test/functorch/test_aotdispatch.py::TestPartitioning::test_custom_partitioner_fn, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_getitem, test/functorch/test_aotdispatch.py::TestPartitioning::test_default_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_generate_gives_inference_graph, test/functorch/test_aotdispatch.py::TestPartitioning::test_meta_tensor_inplace_op, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_output_tensor_shape_tensor, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_raise_getitems, test/functorch/test_aotdispatch.py::TestPartitioning::test_min_cut_partitioner_save_shape, test/functorch/test_aotdispatch.py::TestPartitioning::test_preserve_random, test/functorch/test_aotdispatch.py::TestPartitioning::test_quantize_activation_duplicate_nodes, test/functorch/test_aotdispatch.py::TestPartitioning::test_recompute_partitioning, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_incorrect_backward, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_inference, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_input_mutation_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_alias, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_output_requires_grad_in_no_grad_views, test/functorch/test_aotdispatch.py::TestAOTDispatch::test_aot_dispatch_simple, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_dynamic, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_fake_tensor_gm_raises, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_module_simplified_preserves_stack_trace_from_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_aot_test_subclasses_with_tensor_factories, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_flex_attn_noncontiguous_tangents, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_dense, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_nested_tensor_tangent, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_grads_no_force_contiguous_subclass, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inductor_freezing_with_subclasses, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_inference_python_dispatcher, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_layer_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_lift_fresh_copy_in_graph, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_False_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_False_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cpu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_noncontig_nonmemformat_tangents_dynamic_shapes_True_test_subclasses_True_device_cuda, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rms_norm, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_rrelu_with_noise_mutation, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_all, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_donated, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_base_saved_tensors_hooks_filtering_mode_no_static, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_donated_buffers, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_params, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_saved_tensors_hooks_recompile, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_subclass_parameters_torture_case, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_tangent_type_coercion, test/functorch/test_aotdispatch.py::TestAOTModuleSimplified::test_wrong_guess_tangent_type, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithDynamo::test_view_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_aot_eager_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_False_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_alias_of_intermediate_detach_backend_inductor_view_replay_for_aliased_outputs_True_dynamic_shapes_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_autocast_disable_guard, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_data, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_forward_inputs_create_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_mutation_on_grad_out, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_custom, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_off, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_backward_pass_autocast_on, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batch_norm_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_batchnorm_inference, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_batch_norm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_buffer_copied_in_graph_with_different_shapes, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_compilation_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_complex_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_composite_impl_compile, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_autograd, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_custom_tensor_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_default_partitioner_saves_symints_not_tensors_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_returned_as_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dupe_arg_torture, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_duplicated_arguments_on_tensor_overlap, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_dynamic_shape_output_not_in_bw_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_embedding_bag_view_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization1, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_fw_bw_mutation_no_functionalization2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_grad_context, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inference_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inner_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_aliased_with_mutation_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_data_and_metadata_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_inplace_requires_grad_true, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_metadata_mutation_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_alias_everything, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_none_require_gradients, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_and_output_alias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_bases_out_of_order, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_aliases_other_input2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_and_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_batchnorm, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_false_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_hidden_from_autograd_aliasing, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_is_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_metadata2, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_modifies_autograd_meta_of_aliases, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_output_view_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_detach_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_requires_grad_no_grad_inference_graph, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_return, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__input_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_set__nop, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_simple_with_none_and_nontensor, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_before_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_down_and_set_, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_mutation_storage_resize_up, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_aliase_custom_autograd_function, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_metadata_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_mutate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_input_output_view_simple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_unsqueeze_with_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_inputs_overlapping_with_mutation_guard_base, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_dupe_left_bias, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_invalid_requires_grad_fake, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_list_codegen, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_activations_dynamic_with_nested, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mark_outputs_dynamic_use_autograd_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mem_leak_from_save_for_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_module, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_multi_output_list, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutates_input_noncontiguous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutation_of_input_in_fw_and_bw, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_mutations_in_bw_detached_from_tangent, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_complicated_inps_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_homogenous, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nested_subclasses_non_nested_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_new_inp_requires_grad_now, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_no_grad_input_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_non_tensor_and_none_inputs, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_nonidempotent_amp, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_multi_output_view_should_raise_autograd_error, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_input_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_different_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_and_returned_flipped, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_and_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_inplace_view_with_detach, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multi_output_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_multiple_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_mutation_linear, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_no_grad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_returned_multiple_times, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_single, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_intermediate_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_multiple_inputs_get_correct_one, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_aliases_output_view_meta_replay, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_all_alias_types, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_dict, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_output_op_depending_on_symint, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_outputs_are_aliased, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_real_weights_in_symbolic_mode_with_inplace_ops, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_saved_tensors_hooks_mutations_raise, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_bad, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__and_data_mutation_good, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__not_allowed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_set__steals_view_chain, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_single_output, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_output_requires_grad_input_doesnt, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_non_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_some_outputs_dont_require_grad_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_squeeze_mutation, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_False, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclass_metadata_mutation_req_grad_True, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_subclasses_mixed_mode, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_synthetic_base_base_attribute_is_none, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_and_inplace_view, test/functorch/test_aotdispatch.py::TestAOTAutogradWithCache::test_view_detach 2025-12-04T13:45:49.9166240Z 2025-12-04T13:45:49.9166381Z Finished functorch/test_aotdispatch 1/1 ... [2025-12-04 13:45:49.900026][2259934.166695865], took 1.16min 2025-12-04T13:45:49.9166795Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:45:49.9167156Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:45:49.9167399Z Running test_fx 2/3 ... [2025-12-04 13:45:49.906245][2259934.17291853] 2025-12-04T13:45:49.9167564Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:45:49.9167929Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_fx.py', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:45:49.906449] 2025-12-04T13:51:57.2877959Z 2025-12-04T13:51:57.2878938Z test_fx 2/3 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_2.3_d9f7320966108a78_.log 2025-12-04T13:51:57.2934866Z Running 430 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_do_not_fold_impure_subgraph, test/test_fx.py::TestConstFold::test_fold_pure_subgraph, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_boxed_run_argument_validation, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2 2025-12-04T13:51:57.2985717Z 2025-12-04T13:51:57.2985814Z Finished test_fx 2/3 ... [2025-12-04 13:51:57.287917][2260301.554587087], took 6.12min 2025-12-04T13:51:57.2986186Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T13:51:57.2986548Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T13:51:57.2986763Z Running functorch/test_ops 2/5 ... [2025-12-04 13:51:57.294668][2260301.561340946] 2025-12-04T13:51:57.2986941Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T13:51:57.2987323Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=2', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:51:57.294845] 2025-12-04T14:02:57.2931899Z 2025-12-04T14:02:57.2934031Z functorch/test_ops 2/5 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_2.5_619fefe02476e57e_.log 2025-12-04T14:02:57.3218680Z Running 2097 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_cross_entropy_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_layer_norm_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_log_softmax_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atleast_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amax_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_ge_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_gt_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_lt_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_dsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_positive_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_real_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_special_grad_op_jvp_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unsqueeze_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unsqueeze_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_maximum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyTakeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___getitem___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_abs_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_allclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_any_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argwhere_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bernoulli_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_block_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cartesian_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_column_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_hermitian_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_multi_dot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorsolve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log1p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logcumsumexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumprod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_log_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_logaddexp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_minimum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nan_to_num_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ne_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_celu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_channel_shuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_similarity_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_trilinear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_linear_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_unshuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rrelu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_smooth_l1_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polar_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_4_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_neg_3_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsub_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_blackman_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_sampled_addmm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_airy_ai_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_h_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_zeta_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tensor_split_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_to_sparse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unbind_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vdot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hypot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpySortAutogradFunction_cuda_float32 2025-12-04T14:02:57.3482269Z 2025-12-04T14:02:57.3482392Z Finished functorch/test_ops 2/5 ... [2025-12-04 14:02:57.295128][2260961.561794654], took 11.00min 2025-12-04T14:02:57.3482817Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:02:57.3483178Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:02:57.3483440Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T14:02:57.3483629Z Uploading artifacts took 0.00 seconds 2025-12-04T14:02:57.3483798Z Running nn/test_pruning 1/1 ... [2025-12-04 14:02:57.302010][2260961.568683641] 2025-12-04T14:02:57.3483973Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:02:57.3484350Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'nn/test_pruning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:02:57.302188] 2025-12-04T14:02:59.5200224Z 2025-12-04T14:02:59.5201243Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_72a28eaa3a238960_.log 2025-12-04T14:02:59.5211324Z Running 34 items in this shard: test/nn/test_pruning.py::TestPruningNN::test_compute_nparams_to_prune, test/nn/test_pruning.py::TestPruningNN::test_custom_from_mask_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_identity_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning_with_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_multiple_pruning_calls, test/nn/test_pruning.py::TestPruningNN::test_prune, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores_mimic_default, test/nn/test_pruning.py::TestPruningNN::test_pruning_container, test/nn/test_pruning.py::TestPruningNN::test_pruning_container_compute_mask, test/nn/test_pruning.py::TestPruningNN::test_pruning_id_consistency, test/nn/test_pruning.py::TestPruningNN::test_pruning_rollback, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_model, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_state_dict, test/nn/test_pruning.py::TestPruningNN::test_random_pruning, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_0perc, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_new_weight, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_orig, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_pickle, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_sizes, test/nn/test_pruning.py::TestPruningNN::test_random_structured_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_exception, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_rnn_pruning, test/nn/test_pruning.py::TestPruningNN::test_unstructured_pruning_same_magnitude, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount_init 2025-12-04T14:02:59.5216574Z 2025-12-04T14:02:59.5216683Z Finished nn/test_pruning 1/1 ... [2025-12-04 14:02:59.519687][2260963.786357119], took 0.04min 2025-12-04T14:02:59.5217074Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:02:59.5268681Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:02:59.5269802Z Running optim/test_lrscheduler 1/1 ... [2025-12-04 14:02:59.526847][2260963.793520762] 2025-12-04T14:02:59.5270084Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:02:59.5272109Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'optim/test_lrscheduler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:02:59.527022] 2025-12-04T14:03:01.2699941Z 2025-12-04T14:03:01.2700837Z optim/test_lrscheduler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_lrscheduler_1.1_b434134f397e262c_.log 2025-12-04T14:03:01.2701970Z 2025-12-04T14:03:01.2702230Z Finished optim/test_lrscheduler 1/1 ... [2025-12-04 14:03:01.269590][2260965.536258465], took 0.03min 2025-12-04T14:03:01.2715723Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:01.2768018Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:01.2770008Z Running profiler/test_cpp_thread 1/1 ... [2025-12-04 14:03:01.276819][2260965.543492176] 2025-12-04T14:03:01.2770359Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:01.2771196Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_cpp_thread.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:01.276999] 2025-12-04T14:03:12.6091278Z 2025-12-04T14:03:12.6092409Z profiler/test_cpp_thread 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_cpp_thread_1.1_699245acb53d1acd_.log 2025-12-04T14:03:12.6095605Z Running 6 items in this shard: test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_profile_memory_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_with_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestCUDA::test_without_enable_profiler_in_child_thread_cuda, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_profile_memory_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_with_enable_profiler_in_child_thread_xpu, test/profiler/test_cpp_thread.py::CppThreadTestXPU::test_without_enable_profiler_in_child_thread_xpu 2025-12-04T14:03:12.6097909Z 2025-12-04T14:03:12.6098254Z Finished profiler/test_cpp_thread 1/1 ... [2025-12-04 14:03:12.608763][2260976.875433475], took 0.19min 2025-12-04T14:03:12.6104670Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:12.6156074Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:12.6158289Z Running profiler/test_execution_trace 1/1 ... [2025-12-04 14:03:12.615606][2260976.882279952] 2025-12-04T14:03:12.6158640Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:12.6159539Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_execution_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:12.615783] 2025-12-04T14:03:18.6524728Z 2025-12-04T14:03:18.6525928Z profiler/test_execution_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_execution_trace_1.1_b301df4e6ddbde13_.log 2025-12-04T14:03:18.6528263Z Running 13 items in this shard: test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_alone_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_disabled_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_enabled_with_kineto_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_env_enabled_with_pt2_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_nested_tensor_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_no_capture_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_record_integral_tensor_data_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_record_integral_tensor_range_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_repeat_in_loop_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_start_stop_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_with_kineto_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_execution_trace_with_pt2_cuda, test/profiler/test_execution_trace.py::TestExecutionTraceCUDA::test_triton_fx_graph_with_et_cuda 2025-12-04T14:03:18.6530436Z 2025-12-04T14:03:18.6530574Z Finished profiler/test_execution_trace 1/1 ... [2025-12-04 14:03:18.652152][2260982.918821659], took 0.10min 2025-12-04T14:03:18.6537846Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:18.6589445Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:18.6591218Z Running profiler/test_profiler_tree 1/1 ... [2025-12-04 14:03:18.659022][2260982.925695356] 2025-12-04T14:03:18.6591426Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:18.6593153Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_profiler_tree.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:18.659200] 2025-12-04T14:03:20.8771699Z 2025-12-04T14:03:20.8772428Z profiler/test_profiler_tree 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_profiler_tree_1.1_69511fc49df354e3_.log 2025-12-04T14:03:20.8774357Z Running 10 items in this shard: test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_detailed, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_cuda_with_stream, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_memory_and_stack, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_record_function, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_modules, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_dispatch, test/profiler/test_profiler_tree.py::TestProfilerTree::test_profiler_experimental_tree_with_stack_and_torch_function 2025-12-04T14:03:20.8775963Z 2025-12-04T14:03:20.8776088Z Finished profiler/test_profiler_tree 1/1 ... [2025-12-04 14:03:20.876792][2260985.143461366], took 0.04min 2025-12-04T14:03:20.8787171Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:20.8839274Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:20.8841055Z Running profiler/test_python_tracer 1/1 ... [2025-12-04 14:03:20.883892][2260985.15056529] 2025-12-04T14:03:20.8841510Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:20.8842604Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_python_tracer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:20.884074] 2025-12-04T14:03:27.9079644Z 2025-12-04T14:03:27.9080334Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_5dfbb6cf3ab13e88_.log 2025-12-04T14:03:27.9081103Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events 2025-12-04T14:03:27.9082057Z 2025-12-04T14:03:27.9082190Z Finished profiler/test_python_tracer 1/1 ... [2025-12-04 14:03:27.907697][2260992.174367813], took 0.12min 2025-12-04T14:03:27.9089588Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:27.9140131Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:27.9142666Z Running profiler/test_record_function 1/1 ... [2025-12-04 14:03:27.914194][2260992.180866396] 2025-12-04T14:03:27.9142866Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:27.9144870Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_record_function.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:27.914384] 2025-12-04T14:03:30.0817966Z 2025-12-04T14:03:30.0819063Z profiler/test_record_function 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_record_function_1.1_716360bc18df4f73_.log 2025-12-04T14:03:30.0821575Z Running 6 items in this shard: test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_delegation_with_profiler, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_datapipe_with_record_function_fork, test/profiler/test_record_function.py::TestRecordFunction::test_python_dispatch_mode_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_python_subclass_record_function, test/profiler/test_record_function.py::TestRecordFunction::test_record_function 2025-12-04T14:03:30.0823652Z 2025-12-04T14:03:30.0823957Z Finished profiler/test_record_function 1/1 ... [2025-12-04 14:03:30.081406][2260994.348077005], took 0.04min 2025-12-04T14:03:30.0827273Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:30.0877699Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:30.0879403Z Running profiler/test_torch_tidy 1/1 ... [2025-12-04 14:03:30.087768][2260994.35444148] 2025-12-04T14:03:30.0879699Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:30.0880859Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'profiler/test_torch_tidy.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:30.087943] 2025-12-04T14:03:35.1132051Z 2025-12-04T14:03:35.1132896Z profiler/test_torch_tidy 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_torch_tidy_1.1_944e994c23d6d39b_.log 2025-12-04T14:03:35.1137141Z Running 22 items in this shard: test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_id_uniqueness, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocation_ids_with_other_ops, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_allocations, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_extra_fields, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_impl_reuse, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_mkldnn_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_module_and_optimizer_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_nnmodule_params, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_adam, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_optimizer_parameters_sgd, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_pointers_and_ids, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_refcounts, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_scalar_ins, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_sparse_tensors, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_lists, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensor_properties, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_full, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_keep_alive, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_scalar_args, test/profiler/test_torch_tidy.py::TestTorchTidyProfiler::test_tensorimpl_invalidation_set 2025-12-04T14:03:35.1141123Z 2025-12-04T14:03:35.1141269Z Finished profiler/test_torch_tidy 1/1 ... [2025-12-04 14:03:35.112791][2260999.379462625], took 0.08min 2025-12-04T14:03:35.1141766Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:35.1192656Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:35.1194406Z Running test_accelerator 1/1 ... [2025-12-04 14:03:35.119272][2260999.385945048] 2025-12-04T14:03:35.1194620Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:35.1195727Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_accelerator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:35.119441] 2025-12-04T14:03:37.7878825Z 2025-12-04T14:03:37.7879800Z test_accelerator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_accelerator_1.1_7908829d504adda9_.log 2025-12-04T14:03:37.7883805Z Running 12 items in this shard: test/test_accelerator.py::TestAccelerator::test_current_accelerator, test/test_accelerator.py::TestAccelerator::test_current_stream_query, test/test_accelerator.py::TestAccelerator::test_device_context_manager, test/test_accelerator.py::TestAccelerator::test_generic_event_behavior, test/test_accelerator.py::TestAccelerator::test_generic_multi_device_behavior, test/test_accelerator.py::TestAccelerator::test_generic_stream_behavior, test/test_accelerator.py::TestAccelerator::test_get_memory_info, test/test_accelerator.py::TestAccelerator::test_memory_stats, test/test_accelerator.py::TestAccelerator::test_multi_device_context_manager, test/test_accelerator.py::TestAccelerator::test_multi_device_stream_context_manager, test/test_accelerator.py::TestAccelerator::test_pin_memory_on_non_blocking_copy, test/test_accelerator.py::TestAccelerator::test_stream_context_manager 2025-12-04T14:03:37.7886669Z 2025-12-04T14:03:37.7886923Z Finished test_accelerator 1/1 ... [2025-12-04 14:03:37.787478][2261002.054149394], took 0.04min 2025-12-04T14:03:37.7888413Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:37.7939755Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:37.7941719Z Running test_appending_byte_serializer 1/1 ... [2025-12-04 14:03:37.793981][2261002.060654346] 2025-12-04T14:03:37.7942189Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:37.7942854Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_appending_byte_serializer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:37.794158] 2025-12-04T14:03:39.9119507Z 2025-12-04T14:03:39.9120585Z test_appending_byte_serializer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_appending_byte_serializer_1.1_1a00ba86ef024dbb_.log 2025-12-04T14:03:39.9122785Z Running 3 items in this shard: test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_checksum, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_class, test/test_appending_byte_serializer.py::TestAppendingByteSerializer::test_write_and_read_int 2025-12-04T14:03:39.9124130Z 2025-12-04T14:03:39.9124451Z Finished test_appending_byte_serializer 1/1 ... [2025-12-04 14:03:39.911586][2261004.17825618], took 0.04min 2025-12-04T14:03:39.9135154Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:39.9185993Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:39.9187539Z Running test_as_strided 1/1 ... [2025-12-04 14:03:39.918579][2261004.185252465] 2025-12-04T14:03:39.9187868Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:39.9188948Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_as_strided.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:39.918756] 2025-12-04T14:03:42.0364730Z 2025-12-04T14:03:42.0365258Z test_as_strided 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_as_strided_1.1_cd7a06fcdebb0a6f_.log 2025-12-04T14:03:42.0366026Z Running 2 items in this shard: test/test_as_strided.py::TestAsStrided::test_size_10_exhaustive, test/test_as_strided.py::TestAsStrided::test_subset_property 2025-12-04T14:03:42.0366442Z 2025-12-04T14:03:42.0366632Z Finished test_as_strided 1/1 ... [2025-12-04 14:03:42.036154][2261006.302825079], took 0.04min 2025-12-04T14:03:42.0381163Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:42.0432910Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:42.0434669Z Running test_autocast 1/1 ... [2025-12-04 14:03:42.043282][2261006.309955032] 2025-12-04T14:03:42.0434963Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:42.0436805Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:42.043461] 2025-12-04T14:03:47.2264578Z 2025-12-04T14:03:47.2265629Z test_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autocast_1.1_0e1a2321d62e7b14_.log 2025-12-04T14:03:47.2270934Z Running 20 items in this shard: test/test_autocast.py::TestAutocastCPU::test_autocast_disabled_with_fp32_dtype, test/test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_16, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_rnn, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_16, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_need_autocast_promote, test/test_autocast.py::TestAutocastCPU::test_cpu_autocast_deprecated_warning, test/test_autocast.py::TestAutocastCPU::test_generic_autocast, test/test_autocast.py::TestAutocastGPU::test_autocast_prioritize, test/test_autocast.py::TestAutocastGPU::test_cache_disabled, test/test_autocast.py::TestAutocastGPU::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_bfloat16_supported, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_error_message, test/test_autocast.py::TestTorchAutocast::test_autocast_fast_dtype, test/test_autocast.py::TestTorchAutocast::test_invalid_device, test/test_autocast.py::TestTorchAutocast::test_non_string_device 2025-12-04T14:03:47.2273214Z 2025-12-04T14:03:47.2273410Z Finished test_autocast 1/1 ... [2025-12-04 14:03:47.226129][2261011.492797848], took 0.09min 2025-12-04T14:03:47.2278733Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:47.2330552Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:47.2331881Z Running test_bundled_inputs 1/1 ... [2025-12-04 14:03:47.233061][2261011.499734274] 2025-12-04T14:03:47.2332081Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:47.2334049Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_bundled_inputs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:47.233239] 2025-12-04T14:03:50.8599753Z 2025-12-04T14:03:50.8600812Z test_bundled_inputs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_bundled_inputs_1.1_145e9e28405155fc_.log 2025-12-04T14:03:50.8605320Z Running 12 items in this shard: test/test_bundled_inputs.py::TestBundledInputs::test_bad_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_dict_args, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_fail, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_non_mutator, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_success, test/test_bundled_inputs.py::TestBundledInputs::test_large_tensor_with_inflation, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_both_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_neither_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_non_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_rejected_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_single_tensors 2025-12-04T14:03:50.8608973Z 2025-12-04T14:03:50.8609308Z Finished test_bundled_inputs 1/1 ... [2025-12-04 14:03:50.859602][2261015.126271129], took 0.06min 2025-12-04T14:03:50.8612617Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:50.8664121Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:50.8666020Z Running test_comparison_utils 1/1 ... [2025-12-04 14:03:50.866412][2261015.133085437] 2025-12-04T14:03:50.8666226Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:50.8666800Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_comparison_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:50.866592] 2025-12-04T14:03:52.9840706Z 2025-12-04T14:03:52.9841640Z test_comparison_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_comparison_utils_1.1_c1ec35cb1e929f71_.log 2025-12-04T14:03:52.9844302Z Running 7 items in this shard: test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert, test/test_comparison_utils.py::TestComparisonUtils::test_all_equal_no_assert_nones, test/test_comparison_utils.py::TestComparisonUtils::test_assert_device, test/test_comparison_utils.py::TestComparisonUtils::test_assert_dtype, test/test_comparison_utils.py::TestComparisonUtils::test_assert_layout, test/test_comparison_utils.py::TestComparisonUtils::test_assert_sizes, test/test_comparison_utils.py::TestComparisonUtils::test_assert_strides 2025-12-04T14:03:52.9845876Z 2025-12-04T14:03:52.9846105Z Finished test_comparison_utils 1/1 ... [2025-12-04 14:03:52.983701][2261017.250372097], took 0.04min 2025-12-04T14:03:52.9856391Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:52.9907934Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:52.9909944Z Running test_compile_benchmark_util 1/1 ... [2025-12-04 14:03:52.990825][2261017.257498741] 2025-12-04T14:03:52.9910304Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:52.9911005Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_compile_benchmark_util.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:52.990992] 2025-12-04T14:03:58.2630101Z 2025-12-04T14:03:58.2631218Z test_compile_benchmark_util 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_compile_benchmark_util_1.1_322ce39dde1759cd_.log 2025-12-04T14:03:58.2632507Z Running 1 items in this shard: test/test_compile_benchmark_util.py::TestCompileBenchmarkUtil::test_training_and_inference 2025-12-04T14:03:58.2633035Z 2025-12-04T14:03:58.2633659Z Finished test_compile_benchmark_util 1/1 ... [2025-12-04 14:03:58.262675][2261022.529346167], took 0.09min 2025-12-04T14:03:58.2643646Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:03:58.2693300Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:03:58.2694583Z Running test_complex 1/1 ... [2025-12-04 14:03:58.269290][2261022.535963438] 2025-12-04T14:03:58.2695933Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:03:58.2697001Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_complex.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:03:58.269465] 2025-12-04T14:04:00.8876723Z 2025-12-04T14:04:00.8877637Z test_complex 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_complex_1.1_8ad2fdb4404fa88b_.log 2025-12-04T14:04:00.8882253Z Running 15 items in this shard: test/test_complex.py::TestComplexTensorCUDA::test_all_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_all_cuda_complex64, test/test_complex.py::TestComplexTensorCUDA::test_any_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_any_cuda_complex64, test/test_complex.py::TestComplexTensorCUDA::test_conj_copy_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_conj_copy_cuda_complex64, test/test_complex.py::TestComplexTensorCUDA::test_dtype_inference_cuda_float16, test/test_complex.py::TestComplexTensorCUDA::test_dtype_inference_cuda_float32, test/test_complex.py::TestComplexTensorCUDA::test_dtype_inference_cuda_float64, test/test_complex.py::TestComplexTensorCUDA::test_eq_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_eq_cuda_complex64, test/test_complex.py::TestComplexTensorCUDA::test_ne_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_ne_cuda_complex64, test/test_complex.py::TestComplexTensorCUDA::test_to_list_cuda_complex128, test/test_complex.py::TestComplexTensorCUDA::test_to_list_cuda_complex64 2025-12-04T14:04:00.8886174Z 2025-12-04T14:04:00.8886412Z Finished test_complex 1/1 ... [2025-12-04 14:04:00.887290][2261025.15395961], took 0.04min 2025-12-04T14:04:00.8893729Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:04:00.8945214Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:04:00.8947033Z Running test_cpp_api_parity 1/1 ... [2025-12-04 14:04:00.894498][2261025.161172082] 2025-12-04T14:04:00.8947517Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:04:00.8948083Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_cpp_api_parity.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:04:00.894672] 2025-12-04T14:04:47.9882782Z 2025-12-04T14:04:47.9883640Z test_cpp_api_parity 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_api_parity_1.1_96acf22107ebf4af_.log 2025-12-04T14:04:47.9947275Z Running 488 items in this shard: test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCELoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_BCEWithLogitsLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad1size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad2size1_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv1d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_padded_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_depthwise_with_multiplier_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_groups_thnn_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_reflect_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv2d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_1x1x1_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_circular_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_dilated_strided_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_same_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_pad_valid_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_replicate_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_stride_padding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zero_batch_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Conv3d_zeros_stride2_pad2_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose1d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_groups_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose2d_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ConvTranspose3d_dilated_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CosineEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_CrossMapLRN2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_max_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_mean_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_EmbeddingBag_sum_padding_idx_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_discontiguous_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Embedding_sparse_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Flatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Fold_no_batch_dim_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_HingeEmbeddingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_LayerNorm_3d_no_affine_large_feature_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Linear_no_bias_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MarginRankingLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_MultiLabelSoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_NLLLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_lhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_broadcast_rhs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PairwiseDistance_with_non_default_args_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelShuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_PixelUnshuffle_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_RReLU_with_up_down_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_ReplicationPad3d_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SampleModule_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_SoftMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerDecoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_gelu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TransformerEncoderLayer_relu_activation_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Transformer_multilayer_coder_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_mean_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_none_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_TripletMarginLoss_no_batch_dim_sum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unflatten_no_batch_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_Unfold_int_input_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCELoss_weights_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_legacy_enum_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_BCEWithLogitsLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HingeEmbeddingLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_HuberLoss_delta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_no_reduce_scalar_log_target_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_log_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_KLDivLoss_with_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_complex_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_L1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MSELoss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_0d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiLabelSoftMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_1d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_margin_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_p_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_MultiMarginLoss_weights_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss2d_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLossNd_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_NLLLoss_no_reduce_weights_ignore_index_neg_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_PoissonNLLLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_no_reduce_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SmoothL1Loss_zero_beta_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_SoftMarginLoss_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bicubic_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_shared_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_scale_tuple_skewed_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_bilinear_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_linear_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_1d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_launch_configs_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_2d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_1d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_2d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_nearest_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_3d_zero_dim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_scale_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_align_corners_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_interpolate_trilinear_tuple_3d_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_log_softmax_spatial_special_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_multimarginloss_1d_input_0d_target_no_reduce_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_has_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_sample_functional_no_parity_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim0_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_dim3_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_functional_scalar_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_lastdim_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_dtype_cuda, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special, test/test_cpp_api_parity.py::TestCppApiParity::test_torch_nn_functional_softmax_spatial_special_cuda 2025-12-04T14:04:48.0006323Z 2025-12-04T14:04:48.0006435Z Finished test_cpp_api_parity 1/1 ... [2025-12-04 14:04:47.988493][2261072.255163447], took 0.78min 2025-12-04T14:04:48.0006820Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:04:48.0007172Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:04:48.0007411Z Running test_fx_passes 1/1 ... [2025-12-04 14:04:47.995268][2261072.261940246] 2025-12-04T14:04:48.0007582Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:04:48.0007947Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_fx_passes.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:04:47.995449] 2025-12-04T14:04:50.3130917Z 2025-12-04T14:04:50.3131798Z test_fx_passes 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_passes_1.1_17d632380da9555c_.log 2025-12-04T14:04:50.3143733Z Running 53 items in this shard: test/test_fx_passes.py::TestFXGraphPasses::test_fuser_pass_deep_model, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition0, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition1, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition10, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition11, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition2, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition3, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition4, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition5, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition6, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition7, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition8, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_partition9, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition0, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition1, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition2, test/test_fx_passes.py::TestFXGraphPasses::test_fuser_util_xfail_partition3, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn0_expected_partition0_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn10_expected_partition10_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn11_expected_partition11_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn12_expected_partition12_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn13_expected_partition13_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn14_expected_partition14_bookend_non_compute_pass_True, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn15_expected_partition15_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn16_expected_partition16_bookend_non_compute_pass_True, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn17_expected_partition17_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn18_expected_partition18_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn1_expected_partition1_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn2_expected_partition2_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn3_expected_partition3_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn4_expected_partition4_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn5_expected_partition5_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn6_expected_partition6_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn7_expected_partition7_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn8_expected_partition8_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_fn9_expected_partition9_bookend_non_compute_pass_False, test/test_fx_passes.py::TestFXGraphPasses::test_partitioner_independent_output_fn0_expected_partition0, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model0, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model1, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model10, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model11, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model12, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model13, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model14, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model15, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model2, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model3, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model4, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model5, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model6, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model7, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model8, test/test_fx_passes.py::TestFXMatcherUtils::test_subgraph_matcher_test_model9 2025-12-04T14:04:50.3152968Z 2025-12-04T14:04:50.3153095Z Finished test_fx_passes 1/1 ... [2025-12-04 14:04:50.312846][2261074.579517254], took 0.04min 2025-12-04T14:04:50.3153614Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:04:50.3194413Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:04:50.3195721Z Running test_fx_reinplace_pass 1/1 ... [2025-12-04 14:04:50.319471][2261074.586144065] 2025-12-04T14:04:50.3195916Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:04:50.3197847Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_fx_reinplace_pass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:04:50.319649] 2025-12-04T14:04:52.9873705Z 2025-12-04T14:04:52.9874622Z test_fx_reinplace_pass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_reinplace_pass_1.1_da952ca597bf1f40_.log 2025-12-04T14:04:52.9878152Z Running 12 items in this shard: test/test_fx_reinplace_pass.py::TestReinplacePass::test_out_node_updated, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_basic, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_different_metadata, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_index_mutation, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_overlapping_memory, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_op, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_invalid2, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_scatter_twice_with_different_view_op_valid, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_sym_input, test/test_fx_reinplace_pass.py::TestReinplacePass::test_reinplace_with_view 2025-12-04T14:04:52.9881114Z 2025-12-04T14:04:52.9881322Z Finished test_fx_reinplace_pass 1/1 ... [2025-12-04 14:04:52.987045][2261077.253716764], took 0.04min 2025-12-04T14:04:52.9884905Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:04:52.9936381Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:04:52.9937510Z Running test_hop_infra 1/1 ... [2025-12-04 14:04:52.993609][2261077.260282046] 2025-12-04T14:04:52.9937763Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:04:52.9939419Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_hop_infra.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:04:52.993784] 2025-12-04T14:04:55.3611186Z 2025-12-04T14:04:55.3612787Z test_hop_infra 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hop_infra_1.1_45133bf8a3f426fa_.log 2025-12-04T14:04:55.3613942Z Running 3 items in this shard: test/test_hop_infra.py::TestHOPInfra::test_all_hops_are_imported, test/test_hop_infra.py::TestHOPInfra::test_all_hops_have_opinfo, test/test_hop_infra.py::TestHOPInfra::test_imports_from_all_work 2025-12-04T14:04:55.3614575Z 2025-12-04T14:04:55.3614789Z Finished test_hop_infra 1/1 ... [2025-12-04 14:04:55.360787][2261079.627458533], took 0.04min 2025-12-04T14:04:55.3622479Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:04:55.3673370Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:04:55.3675097Z Running test_hub 1/1 ... [2025-12-04 14:04:55.367353][2261079.634026344] 2025-12-04T14:04:55.3675392Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:04:55.3676326Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_hub.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:04:55.367528] 2025-12-04T14:05:19.5704510Z 2025-12-04T14:05:19.5705349Z test_hub 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_hub_1.1_b3b51169f4b06704_.log 2025-12-04T14:05:19.5707574Z Running 20 items in this shard: test/test_hub.py::TestHub::test_download_url_to_file, test/test_hub.py::TestHub::test_get_set_dir, test/test_hub.py::TestHub::test_hub_parse_repo_info, test/test_hub.py::TestHub::test_list_entrypoints, test/test_hub.py::TestHub::test_load_commit_from_forked_repo, test/test_hub.py::TestHub::test_load_from_branch, test/test_hub.py::TestHub::test_load_from_github, test/test_hub.py::TestHub::test_load_from_local_dir, test/test_hub.py::TestHub::test_load_legacy_zip_checkpoint, test/test_hub.py::TestHub::test_load_state_dict_from_url, test/test_hub.py::TestHub::test_load_zip_1_6_checkpoint, test/test_hub.py::TestHub::test_trust_repo_builtin_trusted_owners, test/test_hub.py::TestHub::test_trust_repo_check_no, test/test_hub.py::TestHub::test_trust_repo_check_yes, test/test_hub.py::TestHub::test_trust_repo_false_emptystring, test/test_hub.py::TestHub::test_trust_repo_false_no, test/test_hub.py::TestHub::test_trust_repo_legacy, test/test_hub.py::TestHub::test_trust_repo_none, test/test_hub.py::TestHub::test_trust_repo_true, test/test_hub.py::TestHub::test_trusted_repo_false_yes 2025-12-04T14:05:19.5709599Z 2025-12-04T14:05:19.5709724Z Finished test_hub 1/1 ... [2025-12-04 14:05:19.570079][2261103.836748782], took 0.40min 2025-12-04T14:05:19.5717190Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:05:19.5768538Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:05:19.5769776Z Running test_jit_autocast 1/1 ... [2025-12-04 14:05:19.576749][2261103.843422462] 2025-12-04T14:05:19.5770105Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:05:19.5771799Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_jit_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:05:19.576945] 2025-12-04T14:05:33.6072752Z 2025-12-04T14:05:33.6073996Z test_jit_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_autocast_1.1_10f15fab724628a0_.log 2025-12-04T14:05:33.6081341Z Running 54 items in this shard: test/test_jit_autocast.py::TestAutocast::test_autocast_api, test/test_jit_autocast.py::TestAutocast::test_autocast_api_not_supported, test/test_jit_autocast.py::TestAutocast::test_autocast_autodiff, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator, test/test_jit_autocast.py::TestAutocast::test_autocast_decorator_outside_jit, test/test_jit_autocast.py::TestAutocast::test_autocast_mixed_dtypes, test/test_jit_autocast.py::TestAutocast::test_callees, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_off, test/test_jit_autocast.py::TestAutocast::test_callees_with_autocast_on, test/test_jit_autocast.py::TestAutocast::test_conditional_autocast, test/test_jit_autocast.py::TestAutocast::test_control_flow, test/test_jit_autocast.py::TestAutocast::test_divergent_autocast, test/test_jit_autocast.py::TestAutocast::test_divergent_types, test/test_jit_autocast.py::TestAutocast::test_duplicate_inputs, test/test_jit_autocast.py::TestAutocast::test_eager_and_script, test/test_jit_autocast.py::TestAutocast::test_explicit_casts, test/test_jit_autocast.py::TestAutocast::test_fp32_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_policy_with_fp64, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy, test/test_jit_autocast.py::TestAutocast::test_fp32_set_opt_dtype_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_ignore_amp, test/test_jit_autocast.py::TestAutocast::test_implicitly_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_inplace, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_cpu, test/test_jit_autocast.py::TestAutocast::test_jit_autocast_softmax_gpu, test/test_jit_autocast.py::TestAutocast::test_jit_call_method_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_executor_under_autocast, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_basic, test/test_jit_autocast.py::TestAutocast::test_jit_freeze_autocast_constants, test/test_jit_autocast.py::TestAutocast::test_jit_generic_autocast, test/test_jit_autocast.py::TestAutocast::test_linear_bf16, test/test_jit_autocast.py::TestAutocast::test_minimal, test/test_jit_autocast.py::TestAutocast::test_minimal_cpu, test/test_jit_autocast.py::TestAutocast::test_minimal_off, test/test_jit_autocast.py::TestAutocast::test_nested_autocast, test/test_jit_autocast.py::TestAutocast::test_promote_policy, test/test_jit_autocast.py::TestAutocast::test_promote_policy_fp64, test/test_jit_autocast.py::TestAutocast::test_reused_autocast, test/test_jit_autocast.py::TestAutocast::test_reused_autocast_expr, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state, test/test_jit_autocast.py::TestAutocast::test_runtime_autocast_state_expr, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing, test/test_jit_autocast.py::TestAutocast::test_script_and_tracing_with_autocast, test/test_jit_autocast.py::TestAutocast::test_script_module, test/test_jit_autocast.py::TestAutocast::test_tracing_and_script, test/test_jit_autocast.py::TestAutocast::test_tracing_with_autocast_and_script, test/test_jit_autocast.py::TestJitTraceAutocast::test_cat_promote, test/test_jit_autocast.py::TestJitTraceAutocast::test_generate_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nchw_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_nhwc_autocast_jit_trace_model, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cpu, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_cuda, test/test_jit_autocast.py::TestJitTraceAutocast::test_script_autocast_enable_and_check, test/test_jit_autocast.py::TestJitTraceAutocast::test_scripted_aliasing 2025-12-04T14:05:33.6086690Z 2025-12-04T14:05:33.6086798Z Finished test_jit_autocast 1/1 ... [2025-12-04 14:05:33.607003][2261117.873671858], took 0.23min 2025-12-04T14:05:33.6089787Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:05:33.6138490Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:05:33.6140444Z Running test_jit_disabled 1/1 ... [2025-12-04 14:05:33.613947][2261117.880620744] 2025-12-04T14:05:33.6140628Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:05:33.6142405Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_jit_disabled.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:05:33.614138] 2025-12-04T14:05:35.6819548Z 2025-12-04T14:05:35.6820736Z test_jit_disabled 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_disabled_1.1_c9282843da034b2d_.log 2025-12-04T14:05:35.6822250Z Running 3 items in this shard: test/test_jit_disabled.py::TestJitDisabled::test_attribute, test/test_jit_disabled.py::TestJitDisabled::test_recursive_script, test/test_jit_disabled.py::TestJitDisabled::test_script_module_construction 2025-12-04T14:05:35.6824262Z 2025-12-04T14:05:35.6824562Z Finished test_jit_disabled 1/1 ... [2025-12-04 14:05:35.681698][2261119.948368039], took 0.03min 2025-12-04T14:05:35.6837158Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:05:35.6888667Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:05:35.6889225Z Running test_jit_fuser_te 1/1 ... [2025-12-04 14:05:35.688791][2261119.955464302] 2025-12-04T14:05:35.6889584Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:05:35.6891592Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_jit_fuser_te.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:05:35.688982] 2025-12-04T14:13:14.9917492Z 2025-12-04T14:13:14.9918356Z test_jit_fuser_te 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_jit_fuser_te_1.1_0df08b5c042e3222_.log 2025-12-04T14:13:15.0757469Z Running 6825 items in this shard: test/test_jit_fuser_te.py::TestFuserCommon::test_autodiff_fallback, test/test_jit_fuser_te.py::TestTEFuserStatic::test_abs, test/test_jit_fuser_te.py::TestTEFuserStatic::test_adaptive_avg_pool2d, test/test_jit_fuser_te.py::TestTEFuserStatic::test_add_bool, test/test_jit_fuser_te.py::TestTEFuserStatic::test_addcmul, test/test_jit_fuser_te.py::TestTEFuserStatic::test_arg_configurations_smoke, test/test_jit_fuser_te.py::TestTEFuserStatic::test_autocast_down, test/test_jit_fuser_te.py::TestTEFuserStatic::test_autocast_up, test/test_jit_fuser_te.py::TestTEFuserStatic::test_batch_norm, test/test_jit_fuser_te.py::TestTEFuserStatic::test_binary_div_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_binary_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_binary_pow, test/test_jit_fuser_te.py::TestTEFuserStatic::test_binary_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_binary_tensor_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_bitwise_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_broadcast, test/test_jit_fuser_te.py::TestTEFuserStatic::test_cat_2k_args, test/test_jit_fuser_te.py::TestTEFuserStatic::test_cat_graph_opt, test/test_jit_fuser_te.py::TestTEFuserStatic::test_channels_last_dims_dynamic, test/test_jit_fuser_te.py::TestTEFuserStatic::test_checks_cat_inputs, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk_correctness, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk_distributes, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk_motion_deduplicates_inputs, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk_mul_one, test/test_jit_fuser_te.py::TestTEFuserStatic::test_chunk_multiple, test/test_jit_fuser_te.py::TestTEFuserStatic::test_clamp, test/test_jit_fuser_te.py::TestTEFuserStatic::test_clamp_double, test/test_jit_fuser_te.py::TestTEFuserStatic::test_clamp_int, test/test_jit_fuser_te.py::TestTEFuserStatic::test_comparison_eq_ne, test/test_jit_fuser_te.py::TestTEFuserStatic::test_comparison_ge_le, test/test_jit_fuser_te.py::TestTEFuserStatic::test_comparison_gt_lt, test/test_jit_fuser_te.py::TestTEFuserStatic::test_concat, test/test_jit_fuser_te.py::TestTEFuserStatic::test_concat_invariant, test/test_jit_fuser_te.py::TestTEFuserStatic::test_constant_chunk_shapes, test/test_jit_fuser_te.py::TestTEFuserStatic::test_conv2d, test/test_jit_fuser_te.py::TestTEFuserStatic::test_conv2d_depthwise, test/test_jit_fuser_te.py::TestTEFuserStatic::test_cuda_half, test/test_jit_fuser_te.py::TestTEFuserStatic::test_dims, test/test_jit_fuser_te.py::TestTEFuserStatic::test_disabled, test/test_jit_fuser_te.py::TestTEFuserStatic::test_div_bool, test/test_jit_fuser_te.py::TestTEFuserStatic::test_dynamic_cat, test/test_jit_fuser_te.py::TestTEFuserStatic::test_dynamic_shapes, test/test_jit_fuser_te.py::TestTEFuserStatic::test_eq_unsqueeze_type_as, test/test_jit_fuser_te.py::TestTEFuserStatic::test_erf, test/test_jit_fuser_te.py::TestTEFuserStatic::test_exhaust_specializations, test/test_jit_fuser_te.py::TestTEFuserStatic::test_exp, test/test_jit_fuser_te.py::TestTEFuserStatic::test_fusion_reuse_multi_gpu, test/test_jit_fuser_te.py::TestTEFuserStatic::test_gelu, test/test_jit_fuser_te.py::TestTEFuserStatic::test_hardsigmoid_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserStatic::test_hardswish_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserStatic::test_inlined_optimized_graph, test/test_jit_fuser_te.py::TestTEFuserStatic::test_isnan, test/test_jit_fuser_te.py::TestTEFuserStatic::test_kernel_cache_multi_gpu, test/test_jit_fuser_te.py::TestTEFuserStatic::test_lerp, test/test_jit_fuser_te.py::TestTEFuserStatic::test_list_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_lstm, test/test_jit_fuser_te.py::TestTEFuserStatic::test_lstm_concat, test/test_jit_fuser_te.py::TestTEFuserStatic::test_lstm_gates_permutations, test/test_jit_fuser_te.py::TestTEFuserStatic::test_lstm_traced, test/test_jit_fuser_te.py::TestTEFuserStatic::test_masked_fill, test/test_jit_fuser_te.py::TestTEFuserStatic::test_matmul, test/test_jit_fuser_te.py::TestTEFuserStatic::test_milstm, test/test_jit_fuser_te.py::TestTEFuserStatic::test_minmax, test/test_jit_fuser_te.py::TestTEFuserStatic::test_minmax_int_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_mul_bool, test/test_jit_fuser_te.py::TestTEFuserStatic::test_neg_pow, test/test_jit_fuser_te.py::TestTEFuserStatic::test_nonzero_device_cuda, test/test_jit_fuser_te.py::TestTEFuserStatic::test_nop, test/test_jit_fuser_te.py::TestTEFuserStatic::test_pow_multiple_dtype, test/test_jit_fuser_te.py::TestTEFuserStatic::test_profiler, test/test_jit_fuser_te.py::TestTEFuserStatic::test_rand_broadcast_cuda, test/test_jit_fuser_te.py::TestTEFuserStatic::test_rand_cuda, test/test_jit_fuser_te.py::TestTEFuserStatic::test_rand_diamond, test/test_jit_fuser_te.py::TestTEFuserStatic::test_relu, test/test_jit_fuser_te.py::TestTEFuserStatic::test_relu_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserStatic::test_remove_output_used_only_in_size, test/test_jit_fuser_te.py::TestTEFuserStatic::test_scalar, test/test_jit_fuser_te.py::TestTEFuserStatic::test_scalar_arg, test/test_jit_fuser_te.py::TestTEFuserStatic::test_scalar_only_inputs, test/test_jit_fuser_te.py::TestTEFuserStatic::test_skip_grad_in_check, test/test_jit_fuser_te.py::TestTEFuserStatic::test_small_constant, test/test_jit_fuser_te.py::TestTEFuserStatic::test_sub_gt_and, test/test_jit_fuser_te.py::TestTEFuserStatic::test_sum_dim, test/test_jit_fuser_te.py::TestTEFuserStatic::test_sum_keepdim_cast, test/test_jit_fuser_te.py::TestTEFuserStatic::test_sum_simple, test/test_jit_fuser_te.py::TestTEFuserStatic::test_superslomo, test/test_jit_fuser_te.py::TestTEFuserStatic::test_tensor_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_ternary_norm_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_ternary_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_threshold, test/test_jit_fuser_te.py::TestTEFuserStatic::test_to_device, test/test_jit_fuser_te.py::TestTEFuserStatic::test_to_dtype, test/test_jit_fuser_te.py::TestTEFuserStatic::test_torch_to, test/test_jit_fuser_te.py::TestTEFuserStatic::test_type_as_cat, test/test_jit_fuser_te.py::TestTEFuserStatic::test_typecheck, test/test_jit_fuser_te.py::TestTEFuserStatic::test_unary_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_unrolled_cat, test/test_jit_fuser_te.py::TestTEFuserStatic::test_unsqueeze_size_calculation, test/test_jit_fuser_te.py::TestTEFuserStatic::test_unsqueeze_var_dim, test/test_jit_fuser_te.py::TestTEFuserStatic::test_unsupported_dtypes, test/test_jit_fuser_te.py::TestTEFuserStatic::test_where_and_typing, test/test_jit_fuser_te.py::TestTEFuserStatic::test_where_ops, test/test_jit_fuser_te.py::TestTEFuserStatic::test_with_strict_fusion, test/test_jit_fuser_te.py::TestTEFuserStatic::test_zero_element_tensors, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_abs, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_adaptive_avg_pool2d, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_add_bool, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_addcmul, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_arg_configurations_smoke, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_autocast_down, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_autocast_up, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_batch_norm, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_binary_div_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_binary_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_binary_pow, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_binary_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_binary_tensor_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_bitwise_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_broadcast, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_cat_2k_args, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_cat_graph_opt, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_channels_last_dims_dynamic, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_checks_cat_inputs, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk_correctness, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk_distributes, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk_motion_deduplicates_inputs, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk_mul_one, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_chunk_multiple, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_clamp, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_clamp_double, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_clamp_int, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_comparison_eq_ne, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_comparison_ge_le, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_comparison_gt_lt, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_concat, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_concat_invariant, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_constant_chunk_shapes, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_conv2d, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_conv2d_depthwise, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_cuda_half, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_dims, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_disabled, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_div_bool, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_dynamic_cat, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_dynamic_shapes, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_eq_unsqueeze_type_as, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_erf, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_exhaust_specializations, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_exp, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_fusion_reuse_multi_gpu, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_gelu, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_hardsigmoid_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_hardswish_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_inlined_optimized_graph, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_isnan, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_kernel_cache_multi_gpu, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_lerp, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_list_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_lstm, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_lstm_concat, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_lstm_gates_permutations, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_lstm_traced, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_masked_fill, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_matmul, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_milstm, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_minmax, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_minmax_int_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_mul_bool, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_neg_pow, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_nonzero_device_cuda, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_nop, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_pow_multiple_dtype, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_profiler, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_rand_broadcast_cuda, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_rand_cuda, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_rand_diamond, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_relu, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_relu_fwd_bwd, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_remove_output_used_only_in_size, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_scalar, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_scalar_arg, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_scalar_only_inputs, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_skip_grad_in_check, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_small_constant, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_sub_gt_and, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_sum_dim, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_sum_keepdim_cast, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_sum_simple, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_superslomo, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_tensor_scalar_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_ternary_norm_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_ternary_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_threshold, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_to_device, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_to_dtype, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_torch_to, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_type_as_cat, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_typecheck, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_unary_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_unrolled_cat, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_unsqueeze_size_calculation, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_unsqueeze_var_dim, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_unsupported_dtypes, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_where_and_typing, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_where_ops, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_with_strict_fusion, test/test_jit_fuser_te.py::TestTEFuserDynamic::test_zero_element_tensors, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_failures___rmatmul___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_failures_frac_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_failures_matmul_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_H_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_T_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___getitem___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___radd___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rand___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rdiv___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmod___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rmul___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___ror___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rpow___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rsub___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness___rxor___cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__batch_norm_with_update_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__batch_norm_with_update_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__batch_norm_with_update_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__batch_norm_with_update_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__chunk_cat_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__native_batch_norm_legit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__native_batch_norm_legit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__native_batch_norm_legit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__native_batch_norm_legit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_lengths_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_lengths_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_lengths_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_lengths_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_offsets_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_offsets_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_offsets_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__segment_reduce_offsets_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__softmax_backward_data_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__softmax_backward_data_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__softmax_backward_data_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__softmax_backward_data_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__upsample_bilinear2d_aa_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__upsample_bilinear2d_aa_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__upsample_bilinear2d_aa_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness__upsample_bilinear2d_aa_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_abs_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acos_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_acosh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_add_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addbmm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcdiv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addcmul_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmm_decomposed_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addmv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_addr_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_alias_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_all_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_allclose_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_amin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_aminmax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_angle_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_any_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_arange_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argmin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argsort_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_argwhere_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_partial_views_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_as_strided_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_asinh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atan_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atanh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_1d_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_2d_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_atleast_3d_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_baddbmm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bernoulli_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bernoulli_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bernoulli_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bernoulli_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bfloat16_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bincount_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bincount_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bincount_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bincount_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bincount_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_and_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_left_shift_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_left_shift_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_left_shift_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_left_shift_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_left_shift_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_not_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_or_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_right_shift_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_right_shift_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_right_shift_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_right_shift_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_right_shift_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bitwise_xor_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_block_diag_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bmm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bool_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_shapes_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_tensors_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_broadcast_to_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_bucketize_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_byte_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cartesian_prod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cat_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cauchy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cauchy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cauchy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cauchy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdist_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cdouble_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ceil_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cfloat_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chalf_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_char_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_inverse_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_inverse_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_inverse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_inverse_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cholesky_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_chunk_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_max_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clamp_min_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_clone_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_column_stack_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_combinations_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_complex_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_complex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_complex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_conj_physical_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_constant_pad_nd_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_contiguous_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_copysign_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_corrcoef_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cos_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cosh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_count_nonzero_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cov_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cross_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cummin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumprod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumsum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_cumulative_trapezoid_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_deg2rad_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diag_embed_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagflat_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diagonal_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_diff_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_digamma_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dist_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_floor_rounding_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_no_rounding_mode_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_div_trunc_rounding_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_double_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dsplit_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_dstack_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_einsum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_like_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_permuted_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_empty_strided_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eq_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_equal_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erf_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfc_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_erfinv_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exp_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_as_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expand_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_expm1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exponential_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exponential_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exponential_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_exponential_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float8_e4m3fn, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float8_e4m3fnuz, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float8_e5m2, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_float8_e5m2fnuz, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_eye_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_fftshift_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_hfftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ifftshift_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_ihfftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_irfftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfft_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fft_rfftn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fill_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flatten_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flip_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fliplr_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_flipud_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_float_power_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_floor_divide_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_fmod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_frexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_frexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_frexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_frexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_uint16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_uint32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_full_like_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gather_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gcd_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gcd_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gcd_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gcd_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gcd_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ge_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geometric_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geqrf_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geqrf_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geqrf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_geqrf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gradient_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_grid_sampler_3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_gt_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_half_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hash_tensor_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_heaviside_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_histc_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hsplit_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hstack_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hypot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hypot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hypot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_hypot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_i0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_igamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_igamma_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_igammac_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_igammac_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_imag_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_imag_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_imag_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_add_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_fill_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_put_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_amin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_mean_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_reduce_prod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_index_select_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_inner_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_int_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isclose_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isfinite_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isinf_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isnan_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isneginf_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isposinf_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_isreal_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_istft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_istft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_item_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_2inputs_2outputs_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_binary_return_by_ref_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_jiterator_unary_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kron_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_kthvalue_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lcm_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lcm_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lcm_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lcm_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lcm_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ldexp_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_le_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lerp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lgamma_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_ex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_ex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cholesky_ex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cond_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cond_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cond_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cond_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_cross_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_det_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_det_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_det_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_det_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_diagonal_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eig_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eig_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eig_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eig_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvals_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvals_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvals_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvals_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvalsh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvalsh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvalsh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_eigvalsh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_householder_product_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_householder_product_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_householder_product_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_householder_product_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_ex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_ex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_inv_ex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_ex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_ex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_factor_ex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_ldl_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_grad_oriented_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_grad_oriented_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_grad_oriented_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lstsq_grad_oriented_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_ex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_ex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_factor_ex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_lu_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_power_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_power_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_power_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_power_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_hermitian_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_hermitian_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_hermitian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_matrix_rank_hermitian_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_multi_dot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_norm_subgradients_at_zero_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_hermitian_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_hermitian_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_hermitian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_hermitian_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_singular_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_singular_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_singular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_pinv_singular_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_qr_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_qr_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_qr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_qr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_slogdet_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_slogdet_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_slogdet_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_slogdet_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_ex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_ex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_ex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_triangular_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_triangular_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_triangular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_solve_triangular_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svd_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svd_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svd_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svdvals_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svdvals_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svdvals_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_svdvals_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorinv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorinv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorinv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorsolve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorsolve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorsolve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_tensorsolve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vander_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vecdot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linalg_vector_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_linspace_tensor_overload_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log10_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log1p_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_normal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_normal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_normal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_normal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_log_softmax_with_dtype_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp2_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logaddexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logcumsumexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logdet_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logdet_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logdet_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logdet_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_and_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_not_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_or_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logical_xor_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logit_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logspace_tensor_overload_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_logsumexp_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_long_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lt_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_unpack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_unpack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_unpack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_lu_unpack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mH_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mT_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_amin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_argmin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumprod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_cumsum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_fill_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_log_softmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_log_softmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_log_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_log_softmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logaddexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logaddexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logaddexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logaddexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_logsumexp_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_median_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_median_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_median_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_median_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_normalize_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_prod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_select_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_softmin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_std_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_sum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_masked_var_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_matrix_exp_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_binary_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_pool2d_with_indices_backward_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_pool2d_with_indices_backward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_pool2d_with_indices_backward_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_no_dim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_max_reduction_with_dim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_maximum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_median_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_list_of_tensors_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_meshgrid_variadic_tensors_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_binary_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_no_dim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_min_reduction_with_dim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_minimum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mode_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_movedim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_msort_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mul_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_multinomial_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_multinomial_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_multinomial_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_multinomial_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mv_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nan_to_num_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanmedian_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanquantile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nanquantile_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nansum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_narrow_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_batch_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_batch_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_batch_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_batch_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_dropout_backward_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_dropout_backward_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_dropout_backward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_dropout_backward_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_layer_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_layer_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_layer_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_native_layer_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ne_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_neg_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_empty_strided_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_full_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_ones_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_new_zeros_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nextafter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nextafter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nextafter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nextafter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_alpha_dropout_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_alpha_dropout_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_alpha_dropout_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_alpha_dropout_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_avg_pool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_bilinear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_bilinear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_bilinear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_celu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_celu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_celu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_celu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_channel_shuffle_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_conv_transpose3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_similarity_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_similarity_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_similarity_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cosine_similarity_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cross_entropy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cross_entropy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cross_entropy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_cross_entropy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_ctc_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_ctc_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_dropout_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_elu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_elu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_elu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_elu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_bag_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_bag_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_bag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_bag_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_embedding_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_fractional_max_pool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gaussian_nll_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gaussian_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gaussian_nll_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gelu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gelu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_gelu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_glu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_glu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_glu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_glu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_grid_sample_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_grid_sample_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_grid_sample_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_grid_sample_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_group_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_group_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_group_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_group_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardshrink_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardshrink_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardshrink_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardsigmoid_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardsigmoid_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardsigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardsigmoid_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardswish_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardswish_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardswish_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardswish_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hardtanh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hinge_embedding_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hinge_embedding_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_hinge_embedding_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_huber_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_huber_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_huber_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_huber_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_instance_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_instance_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_instance_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_instance_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_area_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_area_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_area_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_area_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bicubic_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bicubic_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bicubic_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bilinear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_bilinear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_linear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_linear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_linear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_linear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_nearest_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_trilinear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_trilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_interpolate_trilinear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_kl_div_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_kl_div_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_kl_div_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_kl_div_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_l1_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_layer_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_layer_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_layer_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_layer_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_leaky_relu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_leaky_relu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_leaky_relu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_leaky_relu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_linear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_local_response_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_local_response_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_local_response_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_local_response_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_logsigmoid_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_logsigmoid_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_logsigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_logsigmoid_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_margin_ranking_loss_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_pool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_grad_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool1d_grad_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_grad_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool2d_grad_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_grad_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_max_unpool3d_grad_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mish_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mish_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mish_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mish_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mse_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mse_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mse_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_mse_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_head_attention_forward_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_head_attention_forward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_head_attention_forward_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_margin_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multi_margin_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_margin_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_margin_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_nll_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_nll_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_nll_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_normalize_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_one_hot_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_circular_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_constant_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_reflect_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pad_replicate_negative_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pairwise_distance_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pdist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pdist_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_shuffle_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_pixel_unshuffle_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_poisson_nll_loss_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_prelu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_prelu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_prelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_prelu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu6_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_relu_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rms_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rrelu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rrelu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rrelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_rrelu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_selu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_selu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_selu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_selu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_complex_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_complex_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_silu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_smooth_l1_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_smooth_l1_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_smooth_l1_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_soft_margin_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_soft_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_soft_margin_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softmin_with_dtype_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softplus_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softplus_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softplus_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softplus_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softshrink_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softshrink_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softshrink_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_softsign_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_tanhshrink_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_threshold_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_loss_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_unfold_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_bilinear_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_bilinear_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_nearest_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_nearest_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_nearest_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_nearest_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nn_functional_upsample_nearest_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_nonzero_static_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_fro_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_inf_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_nuc_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_nuc_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_nuc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_norm_nuc_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_in_place_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_number_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_number_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_number_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_normal_number_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ones_like_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ormqr_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ormqr_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ormqr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ormqr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_outer_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pca_lowrank_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pca_lowrank_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pca_lowrank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pca_lowrank_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_permute_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pinverse_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pinverse_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pinverse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pinverse_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polar_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polar_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_2_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_3_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_polygamma_polygamma_n_4_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_positive_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_pow_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_prod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_put_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_qr_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_qr_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_qr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_qr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_quantile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_quantile_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rad2deg_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rand_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randint_like_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_randn_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_ravel_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_real_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reciprocal_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_remainder_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_renorm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_repeat_interleave_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_as_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_reshape_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize__cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resize_as__cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_conj_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_resolve_neg_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_roll_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rot90_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_0_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_0_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_3_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_3_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_3_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_neg_3_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_neg_3_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_neg_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_round_decimals_neg_3_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsqrt_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_rsub_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scalar_tensor_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_add_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amax_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_amin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_mean_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_prod_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_scatter_reduce_sum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_searchsorted_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_select_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sgn_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_short_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sigmoid_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sign_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_bartlett_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_bartlett_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_blackman_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_blackman_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_cosine_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_cosine_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_exponential_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_exponential_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_gaussian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_gaussian_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_general_cosine_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_general_cosine_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_general_hamming_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_general_hamming_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_hamming_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_hamming_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_hann_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_hann_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_kaiser_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_kaiser_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_nuttall_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signal_windows_nuttall_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_signbit_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sin_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinc_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sinh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_slice_scatter_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_softmax_with_dtype_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sort_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_mm_reduce_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_mm_reduce_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_mm_reduce_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_mm_reduce_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_sampled_addmm_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_sampled_addmm_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_sampled_addmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sparse_sampled_addmm_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_airy_ai_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_j1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_bessel_y1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_t_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_u_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_v_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_chebyshev_polynomial_w_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_entr_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_erfcx_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_h_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_hermite_polynomial_he_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i0e_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_i1e_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_laguerre_polynomial_l_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_legendre_polynomial_p_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_log_ndtr_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_i1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_modified_bessel_k1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtr_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_ndtri_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_scaled_modified_bessel_k1_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_spherical_bessel_j0_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_xlog1py_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_special_zeta_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_list_args_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_split_with_sizes_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sqrt_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_square_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_squeeze_multiple_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stack_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_mean_unbiased_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_std_unbiased_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stft_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stft_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_stft_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sub_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_sum_to_size_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_lowrank_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_lowrank_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_lowrank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_svd_lowrank_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_t_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_along_dim_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_take_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tan_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tanh_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensor_split_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tensordot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tile_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_to_sparse_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_topk_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch__scaled_mm_cuda_float8_e4m3fn, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trace_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_transpose_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapezoid_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trapz_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triangular_solve_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triangular_solve_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triangular_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triangular_solve_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_indices_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_tril_indices_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_indices_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_triu_indices_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_true_divide_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_trunc_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unbind_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unflatten_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unfold_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_uniform_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_consecutive_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_uint16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_uint32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_uint64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unique_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unravel_index_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unravel_index_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unravel_index_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unravel_index_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unravel_index_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_chunk_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsafe_split_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_unsqueeze_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_mean_unbiased_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_var_unbiased_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vdot_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_complex_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_complex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_complex_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_real_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_as_real_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_copy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_view_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vsplit_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_vstack_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_where_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_xlogy_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zero__cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_bfloat16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_bool, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_complex128, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_complex32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_complex64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_float16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_float64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_int16, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_int32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_int64, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_int8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_nnc_correctness_zeros_like_cuda_uint8, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_H_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_T_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported___getitem___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported___rpow___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported___rsub___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__batch_norm_with_update_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__chunk_cat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__native_batch_norm_legit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__segment_reduce_lengths_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__segment_reduce_offsets_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__softmax_backward_data_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__unsafe_masked_index_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__unsafe_masked_index_put_accumulate_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported__upsample_bilinear2d_aa_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_acosh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_addbmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_addcdiv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_addmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_addmv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_addr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_alias_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_all_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_allclose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_aminmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_angle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_any_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_arange_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_argmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_argmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_argsort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_argwhere_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_as_strided_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_as_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_as_strided_partial_views_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_as_strided_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_asinh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_atanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_atleast_1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_atleast_2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_atleast_3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_baddbmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_bernoulli_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_bfloat16_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_block_diag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_bmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_broadcast_shapes_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_broadcast_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_broadcast_to_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_bucketize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cartesian_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cauchy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cdist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cdouble_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cfloat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_chalf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cholesky_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cholesky_inverse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cholesky_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_chunk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_clamp_max_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_clamp_min_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_clone_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_column_stack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_combinations_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_complex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_conj_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_conj_physical_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_constant_pad_nd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_copysign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_corrcoef_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_count_nonzero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cov_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cross_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cummax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cummin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cumprod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cumsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_cumulative_trapezoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_deg2rad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diag_embed_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diagflat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diagonal_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diagonal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diagonal_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_diff_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_digamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_dist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_dot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_dsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_dstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_einsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_empty_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_empty_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_empty_permuted_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_empty_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_equal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_erfinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_exp2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_expand_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_exponential_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_eye_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_fft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_fft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_fftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_fftshift_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_hfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_hfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_hfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ifft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ifft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ifftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ifftshift_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ihfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ihfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_ihfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_irfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_irfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_irfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_rfft2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_rfft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fft_rfftn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_flatten_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_flip_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fliplr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_flipud_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_float_power_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_floor_divide_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_fmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_frexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_full_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_full_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_gather_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_geometric_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_geqrf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_gradient_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_grid_sampler_2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_grid_sampler_3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_hash_tensor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_heaviside_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_histc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_hsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_hstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_hypot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_i0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_igamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_igammac_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_put_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_reduce_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_reduce_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_reduce_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_reduce_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_index_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_inner_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isclose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isfinite_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isinf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isneginf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isposinf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_isreal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_item_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_jiterator_2inputs_2outputs_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_jiterator_4inputs_with_extra_args_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_jiterator_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_jiterator_binary_return_by_ref_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_jiterator_unary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_kron_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_kthvalue_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_ldexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_cholesky_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_cholesky_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_cond_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_cross_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_det_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_diagonal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_eig_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_eigh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_eigvals_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_eigvalsh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_householder_product_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_inv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_inv_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_ldl_factor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_ldl_factor_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_ldl_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lstsq_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lstsq_grad_oriented_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lu_factor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lu_factor_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_lu_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_matrix_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_matrix_power_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_matrix_rank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_matrix_rank_hermitian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_multi_dot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_norm_subgradients_at_zero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_pinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_pinv_hermitian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_pinv_singular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_qr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_slogdet_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_solve_ex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_solve_triangular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_svd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_svdvals_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_tensorinv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_tensorsolve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_vander_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_vecdot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linalg_vector_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linspace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_linspace_tensor_overload_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_log_normal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_log_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_log_softmax_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logaddexp2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logaddexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logcumsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logdet_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logical_and_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logical_not_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logical_or_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logical_xor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logspace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logspace_tensor_overload_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_logsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_lu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_lu_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_lu_unpack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mH_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mT_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_argmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_argmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_cumprod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_cumsum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_log_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_logaddexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_logsumexp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_median_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_normalize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_softmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_std_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_masked_var_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_matrix_exp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_max_pool2d_with_indices_backward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_max_reduction_no_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_max_reduction_with_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_maximum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_median_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_meshgrid_list_of_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_meshgrid_variadic_tensors_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_min_reduction_no_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_min_reduction_with_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_minimum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mode_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_movedim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_msort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_multinomial_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mv_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nan_to_num_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nanmean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nanmedian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nanquantile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nansum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_narrow_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_narrow_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_native_batch_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_native_dropout_backward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_native_layer_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_new_empty_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_new_empty_strided_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_new_full_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_new_ones_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_new_zeros_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nextafter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_alpha_dropout_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_avg_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_avg_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_avg_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_batch_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_binary_cross_entropy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_celu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_channel_shuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv_transpose1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv_transpose2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_conv_transpose3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_cosine_embedding_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_cosine_similarity_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_cross_entropy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_ctc_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_dropout2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_dropout3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_dropout_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_elu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_embedding_bag_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_embedding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_fractional_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_fractional_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_gaussian_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_gelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_glu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_grid_sample_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_group_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_hinge_embedding_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_huber_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_instance_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_area_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_bicubic_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_linear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_nearest_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_interpolate_trilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_kl_div_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_l1_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_layer_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_linear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_local_response_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_logsigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_margin_ranking_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_pool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_pool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_pool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool1d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool1d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool2d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool2d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool3d_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_max_unpool3d_grad_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_mish_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_mse_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_multi_head_attention_forward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_multi_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_multilabel_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_normalize_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pad_circular_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pad_constant_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pad_reflect_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pad_replicate_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pad_replicate_negative_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pairwise_distance_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pdist_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pixel_shuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_pixel_unshuffle_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_poisson_nll_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_prelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_rms_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_rrelu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_selu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_silu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_smooth_l1_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_soft_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_softmin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_softmin_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_softshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_triplet_margin_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_unfold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_upsample_bilinear_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nn_functional_upsample_nearest_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nonzero_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_nonzero_static_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_norm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_norm_fro_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_norm_inf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_norm_nuc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_normal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_normal_in_place_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_normal_number_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_ones_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_ones_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_ormqr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_outer_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_pca_lowrank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_permute_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_pinverse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polar_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polygamma_polygamma_n_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polygamma_polygamma_n_1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polygamma_polygamma_n_2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polygamma_polygamma_n_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_polygamma_polygamma_n_4_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_positive_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_put_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_qr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_quantile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_rad2deg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_rand_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_randint_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_randint_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_randn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_randn_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_ravel_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_real_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_renorm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_repeat_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_repeat_interleave_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_resize__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_resize_as__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_resolve_conj_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_resolve_neg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_roll_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_rot90_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_round_decimals_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_round_decimals_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_round_decimals_neg_3_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scalar_tensor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_reduce_amax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_reduce_amin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_reduce_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_reduce_prod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_scatter_reduce_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_searchsorted_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_select_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_select_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sgn_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_bartlett_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_blackman_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_cosine_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_exponential_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_gaussian_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_general_cosine_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_general_hamming_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_hamming_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_hann_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_kaiser_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signal_windows_nuttall_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_signbit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sinc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_slice_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_slice_scatter_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_softmax_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_softmax_with_dtype_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sort_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sparse_mm_reduce_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sparse_sampled_addmm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_airy_ai_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_bessel_j0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_bessel_j1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_bessel_y0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_bessel_y1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_chebyshev_polynomial_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_chebyshev_polynomial_u_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_chebyshev_polynomial_v_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_chebyshev_polynomial_w_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_entr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_erfcx_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_hermite_polynomial_h_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_hermite_polynomial_he_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_i0e_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_i1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_i1e_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_laguerre_polynomial_l_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_legendre_polynomial_p_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_log_ndtr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_modified_bessel_i0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_modified_bessel_i1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_modified_bessel_k0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_modified_bessel_k1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_ndtr_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_ndtri_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_scaled_modified_bessel_k0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_scaled_modified_bessel_k1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_spherical_bessel_j0_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_xlog1py_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_special_zeta_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_split_list_args_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_split_with_sizes_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_split_with_sizes_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_square_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_squeeze_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_squeeze_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_squeeze_multiple_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_stack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_std_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_std_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_std_mean_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_std_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_stft_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_sum_to_size_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_svd_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_svd_lowrank_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_t_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_take_along_dim_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_take_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_tensor_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_tensordot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_tile_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_to_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_to_sparse_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_topk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_trace_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_transpose_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_trapezoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_trapz_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_triangular_solve_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_tril_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_triu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unbind_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unbind_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unflatten_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unfold_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unfold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_uniform_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unique_consecutive_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unique_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unsafe_chunk_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unsafe_split_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_unsqueeze_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_var_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_var_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_var_mean_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_var_unbiased_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_vdot_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_view_as_complex_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_view_copy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_vsplit_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_vstack_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_xlogy_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_zero__cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_zeros_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_unsupported_zeros_like_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working___radd___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working___rdiv___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working___rmod___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working___rmul___cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_abs_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_acos_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_add_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_addcmul_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_addmm_decomposed_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_asin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_atan2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_atan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_bool_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_byte_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_ceil_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_char_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_clamp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_contiguous_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_cos_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_cosh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_div_floor_rounding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_div_no_rounding_mode_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_div_trunc_rounding_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_double_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_eq_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_erf_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_erfc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_exp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_expand_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_expand_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_expm1_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_float_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_floor_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_fmod_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_ge_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_gt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_half_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_int_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_isnan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_le_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_lerp_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_lgamma_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_log10_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_log1p_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_log2_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_log_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_long_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_lt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_masked_fill_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_max_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_mean_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_min_binary_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_mm_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_mul_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_ne_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_neg_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_hardshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_hardsigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_hardswish_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_hardtanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_leaky_relu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_relu6_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_relu_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_softplus_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_softsign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_tanhshrink_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_nn_functional_threshold_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_permute_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_pow_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_reciprocal_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_remainder_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_reshape_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_reshape_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_round_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_rsqrt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_rsub_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_short_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sigmoid_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sign_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sin_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sinh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sqrt_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sub_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_sum_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_t_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_tan_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_tanh_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_transpose_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_true_divide_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_trunc_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_unsqueeze_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_view_as_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_view_cuda_float32, test/test_jit_fuser_te.py::TestNNCOpInfoCUDA::test_working_where_cuda_float32 2025-12-04T14:13:15.1558587Z 2025-12-04T14:13:15.1558698Z Finished test_jit_fuser_te 1/1 ... [2025-12-04 14:13:14.995712][2261579.262379597], took 7.66min 2025-12-04T14:13:15.1559076Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:13:15.1559434Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:13:15.1559638Z Running test_mkldnn 1/1 ... [2025-12-04 14:13:15.002295][2261579.268967398] 2025-12-04T14:13:15.1559806Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:13:15.1560180Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_mkldnn.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:13:15.002517] 2025-12-04T14:13:17.2027507Z 2025-12-04T14:13:17.2029072Z test_mkldnn 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_mkldnn_1.1_839a9f26cca864a9_.log 2025-12-04T14:13:17.2029823Z Running 0 items in this shard: 2025-12-04T14:13:17.2030044Z 2025-12-04T14:13:17.2030322Z Finished test_mkldnn 1/1 ... [2025-12-04 14:13:17.202408][2261581.469076789], took 0.04min 2025-12-04T14:13:17.2039260Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:13:17.2090538Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:13:17.2091470Z Running test_nestedtensor 2/2 ... [2025-12-04 14:13:17.208960][2261581.475632591] 2025-12-04T14:13:17.2091946Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:13:17.2093856Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:13:17.209160] 2025-12-04T14:35:45.8870912Z 2025-12-04T14:35:45.8871863Z PRINTING LOG FILE of test_nestedtensor 2/2 (test/test-reports/test_nestedtensor_2.2_7846e0a7f873b97b_.log) 2025-12-04T14:35:45.8872616Z Test results will be stored in test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c8e74065146f9707.xml 2025-12-04T14:35:45.8873165Z ============================= test session starts ============================== 2025-12-04T14:35:45.8873721Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T14:35:45.8874163Z cachedir: .pytest_cache 2025-12-04T14:35:45.8874680Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:35:45.8875265Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T14:35:45.8875567Z configfile: pytest.ini 2025-12-04T14:35:45.8876149Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:35:45.8876695Z collecting ... collected 1644 items 2025-12-04T14:35:45.8877021Z stepcurrent: Cannot find last run test, not skipping 2025-12-04T14:35:45.9017641Z Running 836 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_default_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_dim, test/test_nestedtensor.py::TestNestedTensor::test_fill_, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_ones_like, test/test_nestedtensor.py::TestNestedTensor::test_nested_namespace, test/test_nestedtensor.py::TestNestedTensor::test_nested_tensor_matching_dim, test/test_nestedtensor.py::TestNestedTensor::test_nested_view_from_buffer_overflow_errors, test/test_nestedtensor.py::TestNestedTensor::test_numel, test/test_nestedtensor.py::TestNestedTensor::test_repr_string, test/test_nestedtensor.py::TestNestedTensor::test_size, test/test_nestedtensor.py::TestNestedTensor::test_stride, test/test_nestedtensor.py::TestNestedTensor::test_to, test/test_nestedtensor.py::TestNestedTensor::test_to_padded_tensor_on_empty_tensor, test/test_nestedtensor.py::TestNestedTensor::test_unbind_0, test/test_nestedtensor.py::TestNestedTensor::test_unbind_1, test/test_nestedtensor.py::TestNestedTensor::test_unbind_dim, test/test_nestedtensor.py::TestNestedTensor::test_zero_, test/test_nestedtensor.py::TestNestedInt::test_with_factor, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_device_checks_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_strided_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sum_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isinf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isnan_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isneginf_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sin_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float64, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_abs_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1024_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_256_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_32_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_513_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_masked_fill_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_list_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_fused_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_plus_transpose_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_relu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_split_with_sizes_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_values_grad_with_broadcast_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_apply__cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_transposed_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_broadcast_shapes_on_in_graph_constructed_njt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_max_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_min_seq_len_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_with_custom_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_True_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_same_size_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_with_pinned_memory_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layout_under_torch_dispatch_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_ones_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_rand_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randint_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_zeros_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_backward_memory_usage_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_activation_checkpoint_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_fx_trace_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_njt_cat_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_permute_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_pin_memory_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_flop_counter_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_tensor_attributes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_threshold_backward_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_copy_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_transposed_inputs_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_0_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_equals_2_bad_dim_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unsafe_view_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_views_inherit_ragged_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_copysign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_xlogy_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hash_tensor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_heaviside_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_int_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isclose_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_long_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_neg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_laguerre_polynomial_l_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_spherical_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_heaviside_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_int_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_or_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nanmean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_threshold_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polar_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_he_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_zeta_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_trunc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_non_contiguous_mutation_cuda 2025-12-04T14:35:45.9147717Z 2025-12-04T14:35:45.9147881Z test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20 PASSED [0.0104s] [ 0%] 2025-12-04T14:35:45.9148283Z test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10 PASSED [0.0054s] [ 0%] 2025-12-04T14:35:45.9148620Z test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20 PASSED [0.0072s] [ 0%] 2025-12-04T14:35:45.9148953Z test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10 PASSED [0.0079s] [ 0%] 2025-12-04T14:35:45.9149286Z test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20 PASSED [0.0055s] [ 0%] 2025-12-04T14:35:45.9149621Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10 PASSED [0.0054s] [ 0%] 2025-12-04T14:35:45.9149955Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20 PASSED [0.0075s] [ 0%] 2025-12-04T14:35:45.9150292Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_10 PASSED [0.0065s] [ 0%] 2025-12-04T14:35:45.9150625Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10 PASSED [0.0075s] [ 1%] 2025-12-04T14:35:45.9150967Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_10 PASSED [0.0067s] [ 1%] 2025-12-04T14:35:45.9151316Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_20 PASSED [0.0062s] [ 1%] 2025-12-04T14:35:45.9151663Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_10 PASSED [0.0076s] [ 1%] 2025-12-04T14:35:45.9152010Z test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_4_max_seq_len_3_vocab_size_20 PASSED [0.0074s] [ 1%] 2025-12-04T14:35:45.9152314Z test_nestedtensor.py::TestNestedTensor::test_default_nested_tensor PASSED [0.0020s] [ 1%] 2025-12-04T14:35:45.9152596Z test_nestedtensor.py::TestNestedTensor::test_dim PASSED [0.0004s] [ 1%] 2025-12-04T14:35:45.9152820Z test_nestedtensor.py::TestNestedTensor::test_fill_ PASSED [0.0048s] [ 1%] 2025-12-04T14:35:45.9153061Z test_nestedtensor.py::TestNestedTensor::test_like_functions_ones_like PASSED [0.0061s] [ 2%] 2025-12-04T14:35:45.9153345Z test_nestedtensor.py::TestNestedTensor::test_nested_namespace PASSED [0.0063s] [ 2%] 2025-12-04T14:35:45.9153599Z test_nestedtensor.py::TestNestedTensor::test_nested_tensor_matching_dim PASSED [0.0006s] [ 2%] 2025-12-04T14:35:45.9153881Z test_nestedtensor.py::TestNestedTensor::test_nested_view_from_buffer_overflow_errors PASSED [0.0005s] [ 2%] 2025-12-04T14:35:45.9154138Z test_nestedtensor.py::TestNestedTensor::test_numel PASSED [0.0332s] [ 2%] 2025-12-04T14:35:45.9154362Z test_nestedtensor.py::TestNestedTensor::test_repr_string PASSED [0.0067s] [ 2%] 2025-12-04T14:35:45.9154583Z test_nestedtensor.py::TestNestedTensor::test_size PASSED [0.0005s] [ 2%] 2025-12-04T14:35:45.9154811Z test_nestedtensor.py::TestNestedTensor::test_stride PASSED [0.0005s] [ 2%] 2025-12-04T14:35:45.9155030Z test_nestedtensor.py::TestNestedTensor::test_to PASSED [0.0279s] [ 2%] 2025-12-04T14:35:45.9155277Z test_nestedtensor.py::TestNestedTensor::test_to_padded_tensor_on_empty_tensor PASSED [0.0010s] [ 3%] 2025-12-04T14:35:45.9155528Z test_nestedtensor.py::TestNestedTensor::test_unbind_0 PASSED [0.0119s] [ 3%] 2025-12-04T14:35:45.9155747Z test_nestedtensor.py::TestNestedTensor::test_unbind_1 PASSED [0.0143s] [ 3%] 2025-12-04T14:35:45.9155965Z test_nestedtensor.py::TestNestedTensor::test_unbind_dim PASSED [0.0074s] [ 3%] 2025-12-04T14:35:45.9156182Z test_nestedtensor.py::TestNestedTensor::test_zero_ PASSED [0.0067s] [ 3%] 2025-12-04T14:35:45.9156398Z test_nestedtensor.py::TestNestedInt::test_with_factor PASSED [0.0005s] [ 3%] 2025-12-04T14:35:45.9156685Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float32 SKIPPED [0.0347s] (Only runs on cpu) [ 3%] 2025-12-04T14:35:45.9157075Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64 SKIPPED [0.0011s] (Only runs on cpu) [ 3%] 2025-12-04T14:35:45.9157403Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_bfloat16 PASSED [0.8183s] [ 4%] 2025-12-04T14:35:45.9157709Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float16 PASSED [0.7825s] [ 4%] 2025-12-04T14:35:45.9158009Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float32 PASSED [2.2249s] [ 4%] 2025-12-04T14:35:45.9158306Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float64 PASSED [0.2033s] [ 4%] 2025-12-04T14:35:45.9158618Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float32 PASSED [0.9047s] [ 4%] 2025-12-04T14:35:45.9158937Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16 PASSED [0.0027s] [ 4%] 2025-12-04T14:35:45.9159245Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float32 PASSED [0.0090s] [ 4%] 2025-12-04T14:35:45.9159541Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float64 PASSED [0.0852s] [ 4%] 2025-12-04T14:35:45.9159835Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_device_checks_cuda PASSED [0.0008s] [ 5%] 2025-12-04T14:35:45.9160153Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float64 PASSED [0.0089s] [ 5%] 2025-12-04T14:35:45.9160483Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float64 PASSED [0.0128s] [ 5%] 2025-12-04T14:35:45.9160797Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_embedding_strided_cuda PASSED [0.0097s] [ 5%] 2025-12-04T14:35:45.9161104Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float32 PASSED [0.0065s] [ 5%] 2025-12-04T14:35:45.9161444Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float16 PASSED [0.0229s] [ 5%] 2025-12-04T14:35:45.9161823Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float32 PASSED [0.0029s] [ 5%] 2025-12-04T14:35:45.9162147Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float64 PASSED [0.0026s] [ 5%] 2025-12-04T14:35:45.9162468Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int16 PASSED [0.0021s] [ 5%] 2025-12-04T14:35:45.9162789Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float32 PASSED [0.0065s] [ 6%] 2025-12-04T14:35:45.9163112Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float64 PASSED [0.0026s] [ 6%] 2025-12-04T14:35:45.9163465Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int8 PASSED [0.0022s] [ 6%] 2025-12-04T14:35:45.9163791Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_bfloat16 PASSED [0.0049s] [ 6%] 2025-12-04T14:35:45.9164135Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float16 PASSED [0.0021s] [ 6%] 2025-12-04T14:35:45.9164468Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float32 PASSED [0.0018s] [ 6%] 2025-12-04T14:35:45.9164800Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float64 PASSED [0.0022s] [ 6%] 2025-12-04T14:35:45.9165128Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int32 PASSED [0.0017s] [ 6%] 2025-12-04T14:35:45.9165454Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int64 PASSED [0.0020s] [ 7%] 2025-12-04T14:35:45.9165779Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_uint8 PASSED [0.0019s] [ 7%] 2025-12-04T14:35:45.9166185Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_bfloat16 PASSED [0.0046s] [ 7%] 2025-12-04T14:35:45.9166557Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float32 PASSED [0.0021s] [ 7%] 2025-12-04T14:35:45.9166889Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int16 PASSED [0.0028s] [ 7%] 2025-12-04T14:35:45.9167213Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int32 PASSED [0.0029s] [ 7%] 2025-12-04T14:35:45.9167538Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_uint8 PASSED [0.0037s] [ 7%] 2025-12-04T14:35:45.9167865Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_bfloat16 PASSED [0.0022s] [ 7%] 2025-12-04T14:35:45.9168190Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float16 PASSED [0.0017s] [ 8%] 2025-12-04T14:35:45.9168512Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float32 PASSED [0.0013s] [ 8%] 2025-12-04T14:35:45.9168841Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float64 PASSED [0.0014s] [ 8%] 2025-12-04T14:35:45.9169160Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int64 PASSED [0.0010s] [ 8%] 2025-12-04T14:35:45.9169481Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_uint8 PASSED [0.0010s] [ 8%] 2025-12-04T14:35:45.9169803Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_bfloat16 PASSED [0.0012s] [ 8%] 2025-12-04T14:35:45.9170124Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float32 PASSED [0.0011s] [ 8%] 2025-12-04T14:35:45.9170441Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int16 PASSED [0.0020s] [ 8%] 2025-12-04T14:35:45.9170759Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int64 PASSED [0.0010s] [ 8%] 2025-12-04T14:35:45.9171103Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float16 PASSED [0.0211s] [ 9%] 2025-12-04T14:35:45.9171424Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float32 PASSED [0.0009s] [ 9%] 2025-12-04T14:35:45.9171758Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_noncontiguous_cuda_float64 PASSED [0.0007s] [ 9%] 2025-12-04T14:35:45.9172082Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float32 PASSED [0.0400s] [ 9%] 2025-12-04T14:35:45.9172384Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float64 PASSED [0.1505s] [ 9%] 2025-12-04T14:35:45.9172699Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_noncontiguous_cuda_float64 PASSED [0.0024s] [ 9%] 2025-12-04T14:35:45.9173044Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float32 PASSED [0.0060s] [ 9%] 2025-12-04T14:35:45.9173435Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float64 PASSED [0.0070s] [ 9%] 2025-12-04T14:35:45.9173759Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_narrow_cuda_float16 PASSED [0.0138s] [ 10%] 2025-12-04T14:35:45.9174081Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float16 PASSED [0.0011s] [ 10%] 2025-12-04T14:35:45.9174430Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32 PASSED [0.0011s] [ 10%] 2025-12-04T14:35:45.9174809Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float16 PASSED [0.0236s] [ 10%] 2025-12-04T14:35:45.9175220Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float32 PASSED [0.0215s] [ 10%] 2025-12-04T14:35:45.9175629Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32 PASSED [0.0213s] [ 10%] 2025-12-04T14:35:45.9176027Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32 PASSED [0.0060s] [ 10%] 2025-12-04T14:35:45.9176359Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float16 PASSED [0.0035s] [ 10%] 2025-12-04T14:35:45.9176718Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float16 PASSED [0.0012s] [ 11%] 2025-12-04T14:35:45.9177096Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float32 PASSED [0.0011s] [ 11%] 2025-12-04T14:35:45.9177475Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float64 PASSED [0.0014s] [ 11%] 2025-12-04T14:35:45.9177828Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float16 PASSED [0.0041s] [ 11%] 2025-12-04T14:35:45.9178167Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float16 PASSED [0.0030s] [ 11%] 2025-12-04T14:35:45.9178512Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_in_place_cuda_float32 PASSED [0.0022s] [ 11%] 2025-12-04T14:35:45.9178864Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float32 PASSED [0.0025s] [ 11%] 2025-12-04T14:35:45.9179222Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float64 PASSED [0.0021s] [ 11%] 2025-12-04T14:35:45.9179587Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_False_cuda_float32 PASSED [0.0011s] [ 11%] 2025-12-04T14:35:45.9179953Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16 PASSED [0.0014s] [ 12%] 2025-12-04T14:35:45.9180329Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sum_dim_cuda_float32 SKIPPED [0.0005s] (Only runs on cpu) [ 12%] 2025-12-04T14:35:45.9180707Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float16 PASSED [0.0014s] [ 12%] 2025-12-04T14:35:45.9181040Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_3_cuda PASSED [0.0325s] [ 12%] 2025-12-04T14:35:45.9181436Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float16 PASSED [0.0081s] [ 12%] 2025-12-04T14:35:45.9181861Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float32 PASSED [0.0066s] [ 12%] 2025-12-04T14:35:45.9182281Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float32 PASSED [0.0056s] [ 12%] 2025-12-04T14:35:45.9182701Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float64 PASSED [0.0074s] [ 12%] 2025-12-04T14:35:45.9183121Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32 PASSED [0.0066s] [ 13%] 2025-12-04T14:35:45.9183594Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float64 PASSED [0.0065s] [ 13%] 2025-12-04T14:35:45.9183955Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float64 PASSED [0.0203s] [ 13%] 2025-12-04T14:35:45.9184266Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32 PASSED [0.0152s] [ 13%] 2025-12-04T14:35:45.9184596Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float16 PASSED [0.0013s] [ 13%] 2025-12-04T14:35:45.9184929Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64 PASSED [0.0010s] [ 13%] 2025-12-04T14:35:45.9185262Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float32 PASSED [0.0010s] [ 13%] 2025-12-04T14:35:45.9185642Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float16 PASSED [0.0011s] [ 13%] 2025-12-04T14:35:45.9186002Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float16 PASSED [0.0012s] [ 13%] 2025-12-04T14:35:45.9186356Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32 PASSED [0.0013s] [ 14%] 2025-12-04T14:35:45.9186705Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float16 PASSED [0.0011s] [ 14%] 2025-12-04T14:35:45.9187043Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_simple_cuda_float32 PASSED [0.0011s] [ 14%] 2025-12-04T14:35:45.9187397Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16 PASSED [0.0038s] [ 14%] 2025-12-04T14:35:45.9187781Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32 PASSED [0.0012s] [ 14%] 2025-12-04T14:35:45.9188129Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16 PASSED [0.0011s] [ 14%] 2025-12-04T14:35:45.9188433Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float32 PASSED [0.0010s] [ 14%] 2025-12-04T14:35:45.9188736Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float64 PASSED [0.0011s] [ 14%] 2025-12-04T14:35:45.9189076Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16 PASSED [0.0011s] [ 15%] 2025-12-04T14:35:45.9189418Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs__cuda PASSED [0.0013s] [ 15%] 2025-12-04T14:35:45.9189718Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs_cuda PASSED [0.0012s] [ 15%] 2025-12-04T14:35:45.9190018Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu__cuda PASSED [0.0095s] [ 15%] 2025-12-04T14:35:45.9190354Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isinf_cuda PASSED [0.0017s] [ 15%] 2025-12-04T14:35:45.9190657Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isnan_cuda PASSED [0.0015s] [ 15%] 2025-12-04T14:35:45.9190964Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isneginf_cuda PASSED [0.0010s] [ 15%] 2025-12-04T14:35:45.9191269Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu__cuda PASSED [0.0012s] [ 15%] 2025-12-04T14:35:45.9191568Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu__cuda PASSED [0.0065s] [ 16%] 2025-12-04T14:35:45.9191869Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_silu_cuda PASSED [0.0015s] [ 16%] 2025-12-04T14:35:45.9192167Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_sin_cuda PASSED [0.0041s] [ 16%] 2025-12-04T14:35:45.9192473Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh__cuda PASSED [0.0045s] [ 16%] 2025-12-04T14:35:45.9192793Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float32 PASSED [0.0012s] [ 16%] 2025-12-04T14:35:45.9193113Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float32 PASSED [0.0014s] [ 16%] 2025-12-04T14:35:45.9193431Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float64 PASSED [0.0011s] [ 16%] 2025-12-04T14:35:45.9193762Z test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float64 PASSED [0.0127s] [ 16%] 2025-12-04T14:35:45.9194089Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_abs_backward_cuda PASSED [0.0338s] [ 16%] 2025-12-04T14:35:45.9194406Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda PASSED [0.0084s] [ 17%] 2025-12-04T14:35:45.9194795Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda PASSED [0.0028s] [ 17%] 2025-12-04T14:35:45.9195127Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda PASSED [0.0176s] [ 17%] 2025-12-04T14:35:45.9195425Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda PASSED [0.0042s] [ 17%] 2025-12-04T14:35:45.9195738Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_128_cuda PASSED [0.0369s] [ 17%] 2025-12-04T14:35:45.9196068Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_4_cuda PASSED [0.0039s] [ 17%] 2025-12-04T14:35:45.9196395Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1024_cuda PASSED [0.0033s] [ 17%] 2025-12-04T14:35:45.9196717Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_128_cuda PASSED [0.0038s] [ 17%] 2025-12-04T14:35:45.9197040Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_256_cuda PASSED [0.0031s] [ 18%] 2025-12-04T14:35:45.9197365Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_32_cuda PASSED [0.0031s] [ 18%] 2025-12-04T14:35:45.9197684Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_4_cuda PASSED [0.0031s] [ 18%] 2025-12-04T14:35:45.9198002Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda PASSED [0.0031s] [ 18%] 2025-12-04T14:35:45.9198323Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_513_cuda PASSED [0.0031s] [ 18%] 2025-12-04T14:35:45.9198635Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_masked_fill_backward_cuda PASSED [0.0030s] [ 18%] 2025-12-04T14:35:45.9198945Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_gradcheck_cuda PASSED [0.0660s] [ 18%] 2025-12-04T14:35:45.9199261Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_list_cuda PASSED [0.0035s] [ 18%] 2025-12-04T14:35:45.9199619Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_fused_cuda PASSED [0.0024s] [ 19%] 2025-12-04T14:35:45.9199960Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_plus_transpose_cuda PASSED [0.0076s] [ 19%] 2025-12-04T14:35:45.9200298Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda PASSED [0.0726s] [ 19%] 2025-12-04T14:35:45.9200626Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_backward_cuda PASSED [0.0042s] [ 19%] 2025-12-04T14:35:45.9200943Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_softmax_cuda PASSED [0.0035s] [ 19%] 2025-12-04T14:35:45.9201264Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda PASSED [0.0045s] [ 19%] 2025-12-04T14:35:45.9201602Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_gradcheck_cuda PASSED [0.0167s] [ 19%] 2025-12-04T14:35:45.9201921Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_relu_backward_cuda PASSED [0.0126s] [ 19%] 2025-12-04T14:35:45.9202211Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda PASSED [0.0034s] [ 19%] 2025-12-04T14:35:45.9202520Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_split_with_sizes_flow_through_cuda PASSED [0.0244s] [ 20%] 2025-12-04T14:35:45.9202844Z test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_values_grad_with_broadcast_cuda PASSED [0.0029s] [ 20%] 2025-12-04T14:35:45.9203147Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_apply__cuda_float32 PASSED [0.0016s] [ 20%] 2025-12-04T14:35:45.9203574Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_False_cuda_float16 PASSED [0.0007s] [ 20%] 2025-12-04T14:35:45.9204093Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16 PASSED [0.0009s] [ 20%] 2025-12-04T14:35:45.9204581Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float16 PASSED [0.0005s] [ 20%] 2025-12-04T14:35:45.9205064Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0005s] [ 20%] 2025-12-04T14:35:45.9205551Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16 PASSED [0.0005s] [ 20%] 2025-12-04T14:35:45.9206034Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float64 PASSED [0.0006s] [ 21%] 2025-12-04T14:35:45.9206523Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float16 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9207014Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_False_cuda_float64 PASSED [0.0007s] [ 21%] 2025-12-04T14:35:45.9207502Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float16 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9207989Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9208475Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9208967Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9209487Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0005s] [ 21%] 2025-12-04T14:35:45.9209971Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float16 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9210456Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16 PASSED [0.0007s] [ 22%] 2025-12-04T14:35:45.9210944Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9211432Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float16 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9211915Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9212398Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float16 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9212879Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9213389Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_True_cuda_float64 PASSED [0.0007s] [ 22%] 2025-12-04T14:35:45.9213904Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0005s] [ 22%] 2025-12-04T14:35:45.9214393Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float32 PASSED [0.0005s] [ 23%] 2025-12-04T14:35:45.9214879Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0005s] [ 23%] 2025-12-04T14:35:45.9215366Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0005s] [ 23%] 2025-12-04T14:35:45.9215853Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0005s] [ 23%] 2025-12-04T14:35:45.9216340Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16 PASSED [0.0007s] [ 23%] 2025-12-04T14:35:45.9216824Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0005s] [ 23%] 2025-12-04T14:35:45.9217308Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float64 PASSED [0.0031s] [ 23%] 2025-12-04T14:35:45.9217790Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0023s] [ 23%] 2025-12-04T14:35:45.9218273Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0035s] [ 24%] 2025-12-04T14:35:45.9218793Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0044s] [ 24%] 2025-12-04T14:35:45.9219274Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float16 PASSED [0.0054s] [ 24%] 2025-12-04T14:35:45.9219756Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0035s] [ 24%] 2025-12-04T14:35:45.9220238Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float64 PASSED [0.0029s] [ 24%] 2025-12-04T14:35:45.9220722Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0014s] [ 24%] 2025-12-04T14:35:45.9221216Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0050s] [ 24%] 2025-12-04T14:35:45.9221702Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0016s] [ 24%] 2025-12-04T14:35:45.9222188Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0057s] [ 25%] 2025-12-04T14:35:45.9222674Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0023s] [ 25%] 2025-12-04T14:35:45.9223159Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float16 PASSED [0.0022s] [ 25%] 2025-12-04T14:35:45.9223714Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float32 PASSED [0.0022s] [ 25%] 2025-12-04T14:35:45.9224197Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0023s] [ 25%] 2025-12-04T14:35:45.9224680Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0029s] [ 25%] 2025-12-04T14:35:45.9232997Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64 PASSED [0.0033s] [ 25%] 2025-12-04T14:35:45.9233561Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16 PASSED [0.0014s] [ 25%] 2025-12-04T14:35:45.9234057Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0013s] [ 25%] 2025-12-04T14:35:45.9234550Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16 PASSED [0.0046s] [ 26%] 2025-12-04T14:35:45.9235037Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float32 PASSED [0.0074s] [ 26%] 2025-12-04T14:35:45.9235525Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0065s] [ 26%] 2025-12-04T14:35:45.9236011Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0017s] [ 26%] 2025-12-04T14:35:45.9236574Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0015s] [ 26%] 2025-12-04T14:35:45.9237059Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0054s] [ 26%] 2025-12-04T14:35:45.9237544Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float32 PASSED [0.0022s] [ 26%] 2025-12-04T14:35:45.9238033Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0022s] [ 26%] 2025-12-04T14:35:45.9238518Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32 PASSED [0.0028s] [ 27%] 2025-12-04T14:35:45.9239007Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0030s] [ 27%] 2025-12-04T14:35:45.9239488Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32 PASSED [0.0027s] [ 27%] 2025-12-04T14:35:45.9239967Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64 PASSED [0.0028s] [ 27%] 2025-12-04T14:35:45.9240450Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float32 PASSED [0.0013s] [ 27%] 2025-12-04T14:35:45.9240978Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float64 PASSED [0.0013s] [ 27%] 2025-12-04T14:35:45.9241470Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float32 PASSED [0.0053s] [ 27%] 2025-12-04T14:35:45.9241957Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float64 PASSED [0.0076s] [ 27%] 2025-12-04T14:35:45.9242444Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64 PASSED [0.0015s] [ 27%] 2025-12-04T14:35:45.9242842Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_cuda PASSED [0.0073s] [ 28%] 2025-12-04T14:35:45.9243160Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_transposed_cuda PASSED [0.0087s] [ 28%] 2025-12-04T14:35:45.9243556Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_broadcast_shapes_on_in_graph_constructed_njt_cuda_float32 PASSED [1.6784s] [ 28%] 2025-12-04T14:35:45.9243999Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_preserves_metadata_cache_cuda_float32 SKIPPED [0.0002s] (torch.compile is not supported on python 3.12+) [ 28%] 2025-12-04T14:35:45.9244485Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_max_seq_len_cuda_float32 SKIPPED [0.0001s] (torch.compile is not supported on python 3.12+) [ 28%] 2025-12-04T14:35:45.9244963Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_with_dynamic_min_seq_len_cuda_float32 SKIPPED [0.0001s] (torch.compile is not supported on python 3.12+) [ 28%] 2025-12-04T14:35:45.9245380Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_composite_op_with_custom_mode_cuda_float32 PASSED [0.0022s] [ 28%] 2025-12-04T14:35:45.9245745Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float16 PASSED [0.0029s] [ 28%] 2025-12-04T14:35:45.9246156Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float64 PASSED [0.0024s] [ 29%] 2025-12-04T14:35:45.9246499Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda PASSED [0.0013s] [ 29%] 2025-12-04T14:35:45.9246913Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 29%] 2025-12-04T14:35:45.9247422Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_True_cuda SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 29%] 2025-12-04T14:35:45.9247823Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_same_size_cuda PASSED [0.0018s] [ 29%] 2025-12-04T14:35:45.9248207Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float16 PASSED [0.0023s] [ 29%] 2025-12-04T14:35:45.9248678Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_False_cuda_float64 PASSED [0.0020s] [ 29%] 2025-12-04T14:35:45.9249147Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float16 PASSED [0.0063s] [ 29%] 2025-12-04T14:35:45.9249615Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float32 PASSED [0.0051s] [ 30%] 2025-12-04T14:35:45.9250079Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float64 PASSED [0.0049s] [ 30%] 2025-12-04T14:35:45.9250567Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float16 PASSED [0.0027s] [ 30%] 2025-12-04T14:35:45.9251117Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0025s] [ 30%] 2025-12-04T14:35:45.9251632Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float16 PASSED [0.0026s] [ 30%] 2025-12-04T14:35:45.9252145Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float64 PASSED [0.0056s] [ 30%] 2025-12-04T14:35:45.9252654Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0053s] [ 30%] 2025-12-04T14:35:45.9253166Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_True_cuda_float64 PASSED [0.0053s] [ 30%] 2025-12-04T14:35:45.9253646Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_with_pinned_memory_cuda PASSED [0.0034s] [ 30%] 2025-12-04T14:35:45.9254107Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0111s] [ 31%] 2025-12-04T14:35:45.9254648Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0056s] [ 31%] 2025-12-04T14:35:45.9255188Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0123s] [ 31%] 2025-12-04T14:35:45.9255729Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0120s] [ 31%] 2025-12-04T14:35:45.9256293Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0118s] [ 31%] 2025-12-04T14:35:45.9256829Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0124s] [ 31%] 2025-12-04T14:35:45.9257284Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float16 PASSED [0.0012s] [ 31%] 2025-12-04T14:35:45.9257661Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float64 PASSED [0.0010s] [ 31%] 2025-12-04T14:35:45.9258077Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float32 PASSED [0.0012s] [ 32%] 2025-12-04T14:35:45.9258536Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float16 PASSED [0.0013s] [ 32%] 2025-12-04T14:35:45.9258995Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float32 PASSED [0.0011s] [ 32%] 2025-12-04T14:35:45.9259447Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float32 PASSED [0.0022s] [ 32%] 2025-12-04T14:35:45.9259899Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64 PASSED [0.0019s] [ 32%] 2025-12-04T14:35:45.9260363Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0027s] [ 32%] 2025-12-04T14:35:45.9260873Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0048s] [ 32%] 2025-12-04T14:35:45.9261350Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0050s] [ 32%] 2025-12-04T14:35:45.9262187Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32 W1204 14:13:28.075000 1638787 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-12-04T14:35:45.9262839Z PASSED [0.0015s] [ 33%] 2025-12-04T14:35:45.9263068Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layout_under_torch_dispatch_mode_cuda PASSED [0.0011s] [ 33%] 2025-12-04T14:35:45.9263430Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda PASSED [0.0011s] [ 33%] 2025-12-04T14:35:45.9263738Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda PASSED [0.0010s] [ 33%] 2025-12-04T14:35:45.9264045Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_empty_like_cuda PASSED [0.0080s] [ 33%] 2025-12-04T14:35:45.9264350Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_ones_like_cuda PASSED [0.0188s] [ 33%] 2025-12-04T14:35:45.9264654Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_rand_like_cuda PASSED [0.0082s] [ 33%] 2025-12-04T14:35:45.9264960Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randint_like_cuda PASSED [0.0139s] [ 33%] 2025-12-04T14:35:45.9265267Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randn_like_cuda PASSED [0.0080s] [ 33%] 2025-12-04T14:35:45.9265607Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_zeros_like_cuda PASSED [0.0186s] [ 34%] 2025-12-04T14:35:45.9265935Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_backward_memory_usage_cuda_float32 PASSED [0.1612s] [ 34%] 2025-12-04T14:35:45.9266284Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_activation_checkpoint_cuda PASSED [0.0054s] [ 34%] 2025-12-04T14:35:45.9266655Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_fx_trace_cuda SKIPPED [0.0006s] (Only runs on cpu) [ 34%] 2025-12-04T14:35:45.9267045Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_from_jagged_pass_min_max_False_cuda_float32 PASSED [0.0032s] [ 34%] 2025-12-04T14:35:45.9267379Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_njt_cat_cuda PASSED [0.0056s] [ 34%] 2025-12-04T14:35:45.9267673Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_pointwise_cuda PASSED [0.0023s] [ 34%] 2025-12-04T14:35:45.9268020Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float32 PASSED [0.0018s] [ 34%] 2025-12-04T14:35:45.9268395Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float64 PASSED [0.0018s] [ 35%] 2025-12-04T14:35:45.9268770Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float64 PASSED [0.0014s] [ 35%] 2025-12-04T14:35:45.9269251Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0005s] [ 35%] 2025-12-04T14:35:45.9269829Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 35%] 2025-12-04T14:35:45.9270439Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0016s] [ 35%] 2025-12-04T14:35:45.9271010Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0014s] [ 35%] 2025-12-04T14:35:45.9271578Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0017s] [ 35%] 2025-12-04T14:35:45.9272148Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0014s] [ 35%] 2025-12-04T14:35:45.9272717Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0014s] [ 36%] 2025-12-04T14:35:45.9273327Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0014s] [ 36%] 2025-12-04T14:35:45.9273898Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0014s] [ 36%] 2025-12-04T14:35:45.9274471Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0005s] [ 36%] 2025-12-04T14:35:45.9275054Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 36%] 2025-12-04T14:35:45.9275667Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0007s] [ 36%] 2025-12-04T14:35:45.9276241Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0037s] [ 36%] 2025-12-04T14:35:45.9276815Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0039s] [ 36%] 2025-12-04T14:35:45.9277385Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0039s] [ 36%] 2025-12-04T14:35:45.9277965Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0035s] [ 37%] 2025-12-04T14:35:45.9278541Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0034s] [ 37%] 2025-12-04T14:35:45.9279160Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0007s] [ 37%] 2025-12-04T14:35:45.9279826Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 37%] 2025-12-04T14:35:45.9280524Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0044s] [ 37%] 2025-12-04T14:35:45.9281180Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0046s] [ 37%] 2025-12-04T14:35:45.9281836Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0005s] [ 37%] 2025-12-04T14:35:45.9282496Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 37%] 2025-12-04T14:35:45.9283158Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0007s] [ 38%] 2025-12-04T14:35:45.9283848Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0033s] [ 38%] 2025-12-04T14:35:45.9284509Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0040s] [ 38%] 2025-12-04T14:35:45.9285172Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0041s] [ 38%] 2025-12-04T14:35:45.9285860Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0041s] [ 38%] 2025-12-04T14:35:45.9286513Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0044s] [ 38%] 2025-12-04T14:35:45.9287170Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0034s] [ 38%] 2025-12-04T14:35:45.9287832Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0032s] [ 38%] 2025-12-04T14:35:45.9288490Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0033s] [ 38%] 2025-12-04T14:35:45.9289149Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0034s] [ 39%] 2025-12-04T14:35:45.9289778Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0005s] [ 39%] 2025-12-04T14:35:45.9290411Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 39%] 2025-12-04T14:35:45.9291003Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0034s] [ 39%] 2025-12-04T14:35:45.9291592Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0034s] [ 39%] 2025-12-04T14:35:45.9292180Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0071s] [ 39%] 2025-12-04T14:35:45.9292776Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0070s] [ 39%] 2025-12-04T14:35:45.9293403Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0077s] [ 39%] 2025-12-04T14:35:45.9293994Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0072s] [ 40%] 2025-12-04T14:35:45.9294580Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0072s] [ 40%] 2025-12-04T14:35:45.9295175Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0077s] [ 40%] 2025-12-04T14:35:45.9295786Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0006s] [ 40%] 2025-12-04T14:35:45.9296356Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0005s] [ 40%] 2025-12-04T14:35:45.9296916Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0005s] [ 40%] 2025-12-04T14:35:45.9297475Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0011s] [ 40%] 2025-12-04T14:35:45.9298040Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0015s] [ 40%] 2025-12-04T14:35:45.9298597Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0010s] [ 41%] 2025-12-04T14:35:45.9299154Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0010s] [ 41%] 2025-12-04T14:35:45.9299573Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_permute_cuda PASSED [0.0021s] [ 41%] 2025-12-04T14:35:45.9299858Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_pin_memory_cuda PASSED [0.0054s] [ 41%] 2025-12-04T14:35:45.9300214Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_False_cuda PASSED [0.0017s] [ 41%] 2025-12-04T14:35:45.9300640Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 41%] 2025-12-04T14:35:45.9301124Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float32 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 41%] 2025-12-04T14:35:45.9301603Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_bfloat16 SKIPPED [0.0001s] (ROCm doesn't support flash attention or mem_efficient attention for NT) [ 41%] 2025-12-04T14:35:45.9302079Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_cuda_float32 SKIPPED [0.0001s] (ROCm doesn't support flash attention or mem_efficient attention for NT) [ 41%] 2025-12-04T14:35:45.9302551Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_flop_counter_cuda SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 42%] 2025-12-04T14:35:45.9302975Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16 PASSED [0.0057s] [ 42%] 2025-12-04T14:35:45.9303379Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float64 PASSED [0.0046s] [ 42%] 2025-12-04T14:35:45.9303737Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_packed_in_proj_cuda_float32 PASSED [0.0077s] [ 42%] 2025-12-04T14:35:45.9304121Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_False_cuda_float32 PASSED [0.0029s] [ 42%] 2025-12-04T14:35:45.9304553Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_False_cuda_float32 PASSED [0.0022s] [ 42%] 2025-12-04T14:35:45.9305018Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0039s] [ 42%] 2025-12-04T14:35:45.9305536Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0046s] [ 42%] 2025-12-04T14:35:45.9306021Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0043s] [ 43%] 2025-12-04T14:35:45.9306576Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0021s] [ 43%] 2025-12-04T14:35:45.9307196Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0019s] [ 43%] 2025-12-04T14:35:45.9307742Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32 PASSED [0.0049s] [ 43%] 2025-12-04T14:35:45.9308206Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32 PASSED [0.0047s] [ 43%] 2025-12-04T14:35:45.9308668Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32 PASSED [0.0055s] [ 43%] 2025-12-04T14:35:45.9309127Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32 PASSED [0.0053s] [ 43%] 2025-12-04T14:35:45.9309605Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0043s] [ 43%] 2025-12-04T14:35:45.9310154Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0043s] [ 44%] 2025-12-04T14:35:45.9310659Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0049s] [ 44%] 2025-12-04T14:35:45.9311153Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0010s] [ 44%] 2025-12-04T14:35:45.9311629Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0011s] [ 44%] 2025-12-04T14:35:45.9312101Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0010s] [ 44%] 2025-12-04T14:35:45.9312588Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32 PASSED [0.0019s] [ 44%] 2025-12-04T14:35:45.9313084Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32 PASSED [0.0017s] [ 44%] 2025-12-04T14:35:45.9313610Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32 PASSED [0.0018s] [ 44%] 2025-12-04T14:35:45.9314103Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_True_log_softmax_cuda_float32 PASSED [0.0018s] [ 44%] 2025-12-04T14:35:45.9314498Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_cuda PASSED [0.0028s] [ 45%] 2025-12-04T14:35:45.9314910Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0022s] [ 45%] 2025-12-04T14:35:45.9315473Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0020s] [ 45%] 2025-12-04T14:35:45.9316015Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0020s] [ 45%] 2025-12-04T14:35:45.9316545Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32 PASSED [0.0019s] [ 45%] 2025-12-04T14:35:45.9317070Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0020s] [ 45%] 2025-12-04T14:35:45.9317597Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32 PASSED [0.0019s] [ 45%] 2025-12-04T14:35:45.9318122Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32 PASSED [0.0022s] [ 45%] 2025-12-04T14:35:45.9318644Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32 PASSED [0.0021s] [ 46%] 2025-12-04T14:35:45.9319059Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_tensor_attributes_cuda PASSED [0.0012s] [ 46%] 2025-12-04T14:35:45.9319365Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_threshold_backward_cuda PASSED [0.0012s] [ 46%] 2025-12-04T14:35:45.9319656Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_copy_cuda PASSED [0.0011s] [ 46%] 2025-12-04T14:35:45.9319967Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda PASSED [0.0013s] [ 46%] 2025-12-04T14:35:45.9320395Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float16 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 46%] 2025-12-04T14:35:45.9320968Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float32 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 46%] 2025-12-04T14:35:45.9321539Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float64 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 46%] 2025-12-04T14:35:45.9322117Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 47%] 2025-12-04T14:35:45.9322699Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 47%] 2025-12-04T14:35:45.9323309Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float16 SKIPPED [0.0006s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 47%] 2025-12-04T14:35:45.9323881Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 47%] 2025-12-04T14:35:45.9324453Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float32 SKIPPED [0.0005s] (skipCUDAIfRocm: test doesn't currently work on the ROCm stack) [ 47%] 2025-12-04T14:35:45.9324992Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_bool PASSED [0.0017s] [ 47%] 2025-12-04T14:35:45.9325378Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float16 PASSED [0.0016s] [ 47%] 2025-12-04T14:35:45.9325767Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float32 PASSED [0.0016s] [ 47%] 2025-12-04T14:35:45.9326148Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_True_cuda_bool PASSED [0.0007s] [ 47%] 2025-12-04T14:35:45.9326532Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float16 PASSED [0.0016s] [ 48%] 2025-12-04T14:35:45.9326920Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float32 PASSED [0.0016s] [ 48%] 2025-12-04T14:35:45.9327309Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float64 PASSED [0.0016s] [ 48%] 2025-12-04T14:35:45.9327693Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float16 PASSED [0.0031s] [ 48%] 2025-12-04T14:35:45.9328078Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float32 PASSED [0.0029s] [ 48%] 2025-12-04T14:35:45.9328459Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64 PASSED [0.0039s] [ 48%] 2025-12-04T14:35:45.9328846Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float16 PASSED [0.0019s] [ 48%] 2025-12-04T14:35:45.9329227Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_bool PASSED [0.0006s] [ 48%] 2025-12-04T14:35:45.9329659Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float16 PASSED [0.0031s] [ 49%] 2025-12-04T14:35:45.9330047Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float32 PASSED [0.0031s] [ 49%] 2025-12-04T14:35:45.9330414Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_transposed_inputs_cuda PASSED [0.0142s] [ 49%] 2025-12-04T14:35:45.9330746Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32 PASSED [0.0031s] [ 49%] 2025-12-04T14:35:45.9331069Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_0_cuda PASSED [0.0010s] [ 49%] 2025-12-04T14:35:45.9331396Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_2_cuda PASSED [0.0014s] [ 49%] 2025-12-04T14:35:45.9331719Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_3_cuda PASSED [0.0012s] [ 49%] 2025-12-04T14:35:45.9332068Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_equals_2_bad_dim_cuda PASSED [0.0009s] [ 49%] 2025-12-04T14:35:45.9332422Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda PASSED [0.0036s] [ 50%] 2025-12-04T14:35:45.9332735Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unsafe_view_cuda PASSED [0.0016s] [ 50%] 2025-12-04T14:35:45.9333036Z test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_views_inherit_ragged_dim_cuda PASSED [0.0015s] [ 50%] 2025-12-04T14:35:45.9333382Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rsub___cuda_float32 PASSED [0.4369s] [ 50%] 2025-12-04T14:35:45.9333690Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acosh_cuda_float32 PASSED [0.4540s] [ 50%] 2025-12-04T14:35:45.9334009Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32 SKIPPED [0.9449s] (Skipped!) [ 50%] 2025-12-04T14:35:45.9334404Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 50%] 2025-12-04T14:35:45.9334738Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32 SKIPPED [0.7114s] (Skipped!) [ 50%] 2025-12-04T14:35:45.9335071Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 50%] 2025-12-04T14:35:45.9335389Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_angle_cuda_float32 PASSED [0.4414s] [ 50%] 2025-12-04T14:35:45.9335693Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32 PASSED [0.4672s] [ 50%] 2025-12-04T14:35:45.9335995Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32 PASSED [0.4443s] [ 51%] 2025-12-04T14:35:45.9336296Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32 PASSED [0.4477s] [ 51%] 2025-12-04T14:35:45.9336600Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atanh_cuda_float32 PASSED [0.4481s] [ 51%] 2025-12-04T14:35:45.9336912Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_bfloat16_cuda_float32 PASSED [0.4441s] [ 51%] 2025-12-04T14:35:45.9337224Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32 PASSED [0.3467s] [ 51%] 2025-12-04T14:35:45.9337528Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_ceil_cuda_float32 PASSED [0.4487s] [ 51%] 2025-12-04T14:35:45.9337829Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chunk_cuda_float32 PASSED [0.9857s] [ 51%] 2025-12-04T14:35:45.9338156Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_max_cuda_float32 SKIPPED [1.4923s] (Skipped!) [ 51%] 2025-12-04T14:35:45.9338505Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_max_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 51%] 2025-12-04T14:35:45.9338828Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_cuda_float32 PASSED [0.3972s] [ 52%] 2025-12-04T14:35:45.9339192Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_copysign_cuda_float32 SKIPPED [1.0360s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9339540Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_copysign_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9339864Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cos_cuda_float32 PASSED [0.4194s] [ 52%] 2025-12-04T14:35:45.9340201Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_floor_rounding_cuda_float32 SKIPPED [1.3236s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9340574Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_div_floor_rounding_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9340913Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_erf_cuda_float32 PASSED [0.4227s] [ 52%] 2025-12-04T14:35:45.9341217Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_expm1_cuda_float32 PASSED [0.4837s] [ 52%] 2025-12-04T14:35:45.9341550Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_power_cuda_float32 SKIPPED [1.4550s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9341900Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_power_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9342228Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_floor_cuda_float32 PASSED [0.4220s] [ 52%] 2025-12-04T14:35:45.9342547Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmod_cuda_float32 SKIPPED [1.4342s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9342880Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmod_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 52%] 2025-12-04T14:35:45.9343223Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_hypot_cuda_float32 SKIPPED [1.4365s] (Skipped!) [ 53%] 2025-12-04T14:35:45.9343607Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_hypot_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 53%] 2025-12-04T14:35:45.9343980Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_lgamma_cuda_float32 PASSED [0.7602s] [ 53%] 2025-12-04T14:35:45.9344306Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_linalg_vector_norm_cuda_float32 PASSED [0.4435s] [ 53%] 2025-12-04T14:35:45.9344630Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log1p_cuda_float32 PASSED [0.4450s] [ 53%] 2025-12-04T14:35:45.9344943Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32 PASSED [0.7135s] [ 53%] 2025-12-04T14:35:45.9345263Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amin_cuda_float32 PASSED [0.3545s] [ 53%] 2025-12-04T14:35:45.9345581Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_mean_cuda_float32 PASSED [0.7013s] [ 53%] 2025-12-04T14:35:45.9345898Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32 PASSED [0.3547s] [ 53%] 2025-12-04T14:35:45.9346224Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_select_cuda_float32 PASSED [0.3495s] [ 54%] 2025-12-04T14:35:45.9346544Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_sum_cuda_float32 PASSED [0.4226s] [ 54%] 2025-12-04T14:35:45.9346892Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_reduction_with_dim_cuda_float32 PASSED [0.7472s] [ 54%] 2025-12-04T14:35:45.9347240Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_maximum_cuda_float32 SKIPPED [1.0749s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9347583Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_maximum_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9347921Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32 SKIPPED [0.8940s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9348253Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9348648Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mul_cuda_float32 SKIPPED [1.1586s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9348979Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mul_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 54%] 2025-12-04T14:35:45.9349320Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [0.3990s] [ 54%] 2025-12-04T14:35:45.9349670Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [0.3781s] [ 54%] 2025-12-04T14:35:45.9350006Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nan_to_num_cuda_float32 PASSED [0.3595s] [ 55%] 2025-12-04T14:35:45.9350323Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nanmean_cuda_float32 PASSED [0.4298s] [ 55%] 2025-12-04T14:35:45.9350634Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nansum_cuda_float32 PASSED [0.3653s] [ 55%] 2025-12-04T14:35:45.9350941Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_neg_cuda_float32 PASSED [0.3655s] [ 55%] 2025-12-04T14:35:45.9351263Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_celu_cuda_float32 PASSED [0.3690s] [ 55%] 2025-12-04T14:35:45.9351616Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32 PASSED [0.3739s] [ 55%] 2025-12-04T14:35:45.9351985Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32 SKIPPED [1.0048s] (Skipped!) [ 55%] 2025-12-04T14:35:45.9352364Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 55%] 2025-12-04T14:35:45.9352723Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_mish_cuda_float32 PASSED [0.3689s] [ 55%] 2025-12-04T14:35:45.9353064Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_prelu_cuda_float32 PASSED [0.5628s] [ 55%] 2025-12-04T14:35:45.9353487Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rrelu_cuda_float32 PASSED [0.5784s] [ 56%] 2025-12-04T14:35:45.9353826Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_selu_cuda_float32 PASSED [0.3461s] [ 56%] 2025-12-04T14:35:45.9354173Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softshrink_cuda_float32 PASSED [0.3745s] [ 56%] 2025-12-04T14:35:45.9354525Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32 PASSED [0.4111s] [ 56%] 2025-12-04T14:35:45.9354851Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polar_cuda_float32 PASSED [0.6005s] [ 56%] 2025-12-04T14:35:45.9355159Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_positive_cuda_float32 PASSED [0.3645s] [ 56%] 2025-12-04T14:35:45.9355479Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32 SKIPPED [1.1131s] (Skipped!) [ 56%] 2025-12-04T14:35:45.9355817Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 56%] 2025-12-04T14:35:45.9356158Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 ('RERUN', {'yellow': True}) [0.6679s] [ 56%] 2025-12-04T14:35:45.9356510Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 ('RERUN', {'yellow': True}) [0.6683s] [ 56%] 2025-12-04T14:35:45.9356838Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 FAILED [0.6594s] [ 56%] 2025-12-04T14:35:45.9357010Z 2025-12-04T14:35:45.9357069Z ==================================== RERUNS ==================================== 2025-12-04T14:35:45.9357264Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9357448Z Traceback (most recent call last): 2025-12-04T14:35:45.9357735Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9357980Z method(*args, **kwargs) 2025-12-04T14:35:45.9358206Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9358438Z method(*args, **kwargs) 2025-12-04T14:35:45.9358659Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9358887Z with policy(): 2025-12-04T14:35:45.9359102Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9359337Z raise RuntimeError(msg) 2025-12-04T14:35:45.9359756Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 306176 and is now reported as 322560 on device 0. CUDA driver allocated memory was 2529165312 and is now 2531262464. 2025-12-04T14:35:45.9360154Z 2025-12-04T14:35:45.9360234Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9360544Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9360778Z 2025-12-04T14:35:45.9360868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9361090Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9361272Z Traceback (most recent call last): 2025-12-04T14:35:45.9361507Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9361742Z method(*args, **kwargs) 2025-12-04T14:35:45.9361964Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9362230Z method(*args, **kwargs) 2025-12-04T14:35:45.9362457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9362684Z with policy(): 2025-12-04T14:35:45.9362899Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9363132Z raise RuntimeError(msg) 2025-12-04T14:35:45.9363583Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 322560 and is now reported as 338944 on device 0. CUDA driver allocated memory was 2531262464 and is now 2533359616. 2025-12-04T14:35:45.9363965Z 2025-12-04T14:35:45.9364042Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9364349Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9364586Z 2025-12-04T14:35:45.9364674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9364854Z =================================== FAILURES =================================== 2025-12-04T14:35:45.9365044Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9365224Z Traceback (most recent call last): 2025-12-04T14:35:45.9365457Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9365691Z method(*args, **kwargs) 2025-12-04T14:35:45.9365910Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9366139Z method(*args, **kwargs) 2025-12-04T14:35:45.9366357Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9366621Z with policy(): 2025-12-04T14:35:45.9366833Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9367066Z raise RuntimeError(msg) 2025-12-04T14:35:45.9367478Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 338944 and is now reported as 355328 on device 0. CUDA driver allocated memory was 2533359616 and is now 2535456768. 2025-12-04T14:35:45.9367861Z 2025-12-04T14:35:45.9367936Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9368242Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9368477Z 2025-12-04T14:35:45.9368564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9368895Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c8e74065146f9707.xml - 2025-12-04T14:35:45.9369188Z =========================== short test summary info ============================ 2025-12-04T14:35:45.9369784Z FAILED [0.6594s] test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 338944 and is now reported as 355328 on device 0. CUDA driver allocated memory was 2533359616 and is now 2535456768. 2025-12-04T14:35:45.9370304Z 2025-12-04T14:35:45.9370378Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9370683Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9370949Z 2025-12-04T14:35:45.9371043Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9371232Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:35:45.9371412Z ============= 1 failed, 440 passed, 48 skipped, 2 rerun in 45.77s ============== 2025-12-04T14:35:45.9371560Z Got exit code 1 2025-12-04T14:35:45.9371661Z Retrying single test... 2025-12-04T14:35:45.9371874Z Test results will be stored in test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c0492afac795e1e4.xml 2025-12-04T14:35:45.9372119Z ============================= test session starts ============================== 2025-12-04T14:35:45.9372332Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T14:35:45.9372523Z cachedir: .pytest_cache 2025-12-04T14:35:45.9372756Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:35:45.9373003Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T14:35:45.9373127Z configfile: pytest.ini 2025-12-04T14:35:45.9373390Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:35:45.9373670Z collecting ... collected 1644 items / 835 deselected / 809 selected 2025-12-04T14:35:45.9373975Z stepcurrent: skipping 475 already run items. Running only test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 2025-12-04T14:35:45.9374243Z Running 1 items in this shard 2025-12-04T14:35:45.9374318Z 2025-12-04T14:35:45.9374819Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 W1204 14:14:10.841000 1639202 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-12-04T14:35:45.9375543Z [W1204 14:14:10.120607561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9375737Z 2025-12-04T14:35:45.9375892Z [W1204 14:14:10.263095101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9376083Z 2025-12-04T14:35:45.9376236Z [W1204 14:14:11.268615218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9376426Z 2025-12-04T14:35:45.9376580Z [W1204 14:14:11.272443381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9376768Z 2025-12-04T14:35:45.9376921Z [W1204 14:14:11.276204215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9377113Z 2025-12-04T14:35:45.9377269Z [W1204 14:14:11.280070307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9377458Z 2025-12-04T14:35:45.9377613Z [W1204 14:14:11.297873401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9377805Z 2025-12-04T14:35:45.9377956Z [W1204 14:14:11.301022094 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9378147Z 2025-12-04T14:35:45.9378300Z [W1204 14:14:11.307050714 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9378491Z 2025-12-04T14:35:45.9378642Z [W1204 14:14:11.309602595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9378834Z 2025-12-04T14:35:45.9379032Z [W1204 14:14:11.313466778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9379224Z 2025-12-04T14:35:45.9379376Z [W1204 14:14:11.315792373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9379566Z 2025-12-04T14:35:45.9379716Z [W1204 14:14:11.320807138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9379908Z 2025-12-04T14:35:45.9380061Z [W1204 14:14:11.323241752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9380252Z 2025-12-04T14:35:45.9380405Z [W1204 14:14:11.326995655 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9380594Z 2025-12-04T14:35:45.9380752Z [W1204 14:14:11.329255772 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9380941Z 2025-12-04T14:35:45.9381095Z [W1204 14:14:11.333081324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9381286Z 2025-12-04T14:35:45.9381439Z [W1204 14:14:11.337095664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9381627Z 2025-12-04T14:35:45.9381780Z [W1204 14:14:11.343696206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9381968Z 2025-12-04T14:35:45.9382120Z [W1204 14:14:11.346653401 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9382308Z 2025-12-04T14:35:45.9382490Z [W1204 14:14:11.352778600 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9382684Z 2025-12-04T14:35:45.9382836Z [W1204 14:14:11.355384261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9383027Z 2025-12-04T14:35:45.9383179Z [W1204 14:14:11.358279268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9383413Z 2025-12-04T14:35:45.9383566Z [W1204 14:14:11.360725021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9383758Z 2025-12-04T14:35:45.9383910Z [W1204 14:14:11.379059177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9384102Z 2025-12-04T14:35:45.9384253Z [W1204 14:14:11.382963799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9384448Z 2025-12-04T14:35:45.9384599Z [W1204 14:14:11.387896765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9384792Z 2025-12-04T14:35:45.9384943Z [W1204 14:14:11.391675008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9385135Z 2025-12-04T14:35:45.9385290Z [W1204 14:14:11.395172446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9385480Z 2025-12-04T14:35:45.9385633Z [W1204 14:14:11.399047168 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9385823Z 2025-12-04T14:35:45.9385976Z [W1204 14:14:11.405040598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9386198Z 2025-12-04T14:35:45.9386356Z [W1204 14:14:11.407952915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9386544Z 2025-12-04T14:35:45.9386698Z [W1204 14:14:11.413845867 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9386886Z 2025-12-04T14:35:45.9387039Z [W1204 14:14:11.416408118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9387229Z 2025-12-04T14:35:45.9387383Z [W1204 14:14:11.420245641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9387574Z 2025-12-04T14:35:45.9387725Z [W1204 14:14:11.422600446 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9387918Z 2025-12-04T14:35:45.9388071Z [W1204 14:14:11.427625631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9388263Z 2025-12-04T14:35:45.9388415Z [W1204 14:14:11.430057984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9388607Z 2025-12-04T14:35:45.9388758Z [W1204 14:14:11.433813558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9389193Z 2025-12-04T14:35:45.9389346Z [W1204 14:14:11.436078054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9389538Z 2025-12-04T14:35:45.9389690Z [W1204 14:14:11.439549182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9389883Z 2025-12-04T14:35:45.9390103Z [W1204 14:14:11.442053215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9390296Z 2025-12-04T14:35:45.9390450Z [W1204 14:14:11.446090705 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9390641Z 2025-12-04T14:35:45.9390794Z [W1204 14:14:11.448611337 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9390983Z 2025-12-04T14:35:45.9391136Z [W1204 14:14:11.454300232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9391326Z 2025-12-04T14:35:45.9391478Z [W1204 14:14:11.456849374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9391668Z 2025-12-04T14:35:45.9391825Z [W1204 14:14:11.459666942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9392015Z 2025-12-04T14:35:45.9392169Z [W1204 14:14:11.462071776 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9392361Z 2025-12-04T14:35:45.9392415Z ('RERUN', {'yellow': True}) [0.7535s] [100%] 2025-12-04T14:35:45.9392778Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 [W1204 14:14:11.903581144 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9393089Z 2025-12-04T14:35:45.9393242Z [W1204 14:14:11.907624784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9393465Z 2025-12-04T14:35:45.9393617Z [W1204 14:14:11.912525270 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9393850Z 2025-12-04T14:35:45.9394002Z [W1204 14:14:11.916309974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9394195Z 2025-12-04T14:35:45.9394346Z [W1204 14:14:11.919807892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9394537Z 2025-12-04T14:35:45.9394689Z [W1204 14:14:11.923763962 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9394880Z 2025-12-04T14:35:45.9395031Z [W1204 14:14:11.929886121 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9395222Z 2025-12-04T14:35:45.9395375Z [W1204 14:14:11.932806737 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9395567Z 2025-12-04T14:35:45.9395722Z [W1204 14:14:11.938547241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9395912Z 2025-12-04T14:35:45.9396066Z [W1204 14:14:11.941091693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9396256Z 2025-12-04T14:35:45.9396409Z [W1204 14:14:11.944948406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9396601Z 2025-12-04T14:35:45.9396754Z [W1204 14:14:11.947414049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9396943Z 2025-12-04T14:35:45.9397097Z [W1204 14:14:11.952278136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9397288Z 2025-12-04T14:35:45.9397476Z [W1204 14:14:11.954691310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9397668Z 2025-12-04T14:35:45.9397821Z [W1204 14:14:11.958432284 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9398015Z 2025-12-04T14:35:45.9398167Z [W1204 14:14:11.960711430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9398360Z 2025-12-04T14:35:45.9398511Z [W1204 14:14:11.964543703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9398701Z 2025-12-04T14:35:45.9398852Z [W1204 14:14:11.967179553 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9399046Z 2025-12-04T14:35:45.9399198Z [W1204 14:14:11.971075045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9399390Z 2025-12-04T14:35:45.9399542Z [W1204 14:14:11.973388000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9399733Z 2025-12-04T14:35:45.9399884Z [W1204 14:14:11.979026746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9400075Z 2025-12-04T14:35:45.9400228Z [W1204 14:14:11.981593778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9400417Z 2025-12-04T14:35:45.9400571Z [W1204 14:14:11.984413046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9400789Z 2025-12-04T14:35:45.9400946Z [W1204 14:14:11.986780300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9401135Z 2025-12-04T14:35:45.9401288Z [W1204 14:14:11.004881919 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9401477Z 2025-12-04T14:35:45.9401631Z [W1204 14:14:11.008857930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9401820Z 2025-12-04T14:35:45.9401973Z [W1204 14:14:11.013706648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9402161Z 2025-12-04T14:35:45.9402315Z [W1204 14:14:11.017462691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9402506Z 2025-12-04T14:35:45.9402661Z [W1204 14:14:11.020920560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9402852Z 2025-12-04T14:35:45.9403004Z [W1204 14:14:11.024866261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9403195Z 2025-12-04T14:35:45.9403382Z [W1204 14:14:11.030519416 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9403572Z 2025-12-04T14:35:45.9403723Z [W1204 14:14:11.033317224 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9403914Z 2025-12-04T14:35:45.9404065Z [W1204 14:14:11.039223896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9404257Z 2025-12-04T14:35:45.9404441Z [W1204 14:14:11.041766488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9404635Z 2025-12-04T14:35:45.9404787Z [W1204 14:14:11.045607210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9404980Z 2025-12-04T14:35:45.9405133Z [W1204 14:14:11.048078764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9405321Z 2025-12-04T14:35:45.9405473Z [W1204 14:14:11.052977560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9405661Z 2025-12-04T14:35:45.9405815Z [W1204 14:14:11.055385184 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9406002Z 2025-12-04T14:35:45.9406156Z [W1204 14:14:11.059100049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9406350Z 2025-12-04T14:35:45.9406505Z [W1204 14:14:11.061365595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9406694Z 2025-12-04T14:35:45.9406848Z [W1204 14:14:11.064855053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9407036Z 2025-12-04T14:35:45.9407190Z [W1204 14:14:11.067465394 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9407381Z 2025-12-04T14:35:45.9407531Z [W1204 14:14:11.071493393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9407722Z 2025-12-04T14:35:45.9407873Z [W1204 14:14:11.073897668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9408100Z 2025-12-04T14:35:45.9408256Z [W1204 14:14:11.079522823 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9408450Z 2025-12-04T14:35:45.9408601Z [W1204 14:14:11.082088885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9408793Z 2025-12-04T14:35:45.9408944Z [W1204 14:14:11.084874443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9409135Z 2025-12-04T14:35:45.9409286Z [W1204 14:14:11.087269888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9409478Z 2025-12-04T14:35:45.9409528Z ('RERUN', {'yellow': True}) [0.6287s] [100%] 2025-12-04T14:35:45.9409885Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 [W1204 14:14:12.549932900 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9410195Z 2025-12-04T14:35:45.9410349Z [W1204 14:14:12.553928220 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9410537Z 2025-12-04T14:35:45.9410691Z [W1204 14:14:12.558847676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9410882Z 2025-12-04T14:35:45.9411036Z [W1204 14:14:12.562702959 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9411224Z 2025-12-04T14:35:45.9411378Z [W1204 14:14:12.566277505 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9411568Z 2025-12-04T14:35:45.9411750Z [W1204 14:14:12.570286365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9411940Z 2025-12-04T14:35:45.9412093Z [W1204 14:14:12.576447893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9412282Z 2025-12-04T14:35:45.9412437Z [W1204 14:14:12.579372690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9412629Z 2025-12-04T14:35:45.9412780Z [W1204 14:14:12.585114414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9412974Z 2025-12-04T14:35:45.9413126Z [W1204 14:14:12.587673935 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9413354Z 2025-12-04T14:35:45.9413508Z [W1204 14:14:12.591621866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9413701Z 2025-12-04T14:35:45.9413853Z [W1204 14:14:12.594078120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9414044Z 2025-12-04T14:35:45.9414194Z [W1204 14:14:12.598963677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9414387Z 2025-12-04T14:35:45.9414538Z [W1204 14:14:12.601411410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9414731Z 2025-12-04T14:35:45.9414881Z [W1204 14:14:12.605140824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9415078Z 2025-12-04T14:35:45.9415230Z [W1204 14:14:12.607419380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9415452Z 2025-12-04T14:35:45.9415605Z [W1204 14:14:12.611181444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9415794Z 2025-12-04T14:35:45.9415949Z [W1204 14:14:12.613821324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9416137Z 2025-12-04T14:35:45.9416291Z [W1204 14:14:12.617742136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9416480Z 2025-12-04T14:35:45.9416634Z [W1204 14:14:12.620088991 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9416823Z 2025-12-04T14:35:45.9416976Z [W1204 14:14:12.625833455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9417170Z 2025-12-04T14:35:45.9417324Z [W1204 14:14:12.628422426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9417518Z 2025-12-04T14:35:45.9417670Z [W1204 14:14:12.631265084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9417862Z 2025-12-04T14:35:45.9418013Z [W1204 14:14:12.633646828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9418205Z 2025-12-04T14:35:45.9418357Z [W1204 14:14:12.651879175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9418548Z 2025-12-04T14:35:45.9418701Z [W1204 14:14:12.655888175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9418896Z 2025-12-04T14:35:45.9419078Z [W1204 14:14:12.660784282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9419274Z 2025-12-04T14:35:45.9419426Z [W1204 14:14:12.664562796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9419619Z 2025-12-04T14:35:45.9419775Z [W1204 14:14:12.668083333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9419964Z 2025-12-04T14:35:45.9420120Z [W1204 14:14:12.672047354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9420311Z 2025-12-04T14:35:45.9420466Z [W1204 14:14:12.677705159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9420656Z 2025-12-04T14:35:45.9420810Z [W1204 14:14:12.680538747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9420999Z 2025-12-04T14:35:45.9421152Z [W1204 14:14:12.686555847 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9421342Z 2025-12-04T14:35:45.9421495Z [W1204 14:14:12.689135278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9421684Z 2025-12-04T14:35:45.9421838Z [W1204 14:14:12.693016210 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9422028Z 2025-12-04T14:35:45.9422179Z [W1204 14:14:12.695492903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9422394Z 2025-12-04T14:35:45.9422548Z [W1204 14:14:12.700441629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9422740Z 2025-12-04T14:35:45.9422891Z [W1204 14:14:12.702865743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9423082Z 2025-12-04T14:35:45.9423234Z [W1204 14:14:12.706620027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9423468Z 2025-12-04T14:35:45.9423619Z [W1204 14:14:12.708875243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9423811Z 2025-12-04T14:35:45.9423962Z [W1204 14:14:12.712427430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9424153Z 2025-12-04T14:35:45.9424309Z [W1204 14:14:12.715611192 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9424502Z 2025-12-04T14:35:45.9424657Z [W1204 14:14:12.720768425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9424845Z 2025-12-04T14:35:45.9424998Z [W1204 14:14:12.723333177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9425188Z 2025-12-04T14:35:45.9425342Z [W1204 14:14:12.729268588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9425530Z 2025-12-04T14:35:45.9425683Z [W1204 14:14:12.731899849 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9425873Z 2025-12-04T14:35:45.9426056Z [W1204 14:14:12.734740026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9426247Z 2025-12-04T14:35:45.9426402Z [W1204 14:14:12.737186750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9426590Z 2025-12-04T14:35:45.9426633Z FAILED [0.6305s] [100%] 2025-12-04T14:35:45.9426697Z 2025-12-04T14:35:45.9426755Z ==================================== RERUNS ==================================== 2025-12-04T14:35:45.9426951Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9427134Z Traceback (most recent call last): 2025-12-04T14:35:45.9427378Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9427616Z method(*args, **kwargs) 2025-12-04T14:35:45.9427840Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9428079Z method(*args, **kwargs) 2025-12-04T14:35:45.9428299Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9428528Z with policy(): 2025-12-04T14:35:45.9428741Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9428975Z raise RuntimeError(msg) 2025-12-04T14:35:45.9429390Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 0 and is now reported as 16384 on device 0. CUDA driver allocated memory was 807403520 and is now 859832320. 2025-12-04T14:35:45.9429761Z 2025-12-04T14:35:45.9429842Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9430156Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9430421Z 2025-12-04T14:35:45.9430516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9430723Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9431195Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9431648Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9431854Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9432041Z Traceback (most recent call last): 2025-12-04T14:35:45.9432278Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9432519Z method(*args, **kwargs) 2025-12-04T14:35:45.9432746Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9432978Z method(*args, **kwargs) 2025-12-04T14:35:45.9433202Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9433469Z with policy(): 2025-12-04T14:35:45.9433685Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9433921Z raise RuntimeError(msg) 2025-12-04T14:35:45.9434337Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 16384 and is now reported as 32768 on device 0. CUDA driver allocated memory was 859832320 and is now 874512384. 2025-12-04T14:35:45.9434713Z 2025-12-04T14:35:45.9434838Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9435152Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9435387Z 2025-12-04T14:35:45.9435476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9435679Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9436139Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9436587Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9436749Z =================================== FAILURES =================================== 2025-12-04T14:35:45.9436946Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9437128Z Traceback (most recent call last): 2025-12-04T14:35:45.9437365Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9437601Z method(*args, **kwargs) 2025-12-04T14:35:45.9437823Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9438059Z method(*args, **kwargs) 2025-12-04T14:35:45.9438281Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9438511Z with policy(): 2025-12-04T14:35:45.9438724Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9438960Z raise RuntimeError(msg) 2025-12-04T14:35:45.9439417Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 32768 and is now reported as 49152 on device 0. CUDA driver allocated memory was 874512384 and is now 889192448. 2025-12-04T14:35:45.9439794Z 2025-12-04T14:35:45.9439870Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9440178Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9440411Z 2025-12-04T14:35:45.9440498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9440699Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9441159Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9441607Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9441909Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c0492afac795e1e4.xml - 2025-12-04T14:35:45.9442199Z =========================== short test summary info ============================ 2025-12-04T14:35:45.9442779Z FAILED [0.6305s] test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 32768 and is now reported as 49152 on device 0. CUDA driver allocated memory was 874512384 and is now 889192448. 2025-12-04T14:35:45.9443330Z 2025-12-04T14:35:45.9443407Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9443847Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9444128Z 2025-12-04T14:35:45.9444215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9444404Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:35:45.9444579Z ================== 1 failed, 835 deselected, 2 rerun in 2.07s ================== 2025-12-04T14:35:45.9444729Z Got exit code 1 2025-12-04T14:35:45.9444834Z Retrying single test... 2025-12-04T14:35:45.9445048Z Test results will be stored in test-reports/python-pytest/test_nestedtensor/test_nestedtensor-6d96b44d51a97d13.xml 2025-12-04T14:35:45.9445294Z ============================= test session starts ============================== 2025-12-04T14:35:45.9445509Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T14:35:45.9445708Z cachedir: .pytest_cache 2025-12-04T14:35:45.9445943Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:35:45.9446193Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T14:35:45.9446322Z configfile: pytest.ini 2025-12-04T14:35:45.9446552Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:35:45.9446833Z collecting ... collected 1644 items / 835 deselected / 809 selected 2025-12-04T14:35:45.9447137Z stepcurrent: skipping 475 already run items. Running only test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 2025-12-04T14:35:45.9447410Z Running 1 items in this shard 2025-12-04T14:35:45.9447486Z 2025-12-04T14:35:45.9447983Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 W1204 14:14:16.412000 1639339 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-12-04T14:35:45.9448709Z [W1204 14:14:16.691497814 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9448909Z 2025-12-04T14:35:45.9449066Z [W1204 14:14:16.837099987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9449260Z 2025-12-04T14:35:45.9449416Z [W1204 14:14:16.842365468 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9449606Z 2025-12-04T14:35:45.9449765Z [W1204 14:14:16.846304599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9449958Z 2025-12-04T14:35:45.9450115Z [W1204 14:14:16.850309199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9450306Z 2025-12-04T14:35:45.9450462Z [W1204 14:14:16.854259980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9450653Z 2025-12-04T14:35:45.9450810Z [W1204 14:14:16.872434648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9451002Z 2025-12-04T14:35:45.9451156Z [W1204 14:14:16.875480923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9451348Z 2025-12-04T14:35:45.9451501Z [W1204 14:14:16.881242257 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9451693Z 2025-12-04T14:35:45.9451878Z [W1204 14:14:16.883646271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9452076Z 2025-12-04T14:35:45.9452228Z [W1204 14:14:16.887341796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9452421Z 2025-12-04T14:35:45.9452572Z [W1204 14:14:16.889643541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9452764Z 2025-12-04T14:35:45.9452916Z [W1204 14:14:16.894465359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9453109Z 2025-12-04T14:35:45.9453287Z [W1204 14:14:16.896777634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9453483Z 2025-12-04T14:35:45.9453641Z [W1204 14:14:16.900413890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9453837Z 2025-12-04T14:35:45.9453993Z [W1204 14:14:16.902584848 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9454184Z 2025-12-04T14:35:45.9454342Z [W1204 14:14:16.906159994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9454533Z 2025-12-04T14:35:45.9454689Z [W1204 14:14:16.908598678 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9454882Z 2025-12-04T14:35:45.9455039Z [W1204 14:14:16.912363631 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9455229Z 2025-12-04T14:35:45.9455385Z [W1204 14:14:16.914708406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9455613Z 2025-12-04T14:35:45.9455772Z [W1204 14:14:16.920166155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9455964Z 2025-12-04T14:35:45.9456120Z [W1204 14:14:16.922626018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9456314Z 2025-12-04T14:35:45.9456466Z [W1204 14:14:16.925374247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9456658Z 2025-12-04T14:35:45.9456811Z [W1204 14:14:16.927686782 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9457006Z 2025-12-04T14:35:45.9457159Z [W1204 14:14:16.945132661 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9457355Z 2025-12-04T14:35:45.9457508Z [W1204 14:14:16.948865116 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9457701Z 2025-12-04T14:35:45.9457854Z [W1204 14:14:16.953602735 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9458047Z 2025-12-04T14:35:45.9458198Z [W1204 14:14:16.957179211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9458393Z 2025-12-04T14:35:45.9458545Z [W1204 14:14:16.960482922 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9458738Z 2025-12-04T14:35:45.9458896Z [W1204 14:14:16.964177427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9459089Z 2025-12-04T14:35:45.9459275Z [W1204 14:14:16.969748483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9459467Z 2025-12-04T14:35:45.9459624Z [W1204 14:14:16.972517722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9459814Z 2025-12-04T14:35:45.9459971Z [W1204 14:14:16.978114078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9460163Z 2025-12-04T14:35:45.9460319Z [W1204 14:14:16.980551312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9460510Z 2025-12-04T14:35:45.9460665Z [W1204 14:14:16.984285036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9460856Z 2025-12-04T14:35:45.9461012Z [W1204 14:14:16.986543402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9461206Z 2025-12-04T14:35:45.9461358Z [W1204 14:14:16.991345390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9461552Z 2025-12-04T14:35:45.9461703Z [W1204 14:14:16.993638646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9461894Z 2025-12-04T14:35:45.9462047Z [W1204 14:14:16.997253132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9462242Z 2025-12-04T14:35:45.9462394Z [W1204 14:14:16.999405390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9462615Z 2025-12-04T14:35:45.9462769Z [W1204 14:14:16.002733500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9462962Z 2025-12-04T14:35:45.9463114Z [W1204 14:14:16.005167494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9463356Z 2025-12-04T14:35:45.9463510Z [W1204 14:14:16.009112895 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9463703Z 2025-12-04T14:35:45.9463861Z [W1204 14:14:16.011591568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9464050Z 2025-12-04T14:35:45.9464206Z [W1204 14:14:16.017230973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9464396Z 2025-12-04T14:35:45.9464554Z [W1204 14:14:16.019782005 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9464744Z 2025-12-04T14:35:45.9464898Z [W1204 14:14:16.022549464 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9465088Z 2025-12-04T14:35:45.9465244Z [W1204 14:14:16.024921348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9465434Z 2025-12-04T14:35:45.9465489Z ('RERUN', {'yellow': True}) [0.7425s] [100%] 2025-12-04T14:35:45.9465854Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 [W1204 14:14:17.476828361 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9466168Z 2025-12-04T14:35:45.9466321Z [W1204 14:14:17.480657084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9466554Z 2025-12-04T14:35:45.9466709Z [W1204 14:14:17.485313845 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9466904Z 2025-12-04T14:35:45.9467057Z [W1204 14:14:17.488940770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9467252Z 2025-12-04T14:35:45.9467404Z [W1204 14:14:17.492350049 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9467598Z 2025-12-04T14:35:45.9467751Z [W1204 14:14:17.496145873 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9467945Z 2025-12-04T14:35:45.9468097Z [W1204 14:14:17.501870767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9468292Z 2025-12-04T14:35:45.9468446Z [W1204 14:14:17.504653275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9468640Z 2025-12-04T14:35:45.9468796Z [W1204 14:14:17.510037415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9468987Z 2025-12-04T14:35:45.9469143Z [W1204 14:14:17.512432439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9469334Z 2025-12-04T14:35:45.9469490Z [W1204 14:14:17.516146634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9469680Z 2025-12-04T14:35:45.9469838Z [W1204 14:14:17.518473499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9470060Z 2025-12-04T14:35:45.9470220Z [W1204 14:14:17.523115199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9470413Z 2025-12-04T14:35:45.9470571Z [W1204 14:14:17.525451864 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9470761Z 2025-12-04T14:35:45.9470916Z [W1204 14:14:17.529031511 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9471111Z 2025-12-04T14:35:45.9471262Z [W1204 14:14:17.531199558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9471454Z 2025-12-04T14:35:45.9471606Z [W1204 14:14:17.534755285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9471800Z 2025-12-04T14:35:45.9471953Z [W1204 14:14:17.537287727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9472148Z 2025-12-04T14:35:45.9472301Z [W1204 14:14:17.541049131 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9472494Z 2025-12-04T14:35:45.9472646Z [W1204 14:14:17.543290568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9472840Z 2025-12-04T14:35:45.9472995Z [W1204 14:14:17.548724996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9473192Z 2025-12-04T14:35:45.9473373Z [W1204 14:14:17.551188950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9473568Z 2025-12-04T14:35:45.9473753Z [W1204 14:14:17.553982438 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9473945Z 2025-12-04T14:35:45.9474102Z [W1204 14:14:17.556304233 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9474292Z 2025-12-04T14:35:45.9474447Z [W1204 14:14:17.573657234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9474638Z 2025-12-04T14:35:45.9474796Z [W1204 14:14:17.577466647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9474986Z 2025-12-04T14:35:45.9475144Z [W1204 14:14:17.582154256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9475334Z 2025-12-04T14:35:45.9475492Z [W1204 14:14:17.585769793 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9475683Z 2025-12-04T14:35:45.9475839Z [W1204 14:14:17.589110382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9476032Z 2025-12-04T14:35:45.9476184Z [W1204 14:14:17.592921226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9476376Z 2025-12-04T14:35:45.9476527Z [W1204 14:14:17.598295765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9476721Z 2025-12-04T14:35:45.9476873Z [W1204 14:14:17.600992335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9477066Z 2025-12-04T14:35:45.9477219Z [W1204 14:14:17.606630480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9477451Z 2025-12-04T14:35:45.9477604Z [W1204 14:14:17.609075764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9477797Z 2025-12-04T14:35:45.9477953Z [W1204 14:14:17.612755909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9478148Z 2025-12-04T14:35:45.9478299Z [W1204 14:14:17.615091654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9478493Z 2025-12-04T14:35:45.9478649Z [W1204 14:14:17.619746834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9478839Z 2025-12-04T14:35:45.9478994Z [W1204 14:14:17.622039020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9479186Z 2025-12-04T14:35:45.9479344Z [W1204 14:14:17.625634546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9479534Z 2025-12-04T14:35:45.9479689Z [W1204 14:14:17.627792414 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9479878Z 2025-12-04T14:35:45.9480034Z [W1204 14:14:17.631122134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9480225Z 2025-12-04T14:35:45.9480379Z [W1204 14:14:17.633614227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9480569Z 2025-12-04T14:35:45.9480724Z [W1204 14:14:17.637571148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9480920Z 2025-12-04T14:35:45.9481098Z [W1204 14:14:17.639874254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9481293Z 2025-12-04T14:35:45.9481446Z [W1204 14:14:17.645271623 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9481638Z 2025-12-04T14:35:45.9481789Z [W1204 14:14:17.647714366 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9481983Z 2025-12-04T14:35:45.9482135Z [W1204 14:14:17.650408166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9482327Z 2025-12-04T14:35:45.9482480Z [W1204 14:14:17.652698242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9482674Z 2025-12-04T14:35:45.9482725Z ('RERUN', {'yellow': True}) [0.6156s] [100%] 2025-12-04T14:35:45.9483083Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 [W1204 14:14:17.079739267 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9483429Z 2025-12-04T14:35:45.9483586Z [W1204 14:14:17.083534310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9483779Z 2025-12-04T14:35:45.9483991Z [W1204 14:14:17.088208260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9484181Z 2025-12-04T14:35:45.9484338Z [W1204 14:14:17.091894145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9484528Z 2025-12-04T14:35:45.9484689Z [W1204 14:14:17.095322604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9484913Z 2025-12-04T14:35:45.9490050Z [W1204 14:14:17.099243245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9490251Z 2025-12-04T14:35:45.9490409Z [W1204 14:14:17.105152607 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9490600Z 2025-12-04T14:35:45.9490755Z [W1204 14:14:17.107977684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9490944Z 2025-12-04T14:35:45.9491099Z [W1204 14:14:17.113451872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9491290Z 2025-12-04T14:35:45.9491443Z [W1204 14:14:17.115927085 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9491643Z 2025-12-04T14:35:45.9491794Z [W1204 14:14:17.119717259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9491986Z 2025-12-04T14:35:45.9492141Z [W1204 14:14:17.122108773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9492332Z 2025-12-04T14:35:45.9492483Z [W1204 14:14:17.126887702 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9492674Z 2025-12-04T14:35:45.9492825Z [W1204 14:14:17.129289206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9493017Z 2025-12-04T14:35:45.9493169Z [W1204 14:14:17.133015310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9493439Z 2025-12-04T14:35:45.9493592Z [W1204 14:14:17.135256676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9493786Z 2025-12-04T14:35:45.9493939Z [W1204 14:14:17.138944311 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9494128Z 2025-12-04T14:35:45.9494282Z [W1204 14:14:17.141546802 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9494472Z 2025-12-04T14:35:45.9494626Z [W1204 14:14:17.145375035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9494816Z 2025-12-04T14:35:45.9494970Z [W1204 14:14:17.147680701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9495161Z 2025-12-04T14:35:45.9495320Z [W1204 14:14:17.153325026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9495509Z 2025-12-04T14:35:45.9495664Z [W1204 14:14:17.155838229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9495854Z 2025-12-04T14:35:45.9496008Z [W1204 14:14:17.158640487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9496200Z 2025-12-04T14:35:45.9496352Z [W1204 14:14:17.160995801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9496543Z 2025-12-04T14:35:45.9496697Z [W1204 14:14:17.178489370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9496927Z 2025-12-04T14:35:45.9497086Z [W1204 14:14:17.182329313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9497278Z 2025-12-04T14:35:45.9497429Z [W1204 14:14:17.187015243 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9497621Z 2025-12-04T14:35:45.9497772Z [W1204 14:14:17.190605519 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9497965Z 2025-12-04T14:35:45.9498117Z [W1204 14:14:17.193942799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9498308Z 2025-12-04T14:35:45.9498460Z [W1204 14:14:17.197722442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9498653Z 2025-12-04T14:35:45.9498809Z [W1204 14:14:17.203117032 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9498998Z 2025-12-04T14:35:45.9499151Z [W1204 14:14:17.205797722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9499340Z 2025-12-04T14:35:45.9499493Z [W1204 14:14:17.211420608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9499682Z 2025-12-04T14:35:45.9499836Z [W1204 14:14:17.213860441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9500025Z 2025-12-04T14:35:45.9500178Z [W1204 14:14:17.217634825 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9500368Z 2025-12-04T14:35:45.9500549Z [W1204 14:14:17.219975480 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9500739Z 2025-12-04T14:35:45.9500893Z [W1204 14:14:17.224679449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9501086Z 2025-12-04T14:35:45.9501238Z [W1204 14:14:17.226990065 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9501429Z 2025-12-04T14:35:45.9501583Z [W1204 14:14:17.230598031 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9501774Z 2025-12-04T14:35:45.9501926Z [W1204 14:14:17.232757869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9502116Z 2025-12-04T14:35:45.9502269Z [W1204 14:14:17.236115948 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9502462Z 2025-12-04T14:35:45.9502613Z [W1204 14:14:17.238772139 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9502807Z 2025-12-04T14:35:45.9502958Z [W1204 14:14:17.243548477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9503148Z 2025-12-04T14:35:45.9503349Z [W1204 14:14:17.246208537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9503541Z 2025-12-04T14:35:45.9503776Z [W1204 14:14:17.251987241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9503966Z 2025-12-04T14:35:45.9504124Z [W1204 14:14:17.254479324 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9504350Z 2025-12-04T14:35:45.9504503Z [W1204 14:14:17.257205263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9504693Z 2025-12-04T14:35:45.9504846Z [W1204 14:14:17.259542578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1... 2025-12-04T14:35:45.9505035Z 2025-12-04T14:35:45.9505079Z FAILED [0.6035s] [100%] 2025-12-04T14:35:45.9505144Z 2025-12-04T14:35:45.9505202Z ==================================== RERUNS ==================================== 2025-12-04T14:35:45.9505397Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9505579Z Traceback (most recent call last): 2025-12-04T14:35:45.9505827Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9506070Z method(*args, **kwargs) 2025-12-04T14:35:45.9506298Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9506531Z method(*args, **kwargs) 2025-12-04T14:35:45.9506751Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9506979Z with policy(): 2025-12-04T14:35:45.9507193Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9507425Z raise RuntimeError(msg) 2025-12-04T14:35:45.9507831Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 0 and is now reported as 16384 on device 0. CUDA driver allocated memory was 807403520 and is now 859832320. 2025-12-04T14:35:45.9508206Z 2025-12-04T14:35:45.9508321Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9508631Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9508864Z 2025-12-04T14:35:45.9508956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9509161Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9509627Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9510077Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9510280Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9510464Z Traceback (most recent call last): 2025-12-04T14:35:45.9510708Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9510942Z method(*args, **kwargs) 2025-12-04T14:35:45.9511168Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9511398Z method(*args, **kwargs) 2025-12-04T14:35:45.9511618Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9511845Z with policy(): 2025-12-04T14:35:45.9512059Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9512291Z raise RuntimeError(msg) 2025-12-04T14:35:45.9512703Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 16384 and is now reported as 32768 on device 0. CUDA driver allocated memory was 859832320 and is now 874512384. 2025-12-04T14:35:45.9513108Z 2025-12-04T14:35:45.9513186Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9513524Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9513759Z 2025-12-04T14:35:45.9513848Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9514048Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9514505Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9514950Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9515111Z =================================== FAILURES =================================== 2025-12-04T14:35:45.9515299Z __________ TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 __________ 2025-12-04T14:35:45.9515478Z Traceback (most recent call last): 2025-12-04T14:35:45.9515715Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9515951Z method(*args, **kwargs) 2025-12-04T14:35:45.9516176Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3329, in wrapper 2025-12-04T14:35:45.9516409Z method(*args, **kwargs) 2025-12-04T14:35:45.9516632Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 3328, in wrapper 2025-12-04T14:35:45.9516863Z with policy(): 2025-12-04T14:35:45.9517152Z File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 2705, in __exit__ 2025-12-04T14:35:45.9517390Z raise RuntimeError(msg) 2025-12-04T14:35:45.9517805Z RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 32768 and is now reported as 49152 on device 0. CUDA driver allocated memory was 874512384 and is now 889192448. 2025-12-04T14:35:45.9518184Z 2025-12-04T14:35:45.9518259Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9518572Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9518807Z 2025-12-04T14:35:45.9518897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9519099Z ----------------------------- Captured stderr call ----------------------------- 2025-12-04T14:35:45.9519565Z /opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/backends/cudnn/__init__.py:159: UserWarning: cuDNN Benchmark limit is not supported in MIOpen and will have no effect. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/cuda/Module.cpp:1960.) 2025-12-04T14:35:45.9520015Z torch._C._cuda_set_cudnn_benchmark_limit(_benchmark_limit) 2025-12-04T14:35:45.9520316Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-6d96b44d51a97d13.xml - 2025-12-04T14:35:45.9520605Z =========================== short test summary info ============================ 2025-12-04T14:35:45.9521187Z FAILED [0.6035s] test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 - RuntimeError: CUDA driver API confirmed a leak in __main__.TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32! Caching allocator allocated memory was 32768 and is now reported as 49152 on device 0. CUDA driver allocated memory was 874512384 and is now 889192448. 2025-12-04T14:35:45.9521744Z 2025-12-04T14:35:45.9521823Z To execute this test, run the following from the base repo dir: 2025-12-04T14:35:45.9522132Z PYTORCH_TEST_WITH_ROCM=1 PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/test_nestedtensor.py TestNestedTensorOpInfoCUDA.test_backward_prod_cuda_float32 2025-12-04T14:35:45.9522368Z 2025-12-04T14:35:45.9522456Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0 2025-12-04T14:35:45.9522647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!! 2025-12-04T14:35:45.9522820Z ================== 1 failed, 835 deselected, 2 rerun in 2.02s ================== 2025-12-04T14:35:45.9522968Z Got exit code 1 2025-12-04T14:35:45.9523169Z FAILED CONSISTENTLY: test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32 2025-12-04T14:35:45.9523513Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set 2025-12-04T14:35:45.9523837Z Test results will be stored in test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b2d820ac08da.xml 2025-12-04T14:35:45.9524081Z ============================= test session starts ============================== 2025-12-04T14:35:45.9524295Z platform linux -- Python 3.12.5, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.12/bin/python 2025-12-04T14:35:45.9524488Z cachedir: .pytest_cache 2025-12-04T14:35:45.9524716Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow] 2025-12-04T14:35:45.9524960Z rootdir: /var/lib/jenkins/pytorch 2025-12-04T14:35:45.9525084Z configfile: pytest.ini 2025-12-04T14:35:45.9525315Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0 2025-12-04T14:35:45.9525595Z collecting ... collected 1644 items / 476 deselected / 1168 selected 2025-12-04T14:35:45.9525767Z stepcurrent: skipping 476 already run items. 2025-12-04T14:35:45.9525909Z Running 360 items in this shard 2025-12-04T14:35:45.9526022Z 2025-12-04T14:35:45.9526531Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_reciprocal_cuda_float32 W1204 14:14:21.960000 1639476 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile. 2025-12-04T14:35:45.9527106Z PASSED [0.3152s] [ 0%] 2025-12-04T14:35:45.9527326Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_0_cuda_float32 PASSED [0.1408s] [ 0%] 2025-12-04T14:35:45.9527665Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_round_decimals_3_cuda_float32 PASSED [0.1393s] [ 0%] 2025-12-04T14:35:45.9527987Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rsqrt_cuda_float32 PASSED [0.1467s] [ 1%] 2025-12-04T14:35:45.9528299Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_select_cuda_float32 PASSED [0.2826s] [ 1%] 2025-12-04T14:35:45.9528607Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sin_cuda_float32 PASSED [0.1016s] [ 1%] 2025-12-04T14:35:45.9528909Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinc_cuda_float32 PASSED [0.5040s] [ 1%] 2025-12-04T14:35:45.9529210Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sinh_cuda_float32 PASSED [0.1077s] [ 2%] 2025-12-04T14:35:45.9529525Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32 PASSED [0.4158s] [ 2%] 2025-12-04T14:35:45.9529850Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i0e_cuda_float32 PASSED [0.3676s] [ 2%] 2025-12-04T14:35:45.9529995Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1e_cuda_float32 PASSED [0.1277s] [ 3%] 2025-12-04T14:35:45.9530139Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_ndtr_cuda_float32 PASSED [0.1249s] [ 3%] 2025-12-04T14:35:45.9530328Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32 PASSED [0.0630s] [ 3%] 2025-12-04T14:35:45.9530463Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32 PASSED [0.3173s] [ 3%] 2025-12-04T14:35:45.9530623Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_true_divide_cuda_float32 SKIPPED [0.8390s] (Skipped!) [ 4%] 2025-12-04T14:35:45.9530783Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_true_divide_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 4%] 2025-12-04T14:35:45.9530915Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_trunc_cuda_float32 PASSED [0.2898s] [ 4%] 2025-12-04T14:35:45.9531074Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unsqueeze_cuda_float32 SKIPPED [0.6650s] (Skipped!) [ 4%] 2025-12-04T14:35:45.9531229Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unsqueeze_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 4%] 2025-12-04T14:35:45.9531377Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_var_unbiased_cuda_float32 PASSED [0.2529s] [ 5%] 2025-12-04T14:35:45.9531526Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32 SKIPPED [0.6647s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9531679Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9531827Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_xlogy_cuda_float32 SKIPPED [1.1108s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9531977Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_xlogy_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9532140Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32 SKIPPED [5.5400s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9532330Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 5%] 2025-12-04T14:35:45.9532497Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32 SKIPPED [5.9124s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9532661Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9532824Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmod___cuda_float32 SKIPPED [7.0906s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9532983Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rmod___cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9533145Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32 SKIPPED [0.6733s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9533336Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9533505Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32 SKIPPED [1.2892s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9533665Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 6%] 2025-12-04T14:35:45.9533828Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32 SKIPPED [1.2975s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9533987Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9534150Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32 SKIPPED [1.2614s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9534310Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9534472Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan2_cuda_float32 SKIPPED [6.7217s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9534659Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan2_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 7%] 2025-12-04T14:35:45.9534821Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32 SKIPPED [1.3410s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9534981Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535143Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32 SKIPPED [1.1133s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535309Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535471Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32 SKIPPED [1.1073s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535638Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535797Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32 SKIPPED [2.7783s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9535958Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 8%] 2025-12-04T14:35:45.9536124Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_max_cuda_float32 SKIPPED [8.1761s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9536293Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_clamp_max_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9536469Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_physical_cuda_float32 SKIPPED [1.9367s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9536669Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_physical_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9536835Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_deg2rad_cuda_float32 SKIPPED [2.1001s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9537003Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_deg2rad_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 9%] 2025-12-04T14:35:45.9537170Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32 SKIPPED [2.0839s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9537332Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9537512Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_floor_rounding_cuda_float32 SKIPPED [6.4819s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9537690Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_floor_rounding_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9537858Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_double_cuda_float32 SKIPPED [1.8619s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9538019Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_double_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9538181Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32 SKIPPED [2.0136s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9538337Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 10%] 2025-12-04T14:35:45.9538497Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32 SKIPPED [2.2786s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9538654Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9538815Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp_cuda_float32 SKIPPED [2.2825s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9538996Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9539158Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32 SKIPPED [2.3023s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9539319Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9539480Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmin_cuda_float32 SKIPPED [8.4626s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9539643Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmin_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 11%] 2025-12-04T14:35:45.9539803Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32 SKIPPED [6.3634s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9539969Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9540146Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32 SKIPPED [1.4224s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9540328Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9540487Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log10_cuda_float32 SKIPPED [1.8935s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9540648Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log10_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 12%] 2025-12-04T14:35:45.9540807Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log1p_cuda_float32 SKIPPED [1.9191s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9540993Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log1p_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541151Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log_cuda_float32 SKIPPED [1.8975s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541311Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_log_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541477Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logaddexp_cuda_float32 SKIPPED [8.7522s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541647Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logaddexp_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541806Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32 SKIPPED [2.2369s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9541967Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 13%] 2025-12-04T14:35:45.9542146Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_logsumexp_cuda_float32 SKIPPED [1.5980s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9542322Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_logsumexp_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9542492Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_mean_cuda_float32 SKIPPED [1.5909s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9542658Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_mean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9542826Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_norm_cuda_float32 SKIPPED [1.5954s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9542991Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_norm_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 14%] 2025-12-04T14:35:45.9543188Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32 SKIPPED [1.5978s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9543389Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9543557Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_binary_cuda_float32 SKIPPED [5.0277s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9543723Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_max_binary_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9543883Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32 SKIPPED [4.9402s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9544041Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9544206Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32 SKIPPED [4.9945s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9544372Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 15%] 2025-12-04T14:35:45.9544556Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED [2.3196s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9544737Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9544916Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED [2.3224s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545095Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545259Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nan_to_num_cuda_float32 SKIPPED [2.3255s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545465Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nan_to_num_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545627Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nanmean_cuda_float32 SKIPPED [1.7668s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545791Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nanmean_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 16%] 2025-12-04T14:35:45.9545954Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_narrow_cuda_float32 SKIPPED [2.4722s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546118Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_narrow_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546274Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_neg_cuda_float32 SKIPPED [2.1965s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546443Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_neg_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546625Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_celu_cuda_float32 SKIPPED [2.3372s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546804Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_celu_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 17%] 2025-12-04T14:35:45.9546998Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardsigmoid_cuda_float32 SKIPPED [2.4273s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9547186Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardsigmoid_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9547370Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32 SKIPPED [2.4676s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9547582Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9547761Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32 SKIPPED [2.4697s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9547936Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9548124Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softshrink_cuda_float32 SKIPPED [2.7947s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9548309Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softshrink_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 18%] 2025-12-04T14:35:45.9548494Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32 SKIPPED [2.5444s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9548678Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9548864Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32 SKIPPED [2.4597s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9549048Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9549229Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_4_cuda_float32 SKIPPED [2.4692s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9549412Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_4_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 19%] 2025-12-04T14:35:45.9549569Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_pow_cuda_float32 SKIPPED [7.6579s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9549752Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_pow_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9549911Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32 SKIPPED [3.3334s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550070Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550231Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32 SKIPPED [2.5220s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550395Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550551Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32 SKIPPED [2.4016s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550711Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 20%] 2025-12-04T14:35:45.9550881Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_reciprocal_cuda_float32 SKIPPED [2.5293s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551050Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_reciprocal_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551219Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_remainder_cuda_float32 SKIPPED [8.0450s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551384Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_remainder_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551559Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32 SKIPPED [2.6937s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551731Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9551931Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_3_cuda_float32 SKIPPED [2.7192s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9552103Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_3_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 21%] 2025-12-04T14:35:45.9552265Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32 SKIPPED [2.7963s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9552423Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9552582Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32 SKIPPED [2.2613s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9552739Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9552903Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32 SKIPPED [3.5744s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9553067Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 22%] 2025-12-04T14:35:45.9553225Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32 SKIPPED [2.4985s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9553411Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9553572Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sigmoid_cuda_float32 SKIPPED [2.6523s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9553735Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sigmoid_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9553891Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32 SKIPPED [2.6391s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9554079Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9554248Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32 SKIPPED [2.7604s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9554419Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 23%] 2025-12-04T14:35:45.9554587Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1e_cuda_float32 SKIPPED [2.9366s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9554756Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i1e_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9554914Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_cuda_float32 SKIPPED [4.0766s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9555077Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9555235Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32 SKIPPED [2.8519s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9555394Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 24%] 2025-12-04T14:35:45.9555561Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32 SKIPPED [2.8435s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9555721Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9555885Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_squeeze_cuda_float32 SKIPPED [2.6546s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556046Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_squeeze_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556252Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_unbiased_cuda_float32 SKIPPED [2.3389s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556420Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_std_unbiased_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556577Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32 SKIPPED [6.6667s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556733Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 25%] 2025-12-04T14:35:45.9556890Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32 SKIPPED [4.9775s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557044Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557200Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32 SKIPPED [3.0003s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557358Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557518Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32 SKIPPED [3.0026s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557676Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9557844Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32 SKIPPED [7.9683s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9558012Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 26%] 2025-12-04T14:35:45.9558170Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_trunc_cuda_float32 SKIPPED [4.1198s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9558354Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_trunc_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9558520Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32 SKIPPED [5.7333s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9558688Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9558842Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32 SKIPPED [3.6833s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9558999Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 27%] 2025-12-04T14:35:45.9559166Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32 SKIPPED [3.6825s] (Skipped!) [ 28%] 2025-12-04T14:35:45.9559335Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 28%] 2025-12-04T14:35:45.9559496Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32 SKIPPED [5.8118s] (Skipped!) [ 28%] 2025-12-04T14:35:45.9559656Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 28%] 2025-12-04T14:35:45.9559801Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmod___cuda_float32 PASSED [5.8036s] [ 28%] 2025-12-04T14:35:45.9559949Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rmul___cuda_float32 PASSED [5.5785s] [ 28%] 2025-12-04T14:35:45.9560097Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rpow___cuda_float32 PASSED [5.0848s] [ 29%] 2025-12-04T14:35:45.9560241Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rsub___cuda_float32 PASSED [1.4841s] [ 29%] 2025-12-04T14:35:45.9560388Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_abs_cuda_float32 PASSED [1.9064s] [ 29%] 2025-12-04T14:35:45.9560551Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acos_cuda_float32 PASSED [1.9113s] [ 30%] 2025-12-04T14:35:45.9560694Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acosh_cuda_float32 PASSED [1.9052s] [ 30%] 2025-12-04T14:35:45.9560832Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_add_cuda_float32 PASSED [5.0444s] [ 30%] 2025-12-04T14:35:45.9560972Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_all_cuda_float32 PASSED [5.5022s] [ 30%] 2025-12-04T14:35:45.9561111Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amax_cuda_float32 PASSED [5.2521s] [ 31%] 2025-12-04T14:35:45.9561252Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amin_cuda_float32 PASSED [5.2755s] [ 31%] 2025-12-04T14:35:45.9561392Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_angle_cuda_float32 PASSED [2.1745s] [ 31%] 2025-12-04T14:35:45.9561534Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_any_cuda_float32 PASSED [5.4424s] [ 31%] 2025-12-04T14:35:45.9561675Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32 PASSED [3.5037s] [ 32%] 2025-12-04T14:35:45.9561818Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32 PASSED [3.6415s] [ 32%] 2025-12-04T14:35:45.9561958Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan2_cuda_float32 PASSED [5.3908s] [ 32%] 2025-12-04T14:35:45.9562098Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan_cuda_float32 PASSED [2.0489s] [ 33%] 2025-12-04T14:35:45.9562239Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atanh_cuda_float32 PASSED [2.0465s] [ 33%] 2025-12-04T14:35:45.9562386Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bfloat16_cuda_float32 PASSED [2.0741s] [ 33%] 2025-12-04T14:35:45.9562558Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bmm_cuda_float32 PASSED [2.5318s] [ 33%] 2025-12-04T14:35:45.9562699Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_bool_cuda_float32 PASSED [2.0612s] [ 34%] 2025-12-04T14:35:45.9562845Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cdouble_cuda_float32 PASSED [2.1087s] [ 34%] 2025-12-04T14:35:45.9562983Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ceil_cuda_float32 PASSED [2.0887s] [ 34%] 2025-12-04T14:35:45.9563126Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cfloat_cuda_float32 PASSED [2.2322s] [ 35%] 2025-12-04T14:35:45.9563297Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chalf_cuda_float32 PASSED [2.2416s] [ 35%] 2025-12-04T14:35:45.9563459Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chunk_cuda_float32 SKIPPED [4.5707s] (Skipped!) [ 35%] 2025-12-04T14:35:45.9563621Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_chunk_cuda_float32 SKIPPED [0.0001s] (Skipped!) [ 35%] 2025-12-04T14:35:45.9563771Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32 PASSED [5.7385s] [ 35%] 2025-12-04T14:35:45.9563911Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clone_cuda_float32 PASSED [3.9837s] [ 36%] 2025-12-04T14:35:45.9564058Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_complex_cuda_float32 PASSED [1.7929s] [ 36%] 2025-12-04T14:35:45.9564196Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_cuda_float32 PASSED [2.1764s] [ 36%] 2025-12-04T14:35:45.9564352Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32 PASSED [2.1997s] [ 36%] 2025-12-04T14:35:45.9564503Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_count_nonzero_cuda_float32 PASSED [1.7802s] [ 37%] 2025-12-04T14:35:45.9564697Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_no_rounding_mode_cuda_float32 PASSED [5.3063s] [ 37%] 2025-12-04T14:35:45.9564838Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32 PASSED [5.5454s] [ 37%] 2025-12-04T14:35:45.9564978Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp2_cuda_float32 PASSED [3.1489s] [ 38%] 2025-12-04T14:35:45.9565118Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_exp_cuda_float32 PASSED [3.1391s] [ 38%] 2025-12-04T14:35:45.9565260Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_float_cuda_float32 PASSED [3.1115s] [ 38%] 2025-12-04T14:35:45.9565403Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmin_cuda_float32 PASSED [6.9642s] [ 38%] 2025-12-04T14:35:45.9565542Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmod_cuda_float32 PASSED [5.6205s] [ 39%] 2025-12-04T14:35:45.9565688Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frexp_cuda_float32 PASSED [2.4953s] [ 39%] 2025-12-04T14:35:45.9565825Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ge_cuda_float32 PASSED [5.6492s] [ 39%] 2025-12-04T14:35:45.9565968Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32 PASSED [5.5405s] [ 40%] 2025-12-04T14:35:45.9566119Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hash_tensor_cuda_float32 PASSED [2.0898s] [ 40%] 2025-12-04T14:35:45.9566269Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_heaviside_cuda_float32 PASSED [6.7665s] [ 40%] 2025-12-04T14:35:45.9566409Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_hypot_cuda_float32 PASSED [5.7479s] [ 40%] 2025-12-04T14:35:45.9566546Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_i0_cuda_float32 PASSED [2.6096s] [ 41%] 2025-12-04T14:35:45.9566717Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_igamma_cuda_float32 PASSED [5.7935s] [ 41%] 2025-12-04T14:35:45.9566858Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_int_cuda_float32 PASSED [2.5165s] [ 41%] 2025-12-04T14:35:45.9567002Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isclose_cuda_float32 PASSED [8.9217s] [ 41%] 2025-12-04T14:35:45.9567150Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isfinite_cuda_float32 PASSED [2.7940s] [ 42%] 2025-12-04T14:35:45.9567292Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32 PASSED [2.5718s] [ 42%] 2025-12-04T14:35:45.9567431Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ldexp_cuda_float32 PASSED [5.9266s] [ 42%] 2025-12-04T14:35:45.9567571Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_le_cuda_float32 PASSED [6.8547s] [ 43%] 2025-12-04T14:35:45.9567716Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lgamma_cuda_float32 PASSED [3.5182s] [ 43%] 2025-12-04T14:35:45.9567858Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log10_cuda_float32 PASSED [3.4995s] [ 43%] 2025-12-04T14:35:45.9567997Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log1p_cuda_float32 PASSED [3.5057s] [ 43%] 2025-12-04T14:35:45.9568138Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log2_cuda_float32 PASSED [3.4986s] [ 44%] 2025-12-04T14:35:45.9568277Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log_cuda_float32 PASSED [3.5077s] [ 44%] 2025-12-04T14:35:45.9568429Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_and_cuda_float32 PASSED [7.0174s] [ 44%] 2025-12-04T14:35:45.9568568Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logit_cuda_float32 PASSED [3.8093s] [ 45%] 2025-12-04T14:35:45.9568735Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_long_cuda_float32 PASSED [3.6944s] [ 45%] 2025-12-04T14:35:45.9568871Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lt_cuda_float32 PASSED [6.9724s] [ 45%] 2025-12-04T14:35:45.9569024Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amax_cuda_float32 PASSED [3.1076s] [ 45%] 2025-12-04T14:35:45.9569183Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32 PASSED [3.0968s] [ 46%] 2025-12-04T14:35:45.9569338Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_select_cuda_float32 PASSED [3.3409s] [ 46%] 2025-12-04T14:35:45.9569490Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32 PASSED [3.1412s] [ 46%] 2025-12-04T14:35:45.9569638Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_var_cuda_float32 PASSED [3.1397s] [ 46%] 2025-12-04T14:35:45.9569791Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_binary_cuda_float32 PASSED [6.9087s] [ 47%] 2025-12-04T14:35:45.9569935Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_maximum_cuda_float32 PASSED [6.9961s] [ 47%] 2025-12-04T14:35:45.9570103Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_reduction_with_dim_cuda_float32 PASSED [4.5910s] [ 47%] 2025-12-04T14:35:45.9570247Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_minimum_cuda_float32 PASSED [6.3440s] [ 48%] 2025-12-04T14:35:45.9570387Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mul_cuda_float32 PASSED [6.8920s] [ 48%] 2025-12-04T14:35:45.9570551Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [3.7993s] [ 48%] 2025-12-04T14:35:45.9570716Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [3.8165s] [ 48%] 2025-12-04T14:35:45.9570887Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nan_to_num_cuda_float32 PASSED [3.8221s] [ 49%] 2025-12-04T14:35:45.9571033Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nanmean_cuda_float32 PASSED [3.3043s] [ 49%] 2025-12-04T14:35:45.9571171Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ne_cuda_float32 PASSED [7.0057s] [ 49%] 2025-12-04T14:35:45.9571312Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_neg_cuda_float32 PASSED [3.7631s] [ 50%] 2025-12-04T14:35:45.9571470Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_elu_cuda_float32 PASSED [3.8058s] [ 50%] 2025-12-04T14:35:45.9571641Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_embedding_cuda_float32 PASSED [3.4075s] [ 50%] 2025-12-04T14:35:45.9571813Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardshrink_cuda_float32 PASSED [3.9467s] [ 50%] 2025-12-04T14:35:45.9571979Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_prelu_cuda_float32 PASSED [3.2524s] [ 51%] 2025-12-04T14:35:45.9572141Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32 PASSED [3.8442s] [ 51%] 2025-12-04T14:35:45.9572300Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu_cuda_float32 PASSED [3.8976s] [ 51%] 2025-12-04T14:35:45.9572467Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32 PASSED [3.8995s] [ 51%] 2025-12-04T14:35:45.9572625Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_selu_cuda_float32 PASSED [3.9632s] [ 52%] 2025-12-04T14:35:45.9572791Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softplus_cuda_float32 PASSED [3.9549s] [ 52%] 2025-12-04T14:35:45.9572960Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softshrink_cuda_float32 PASSED [4.1801s] [ 52%] 2025-12-04T14:35:45.9573151Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softsign_cuda_float32 PASSED [3.9759s] [ 53%] 2025-12-04T14:35:45.9573351Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_threshold_cuda_float32 PASSED [3.9956s] [ 53%] 2025-12-04T14:35:45.9573495Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polar_cuda_float32 PASSED [3.4701s] [ 53%] 2025-12-04T14:35:45.9573662Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_2_cuda_float32 PASSED [3.9242s] [ 53%] 2025-12-04T14:35:45.9573832Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32 PASSED [3.9401s] [ 54%] 2025-12-04T14:35:45.9573981Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_positive_cuda_float32 PASSED [3.8585s] [ 54%] 2025-12-04T14:35:45.9574125Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32 PASSED [7.2582s] [ 54%] 2025-12-04T14:35:45.9574269Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32 PASSED [5.5291s] [ 55%] 2025-12-04T14:35:45.9574410Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_real_cuda_float32 PASSED [3.9602s] [ 55%] 2025-12-04T14:35:45.9574559Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_remainder_cuda_float32 PASSED [7.2492s] [ 55%] 2025-12-04T14:35:45.9574699Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_cuda_float32 PASSED [3.8739s] [ 55%] 2025-12-04T14:35:45.9574864Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32 PASSED [3.8582s] [ 56%] 2025-12-04T14:35:45.9575006Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_select_cuda_float32 PASSED [5.2946s] [ 56%] 2025-12-04T14:35:45.9575183Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sign_cuda_float32 PASSED [3.9188s] [ 56%] 2025-12-04T14:35:45.9575323Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinc_cuda_float32 PASSED [3.9806s] [ 56%] 2025-12-04T14:35:45.9575463Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sinh_cuda_float32 PASSED [3.9105s] [ 57%] 2025-12-04T14:35:45.9575622Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_y0_cuda_float32 PASSED [3.9040s] [ 57%] 2025-12-04T14:35:45.9575803Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_u_cuda_float32 PASSED [7.0461s] [ 57%] 2025-12-04T14:35:45.9575981Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_v_cuda_float32 PASSED [6.5772s] [ 58%] 2025-12-04T14:35:45.9576135Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_erfcx_cuda_float32 PASSED [3.5635s] [ 58%] 2025-12-04T14:35:45.9576286Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1_cuda_float32 PASSED [3.5470s] [ 58%] 2025-12-04T14:35:45.9576440Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i1e_cuda_float32 PASSED [3.3900s] [ 58%] 2025-12-04T14:35:45.9576620Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_laguerre_polynomial_l_cuda_float32 PASSED [6.5003s] [ 59%] 2025-12-04T14:35:45.9576790Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i1_cuda_float32 PASSED [3.5904s] [ 59%] 2025-12-04T14:35:45.9576964Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k0_cuda_float32 PASSED [3.5626s] [ 59%] 2025-12-04T14:35:45.9577135Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k1_cuda_float32 PASSED [3.5698s] [ 60%] 2025-12-04T14:35:45.9577328Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtr_cuda_float32 PASSED [3.6590s] [ 60%] 2025-12-04T14:35:45.9577480Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32 PASSED [3.5125s] [ 60%] 2025-12-04T14:35:45.9577672Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [3.3607s] [ 60%] 2025-12-04T14:35:45.9577851Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_scaled_modified_bessel_k0_cuda_float32 PASSED [3.4802s] [ 61%] 2025-12-04T14:35:45.9578041Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32 PASSED [6.6173s] [ 61%] 2025-12-04T14:35:45.9578229Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_u_cuda_float32 PASSED [6.7840s] [ 61%] 2025-12-04T14:35:45.9578408Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_spherical_bessel_j0_cuda_float32 PASSED [3.6627s] [ 61%] 2025-12-04T14:35:45.9578552Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_split_cuda_float32 PASSED [4.2463s] [ 62%] 2025-12-04T14:35:45.9578693Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sqrt_cuda_float32 PASSED [3.6395s] [ 62%] 2025-12-04T14:35:45.9578836Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32 PASSED [3.6655s] [ 62%] 2025-12-04T14:35:45.9578981Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_squeeze_cuda_float32 PASSED [3.4868s] [ 63%] 2025-12-04T14:35:45.9579135Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_unbiased_cuda_float32 PASSED [3.2246s] [ 63%] 2025-12-04T14:35:45.9579273Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sub_cuda_float32 PASSED [6.8487s] [ 63%] 2025-12-04T14:35:45.9579438Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sum_cuda_float32 PASSED [7.8517s] [ 63%] 2025-12-04T14:35:45.9579577Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32 PASSED [4.4571s] [ 64%] 2025-12-04T14:35:45.9579715Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_to_cuda_float32 PASSED [5.1625s] [ 64%] 2025-12-04T14:35:45.9579853Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_var_cuda_float32 PASSED [4.0094s] [ 64%] 2025-12-04T14:35:45.9579995Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32 PASSED [5.9292s] [ 65%] 2025-12-04T14:35:45.9580127Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___radd___cuda_float32 PASSED [3.4921s] [ 65%] 2025-12-04T14:35:45.9580261Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rdiv___cuda_float32 PASSED [3.5020s] [ 65%] 2025-12-04T14:35:45.9580393Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmod___cuda_float32 PASSED [3.5063s] [ 65%] 2025-12-04T14:35:45.9580529Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acosh_cuda_float32 PASSED [3.1157s] [ 66%] 2025-12-04T14:35:45.9580659Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_all_cuda_float32 PASSED [3.3363s] [ 66%] 2025-12-04T14:35:45.9580789Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amax_cuda_float32 PASSED [3.3615s] [ 66%] 2025-12-04T14:35:45.9580917Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amin_cuda_float32 PASSED [3.3575s] [ 66%] 2025-12-04T14:35:45.9581049Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_angle_cuda_float32 PASSED [3.1246s] [ 67%] 2025-12-04T14:35:45.9581182Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_any_cuda_float32 PASSED [3.3282s] [ 67%] 2025-12-04T14:35:45.9581311Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asin_cuda_float32 PASSED [3.1318s] [ 67%] 2025-12-04T14:35:45.9581469Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32 PASSED [3.1227s] [ 68%] 2025-12-04T14:35:45.9581597Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan2_cuda_float32 PASSED [3.4669s] [ 68%] 2025-12-04T14:35:45.9581727Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atanh_cuda_float32 PASSED [3.1312s] [ 68%] 2025-12-04T14:35:45.9581855Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bool_cuda_float32 PASSED [3.1317s] [ 68%] 2025-12-04T14:35:45.9581984Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_byte_cuda_float32 PASSED [3.1426s] [ 69%] 2025-12-04T14:35:45.9582114Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cfloat_cuda_float32 PASSED [3.1293s] [ 69%] 2025-12-04T14:35:45.9582244Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_char_cuda_float32 PASSED [3.1274s] [ 69%] 2025-12-04T14:35:45.9582380Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32 PASSED [3.4959s] [ 70%] 2025-12-04T14:35:45.9582524Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_physical_cuda_float32 PASSED [3.1331s] [ 70%] 2025-12-04T14:35:45.9582652Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cosh_cuda_float32 PASSED [3.1230s] [ 70%] 2025-12-04T14:35:45.9582787Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_deg2rad_cuda_float32 PASSED [3.1076s] [ 70%] 2025-12-04T14:35:45.9582918Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32 PASSED [3.1071s] [ 71%] 2025-12-04T14:35:45.9583051Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_double_cuda_float32 PASSED [3.0926s] [ 71%] 2025-12-04T14:35:45.9583178Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_eq_cuda_float32 PASSED [3.4198s] [ 71%] 2025-12-04T14:35:45.9583345Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erf_cuda_float32 PASSED [3.1202s] [ 71%] 2025-12-04T14:35:45.9583513Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfc_cuda_float32 PASSED [3.1284s] [ 72%] 2025-12-04T14:35:45.9583646Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32 PASSED [3.2742s] [ 72%] 2025-12-04T14:35:45.9583776Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32 PASSED [3.1469s] [ 72%] 2025-12-04T14:35:45.9583904Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_expm1_cuda_float32 PASSED [3.1175s] [ 73%] 2025-12-04T14:35:45.9584033Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fill_cuda_float32 PASSED [3.1339s] [ 73%] 2025-12-04T14:35:45.9584161Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_cuda_float32 PASSED [3.1276s] [ 73%] 2025-12-04T14:35:45.9584290Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmax_cuda_float32 PASSED [3.5139s] [ 73%] 2025-12-04T14:35:45.9584417Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmin_cuda_float32 PASSED [3.5066s] [ 74%] 2025-12-04T14:35:45.9584550Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmod_cuda_float32 PASSED [3.4952s] [ 74%] 2025-12-04T14:35:45.9584679Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_frexp_cuda_float32 PASSED [3.1229s] [ 74%] 2025-12-04T14:35:45.9584808Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ge_cuda_float32 PASSED [3.4551s] [ 75%] 2025-12-04T14:35:45.9584934Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_gt_cuda_float32 PASSED [3.4418s] [ 75%] 2025-12-04T14:35:45.9585063Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_half_cuda_float32 PASSED [3.1208s] [ 75%] 2025-12-04T14:35:45.9585199Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_heaviside_cuda_float32 PASSED [3.4765s] [ 75%] 2025-12-04T14:35:45.9585327Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_i0_cuda_float32 PASSED [3.1282s] [ 76%] 2025-12-04T14:35:45.9585454Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_int_cuda_float32 PASSED [3.1205s] [ 76%] 2025-12-04T14:35:45.9585620Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isfinite_cuda_float32 PASSED [3.1176s] [ 76%] 2025-12-04T14:35:45.9585749Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isinf_cuda_float32 PASSED [3.1144s] [ 76%] 2025-12-04T14:35:45.9585880Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isnan_cuda_float32 PASSED [3.1082s] [ 77%] 2025-12-04T14:35:45.9586014Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isneginf_cuda_float32 PASSED [3.1325s] [ 77%] 2025-12-04T14:35:45.9586150Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32 PASSED [3.1102s] [ 77%] 2025-12-04T14:35:45.9586298Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_cuda_float32 PASSED [3.1242s] [ 78%] 2025-12-04T14:35:45.9586443Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_unary_cuda_float32 PASSED [6.1380s] [ 78%] 2025-12-04T14:35:45.9586578Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ldexp_cuda_float32 PASSED [3.5138s] [ 78%] 2025-12-04T14:35:45.9586703Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_le_cuda_float32 PASSED [3.4501s] [ 78%] 2025-12-04T14:35:45.9586836Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_lgamma_cuda_float32 PASSED [3.1337s] [ 79%] 2025-12-04T14:35:45.9586963Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32 PASSED [3.1133s] [ 79%] 2025-12-04T14:35:45.9587093Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log1p_cuda_float32 PASSED [3.1314s] [ 79%] 2025-12-04T14:35:45.9587221Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log2_cuda_float32 PASSED [3.1178s] [ 80%] 2025-12-04T14:35:45.9587350Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32 PASSED [3.1070s] [ 80%] 2025-12-04T14:35:45.9587485Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logaddexp_cuda_float32 PASSED [3.4681s] [ 80%] 2025-12-04T14:35:45.9587648Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_logical_or_cuda_float32 PASSED [3.4317s] [ 80%] 2025-12-04T14:35:45.9587794Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_logsumexp_cuda_float32 PASSED [6.1537s] [ 81%] 2025-12-04T14:35:45.9587935Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_mean_cuda_float32 PASSED [6.1433s] [ 81%] 2025-12-04T14:35:45.9588073Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_norm_cuda_float32 PASSED [6.1456s] [ 81%] 2025-12-04T14:35:45.9588215Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32 PASSED [6.1509s] [ 81%] 2025-12-04T14:35:45.9588350Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32 PASSED [6.1512s] [ 82%] 2025-12-04T14:35:45.9588509Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32 PASSED [3.2825s] [ 82%] 2025-12-04T14:35:45.9588644Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mean_cuda_float32 PASSED [3.3651s] [ 82%] 2025-12-04T14:35:45.9588772Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mul_cuda_float32 PASSED [3.4955s] [ 83%] 2025-12-04T14:35:45.9588928Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [3.1255s] [ 83%] 2025-12-04T14:35:45.9589064Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32 PASSED [3.1225s] [ 83%] 2025-12-04T14:35:45.9589199Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nanmean_cuda_float32 PASSED [3.2130s] [ 83%] 2025-12-04T14:35:45.9589332Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_narrow_cuda_float32 PASSED [3.2626s] [ 84%] 2025-12-04T14:35:45.9589460Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ne_cuda_float32 PASSED [3.4708s] [ 84%] 2025-12-04T14:35:45.9589633Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32 PASSED [3.1322s] [ 84%] 2025-12-04T14:35:45.9589782Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32 PASSED [3.1228s] [ 85%] 2025-12-04T14:35:45.9589941Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardshrink_cuda_float32 PASSED [3.1306s] [ 85%] 2025-12-04T14:35:45.9590093Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_linear_cuda_float32 PASSED [4.2671s] [ 85%] 2025-12-04T14:35:45.9590243Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32 PASSED [3.1323s] [ 85%] 2025-12-04T14:35:45.9590400Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rms_norm_cuda_float32 PASSED [3.1229s] [ 86%] 2025-12-04T14:35:45.9590549Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32 PASSED [6.1314s] [ 86%] 2025-12-04T14:35:45.9590702Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32 PASSED [3.1401s] [ 86%] 2025-12-04T14:35:45.9590849Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32 PASSED [3.1254s] [ 86%] 2025-12-04T14:35:45.9591004Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32 PASSED [3.1283s] [ 87%] 2025-12-04T14:35:45.9591165Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softshrink_cuda_float32 PASSED [3.1306s] [ 87%] 2025-12-04T14:35:45.9591321Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_threshold_cuda_float32 PASSED [3.1442s] [ 87%] 2025-12-04T14:35:45.9591453Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polar_cuda_float32 PASSED [6.2232s] [ 88%] 2025-12-04T14:35:45.9591610Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_0_cuda_float32 PASSED [3.1474s] [ 88%] 2025-12-04T14:35:45.9591795Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_2_cuda_float32 PASSED [3.1548s] [ 88%] 2025-12-04T14:35:45.9591930Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_positive_cuda_float32 PASSED [3.1308s] [ 88%] 2025-12-04T14:35:45.9592061Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32 PASSED [3.2658s] [ 89%] 2025-12-04T14:35:45.9592191Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_real_cuda_float32 PASSED [3.1249s] [ 89%] 2025-12-04T14:35:45.9592330Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_reciprocal_cuda_float32 PASSED [3.1334s] [ 89%] 2025-12-04T14:35:45.9592465Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_remainder_cuda_float32 PASSED [3.5062s] [ 90%] 2025-12-04T14:35:45.9592598Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_cuda_float32 PASSED [3.1270s] [ 90%] 2025-12-04T14:35:45.9592747Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_3_cuda_float32 PASSED [3.1278s] [ 90%] 2025-12-04T14:35:45.9592899Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_neg_3_cuda_float32 PASSED [3.1317s] [ 90%] 2025-12-04T14:35:45.9593030Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_short_cuda_float32 PASSED [3.1417s] [ 91%] 2025-12-04T14:35:45.9593164Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_signbit_cuda_float32 PASSED [3.1060s] [ 91%] 2025-12-04T14:35:45.9593330Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinc_cuda_float32 PASSED [3.1253s] [ 91%] 2025-12-04T14:35:45.9593460Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32 PASSED [3.1368s] [ 91%] 2025-12-04T14:35:45.9593607Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_airy_ai_cuda_float32 PASSED [3.1258s] [ 92%] 2025-12-04T14:35:45.9593756Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32 PASSED [3.1151s] [ 92%] 2025-12-04T14:35:45.9593953Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y0_cuda_float32 PASSED [3.3192s] [ 92%] 2025-12-04T14:35:45.9594121Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_t_cuda_float32 PASSED [3.6676s] [ 93%] 2025-12-04T14:35:45.9594288Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_u_cuda_float32 PASSED [3.4897s] [ 93%] 2025-12-04T14:35:45.9594454Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_chebyshev_polynomial_w_cuda_float32 PASSED [3.7022s] [ 93%] 2025-12-04T14:35:45.9594617Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_h_cuda_float32 PASSED [3.6278s] [ 93%] 2025-12-04T14:35:45.9594781Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_hermite_polynomial_he_cuda_float32 PASSED [3.6246s] [ 94%] 2025-12-04T14:35:45.9594923Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1_cuda_float32 PASSED [3.1221s] [ 94%] 2025-12-04T14:35:45.9595085Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k0_cuda_float32 PASSED [3.2733s] [ 94%] 2025-12-04T14:35:45.9595248Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k1_cuda_float32 PASSED [3.2589s] [ 95%] 2025-12-04T14:35:45.9595389Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtr_cuda_float32 PASSED [3.1183s] [ 95%] 2025-12-04T14:35:45.9595531Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_ndtri_cuda_float32 PASSED [3.1239s] [ 95%] 2025-12-04T14:35:45.9595698Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k1_cuda_float32 PASSED [3.2710s] [ 95%] 2025-12-04T14:35:45.9595875Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_t_cuda_float32 PASSED [3.4941s] [ 96%] 2025-12-04T14:35:45.9596090Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_u_cuda_float32 PASSED [3.5126s] [ 96%] 2025-12-04T14:35:45.9596268Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_v_cuda_float32 PASSED [3.7043s] [ 96%] 2025-12-04T14:35:45.9596445Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_w_cuda_float32 PASSED [3.7218s] [ 96%] 2025-12-04T14:35:45.9596592Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_xlog1py_cuda_float32 PASSED [3.5031s] [ 97%] 2025-12-04T14:35:45.9596735Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_zeta_cuda_float32 PASSED [3.5111s] [ 97%] 2025-12-04T14:35:45.9596881Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_with_sizes_cuda_float32 PASSED [3.1513s] [ 97%] 2025-12-04T14:35:45.9597015Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sum_cuda_float32 PASSED [3.3585s] [ 98%] 2025-12-04T14:35:45.9597142Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tan_cuda_float32 PASSED [3.1452s] [ 98%] 2025-12-04T14:35:45.9597271Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tanh_cuda_float32 PASSED [3.1452s] [ 98%] 2025-12-04T14:35:45.9597409Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_true_divide_cuda_float32 PASSED [3.4896s] [ 98%] 2025-12-04T14:35:45.9597539Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_trunc_cuda_float32 PASSED [3.1252s] [ 99%] 2025-12-04T14:35:45.9597678Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32 PASSED [3.1772s] [ 99%] 2025-12-04T14:35:45.9597811Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32 PASSED [3.3403s] [ 99%] 2025-12-04T14:35:45.9597965Z test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_nested_tensor_non_contiguous_mutation_cuda PASSED [3.1953s] [100%] 2025-12-04T14:35:45.9597991Z 2025-12-04T14:35:45.9598190Z - generated xml file: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-8372b2d820ac08da.xml - 2025-12-04T14:35:45.9598274Z ======== 273 passed, 174 skipped, 476 deselected in 1271.54s (0:21:11) ========= 2025-12-04T14:35:45.9598447Z The following tests failed consistently: ['test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32'] 2025-12-04T14:35:45.9598449Z 2025-12-04T14:35:45.9598590Z FINISHED PRINTING LOG FILE of test_nestedtensor 2/2 (test/test-reports/test_nestedtensor_2.2_7846e0a7f873b97b_.log) 2025-12-04T14:35:45.9598592Z 2025-12-04T14:35:45.9598697Z Finished test_nestedtensor 2/2 ... [2025-12-04 14:35:45.887834][2262930.154504843], took 22.48min 2025-12-04T14:35:45.9598929Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:35:45.9599021Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:35:45.9599119Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T14:35:45.9599169Z Uploading artifacts took 0.00 seconds 2025-12-04T14:35:45.9599216Z test_nestedtensor 2/2 failed! 2025-12-04T14:35:45.9599338Z Running test_rename_privateuse1_to_existing_device 1/1 ... [2025-12-04 14:35:45.894496][2262930.161169544] 2025-12-04T14:35:45.9599387Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:35:45.9599716Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_rename_privateuse1_to_existing_device.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:45.894661] 2025-12-04T14:35:48.0119605Z 2025-12-04T14:35:48.0121137Z test_rename_privateuse1_to_existing_device 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_rename_privateuse1_to_existing_device_1.1_1c55689db0b0487c_.log 2025-12-04T14:35:48.0122224Z Running 1 items in this shard: test/test_rename_privateuse1_to_existing_device.py::TestRenamePrivateuseoneToExistingBackend::test_external_module_register_with_existing_backend 2025-12-04T14:35:48.0122747Z 2025-12-04T14:35:48.0123015Z Finished test_rename_privateuse1_to_existing_device 1/1 ... [2025-12-04 14:35:48.011705][2262932.278376687], took 0.04min 2025-12-04T14:35:48.0131608Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:35:48.0182868Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:35:48.0184574Z Running test_scaled_matmul_cuda 1/1 ... [2025-12-04 14:35:48.018392][2262932.285065127] 2025-12-04T14:35:48.0184872Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:35:48.0186667Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_scaled_matmul_cuda.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:48.018567] 2025-12-04T14:35:56.7935064Z 2025-12-04T14:35:56.7942397Z test_scaled_matmul_cuda 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scaled_matmul_cuda_1.1_0c26f8e46140dee4_.log 2025-12-04T14:35:56.8139988Z Running 863 items in this shard: test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_error_messages_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_eye_b_eye_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_ones_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_b_scale_modified_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_ones_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_a_scale_modified_b_ones_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_from_data_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_False_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1023_64_48_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_1025_128_96_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_127_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_128_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_128_256_512_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_224_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_197_240_272_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_256_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_256_512_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_2_1024_128_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_31_1024_64_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_45_96_1024_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_512_128_256_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_mxfp8_nvfp4_mxfp4_numerics_test_case_name_data_random_scales_one_fast_accum_True_65_96_112_recipe_nvfp4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_compile_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1023_64_48_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_1025_128_96_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_127_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_128_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_128_256_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_256_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_256_512_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_2_1024_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_31_1024_64_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_45_96_1024_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_blockwise_nvfp4_with_global_scale_512_128_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_error_message_fp8_pre_sm89_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float32_output_errors_with_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_basics_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_bias_relu_edgecase_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_error_messages_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_rowwise_scaling_sanity_use_fast_accum_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_float8_scale_fast_accum_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_honor_sm_carveout_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_16_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_1_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2048_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_nvfp4_scaled_grouped_mm_2d_2d_G_4_M_2049_N_8192_K_16640_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_16_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_1_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_mxfp8_scaled_grouped_mm_2d_3d_G_4_M_16640_N_8192_K_4096_format_mxfp8_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_non_divisible_leading_dim_bias_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_pack_uint4_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_2d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_False_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_2d_3d_fast_accum_True_strided_False_wrap_v2_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_2d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_False_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_grouped_gemm_3d_3d_fast_accum_True_strided_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_bfloat16_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_128_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_128_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_256_N_768_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_384_N_128_K_1280_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_data_random_scales_one_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_eye_b_eye_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_calc_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_modify_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_block_wise_numerics_float32_lhs_block_1_rhs_block_1_M_512_N_512_K_512_test_case_x_ones_y_ones_set_scales_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_change_stride_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_deepseek_error_messages_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_512_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_bfloat16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_bfloat16_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_128_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_128_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_128_K_256_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_block_wise_verify_small_shapes_float32_lhs_block_1_rhs_block_1_M_256_N_256_K_128_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float16_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_float32_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_bfloat16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float16_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_scaled_mm_vs_emulated_row_wise_float32_shapes0_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_0_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_1_use_torch_compile_True_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_False_cuda, test/test_scaled_matmul_cuda.py::TestFP8MatmulCUDA::test_zero_dim_tensorwise_which_dim_zero_2_use_torch_compile_True_cuda 2025-12-04T14:35:56.8323040Z 2025-12-04T14:35:56.8323164Z Finished test_scaled_matmul_cuda 1/1 ... [2025-12-04 14:35:56.794144][2262941.060814407], took 0.15min 2025-12-04T14:35:56.8323582Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:35:56.8323946Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:35:56.8324166Z Running test_scatter_gather_ops 1/1 ... [2025-12-04 14:35:56.800718][2262941.067391337] 2025-12-04T14:35:56.8324353Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:35:56.8324739Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_scatter_gather_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:35:56.800886] 2025-12-04T14:36:08.3384965Z 2025-12-04T14:36:08.3386007Z test_scatter_gather_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_scatter_gather_ops_1.1_c93d20e6658170c3_.log 2025-12-04T14:36:08.3401589Z Running 76 items in this shard: test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_bool_cuda_bool, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_gather_large_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__reductions_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter__scalar_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add__cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_broadcasted_index_deterministic_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_add_mult_index_base_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_expanded_index_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amax_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_amin_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_mean_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_prod_cuda_uint8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_bfloat16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex128, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_complex64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_float64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int16, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int32, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int64, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_int8, test/test_scatter_gather_ops.py::TestScatterGatherCUDA::test_scatter_reduce_sum_cuda_uint8 2025-12-04T14:36:08.3411279Z 2025-12-04T14:36:08.3411393Z Finished test_scatter_gather_ops 1/1 ... [2025-12-04 14:36:08.338304][2262952.604973516], took 0.19min 2025-12-04T14:36:08.3411789Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T14:36:08.3450111Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T14:36:08.3452102Z Running test_schema_check 1/1 ... [2025-12-04 14:36:08.345075][2262952.611748903] 2025-12-04T14:36:08.3452294Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T14:36:08.3453326Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_schema_check.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:36:08.345247] 2025-12-04T15:02:21.0760462Z 2025-12-04T15:02:21.0762523Z test_schema_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_schema_check_1.1_b4afba39c7dfe07d_.log 2025-12-04T15:02:21.1700621Z Running 5996 items in this shard: test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_custom_ops_output_is_input, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_custom_ops_secretly_aliasing, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_custom_ops_secretly_mutating, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_multiple_operators, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_multiple_operators_centered, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_outputs_unexpectedly_aliasing, test/test_schema_check.py::TestSchemaCheck::test_alias_check_fail_simple, test/test_schema_check.py::TestSchemaCheck::test_is_alias_of_basic, test/test_schema_check.py::TestSchemaCheck::test_is_alias_of_empty_container, test/test_schema_check.py::TestSchemaCheck::test_mutation_check_fail, test/test_schema_check.py::TestSchemaCheck::test_mutation_check_fail_multiple_operators, test/test_schema_check.py::TestSchemaCheck::test_overlaps_basic, test/test_schema_check.py::TestSchemaCheck::test_overlaps_empty_container, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_empty_list_input, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_aliasing_inputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_default_replaced, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_device_input, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_kwarg_tensor, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_list_input, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_mutable_inputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_nested_training_op, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_training_op, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_wildcard_after, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_with_multiple_outputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_functionality_with_multiple_outputs_aliasing, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_aliasing_inputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_aliasing_outputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_as_strided, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_multiple_outputs, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_mutation, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_none, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_mutated_aliasing_resize_, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_operator_order, test/test_schema_check.py::TestSchemaCheck::test_schema_check_mode_operator_order_without_grad, test/test_schema_check.py::TestSchemaCheck::test_schema_info_bind_basic, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_H_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_T_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___getitem___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___radd___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rand___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rdiv___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmatmul___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmod___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rmul___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___ror___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rpow___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rsub___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness___rxor___cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__batch_norm_with_update_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__batch_norm_with_update_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__batch_norm_with_update_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__batch_norm_with_update_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__chunk_cat_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__native_batch_norm_legit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__native_batch_norm_legit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__native_batch_norm_legit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__native_batch_norm_legit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_lengths_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_lengths_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_lengths_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_lengths_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_offsets_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_offsets_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_offsets_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__segment_reduce_offsets_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__softmax_backward_data_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__softmax_backward_data_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__softmax_backward_data_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__softmax_backward_data_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__upsample_bilinear2d_aa_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__upsample_bilinear2d_aa_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__upsample_bilinear2d_aa_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness__upsample_bilinear2d_aa_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_abs_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acos_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_acosh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_add_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addbmm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcdiv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addcmul_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmm_decomposed_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addmv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_addr_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_alias_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_all_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_allclose_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_amin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_aminmax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_angle_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_any_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_arange_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argmin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argsort_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_argwhere_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_partial_views_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_as_strided_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_asinh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atan_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atanh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_1d_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_2d_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_atleast_3d_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_baddbmm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bernoulli_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bernoulli_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bernoulli_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bernoulli_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bfloat16_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bincount_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bincount_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bincount_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bincount_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bincount_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_and_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_left_shift_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_left_shift_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_left_shift_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_left_shift_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_left_shift_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_not_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_or_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_right_shift_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_right_shift_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_right_shift_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_right_shift_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_right_shift_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bitwise_xor_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_block_diag_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bmm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bool_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_shapes_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_tensors_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_broadcast_to_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_bucketize_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_byte_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cartesian_prod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cat_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cauchy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cauchy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cauchy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cauchy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdist_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdist_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cdouble_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ceil_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cfloat_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chalf_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_char_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_inverse_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_inverse_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_inverse_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_inverse_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cholesky_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_chunk_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_max_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clamp_min_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_clone_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_column_stack_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_combinations_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_complex_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_complex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_complex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_conj_physical_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_constant_pad_nd_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_contiguous_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_copysign_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_corrcoef_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cos_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cosh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_count_nonzero_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cov_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cross_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cummin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumprod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumsum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_cumulative_trapezoid_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_deg2rad_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diag_embed_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagflat_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diagonal_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_diff_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_digamma_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dist_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_floor_rounding_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_no_rounding_mode_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_div_trunc_rounding_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_double_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dsplit_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_dstack_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_einsum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_like_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_permuted_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_empty_strided_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eq_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_equal_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erf_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfc_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_erfinv_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exp_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_as_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expand_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_expm1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exponential_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exponential_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exponential_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_exponential_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float8_e4m3fn, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float8_e4m3fnuz, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float8_e5m2, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_float8_e5m2fnuz, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_eye_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_fftshift_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_hfftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ifftshift_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_ihfftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_irfftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfft_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fft_rfftn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fill_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flatten_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flip_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fliplr_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_flipud_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_float_power_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_floor_divide_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_fmod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frac_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frac_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frac_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frac_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_frexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_uint16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_uint32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_full_like_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gather_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gcd_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gcd_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gcd_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gcd_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gcd_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ge_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geometric_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geqrf_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geqrf_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geqrf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_geqrf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gradient_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_grid_sampler_3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_gt_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_half_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hash_tensor_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_heaviside_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_histc_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hsplit_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hstack_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hypot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hypot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hypot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_hypot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_i0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_igamma_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_igamma_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_igammac_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_igammac_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_imag_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_imag_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_imag_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_add_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_fill_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_put_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_amin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_mean_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_reduce_prod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_index_select_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_inner_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_int_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isclose_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isfinite_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isinf_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isnan_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isneginf_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isposinf_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_isreal_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_istft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_istft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_item_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_2inputs_2outputs_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_binary_return_by_ref_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_jiterator_unary_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kron_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_kthvalue_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lcm_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lcm_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lcm_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lcm_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lcm_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ldexp_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_le_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lerp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lgamma_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_ex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_ex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_ex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cholesky_ex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cond_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cond_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cond_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cond_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_cross_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_det_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_det_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_det_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_det_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_diagonal_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eig_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eig_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eig_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eig_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvals_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvals_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvals_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvals_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvalsh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvalsh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvalsh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_eigvalsh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_householder_product_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_householder_product_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_householder_product_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_householder_product_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_ex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_ex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_ex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_inv_ex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_ex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_ex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_ex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_factor_ex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_ldl_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_grad_oriented_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_grad_oriented_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_grad_oriented_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lstsq_grad_oriented_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_ex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_ex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_ex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_factor_ex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_lu_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_power_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_power_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_power_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_power_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_hermitian_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_hermitian_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_hermitian_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_matrix_rank_hermitian_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_multi_dot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_norm_subgradients_at_zero_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_hermitian_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_hermitian_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_hermitian_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_hermitian_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_singular_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_singular_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_singular_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_pinv_singular_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_qr_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_qr_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_qr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_qr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_slogdet_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_slogdet_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_slogdet_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_slogdet_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_ex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_ex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_ex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_ex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_triangular_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_triangular_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_triangular_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_solve_triangular_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svd_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svd_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svd_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svd_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svdvals_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svdvals_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svdvals_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_svdvals_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorinv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorinv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorinv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorinv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorsolve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorsolve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorsolve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_tensorsolve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vander_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vecdot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linalg_vector_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_linspace_tensor_overload_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log10_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log1p_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_normal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_normal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_normal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_normal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_log_softmax_with_dtype_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp2_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logaddexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logcumsumexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logdet_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logdet_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logdet_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logdet_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_and_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_not_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_or_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logical_xor_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logit_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logspace_tensor_overload_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_logsumexp_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_long_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lt_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_unpack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_unpack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_unpack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_lu_unpack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mH_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mT_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_amin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_argmin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumprod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_cumsum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_fill_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_log_softmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_log_softmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_log_softmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_log_softmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logaddexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logaddexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logaddexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logaddexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_logsumexp_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_median_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_median_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_median_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_median_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_normalize_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_prod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_select_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_softmin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_std_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_sum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_masked_var_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matmul_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_matrix_exp_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_binary_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_pool2d_with_indices_backward_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_pool2d_with_indices_backward_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_pool2d_with_indices_backward_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_no_dim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_max_reduction_with_dim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_maximum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_median_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_list_of_tensors_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_meshgrid_variadic_tensors_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_binary_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_no_dim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_min_reduction_with_dim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_minimum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mode_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_movedim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_msort_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mul_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_multinomial_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_multinomial_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_multinomial_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_multinomial_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mv_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_mvlgamma_mvlgamma_p_5_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nan_to_num_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanmedian_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanquantile_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nanquantile_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nansum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_narrow_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_batch_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_batch_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_batch_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_batch_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_dropout_backward_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_dropout_backward_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_dropout_backward_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_dropout_backward_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_layer_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_layer_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_layer_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_native_layer_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ne_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_neg_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_empty_strided_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_full_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_ones_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_new_zeros_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nextafter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nextafter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nextafter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nextafter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_alpha_dropout_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_alpha_dropout_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_alpha_dropout_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_alpha_dropout_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_avg_pool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_bilinear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_bilinear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_bilinear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_bilinear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_celu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_celu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_celu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_celu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_channel_shuffle_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_conv_transpose3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_similarity_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_similarity_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_similarity_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cosine_similarity_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cross_entropy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cross_entropy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cross_entropy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_cross_entropy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_ctc_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_ctc_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_dropout_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_elu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_elu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_elu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_elu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_bag_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_bag_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_bag_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_bag_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_embedding_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_fractional_max_pool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gaussian_nll_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gaussian_nll_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gaussian_nll_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gelu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gelu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gelu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_gelu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_glu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_glu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_glu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_glu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_grid_sample_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_grid_sample_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_grid_sample_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_grid_sample_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_group_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_group_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_group_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_group_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardshrink_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardshrink_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardshrink_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardshrink_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardsigmoid_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardsigmoid_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardsigmoid_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardsigmoid_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardswish_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardswish_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardswish_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardswish_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hardtanh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hinge_embedding_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hinge_embedding_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_hinge_embedding_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_huber_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_huber_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_huber_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_huber_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_instance_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_instance_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_instance_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_instance_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_area_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_area_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_area_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_area_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bicubic_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bicubic_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bicubic_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bilinear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bilinear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_bilinear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_linear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_linear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_linear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_linear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest-exact_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_nearest_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_trilinear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_trilinear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_interpolate_trilinear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_kl_div_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_kl_div_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_kl_div_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_kl_div_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_l1_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_layer_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_layer_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_layer_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_layer_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_leaky_relu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_leaky_relu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_leaky_relu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_leaky_relu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_linear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_local_response_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_local_response_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_local_response_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_local_response_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_logsigmoid_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_logsigmoid_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_logsigmoid_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_logsigmoid_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_margin_ranking_loss_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_pool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_grad_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_grad_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool1d_grad_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_grad_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_grad_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool2d_grad_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_grad_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_grad_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_max_unpool3d_grad_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mish_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mish_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mish_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mish_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mse_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mse_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mse_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_mse_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_head_attention_forward_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_head_attention_forward_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_head_attention_forward_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_margin_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_margin_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_margin_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multi_margin_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_margin_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_margin_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_margin_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_margin_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_nll_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_nll_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_nll_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_nll_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_normalize_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_one_hot_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_circular_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_constant_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_reflect_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pad_replicate_negative_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pairwise_distance_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pdist_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pdist_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_shuffle_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_pixel_unshuffle_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_poisson_nll_loss_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_prelu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_prelu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_prelu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_prelu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu6_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_relu_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rms_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rrelu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rrelu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rrelu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_rrelu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_scaled_dot_product_attention_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_selu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_selu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_selu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_selu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_complex_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_complex_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_silu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_smooth_l1_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_smooth_l1_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_smooth_l1_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_soft_margin_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_soft_margin_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_soft_margin_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softmin_with_dtype_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softplus_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softplus_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softplus_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softplus_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softshrink_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softshrink_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softshrink_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softshrink_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_softsign_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_tanhshrink_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_threshold_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_loss_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_triplet_margin_with_distance_loss_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_unfold_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_bilinear_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_bilinear_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_bilinear_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_nearest_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_nearest_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_nearest_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_nearest_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nn_functional_upsample_nearest_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_nonzero_static_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_fro_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_inf_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_nuc_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_nuc_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_nuc_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_norm_nuc_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_in_place_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_number_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_number_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_number_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_normal_number_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ones_like_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ormqr_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ormqr_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ormqr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ormqr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_outer_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pca_lowrank_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pca_lowrank_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pca_lowrank_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pca_lowrank_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_permute_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pinverse_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pinverse_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pinverse_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pinverse_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polar_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polar_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_2_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_3_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_polygamma_polygamma_n_4_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_positive_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_pow_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_prod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_put_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_qr_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_qr_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_qr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_qr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_quantile_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_quantile_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rad2deg_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rand_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randint_like_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_randn_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_ravel_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_real_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reciprocal_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_remainder_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_renorm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_repeat_interleave_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_as_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_reshape_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize__cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resize_as__cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_conj_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_resolve_neg_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_roll_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rot90_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_0_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_0_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_3_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_3_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_3_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_3_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_neg_3_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_neg_3_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_neg_3_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_round_decimals_neg_3_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsqrt_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_rsub_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scalar_tensor_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_add_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amax_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_amin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_mean_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_prod_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_scatter_reduce_sum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_searchsorted_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_select_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sgn_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_short_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sigmoid_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sign_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_bartlett_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_bartlett_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_blackman_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_blackman_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_cosine_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_cosine_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_exponential_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_exponential_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_gaussian_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_gaussian_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_general_cosine_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_general_cosine_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_general_hamming_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_general_hamming_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_hamming_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_hamming_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_hann_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_hann_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_kaiser_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_kaiser_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_nuttall_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signal_windows_nuttall_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_signbit_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sin_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinc_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sinh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_slice_scatter_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_softmax_with_dtype_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sort_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_mm_reduce_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_mm_reduce_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_mm_reduce_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_mm_reduce_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_sampled_addmm_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_sampled_addmm_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_sampled_addmm_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sparse_sampled_addmm_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_airy_ai_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_j1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_bessel_y1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_t_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_u_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_v_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_chebyshev_polynomial_w_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_entr_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_erfcx_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_h_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_hermite_polynomial_he_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i0e_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_i1e_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_laguerre_polynomial_l_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_legendre_polynomial_p_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_log_ndtr_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_i1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_modified_bessel_k1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtr_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_ndtri_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_scaled_modified_bessel_k1_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_spherical_bessel_j0_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_xlog1py_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_special_zeta_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_list_args_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_split_with_sizes_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sqrt_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_square_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_squeeze_multiple_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stack_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_mean_unbiased_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_std_unbiased_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stft_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stft_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stft_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_stft_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sub_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_sum_to_size_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_lowrank_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_lowrank_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_lowrank_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_svd_lowrank_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_t_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_along_dim_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_take_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tan_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tanh_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensor_split_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tensordot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tile_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_to_sparse_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_topk_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch__scaled_mm_cuda_float8_e4m3fn, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__efficient_attention_forward_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__efficient_attention_forward_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__flash_attention_forward_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trace_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_transpose_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapezoid_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trapz_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triangular_solve_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triangular_solve_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triangular_solve_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triangular_solve_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_indices_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_tril_indices_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_indices_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_triu_indices_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_true_divide_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_trunc_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unbind_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unflatten_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unfold_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_uniform_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_consecutive_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_uint16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_uint32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_uint64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unique_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unravel_index_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unravel_index_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unravel_index_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unravel_index_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unravel_index_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_chunk_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsafe_split_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_unsqueeze_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_mean_unbiased_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_var_unbiased_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vdot_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_complex_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_complex_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_complex_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_real_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_as_real_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_copy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_view_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vsplit_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_vstack_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_where_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_xlogy_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zero__cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_cuda_uint8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_bfloat16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_bool, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_complex128, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_complex32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_complex64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_float16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_float32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_float64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_int16, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_int32, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_int64, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_int8, test/test_schema_check.py::TestSchemaCheckModeOpInfoCUDA::test_schema_correctness_zeros_like_cuda_uint8 2025-12-04T15:02:21.2594778Z 2025-12-04T15:02:21.2594894Z Finished test_schema_check 1/1 ... [2025-12-04 15:02:21.080121][2264525.346791646], took 26.21min 2025-12-04T15:02:21.2595276Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:02:21.2595632Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:02:21.2595858Z GITHUB_RUN_ID, GITHUB_RUN_ATTEMPT, or ARTIFACTS_FILE_SUFFIX not set, not uploading 2025-12-04T15:02:21.2596115Z Uploading artifacts took 0.00 seconds 2025-12-04T15:02:21.2596279Z Running test_stateless 1/1 ... [2025-12-04 15:02:21.086652][2264525.353325302] 2025-12-04T15:02:21.2596448Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:02:21.2596824Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_stateless.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:21.086834] 2025-12-04T15:02:26.1099422Z 2025-12-04T15:02:26.1100281Z test_stateless 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_stateless_1.1_1dcaf4d84417af53_.log 2025-12-04T15:02:26.1110335Z Running 50 items in this shard: test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_circular_references_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_batch_norm_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_member_reference_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_multiple_dicts_error, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_tuple_dicts, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_error_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_data_parallel_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_gradient_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_jit_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_functional_call_with_kwargs_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_in_place_operator_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_module_fail_reset_to_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_special_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_some_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrize_tie_weights_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_reparametrized_module_change_parametrization_original_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_strict_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_setattr_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_errors_torch_func, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_no_error_without_flag, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_stateless, test/test_stateless.py::TestStatelessFunctionalAPI::test_tied_weights_warns_torch_func, test/test_stateless.py::TestStatelessDeprecation::test_private_stateless_warns, test/test_stateless.py::TestStatelessDeprecation::test_stateless_functional_call_warns, test/test_stateless.py::TestPythonOptimizeMode::test_runs_with_optimize_flag 2025-12-04T15:02:26.1118897Z 2025-12-04T15:02:26.1119007Z Finished test_stateless 1/1 ... [2025-12-04 15:02:26.109741][2264530.376410054], took 0.08min 2025-12-04T15:02:26.1119407Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:02:26.1162333Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:02:26.1164340Z Running test_subclass 1/1 ... [2025-12-04 15:02:26.116337][2264530.383010628] 2025-12-04T15:02:26.1167633Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:02:26.1168421Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_subclass.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:26.116523] 2025-12-04T15:02:28.3839636Z 2025-12-04T15:02:28.3840549Z test_subclass 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_subclass_1.1_0312735553bbef1d_.log 2025-12-04T15:02:28.3861231Z Running 100 items in this shard: test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_deepcopy_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_lazy_module_base_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_diag_tensor_below, test/test_subclass.py::TestSubclass::test_lazy_module_logging_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_sparse_tensor, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_lazy_module_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_module_optimization_base_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_diag_tensor_below, test/test_subclass.py::TestSubclass::test_module_optimization_logging_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_non_wrapper_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_sparse_tensor, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_sizes, test/test_subclass.py::TestSubclass::test_module_optimization_wrapper_with_custom_strides, test/test_subclass.py::TestSubclass::test_non_rewrapping_torch_dispatch_subclass_as_parameter_throws_for_detach, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_base_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_diag_tensor_below_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_logging_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_non_wrapper_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_sparse_tensor_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_sizes_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_False, test/test_subclass.py::TestSubclass::test_param_invariants_wrapper_with_custom_strides_tensor_requires_grad_True, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_base_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_diag_tensor_below_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_logging_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_non_wrapper_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_sparse_tensor_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_sizes_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_False, test/test_subclass.py::TestSubclass::test_parametrization_wrapper_with_custom_strides_leave_parametrized_True, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_repr_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_repr_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_repr_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_serialization_wrapper_with_custom_strides_as_param_True, test/test_subclass.py::TestSubclass::test_tensor_subclass_storage_data_accesses_throw, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_base_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_diag_tensor_below_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_logging_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_non_wrapper_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_sparse_tensor_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_sizes_as_param_True, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_False, test/test_subclass.py::TestSubclass::test_type_propagation_wrapper_with_custom_strides_as_param_True 2025-12-04T15:02:28.3874674Z 2025-12-04T15:02:28.3874801Z Finished test_subclass 1/1 ... [2025-12-04 15:02:28.383762][2264532.650432198], took 0.04min 2025-12-04T15:02:28.3875193Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:02:28.3902608Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:02:28.3904620Z Running test_sympy_utils 1/1 ... [2025-12-04 15:02:28.390346][2264532.657018923] 2025-12-04T15:02:28.3904872Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:02:28.3906265Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_sympy_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:28.390528] 2025-12-04T15:02:40.2218809Z 2025-12-04T15:02:40.2219950Z test_sympy_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sympy_utils_1.1_cabaf664b785f6a7_.log 2025-12-04T15:02:40.2255257Z Running 217 items in this shard: test/test_sympy_utils.py::TestNumbers::test_float_cast, test/test_sympy_utils.py::TestNumbers::test_int_infinity, test/test_sympy_utils.py::TestNumbers::test_lt_self, test/test_sympy_utils.py::TestNumbers::test_mixed_oo_int_oo, test/test_sympy_utils.py::TestNumbers::test_relation, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_and_, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_bitwise_xor, test/test_sympy_utils.py::TestValueRanges::test_binary_bool_ref_range_fn_or_, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_add_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_and_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_or_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_xor_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_bitwise_xor_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_floordiv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_maximum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_minimum_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mod_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_mul_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_by_natural_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_pow_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_sub_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_fn_truediv_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_add, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_bitwise_xor, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_eq, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_floordiv, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ge, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_gt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_le, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_lt, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_maximum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_minimum, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mod, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_mul, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_ne, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_pow_by_natural, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_sub, test/test_sympy_utils.py::TestValueRanges::test_binary_ref_range_fn_truediv, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_and, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_or, test/test_sympy_utils.py::TestValueRanges::test_bitwise_ref_range_fn_bitwise_xor, test/test_sympy_utils.py::TestValueRanges::test_mul_zero_unknown, test/test_sympy_utils.py::TestValueRanges::test_pow_half, test/test_sympy_utils.py::TestValueRanges::test_unary_bool_ref_range_fn_not_, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_abs_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_ceil_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_exp_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_floor_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_log_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_neg_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_reciprocal_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_sqrt_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_float, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_fn_square_dtype_int, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_abs, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_ceil, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_exp, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_floor, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_log, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_neg, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_reciprocal, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_sqrt, test/test_sympy_utils.py::TestValueRanges::test_unary_ref_range_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_bitwise_xor, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_interp_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_bitwise_xor, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_python_interp_fx_fn_truediv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_abs, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_add, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_and_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_and, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_or, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_bitwise_xor, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ceil, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_eq, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_exp, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floor, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_floordiv, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ge, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_gt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_le, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_log, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_lt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_maximum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_minimum, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mod, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_mul, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_ne, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_neg, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_not_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_or_, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_pow_by_natural, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_reciprocal, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sqrt, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_square, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_sub, test/test_sympy_utils.py::TestSympyInterp::test_tensor_interp_fn_truediv, test/test_sympy_utils.py::TestSympySolve::test_addition, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Equality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_LessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_floordiv_Unequality, test/test_sympy_utils.py::TestSympySolve::test_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympySolve::test_give_up, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Equality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_Unequality, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_LessThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_multiplication_division_inequality_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_Unequality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Equality, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_GreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_LessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictGreaterThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_StrictLessThan, test/test_sympy_utils.py::TestSympySolve::test_noop_rhs_Unequality, test/test_sympy_utils.py::TestSympySolve::test_simple_floordiv_gcd, test/test_sympy_utils.py::TestSympySolve::test_z3_proof_floordiv_eq_simplify, test/test_sympy_utils.py::TestSympyFunctions::test_pickle, test/test_sympy_utils.py::TestSingletonInt::test_basic, test/test_sympy_utils.py::TestIdentity::test_cast_identity_float, test/test_sympy_utils.py::TestIdentity::test_cast_identity_illegal, test/test_sympy_utils.py::TestIdentity::test_cast_identity_int, test/test_sympy_utils.py::TestIdentity::test_expand_identity, test/test_sympy_utils.py::TestTypedExpr::test_typed_expr 2025-12-04T15:02:40.2275716Z 2025-12-04T15:02:40.2275825Z Finished test_sympy_utils 1/1 ... [2025-12-04 15:02:40.221702][2264544.488370896], took 0.20min 2025-12-04T15:02:40.2276217Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:02:40.2284816Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:02:40.2286848Z Running test_tensorboard 1/1 ... [2025-12-04 15:02:40.228548][2264544.495221927] 2025-12-04T15:02:40.2287227Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:02:40.2288643Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_tensorboard.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:02:40.228736] 2025-12-04T15:04:32.2640672Z 2025-12-04T15:04:32.2641385Z test_tensorboard 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_tensorboard_1.1_d02a075c93751a1b_.log 2025-12-04T15:04:32.2648915Z Running 50 items in this shard: test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_autograd_np, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_histogram, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_histogram_raw, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_np, test/test_tensorboard.py::TestTensorBoardPyTorchNumpy::test_pytorch_write, test/test_tensorboard.py::TestTensorBoardUtils::test_convert_to_HWC_dtype_remains_same, test/test_tensorboard.py::TestTensorBoardUtils::test_numpy_vid_uint8, test/test_tensorboard.py::TestTensorBoardUtils::test_prepare_video, test/test_tensorboard.py::TestTensorBoardUtils::test_to_HWC, test/test_tensorboard.py::TestTensorBoardWriter::test_writer, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_pathlib, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_summary_writer_close, test/test_tensorboard.py::TestTensorBoardSummaryWriter::test_summary_writer_ctx, test/test_tensorboard.py::TestTensorBoardEmbedding::test_embedding, test/test_tensorboard.py::TestTensorBoardEmbedding::test_embedding_64, test/test_tensorboard.py::TestTensorBoardSummary::test_audio, test/test_tensorboard.py::TestTensorBoardSummary::test_custom_scalars, test/test_tensorboard.py::TestTensorBoardSummary::test_empty_input, test/test_tensorboard.py::TestTensorBoardSummary::test_float32_image, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_auto, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_doane, test/test_tensorboard.py::TestTensorBoardSummary::test_histogram_fd, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_3_channel_batched, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_boxes, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_one_channel, test/test_tensorboard.py::TestTensorBoardSummary::test_image_with_one_channel_batched, test/test_tensorboard.py::TestTensorBoardSummary::test_image_without_channel, test/test_tensorboard.py::TestTensorBoardSummary::test_list_input, test/test_tensorboard.py::TestTensorBoardSummary::test_mesh, test/test_tensorboard.py::TestTensorBoardSummary::test_scalar_new_style, test/test_tensorboard.py::TestTensorBoardSummary::test_text, test/test_tensorboard.py::TestTensorBoardSummary::test_uint8_image, test/test_tensorboard.py::TestTensorBoardSummary::test_video, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_mlp_graph, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_nested_nn_squential, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_pytorch_graph, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_pytorch_graph_dict_input, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_torchvision_smoke, test/test_tensorboard.py::TestTensorBoardPytorchGraph::test_wrong_input_size, test/test_tensorboard.py::TestTensorBoardFigure::test_figure, test/test_tensorboard.py::TestTensorBoardFigure::test_figure_list, test/test_tensorboard.py::TestTensorBoardNumpy::test_pytorch_np_expect_fail, test/test_tensorboard.py::TestTensorBoardNumpy::test_scalar, test/test_tensorboard.py::TestTensorProtoSummary::test_complex_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_empty_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_float_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_half_tensor_proto_bfloat16_proto_type_14, test/test_tensorboard.py::TestTensorProtoSummary::test_half_tensor_proto_float16_proto_type_19, test/test_tensorboard.py::TestTensorProtoSummary::test_int_tensor_proto, test/test_tensorboard.py::TestTensorProtoSummary::test_scalar_tensor_proto 2025-12-04T15:04:32.2654463Z 2025-12-04T15:04:32.2654578Z Finished test_tensorboard 1/1 ... [2025-12-04 15:04:32.263744][2264656.530412162], took 1.87min 2025-12-04T15:04:32.2659798Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:04:32.2706280Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:04:32.2708040Z Running test_utils_config_module 1/1 ... [2025-12-04 15:04:32.270734][2264656.537407571] 2025-12-04T15:04:32.2708235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:04:32.2710124Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_utils_config_module.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:32.270923] 2025-12-04T15:04:34.3391021Z 2025-12-04T15:04:34.3391835Z test_utils_config_module 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_utils_config_module_1.1_51789a45d701de93_.log 2025-12-04T15:04:34.3394841Z Running 22 items in this shard: test/test_utils_config_module.py::TestConfigModule::test_alias, test/test_utils_config_module.py::TestConfigModule::test_bad_jk_type, test/test_utils_config_module.py::TestConfigModule::test_base_value_loading, test/test_utils_config_module.py::TestConfigModule::test_codegen_config, test/test_utils_config_module.py::TestConfigModule::test_codegen_config_function, test/test_utils_config_module.py::TestConfigModule::test_dict_copy_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_semantics, test/test_utils_config_module.py::TestConfigModule::test_env_name_string_semantics, test/test_utils_config_module.py::TestConfigModule::test_get_hash, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_float, test/test_utils_config_module.py::TestConfigModule::test_invalid_config_int, test/test_utils_config_module.py::TestConfigModule::test_make_closur_patcher, test/test_utils_config_module.py::TestConfigModule::test_multi_env, test/test_utils_config_module.py::TestConfigModule::test_none_override_semantics, test/test_utils_config_module.py::TestConfigModule::test_overrides, test/test_utils_config_module.py::TestConfigModule::test_patch, test/test_utils_config_module.py::TestConfigModule::test_reference_is_default, test/test_utils_config_module.py::TestConfigModule::test_reference_semantics, test/test_utils_config_module.py::TestConfigModule::test_save_config, test/test_utils_config_module.py::TestConfigModule::test_save_config_portable, test/test_utils_config_module.py::TestConfigModule::test_type_loading, test/test_utils_config_module.py::TestConfigModule::test_unittest_patch 2025-12-04T15:04:34.3397257Z 2025-12-04T15:04:34.3397374Z Finished test_utils_config_module 1/1 ... [2025-12-04 15:04:34.338762][2264658.605429661], took 0.03min 2025-12-04T15:04:34.3410913Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:04:34.3460008Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:04:34.3461618Z Running test_view_ops 1/1 ... [2025-12-04 15:04:34.346026][2264658.612699376] 2025-12-04T15:04:34.3461809Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:04:34.3463381Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_view_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:34.346219] 2025-12-04T15:04:42.3754310Z 2025-12-04T15:04:42.3755035Z test_view_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_view_ops_1.1_d71082cf2b9a45ae_.log 2025-12-04T15:04:42.3782118Z Running 279 items in this shard: test/test_view_ops.py::TestViewOpsCUDA::test_T_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_assignment_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_advanced_indexing_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_gradients_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_as_strided_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_ellipses_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_newaxis_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_basic_indexing_slice_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_chunk_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_conj_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_self_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_conj_view_with_shared_memory_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_contiguous_self_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_diagonal_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_expand_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_flatten_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_imag_noncomplex_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_movedim_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_narrow_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_permute_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_real_imag_view_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_nonview_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_reshape_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_select_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex128_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_bool, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_float64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int16, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int32, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int64, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_int8, test/test_view_ops.py::TestViewOpsCUDA::test_set_real_imag_cuda_complex64_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_split_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_squeeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_t_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_transpose_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unbind_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unfold_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_inplace_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_unsqueeze_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_complex_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex32, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_real_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_as_view_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_out_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_copy_output_contiguous_cuda, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_new_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_dtype_upsize_errors_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_dsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_hsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_split_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bfloat16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_bool, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex128, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_complex64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_float64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int16, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int32, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int64, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_int8, test/test_view_ops.py::TestViewOpsCUDA::test_view_tensor_vsplit_cuda_uint8, test/test_view_ops.py::TestViewOpsCUDA::test_view_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_T_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_as_strided_overflow_storage_offset_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_atleast_gradient_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_big_transpose_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_shapes_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_tensors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_broadcast_to_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_chunk_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_conj_neg_view_numpy_error_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_contiguous_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_crow_col_indices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_empty_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_expand_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_flatten_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize__cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_memory_format_resize_as_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_narrow_tensor_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_python_types_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_ravel_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_reshape_view_semantics_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_as_preserves_strides_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_resize_overflow_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_split_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_t_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_errors_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_indices_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_tensor_split_sections_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_invalid_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transpose_vs_numpy_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bfloat16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_bool, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex128, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_complex64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_float64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int16, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int32, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int64, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_int8, test/test_view_ops.py::TestOldViewOpsCUDA::test_transposes_errors_cuda_uint8, test/test_view_ops.py::TestOldViewOpsCUDA::test_unsqueeze_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_all_dtypes_and_devices_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_cuda, test/test_view_ops.py::TestOldViewOpsCUDA::test_view_empty_cuda 2025-12-04T15:04:42.3808418Z 2025-12-04T15:04:42.3808520Z Finished test_view_ops 1/1 ... [2025-12-04 15:04:42.375414][2264666.642082992], took 0.13min 2025-12-04T15:04:42.3808903Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:04:42.3822166Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:04:42.3824131Z Running test_xnnpack_integration 1/1 ... [2025-12-04 15:04:42.382323][2264666.648996382] 2025-12-04T15:04:42.3824348Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:04:42.3825995Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'test_xnnpack_integration.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:04:42.382516] 2025-12-04T15:05:07.7757891Z 2025-12-04T15:05:07.7758694Z test_xnnpack_integration 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_xnnpack_integration_1.1_5b2a7d6dc719e1a6_.log 2025-12-04T15:05:07.7760274Z Running 12 items in this shard: test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKOps::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear, test/test_xnnpack_integration.py::TestXNNPACKOps::test_linear_1d_input, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_combined_model, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_conv2d_transpose, test/test_xnnpack_integration.py::TestXNNPACKSerDes::test_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_decomposed_linear, test/test_xnnpack_integration.py::TestXNNPACKRewritePass::test_linear, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_basic, test/test_xnnpack_integration.py::TestXNNPACKConv1dTransformPass::test_conv1d_with_relu_fc 2025-12-04T15:05:07.7761596Z 2025-12-04T15:05:07.7762267Z Finished test_xnnpack_integration 1/1 ... [2025-12-04 15:05:07.775520][2264692.042191268], took 0.42min 2025-12-04T15:05:07.7771336Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:07.7822402Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:07.7824286Z Running torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 15:05:07.782322][2264692.048995379] 2025-12-04T15:05:07.7824662Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:07.7826074Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraypad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:07.782503] 2025-12-04T15:05:09.9499816Z 2025-12-04T15:05:09.9501387Z torch_np/numpy_tests/lib/test_arraypad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraypad_1.1_4c224582872a57a2_.log 2025-12-04T15:05:09.9505263Z Running 9 items in this shard: test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float2, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_float3, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_odd_pad_amount, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_pad_2d, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_constant_zeros, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_check_large_integers, test/torch_np/numpy_tests/lib/test_arraypad.py::TestConstant::test_pad_empty_dimension 2025-12-04T15:05:09.9507088Z 2025-12-04T15:05:09.9507308Z Finished torch_np/numpy_tests/lib/test_arraypad 1/1 ... [2025-12-04 15:05:09.949680][2264694.21635024], took 0.04min 2025-12-04T15:05:09.9520153Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:09.9570186Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:09.9571819Z Running torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 15:05:09.957028][2264694.223701653] 2025-12-04T15:05:09.9572077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:09.9573430Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_arraysetops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:09.957208] 2025-12-04T15:05:12.2745426Z 2025-12-04T15:05:12.2746707Z torch_np/numpy_tests/lib/test_arraysetops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_arraysetops_1.1_2b94994a4f44349a_.log 2025-12-04T15:05:12.2755395Z Running 62 items in this shard: test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary0_prepend0_append_nan_expected_to_end, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary1_prepend1_append1_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_forbidden_type_casts_ary2_prepend_nan_append_nan_expected_to_begin, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary0_prepend_65536_append_65540_expected0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary1_prepend1_append1_expected1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary2_prepend_0_append_0_expected2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_ediff1d_scalar_handling_ary3_prepend_3_append_-9_expected3, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_are_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_both_arrays_have_structured_dtype, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_first_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_hit_alternate_algorithm, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_invert_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_boolean_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype10_dtype20_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_mixed_dtype_dtype11_dtype21_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_ravel_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_second_array_is_object, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_table_timedelta_fails, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_timedelta_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_in1d_with_arrays_containing_tuples, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_array_like, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_intersect1d_indices, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_sort, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_isin_kind_table, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_manyways, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_char_array, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setdiff1d_unique, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_setxor1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestSetOps::test_union1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_2, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_-1, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_1d_with_axis_axis_0, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_errors, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_list, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_axis_zeros, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_nanequals, test/torch_np/numpy_tests/lib/test_arraysetops.py::TestUnique::test_unique_sort_order_with_axis 2025-12-04T15:05:12.2763707Z 2025-12-04T15:05:12.2763853Z Finished torch_np/numpy_tests/lib/test_arraysetops 1/1 ... [2025-12-04 15:05:12.274268][2264696.540938934], took 0.04min 2025-12-04T15:05:12.2764286Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:12.2812867Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:12.2814840Z Running torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 15:05:12.281402][2264696.54807513] 2025-12-04T15:05:12.2815062Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:12.2816714Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:12.281578] 2025-12-04T15:05:15.6048953Z 2025-12-04T15:05:15.6050775Z torch_np/numpy_tests/lib/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_function_base_1.1_c0de44b76c2dc954_.log 2025-12-04T15:05:15.6125243Z Running 505 items in this shard: test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_rotation_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_4d, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_lr, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_ud, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_default_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_multiple_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_average_class_without_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x0_axis0_expected_avg0_weights0_expected_wavg0_expected_wsum0, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x1_axis_0_expected_avg1_weights1_expected_wavg1_expected_wsum1, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_returned, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_upcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_broadcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_deprecated_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_many_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_non_bool_deprecation, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_return_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_array_copied, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_-4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_multidim, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmax::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmin::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPtp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumsum::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestProd::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumprod::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_append, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_n, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_prepend, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_array_order_preserve, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_fancy, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_[1], test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_array([1]), test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_non_int, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_slices, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_args, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_badargs, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_decreasing_unsigned_int_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_inexact_dtypes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_second_order_accurate, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_spacing, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_specific_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_decreasing_unsigned_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestAngle::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_all_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_leading_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_list_to_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_no_trim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_overflow_arr0, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_size_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_trailing_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_both, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_place, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_casting_error, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_forward, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_decreasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_increasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_monotonic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_ndim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_dtype_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_bias, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_extreme, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_non_array, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_rowvar, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_variance, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_fweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_unit_fweights_and_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_wrong_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_int_beta, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMsort::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_invalid_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_shape, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_no_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_return_type, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_single_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_sparse, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_writeback, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_0d_condition, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_comparison, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_default, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_multidimensional_extrafunc, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_scalar_domains_three_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_two_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_dtype_reference_leaks, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals0, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_incorrect_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_and_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_smaller_than_maxvalue, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_complex_interp, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_exceptions, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_if_len_x_is_small, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_behavior_exact_x, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_period, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_right_left_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_scalar_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_zero_dimensional_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_2D, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_api, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_exception, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_extrapolation, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_empty_dim, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_no_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_sequence, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_correct_quantile_value, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_max_ulp, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_hypo, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_averaged_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_closest_observation, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_hazen, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_higher, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_interpolated_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_linear, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_lower, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_median_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_midpoint, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_nearest, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_normal_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_weibull, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_scalar_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_axis_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_overwrite_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_B_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_H_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_b_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_g_type_out_G, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_h_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_l_type_out_D 2025-12-04T15:05:15.6209129Z 2025-12-04T15:05:15.6209313Z Finished torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 15:05:15.605035][2264699.871706183], took 0.06min 2025-12-04T15:05:15.6209744Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:15.6210104Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:15.6210366Z Running torch_np/numpy_tests/lib/test_histograms 1/1 ... [2025-12-04 15:05:15.611724][2264699.878397216] 2025-12-04T15:05:15.6210576Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:15.6210996Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_histograms.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:15.611898] 2025-12-04T15:05:19.6925506Z 2025-12-04T15:05:19.6926355Z torch_np/numpy_tests/lib/test_histograms 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_histograms_1.1_3998d439a504ccec_.log 2025-12-04T15:05:19.6934559Z Running 60 items in this shard: test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_arr_weights_mismatch, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_big_arrays, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bin_array_dims, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bin_edge_cases, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_bool_conversion, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_density, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_error_binnum_type, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_exotic_weights, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_f32_rounding, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_finite_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_histogram_bin_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_invalid_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_last_bin_inclusive_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_no_side_effects, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_object_array_of_0d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_one_bin, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_outliers, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_precision, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_signed_overflow_bounds, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_signed_overflow_bounds_2, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_some_nan_values, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_type, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_unsigned_monotonicity_check, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogram::test_weights, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_incorrect_methods, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_limited_variance, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_novariance, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_outlier, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_scott_vs_stone, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_auto, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_doane, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_fd, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_rice, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_scott, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_stone, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_signed_integer_data_bins_sturges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_simple_weighted, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramOptimBinNums::test_small, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_array, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_error_2, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_bins_errors, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_density_non_uniform_1d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_density_non_uniform_2d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_edge_dtype, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_empty, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_equal_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_finite_range, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_identical_samples, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_inf_edges, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_large_integers, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_rightmost_binedge, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_shape_3d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_shape_4d, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_simple, test/torch_np/numpy_tests/lib/test_histograms.py::TestHistogramdd::test_weights 2025-12-04T15:05:19.6942272Z 2025-12-04T15:05:19.6942416Z Finished torch_np/numpy_tests/lib/test_histograms 1/1 ... [2025-12-04 15:05:19.692441][2264703.959109198], took 0.07min 2025-12-04T15:05:19.6942847Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:19.6991931Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:19.6995489Z Running torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:05:19.699336][2264703.966007698] 2025-12-04T15:05:19.6995709Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:19.6996371Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_index_tricks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:19.699526] 2025-12-04T15:05:21.8167703Z 2025-12-04T15:05:21.8168739Z torch_np/numpy_tests/lib/test_index_tricks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_index_tricks_1.1_e9fc32c5c7a88521_.log 2025-12-04T15:05:21.8177246Z Running 47 items in this shard: test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_big_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_clipmodes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_dtypes, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_clip, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_raise, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_ravel_mode_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_array_unravel, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_empty_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestRavelUnravelIndex::test_writeability, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_longdouble, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npcomplexfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_accepts_npfloating, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_linspace_equivalence, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start0_stop_10_step0_expected0, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_mgrid_size_none_handling_start_-10_stop_20_step1_expected1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_nd, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestGrid::test_sparse, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_0d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_1d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_2d, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_complex_step, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestConcatenator::test_more_mixed_type, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdenumerate::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIndexExpression::test_simple_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_1d_only, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_bool, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_regression_1, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_repeated_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestIx_::test_shape_and_dtype, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestC::test_c_, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_basic, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_hetero_shape_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_low_dim_handling, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_operate_4d_array, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_tall_matrix_wrap, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestFillDiagonal::test_wide_matrix, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndices::test_diag_indices, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_diag_indices_from, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_shape_mismatch, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestDiagIndicesFrom::test_error_small_input, test/torch_np/numpy_tests/lib/test_index_tricks.py::TestNdIndex::test_ndindex 2025-12-04T15:05:21.8183869Z 2025-12-04T15:05:21.8184026Z Finished torch_np/numpy_tests/lib/test_index_tricks 1/1 ... [2025-12-04 15:05:21.816508][2264706.083179081], took 0.04min 2025-12-04T15:05:21.8184486Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:21.8232577Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:21.8235552Z Running torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 15:05:21.823357][2264706.090029902] 2025-12-04T15:05:21.8235951Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:21.8236716Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_shape_base_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:21.823532] 2025-12-04T15:05:24.0438525Z 2025-12-04T15:05:24.0439490Z torch_np/numpy_tests/lib/test_shape_base_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_shape_base__1.1_705f07c0f244aacf_.log 2025-12-04T15:05:24.0451438Z Running 73 items in this shard: test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_argequivalent, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_invalid, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_replace_max, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_0d_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_3d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion_ma, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_scalar_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple101, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_tuple_func1d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_with_iterable_object, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyOverAxes::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_out_of_range, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_tuple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_functionality, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_repeated_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_high_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_low_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_0_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_cols, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_default, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows_greater_max_int32, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_equal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_unequal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_1D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_2D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_3D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic_2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis_handling, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_contiguous, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_type, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a0_shape_b0, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a1_shape_b1, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a2_shape_b2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a3_shape_b3, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a4_shape_b4, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a5_shape_b5, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_kroncompare, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_tile_one_repetition_on_array_gh4679, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestMayShareMemory::test_basic 2025-12-04T15:05:24.0460862Z 2025-12-04T15:05:24.0461007Z Finished torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 15:05:24.043604][2264708.310275142], took 0.04min 2025-12-04T15:05:24.0461426Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:24.0506065Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:24.0509662Z Running torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 15:05:24.050680][2264708.317353799] 2025-12-04T15:05:24.0510056Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:24.0510699Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_twodim_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:24.050853] 2025-12-04T15:05:26.9707305Z 2025-12-04T15:05:26.9708221Z torch_np/numpy_tests/lib/test_twodim_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_twodim_base_1.1_bb71d8444770495b_.log 2025-12-04T15:05:26.9714094Z Running 34 items in this shard: test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_bool, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_diag2d, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_eye_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestEye::test_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_diag_bounds, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_failure, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_fortran_order, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_matrix, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestDiag::test_vector, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFliplr::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestFlipud::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_all_outliers, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_asym, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_10_y_len_11, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_bad_length_x_len_20_y_len_19, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_binparameter_combination, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_density, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_empty, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestHistogram2d::test_simple, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_mask_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_dtype, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim2, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_ndim3, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTri::test_tril_triu_with_inf, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndices::test_triu_indices, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTrilIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestTriuIndicesFrom::test_exceptions, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_basic, test/torch_np/numpy_tests/lib/test_twodim_base.py::TestVander::test_dtypes 2025-12-04T15:05:26.9720073Z 2025-12-04T15:05:26.9720297Z Finished torch_np/numpy_tests/lib/test_twodim_base 1/1 ... [2025-12-04 15:05:26.970423][2264711.237093923], took 0.05min 2025-12-04T15:05:26.9720813Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:26.9771917Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:26.9775064Z Running torch_np/numpy_tests/lib/test_type_check 1/1 ... [2025-12-04 15:05:26.977304][2264711.243977183] 2025-12-04T15:05:26.9775720Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:26.9776880Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_type_check.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:26.977489] 2025-12-04T15:05:29.1949247Z 2025-12-04T15:05:29.1950572Z torch_np/numpy_tests/lib/test_type_check 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_type_check_1.1_6d529d53452a688c_.log 2025-12-04T15:05:29.1963759Z Running 50 items in this shard: test/torch_np/numpy_tests/lib/test_type_check.py::TestCommonType::test_basic, test/torch_np/numpy_tests/lib/test_type_check.py::TestMintypecode::test_default_1, test/torch_np/numpy_tests/lib/test_type_check.py::TestMintypecode::test_default_2, test/torch_np/numpy_tests/lib/test_type_check.py::TestMintypecode::test_default_3, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsscalar::test_basic, test/torch_np/numpy_tests/lib/test_type_check.py::TestReal::test_cmplx, test/torch_np/numpy_tests/lib/test_type_check.py::TestReal::test_real, test/torch_np/numpy_tests/lib/test_type_check.py::TestImag::test_cmplx, test/torch_np/numpy_tests/lib/test_type_check.py::TestImag::test_real, test/torch_np/numpy_tests/lib/test_type_check.py::TestIscomplex::test_fail, test/torch_np/numpy_tests/lib/test_type_check.py::TestIscomplex::test_pass, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsreal::test_fail, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsreal::test_isreal_real, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsreal::test_pass, test/torch_np/numpy_tests/lib/test_type_check.py::TestIscomplexobj::test_basic, test/torch_np/numpy_tests/lib/test_type_check.py::TestIscomplexobj::test_list, test/torch_np/numpy_tests/lib/test_type_check.py::TestIscomplexobj::test_scalar, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsrealobj::test_basic, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_complex, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_complex1, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_goodvalues, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_ind, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_integer, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_neginf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsnan::test_posinf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_complex, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_complex1, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_goodvalues, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_ind, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_integer, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_neginf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsfinite::test_posinf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_goodvalues, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_ind, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_neginf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_neginf_scalar, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_posinf, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsinf::test_posinf_scalar, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsposinf::test_generic, test/torch_np/numpy_tests/lib/test_type_check.py::TestIsneginf::test_generic, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_array, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_complex_bad, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_complex_bad2, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_complex_good, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_do_not_rewrite_previous_keyword, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_float, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_generic, test/torch_np/numpy_tests/lib/test_type_check.py::TestNanToNum::test_integer, test/torch_np/numpy_tests/lib/test_type_check.py::TestRealIfClose::test_basic, test/torch_np/numpy_tests/lib/test_type_check.py::TestArrayConversion::test_asfarray 2025-12-04T15:05:29.1972418Z 2025-12-04T15:05:29.1972617Z Finished torch_np/numpy_tests/lib/test_type_check 1/1 ... [2025-12-04 15:05:29.194701][2264713.461371723], took 0.04min 2025-12-04T15:05:29.1973188Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:29.2023196Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:29.2023776Z Running torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:05:29.202195][2264713.468868395] 2025-12-04T15:05:29.2024021Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:29.2025151Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/numpy_tests/linalg/test_linalg.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:29.202376] 2025-12-04T15:05:36.3930413Z 2025-12-04T15:05:36.3931247Z torch_np/numpy_tests/linalg/test_linalg 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.linalg.test_linalg_1.1_4d5ea8c498382225_.log 2025-12-04T15:05:36.3963088Z Running 268 items in this shard: test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_0_size_k, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSolve::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestInv::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvals::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEig::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_identity, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVD::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestSVDHermitian::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_basic_nonsvd, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_nan, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCond::test_stacked_singular, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinv::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestPinvHermitian::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_generalized_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestDet::test_zero, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_0_n_rhs_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_0_n_4_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_0_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_a_b_m_4_n_2_n_rhs_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_empty_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_future_rcond, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_incompatible_dims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_nonsq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestLstsq::test_sq_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalshCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigvalsh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_generalized_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEighCases::test_empty_herm_cases, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_UPLO, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_invalid, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestEigh::test_types_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNorm_NonSystematic::test_intmin, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormDouble::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormSingle::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_axis, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_bad_args, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_keepdims, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_2x2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_3x3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_empty, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_matrix_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector, test/torch_np/numpy_tests/linalg/test_linalg.py::TestNormInt64::test_vector_return_type, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_matrix_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_reduced_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMatrixRank::test_symmetric_rank, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_all_but_economic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_mode_raw, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_0_n_3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_qr_empty_m_3_n_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size0_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size1_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size2_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size3_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size0_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size1_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestQR::test_stacked_inputs_size4_outer_size2_dt3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_0_size, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape0_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape1_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape2_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape3_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestCholesky::test_basic_property_shape4_dtype3, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_byteorder_check, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_generalized_raise_multiloop, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_sdot_bug_8577, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc::test_xerbla_override, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_dynamic_programming_optimization, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_three_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_basic_function_with_two_arguments, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_logic, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_dynamic_programming_optimization_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_three_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_too_few_input_arrays, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_two_arguments_and_out, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_and_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_first_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMultiDot::test_vector_as_last_argument, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_non_square_handling_arr1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_-2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_ind_limit_ind_0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_result, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape0_ind_2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorinv::test_tensorinv_shape_shape1_ind_1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a0_axes0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_non_square_handling_a1_axes1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape0, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape1, test/torch_np/numpy_tests/linalg/test_linalg.py::TestTensorsolve::test_tensorsolve_result_shape2, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_dot, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_blas64_geqrf_lwork_smoketest, test/torch_np/numpy_tests/linalg/test_linalg.py::TestMisc2::test_unsupported_commontype 2025-12-04T15:05:36.3994175Z 2025-12-04T15:05:36.3994319Z Finished torch_np/numpy_tests/linalg/test_linalg 1/1 ... [2025-12-04 15:05:36.392905][2264720.659574693], took 0.12min 2025-12-04T15:05:36.3994747Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:36.3998546Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:36.4002706Z Running torch_np/test_basic 1/1 ... [2025-12-04 15:05:36.400008][2264720.66667949] 2025-12-04T15:05:36.4002895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:36.4005906Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_basic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:36.400234] 2025-12-04T15:05:40.4728603Z 2025-12-04T15:05:40.4729437Z torch_np/test_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_basic_1.1_37992f663fc69fdb_.log 2025-12-04T15:05:40.4772826Z Running 453 items in this shard: test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_array_func9, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_list_func9, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func0, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func1, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func10, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func11, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func12, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func13, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func14, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func15, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func16, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func17, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func18, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func19, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func2, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func20, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func21, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func22, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func23, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func24, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func25, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func26, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func27, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func28, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func29, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func3, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func30, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func31, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func32, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func33, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func34, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func35, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func36, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func37, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func38, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func39, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func4, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func40, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func41, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func42, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func43, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func44, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func45, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func46, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func47, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func48, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func49, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func5, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func50, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func51, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func52, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func53, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func54, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func55, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func56, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func57, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func58, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func59, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func6, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func60, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func61, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func62, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func63, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func64, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func65, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func66, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func67, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func68, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func69, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func7, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func70, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func71, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func72, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func73, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func74, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func8, test/torch_np/test_basic.py::TestOneArr::test_asarray_tensor_func9, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_array_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_list_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func0_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func10_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func1_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func2_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func3_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func4_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func5_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func6_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func7_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func8_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis3, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_-1, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_0, test/torch_np/test_basic.py::TestOneArrAndAxis::test_andaxis_tensor_func9_axis_1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_array_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_list_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes0, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes1, test/torch_np/test_basic.py::TestOneArrAndAxesTuple::test_andtuple_tensor_func0_axes2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_array_func4, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_list_func4, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func0, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func1, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func2, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func3, test/torch_np/test_basic.py::TestOneArrAndShape::test_andshape_tensor_func4, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_array_func2_np_func2, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_list_func2_np_func2, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func0_np_func0, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func1_np_func1, test/torch_np/test_basic.py::TestOneArrToScalar::test_toscalar_tensor_func2_np_func2, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func0, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func1, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func2, test/torch_np/test_basic.py::TestShapeLikeToArray::test_shape_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_several_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_array_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_list_func3, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func0, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func1, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func2, test/torch_np/test_basic.py::TestSequenceOfArrays::test_single_tensor_func3, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func0, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func1, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func2, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func3, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func4, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func5, test/torch_np/test_basic.py::TestSequenceOfArraysToSingle::test_several_func6, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_array_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_array_func1, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_list_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_list_func1, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_tensor_func0, test/torch_np/test_basic.py::TestArrayToSequence::test_asarray_tensor_func1, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func0_args0, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func1_args1, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func2_args2, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func3_args3, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func4_args4, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func5_args5, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func6_args6, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func7_args7, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func8_args8, test/torch_np/test_basic.py::TestPythonArgsToArray::test_argstoarray_simple_func9_args9, test/torch_np/test_basic.py::TestNormalizations::test_too_few_args_positional, test/torch_np/test_basic.py::TestNormalizations::test_unknown_args, test/torch_np/test_basic.py::TestNormalizations::test_unknown_args_with_defaults, test/torch_np/test_basic.py::TestCopyTo::test_copyto_basic, test/torch_np/test_basic.py::TestCopyTo::test_copyto_typecast, test/torch_np/test_basic.py::TestCopyTo::test_copytobcast, test/torch_np/test_basic.py::TestDivmod::test_divmod_no_out, test/torch_np/test_basic.py::TestDivmod::test_divmod_out, test/torch_np/test_basic.py::TestDivmod::test_divmod_out_both_pos_and_kw, test/torch_np/test_basic.py::TestDivmod::test_divmod_out_list, test/torch_np/test_basic.py::TestDivmod::test_divmod_pos_only, test/torch_np/test_basic.py::TestSmokeNotImpl::test_nimpl_basic, test/torch_np/test_basic.py::TestDefaultDtype::test_defaultdtype_defaults, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_dt_float32, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_dt_pytorch, test/torch_np/test_basic.py::TestDefaultDtype::test_set_default_float_float32, test/torch_np/test_basic.py::TestExport::test_exported_objects, test/torch_np/test_basic.py::TestCtorNested::test_arrays_in_lists, test/torch_np/test_basic.py::TestMisc::test_f16_on_cuda, test/torch_np/test_basic.py::TestMisc::test_ndarrays_to_tensors 2025-12-04T15:05:40.4814904Z 2025-12-04T15:05:40.4815016Z Finished torch_np/test_basic 1/1 ... [2025-12-04 15:05:40.472927][2264724.739597592], took 0.07min 2025-12-04T15:05:40.4815408Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:40.4815771Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:40.4815998Z Running torch_np/test_binary_ufuncs 1/1 ... [2025-12-04 15:05:40.480088][2264724.746761428] 2025-12-04T15:05:40.4816188Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:40.4816615Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_binary_ufuncs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:40.480266] 2025-12-04T15:05:42.5980259Z 2025-12-04T15:05:42.5981249Z torch_np/test_binary_ufuncs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_binary_ufuncs_1.1_3767f7536efd8e5c_.log 2025-12-04T15:05:42.5988836Z Running 38 items in this shard: test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_add, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_arctan2, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_and, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_or, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_bitwise_xor, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_copysign, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_divide, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_float_power, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_floor_divide, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmax, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmin, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_fmod, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_gcd, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_greater, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_greater_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_heaviside, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_hypot, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_lcm, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_ldexp, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_left_shift, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_less, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_less_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logaddexp, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logaddexp2, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_and, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_or, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_logical_xor, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_matmul, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_maximum, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_minimum, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_multiply, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_nextafter, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_not_equal, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_power, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_remainder, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_right_shift, test/torch_np/test_binary_ufuncs.py::TestBinaryUfuncBasic::test_subtract 2025-12-04T15:05:42.5994683Z 2025-12-04T15:05:42.5994858Z Finished torch_np/test_binary_ufuncs 1/1 ... [2025-12-04 15:05:42.597751][2264726.864420467], took 0.04min 2025-12-04T15:05:42.6000061Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:42.6052757Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:42.6053009Z Running torch_np/test_dtype 1/1 ... [2025-12-04 15:05:42.605152][2264726.87182539] 2025-12-04T15:05:42.6053208Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:42.6054821Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_dtype.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:42.605339] 2025-12-04T15:05:44.7226959Z 2025-12-04T15:05:44.7228671Z torch_np/test_dtype 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_dtype_1.1_2de57f4f1658ac73_.log 2025-12-04T15:05:44.7241358Z Running 44 items in this shard: test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_bool, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'bool_', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex128', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'complex64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'float64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'int8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint16', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint32', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint64', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.'uint8', test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.bool_, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex128, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.complex64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.dtype('bool'), test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.float64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.int8, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint16, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint32, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint64, test/torch_np/test_dtype.py::TestConvertDType::test_convert_np_dtypes_numpy.uint8 2025-12-04T15:05:44.7249404Z 2025-12-04T15:05:44.7249553Z Finished torch_np/test_dtype 1/1 ... [2025-12-04 15:05:44.722398][2264728.989069205], took 0.04min 2025-12-04T15:05:44.7250065Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:44.7292567Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:44.7294919Z Running torch_np/test_function_base 1/1 ... [2025-12-04 15:05:44.729248][2264728.995921436] 2025-12-04T15:05:44.7295150Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:44.7295594Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:44.729419] 2025-12-04T15:05:46.9469924Z 2025-12-04T15:05:46.9470787Z torch_np/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_function_base_1.1_31dad7fa130ce805_.log 2025-12-04T15:05:46.9471307Z Running 2 items in this shard: test/torch_np/test_function_base.py::TestAppend::test_basic, test/torch_np/test_function_base.py::TestMisc::test_broadcast_shapes 2025-12-04T15:05:46.9471579Z 2025-12-04T15:05:46.9471720Z Finished torch_np/test_function_base 1/1 ... [2025-12-04 15:05:46.946693][2264731.213363179], took 0.04min 2025-12-04T15:05:46.9491749Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:46.9545386Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:46.9545626Z Running torch_np/test_indexing 1/1 ... [2025-12-04 15:05:46.954399][2264731.221071937] 2025-12-04T15:05:46.9545819Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:46.9547306Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:46.954573] 2025-12-04T15:05:51.9950446Z 2025-12-04T15:05:51.9951277Z torch_np/test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_indexing_1.1_01d77aabdc8eb747_.log 2025-12-04T15:05:51.9952475Z Running 5 items in this shard: test/torch_np/test_indexing.py::TestAdvancedIndexing::test_advanced_separation_patterns, test/torch_np/test_indexing.py::TestAdvancedIndexing::test_broadcast_and_numpy_compatibility, test/torch_np/test_indexing.py::TestAdvancedIndexing::test_comprehensive_indexing, test/torch_np/test_indexing.py::TestAdvancedIndexing::test_ellipsis, test/torch_np/test_indexing.py::TestAdvancedIndexing::test_special_index_types 2025-12-04T15:05:51.9953171Z 2025-12-04T15:05:51.9953422Z Finished torch_np/test_indexing 1/1 ... [2025-12-04 15:05:51.994795][2264736.261466543], took 0.08min 2025-12-04T15:05:51.9965782Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:52.0020166Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:52.0021993Z Running torch_np/test_reductions 1/1 ... [2025-12-04 15:05:52.002048][2264736.268718138] 2025-12-04T15:05:52.0022202Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:52.0024282Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'torch_np/test_reductions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:52.002322] 2025-12-04T15:05:55.8248294Z 2025-12-04T15:05:55.8249173Z torch_np/test_reductions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_reductions_1.1_4da361b57ed34a51_.log 2025-12-04T15:05:55.8391514Z Running 966 items in this shard: test/torch_np/test_reductions.py::TestFlatnonzero::test_basic, test/torch_np/test_reductions.py::TestAny::test_basic, test/torch_np/test_reductions.py::TestAny::test_method_vs_function, test/torch_np/test_reductions.py::TestAny::test_nd, test/torch_np/test_reductions.py::TestAll::test_basic, test/torch_np/test_reductions.py::TestAll::test_method_vs_function, test/torch_np/test_reductions.py::TestAll::test_nd, test/torch_np/test_reductions.py::TestMean::test_mean, test/torch_np/test_reductions.py::TestMean::test_mean_float16, test/torch_np/test_reductions.py::TestMean::test_mean_values, test/torch_np/test_reductions.py::TestMean::test_mean_where, test/torch_np/test_reductions.py::TestSum::test_sum, test/torch_np/test_reductions.py::TestSum::test_sum_boolean, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_1_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt0, test/torch_np/test_reductions.py::TestSum::test_sum_complex_2_dt1, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_2, test/torch_np/test_reductions.py::TestSum::test_sum_dtypes_warnings, test/torch_np/test_reductions.py::TestSum::test_sum_initial, test/torch_np/test_reductions.py::TestSum::test_sum_stability, test/torch_np/test_reductions.py::TestSum::test_sum_where, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_array_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_bad_tuple_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_axis_empty_generic_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_bad_axis_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis5_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis6_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis7_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis8_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_-2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_0_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_1_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_2_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_generic_axis_none_func9, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_keepdims_out_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_False_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype0_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_float64_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func0_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func10_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func11_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func1_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func2_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func3_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func4_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func5_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func6_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func7_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func8_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_-2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_axis_keepdims_True_dtype_int32_func9_axis_2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func0, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func1, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func10, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func11, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func2, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func3, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func4, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func5, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func6, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func7, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func8, test/torch_np/test_reductions.py::TestGenericReductions::test_out_scalar_func9, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_array_axis_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_bad_tuple_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_axis_empty_generic_func1, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func0, test/torch_np/test_reductions.py::TestGenericCumSumProd::test_bad_axis_func1 2025-12-04T15:05:55.8522518Z 2025-12-04T15:05:55.8522650Z Finished torch_np/test_reductions 1/1 ... [2025-12-04 15:05:55.825522][2264740.092190283], took 0.06min 2025-12-04T15:05:55.8523063Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:55.8523461Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:55.8523700Z Running typing/test_python_operators 1/1 ... [2025-12-04 15:05:55.832867][2264740.099540147] 2025-12-04T15:05:55.8523902Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:55.8524302Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'typing/test_python_operators.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:55.833050] 2025-12-04T15:05:58.2009903Z 2025-12-04T15:05:58.2011059Z typing/test_python_operators 1/1 was successful, full logs can be found in artifacts with path test/test-reports/typing.test_python_operators_1.1_513fa0270c899855_.log 2025-12-04T15:05:58.2050469Z Running 318 items in this shard: test/typing/test_python_operators.py::TestPythonOperators::test_binary_a100_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a101_op_%_b101, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a102_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a103_op_%_b103, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a104_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a105_op_*_b105, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a106_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a107_op_*_b107, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a108_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a109_op_**_b109, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a110_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a111_op_**_b111, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a112_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a113_op_+_b113, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a114_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a115_op_+_b115, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a116_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a117_op_-_b117, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a118_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a119_op_-_b119, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a120_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a121_op_/_b121, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a122_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a123_op_/_b123, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a124_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a125_op_//_b125, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a126_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a127_op_//_b127, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a128_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a129_op_&_b129, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a130_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a131_op_&_b131, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a132_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a133_op_<<_b133, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a134_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a135_op_<<_b135, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a136_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a137_op_>>_b137, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a138_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a139_op_>>_b139, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a140_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a141_op_^_b141, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a142_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a143_op_^_b143, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a144_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a145_op_|_b145, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a146_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a147_op_|_b147, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a148_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a149_op_@_b149, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a150_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a151_op_@_b151, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a228_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a229_op_!=_b229, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a230_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a231_op_!=_b231, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a232_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a233_op_<_b233, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a234_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a235_op_<_b235, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a236_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a237_op_<=_b237, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a238_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a239_op_<=_b239, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a240_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a241_op_==_b241, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a242_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a243_op_==_b243, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a244_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a245_op_>_b245, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a246_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a247_op_>_b247, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a248_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a249_op_>=_b249, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a250_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a251_op_>=_b251, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a252_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a253_op_%_b253, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a254_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a255_op_%_b255, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a256_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a257_op_*_b257, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a258_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a259_op_*_b259, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a260_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a261_op_**_b261, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a262_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a263_op_**_b263, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a264_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a265_op_+_b265, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a266_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a267_op_+_b267, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a268_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a269_op_-_b269, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a270_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a271_op_-_b271, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a272_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a273_op_/_b273, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a274_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a275_op_/_b275, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a276_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a277_op_//_b277, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a278_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a279_op_//_b279, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a280_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a281_op_&_b281, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a282_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a283_op_&_b283, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a284_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a285_op_<<_b285, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a286_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a287_op_<<_b287, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a288_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a289_op_>>_b289, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a290_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a291_op_>>_b291, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a292_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a293_op_^_b293, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a294_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a295_op_^_b295, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a296_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a297_op_|_b297, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a298_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a299_op_|_b299, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a300_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a301_op_@_b301, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a302_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a303_op_@_b303, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a76_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a77_op_!=_b77, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a78_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a79_op_!=_b79, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a80_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a81_op_<_b81, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a82_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a83_op_<_b83, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a84_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a85_op_<=_b85, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a86_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a87_op_<=_b87, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a88_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a89_op_==_b89, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a90_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a91_op_==_b91, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a92_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a93_op_>_b93, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a94_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a95_op_>_b95, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a96_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a97_op_>=_b97, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a98_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a99_op_>=_b99, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b1, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b25, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b27, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b53, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b55, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b33, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b35, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b29, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b31, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b37, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b39, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b41, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b43, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b49, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b51, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b45, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b47, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b57, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b59, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b11, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b9, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b7, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b13, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b15, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b21, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b23, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b61, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b63, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b17, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b19, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b73, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b75, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b65, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b67, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b69, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b71, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_1_5_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b153, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b155, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_!=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b177, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b179, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_%_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b205, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b207, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_&_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b185, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b187, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_**_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b181, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b183, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_*_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b189, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b191, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_+_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b193, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b195, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_-_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b201, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b203, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_//_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b197, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b199, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_/_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b209, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b211, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b161, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b163, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b157, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b159, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_<_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b165, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b167, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_==_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b173, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b175, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>=_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b213, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b215, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b169, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b171, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_>_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b225, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b227, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_@_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b217, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b219, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_^_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b221, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b223, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_binary_a_3_op_|_b_3, test/typing/test_python_operators.py::TestPythonOperators::test_operators_are_correct_and_complete, test/typing/test_python_operators.py::TestPythonOperators::test_type_tests_are_complete, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a1, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_+_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a7, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_-_a_3, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a11, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a9, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_1_5, test/typing/test_python_operators.py::TestPythonOperators::test_unary_op_~_a_3 2025-12-04T15:05:58.2085914Z 2025-12-04T15:05:58.2086045Z Finished typing/test_python_operators 1/1 ... [2025-12-04 15:05:58.201031][2264742.467701344], took 0.04min 2025-12-04T15:05:58.2086455Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:05:58.2086823Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:05:58.2087530Z Running xpu/test_conv 1/1 ... [2025-12-04 15:05:58.208560][2264742.475233295] 2025-12-04T15:05:58.2088570Z SCRIBE_GRAPHQL_ACCESS_TOKEN is NOT set 2025-12-04T15:05:58.2089203Z Executing ['/opt/conda/envs/py_3.12/bin/python', '-bb', 'xpu/test_conv.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:05:58.208732] 2025-12-04T15:06:00.4213953Z 2025-12-04T15:06:00.4214801Z xpu/test_conv 1/1 was successful, full logs can be found in artifacts with path test/test-reports/xpu.test_conv_1.1_5d98df06a85a2669_.log 2025-12-04T15:06:00.4215631Z Running 0 items in this shard: 2025-12-04T15:06:00.4215730Z 2025-12-04T15:06:00.4215837Z Finished xpu/test_conv 1/1 ... [2025-12-04 15:06:00.421096][2264744.687765915], took 0.04min 2025-12-04T15:06:00.4233087Z Parsing testcases for test report: /var/lib/jenkins/pytorch/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-5e959589769bafb0.xml 2025-12-04T15:06:00.4283121Z Failed to parse and upload json test reports: Unable to locate credentials 2025-12-04T15:06:02.4460019Z Running test batch 'tests to run' cost 19891.49 seconds 2025-12-04T15:06:02.4467037Z Emitting td_test_failure_stats_v2 2025-12-04T15:06:02.4469615Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764860762_bfbdd7f6d12211f0a62716fde108a3b6 2025-12-04T15:06:04.4642447Z /var/lib/jenkins/pytorch/tools/stats/upload_metrics.py:156: UserWarning: Error uploading metric td_test_failure_stats_v2 to DynamoDB: Unable to locate credentials 2025-12-04T15:06:04.4643522Z warn(f"Error uploading metric {metric_name} to DynamoDB: {e}") 2025-12-04T15:06:04.4646499Z Emitting td_test_failure_stats_v2 2025-12-04T15:06:04.4647081Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764860764_c0f1b9dad12211f0a62716fde108a3b6 2025-12-04T15:06:04.4676817Z Emitting td_test_failure_stats_v2 2025-12-04T15:06:04.4677323Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764860764_c0f226c2d12211f0a62716fde108a3b6 2025-12-04T15:06:04.4697387Z inductor/test_cuda_select_algorithm 1/1 failed! 2025-12-04T15:06:04.4697658Z inductor/test_fp8 1/1 failed! 2025-12-04T15:06:04.4697866Z test_nestedtensor 2/2 failed! 2025-12-04T15:06:05.3425977Z 2025-12-04T15:06:05.3426452Z real 331m36.739s 2025-12-04T15:06:05.3426677Z user 2074m21.737s 2025-12-04T15:06:05.3426798Z sys 203m32.612s 2025-12-04T15:06:05.3426980Z + sccache_epilogue 2025-12-04T15:06:05.3427474Z + echo '::group::Sccache Compilation Log' 2025-12-04T15:06:05.3427954Z ##[group]Sccache Compilation Log 2025-12-04T15:06:05.3428149Z + echo '=================== sccache compilation log ===================' 2025-12-04T15:06:05.3428369Z =================== sccache compilation log =================== 2025-12-04T15:06:05.3428666Z + python /var/lib/jenkins/pytorch/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log 2025-12-04T15:06:05.3500949Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ===========' 2025-12-04T15:06:05.3501272Z =========== If your build fails, please take a look at the log above for possible reasons =========== 2025-12-04T15:06:05.3501490Z + sccache --show-stats 2025-12-04T15:06:05.3518741Z Compile requests 8535 2025-12-04T15:06:05.3519722Z Compile requests executed 520 2025-12-04T15:06:05.3519984Z Cache hits 30 2025-12-04T15:06:05.3520195Z Cache hits (C/C++) 30 2025-12-04T15:06:05.3520364Z Cache misses 488 2025-12-04T15:06:05.3520522Z Cache misses (C/C++) 482 2025-12-04T15:06:05.3520671Z Cache misses (HIP) 6 2025-12-04T15:06:05.3520838Z Cache hits rate 5.79 % 2025-12-04T15:06:05.3521001Z Cache hits rate (C/C++) 5.86 % 2025-12-04T15:06:05.3521156Z Cache hits rate (HIP) 0.00 % 2025-12-04T15:06:05.3521309Z Cache timeouts 0 2025-12-04T15:06:05.3521472Z Cache read errors 0 2025-12-04T15:06:05.3533402Z Forced recaches 0 2025-12-04T15:06:05.3533570Z Cache write errors 0 2025-12-04T15:06:05.3533718Z Cache errors 0 2025-12-04T15:06:05.3533856Z Compilations 488 2025-12-04T15:06:05.3533996Z Compilation failures 2 2025-12-04T15:06:05.3534346Z Non-cacheable compilations 0 2025-12-04T15:06:05.3534481Z Non-cacheable calls 272 2025-12-04T15:06:05.3534602Z Non-compilation calls 7743 2025-12-04T15:06:05.3534728Z Unsupported compiler calls 0 2025-12-04T15:06:05.3534857Z Average cache write 0.000 s 2025-12-04T15:06:05.3534989Z Average compiler 2.151 s 2025-12-04T15:06:05.3535136Z Average cache read hit 0.000 s 2025-12-04T15:06:05.3535267Z Failed distributed compilations 0 2025-12-04T15:06:05.3535354Z 2025-12-04T15:06:05.3535399Z Non-cacheable reasons: 2025-12-04T15:06:05.3535513Z unknown source language 231 2025-12-04T15:06:05.3535637Z -E 41 2025-12-04T15:06:05.3535721Z 2025-12-04T15:06:05.3535801Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-12-04T15:06:05.3535974Z Use direct/preprocessor mode? yes 2025-12-04T15:06:05.3536105Z Version (client) 0.10.0 2025-12-04T15:06:05.3536231Z Cache size 39 MiB 2025-12-04T15:06:05.3536360Z Max cache size 10 GiB 2025-12-04T15:06:05.3536485Z + sccache --stop-server 2025-12-04T15:06:05.3536609Z Stopping sccache server... 2025-12-04T15:06:05.3539352Z Compile requests 8535 2025-12-04T15:06:05.3539508Z Compile requests executed 520 2025-12-04T15:06:05.3539656Z Cache hits 30 2025-12-04T15:06:05.3539778Z Cache hits (C/C++) 30 2025-12-04T15:06:05.3539896Z Cache misses 488 2025-12-04T15:06:05.3540103Z Cache misses (C/C++) 482 2025-12-04T15:06:05.3540225Z Cache misses (HIP) 6 2025-12-04T15:06:05.3540349Z Cache hits rate 5.79 % 2025-12-04T15:06:05.3540477Z Cache hits rate (C/C++) 5.86 % 2025-12-04T15:06:05.3540604Z Cache hits rate (HIP) 0.00 % 2025-12-04T15:06:05.3540735Z Cache timeouts 0 2025-12-04T15:06:05.3540862Z Cache read errors 0 2025-12-04T15:06:05.3541106Z Forced recaches 0 2025-12-04T15:06:05.3541228Z Cache write errors 0 2025-12-04T15:06:05.3541349Z Cache errors 0 2025-12-04T15:06:05.3541475Z Compilations 488 2025-12-04T15:06:05.3541596Z Compilation failures 2 2025-12-04T15:06:05.3541723Z Non-cacheable compilations 0 2025-12-04T15:06:05.3541845Z Non-cacheable calls 272 2025-12-04T15:06:05.3541971Z Non-compilation calls 7743 2025-12-04T15:06:05.3542100Z Unsupported compiler calls 0 2025-12-04T15:06:05.3542227Z Average cache write 0.000 s 2025-12-04T15:06:05.3542354Z Average compiler 2.151 s 2025-12-04T15:06:05.3542484Z Average cache read hit 0.000 s 2025-12-04T15:06:05.3542613Z Failed distributed compilations 0 2025-12-04T15:06:05.3542698Z 2025-12-04T15:06:05.3542741Z Non-cacheable reasons: 2025-12-04T15:06:05.3542852Z unknown source language 231 2025-12-04T15:06:05.3542978Z -E 41 2025-12-04T15:06:05.3543057Z 2025-12-04T15:06:05.3543134Z Cache location Local disk: "/var/lib/jenkins/.cache/sccache" 2025-12-04T15:06:05.3543395Z Use direct/preprocessor mode? yes 2025-12-04T15:06:05.3543525Z Version (client) 0.10.0 2025-12-04T15:06:05.3543650Z Cache size 39 MiB 2025-12-04T15:06:05.3543780Z Max cache size 10 GiB 2025-12-04T15:06:05.3543943Z + echo ::endgroup:: 2025-12-04T15:06:05.3544272Z ##[endgroup] 2025-12-04T15:06:05.3590733Z ##[error]Process completed with exit code 1. 2025-12-04T15:06:05.3617497Z ##[group]Run # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-12-04T15:06:05.3617818Z # copy test results back to the mounted workspace, needed sudo, resulting permissions were correct 2025-12-04T15:06:05.3618292Z docker exec -t "155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3" sh -c "cd ../pytorch && sudo cp -R test/test-reports ../workspace/test" 2025-12-04T15:06:05.3622354Z shell: /usr/bin/bash -e {0} 2025-12-04T15:06:05.3622472Z env: 2025-12-04T15:06:05.3622570Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:05.3622709Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:05.3622885Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:05.3623055Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:05.3623679Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:05.3624050Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:05.3624167Z AWS_REGION: us-east-1 2025-12-04T15:06:05.3624335Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:05.3624495Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:05.3626764Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:05.3626936Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:05.3627123Z ##[endgroup] 2025-12-04T15:06:05.4319560Z ##[group]Run docker exec -t "155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3" sh -c "sudo chown -R 1001:1001 test" 2025-12-04T15:06:05.4319959Z docker exec -t "155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3" sh -c "sudo chown -R 1001:1001 test" 2025-12-04T15:06:05.4324446Z shell: /usr/bin/bash -e {0} 2025-12-04T15:06:05.4324558Z env: 2025-12-04T15:06:05.4324648Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:05.4324784Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:05.4324957Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:05.4325120Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:05.4325515Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:05.4325899Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:05.4326016Z AWS_REGION: us-east-1 2025-12-04T15:06:05.4326185Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:05.4326336Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:05.4328621Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:05.4328786Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:05.4328969Z ##[endgroup] 2025-12-04T15:06:05.5059460Z ##[group]Run cat test/**/*_toprint.log || true 2025-12-04T15:06:05.5059647Z cat test/**/*_toprint.log || true 2025-12-04T15:06:05.5063790Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2025-12-04T15:06:05.5063940Z env: 2025-12-04T15:06:05.5064038Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:05.5064181Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:05.5064357Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:05.5064522Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:05.5064903Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:05.5065272Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:05.5065396Z AWS_REGION: us-east-1 2025-12-04T15:06:05.5065577Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:05.5065752Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:05.5068017Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:05.5068187Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:05.5068372Z ##[endgroup] 2025-12-04T15:06:05.5119555Z cat: 'test/**/*_toprint.log': No such file or directory 2025-12-04T15:06:05.5179533Z Prepare all required actions 2025-12-04T15:06:05.5179917Z Getting action download info 2025-12-04T15:06:05.8787942Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a) 2025-12-04T15:06:06.7660633Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02) 2025-12-04T15:06:07.9196340Z ##[group]Run ./.github/actions/upload-test-artifacts 2025-12-04T15:06:07.9196489Z with: 2025-12-04T15:06:07.9196577Z use-gha: true 2025-12-04T15:06:07.9196723Z file-suffix: test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284 2025-12-04T15:06:07.9196892Z s3-bucket: gha-artifacts 2025-12-04T15:06:07.9196996Z env: 2025-12-04T15:06:07.9197082Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:07.9197214Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:07.9197386Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:07.9197561Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:07.9197939Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:07.9198302Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:07.9198416Z AWS_REGION: us-east-1 2025-12-04T15:06:07.9198568Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:07.9198717Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:07.9200967Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:07.9201199Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:07.9201375Z ##[endgroup] 2025-12-04T15:06:07.9229721Z ##[group]Run actions/upload-artifact@v4 2025-12-04T15:06:07.9229850Z with: 2025-12-04T15:06:07.9230027Z name: test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip 2025-12-04T15:06:07.9230230Z retention-days: 14 2025-12-04T15:06:07.9230338Z if-no-files-found: warn 2025-12-04T15:06:07.9230446Z path: test/**/*.json 2025-12-04T15:06:07.9230546Z compression-level: 6 2025-12-04T15:06:07.9230644Z overwrite: false 2025-12-04T15:06:07.9230745Z include-hidden-files: false 2025-12-04T15:06:07.9230851Z env: 2025-12-04T15:06:07.9230939Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:07.9231072Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:07.9231248Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:07.9231412Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:07.9231794Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:07.9232163Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:07.9232276Z AWS_REGION: us-east-1 2025-12-04T15:06:07.9232404Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:07.9232554Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:07.9234884Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:07.9235048Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:07.9235225Z ##[endgroup] 2025-12-04T15:06:08.2979088Z With the provided path, there will be 6 files uploaded 2025-12-04T15:06:08.2982239Z Artifact name is valid! 2025-12-04T15:06:08.2982748Z Root directory input is valid! 2025-12-04T15:06:08.5508157Z Beginning upload of artifact content to blob storage 2025-12-04T15:06:08.9406846Z Uploaded bytes 46464 2025-12-04T15:06:09.0123764Z Finished uploading artifact content to blob storage! 2025-12-04T15:06:09.0124911Z SHA256 digest of uploaded artifact zip is 105ece88e35849e64c0c60f4e78d61c91a6723aa05bb0f41922ea019ad8c712c 2025-12-04T15:06:09.0125664Z Finalizing artifact upload 2025-12-04T15:06:09.1864716Z Artifact test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip.zip successfully finalized. Artifact ID 4765686461 2025-12-04T15:06:09.1865974Z Artifact test-jsons-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip has been successfully uploaded! Final size is 46464 bytes. Artifact ID is 4765686461 2025-12-04T15:06:09.1867804Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922812470/artifacts/4765686461 2025-12-04T15:06:09.1987706Z ##[group]Run actions/upload-artifact@v4 2025-12-04T15:06:09.1987953Z with: 2025-12-04T15:06:09.1988274Z name: test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip 2025-12-04T15:06:09.1988639Z retention-days: 14 2025-12-04T15:06:09.1988829Z if-no-files-found: ignore 2025-12-04T15:06:09.1989030Z path: test/**/*.xml test/**/*.csv 2025-12-04T15:06:09.1989238Z compression-level: 6 2025-12-04T15:06:09.1989416Z overwrite: false 2025-12-04T15:06:09.1989596Z include-hidden-files: false 2025-12-04T15:06:09.1989787Z env: 2025-12-04T15:06:09.1989940Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:09.1990176Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:09.1990481Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:09.1990750Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:09.1991368Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:09.1991968Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:09.1992164Z AWS_REGION: us-east-1 2025-12-04T15:06:09.1992679Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:09.1992951Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:09.1996675Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:09.1996906Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:09.1997147Z ##[endgroup] 2025-12-04T15:06:09.5773070Z With the provided path, there will be 488 files uploaded 2025-12-04T15:06:09.5775933Z Artifact name is valid! 2025-12-04T15:06:09.5776564Z Root directory input is valid! 2025-12-04T15:06:09.8173888Z Beginning upload of artifact content to blob storage 2025-12-04T15:06:10.5969340Z Uploaded bytes 1205203 2025-12-04T15:06:10.6640171Z Finished uploading artifact content to blob storage! 2025-12-04T15:06:10.6641598Z SHA256 digest of uploaded artifact zip is bf6a57c7c4ebb94573988c1647b680dec103ebcb6c33a91ee73847b5af1ae012 2025-12-04T15:06:10.6642317Z Finalizing artifact upload 2025-12-04T15:06:10.8269259Z Artifact test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip.zip successfully finalized. Artifact ID 4765686803 2025-12-04T15:06:10.8270683Z Artifact test-reports-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip has been successfully uploaded! Final size is 1205203 bytes. Artifact ID is 4765686803 2025-12-04T15:06:10.8274004Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922812470/artifacts/4765686803 2025-12-04T15:06:10.8409300Z ##[group]Run actions/upload-artifact@v4 2025-12-04T15:06:10.8409459Z with: 2025-12-04T15:06:10.8409637Z name: logs-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip 2025-12-04T15:06:10.8409834Z retention-days: 14 2025-12-04T15:06:10.8409967Z if-no-files-found: ignore 2025-12-04T15:06:10.8410098Z path: usage_log.txt test/**/*.log 2025-12-04T15:06:10.8410230Z compression-level: 6 2025-12-04T15:06:10.8410342Z overwrite: false 2025-12-04T15:06:10.8410456Z include-hidden-files: false 2025-12-04T15:06:10.8410580Z env: 2025-12-04T15:06:10.8410691Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:10.8410836Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:10.8411022Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:10.8411198Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:10.8411713Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:10.8412165Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:10.8412294Z AWS_REGION: us-east-1 2025-12-04T15:06:10.8412481Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:10.8412645Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:10.8414971Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:10.8415150Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:10.8415340Z ##[endgroup] 2025-12-04T15:06:11.2218855Z Multiple search paths detected. Calculating the least common ancestor of all paths 2025-12-04T15:06:11.2219740Z The least common ancestor is /home/runner/_work/pytorch/pytorch. This will be the root directory of the artifact 2025-12-04T15:06:11.2220116Z With the provided path, there will be 94 files uploaded 2025-12-04T15:06:11.2222669Z Artifact name is valid! 2025-12-04T15:06:11.2223168Z Root directory input is valid! 2025-12-04T15:06:11.4505795Z Beginning upload of artifact content to blob storage 2025-12-04T15:06:12.3373630Z Uploaded bytes 1677449 2025-12-04T15:06:12.4064376Z Finished uploading artifact content to blob storage! 2025-12-04T15:06:12.4065178Z SHA256 digest of uploaded artifact zip is b47c25419f65cb5c7fc441819720ef53cf2780934000cc4bc73b684909558e9d 2025-12-04T15:06:12.4065991Z Finalizing artifact upload 2025-12-04T15:06:12.5716039Z Artifact logs-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip.zip successfully finalized. Artifact ID 4765687182 2025-12-04T15:06:12.5717047Z Artifact logs-runattempt1-test-default-2-6-linux.rocm.gpu.gfx942.1.b_57116139284.zip has been successfully uploaded! Final size is 1677449 bytes. Artifact ID is 4765687182 2025-12-04T15:06:12.5721018Z Artifact download URL: https://github.com/pytorch/pytorch/actions/runs/19922812470/artifacts/4765687182 2025-12-04T15:06:12.5840229Z ##[group]Run # shellcheck disable=SC2156 2025-12-04T15:06:12.5840442Z # shellcheck disable=SC2156 2025-12-04T15:06:12.5840713Z find . -iname "core.[1-9]*" -exec docker exec "${CONTAINER_NAME}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \; 2025-12-04T15:06:12.5845053Z shell: /usr/bin/bash -e {0} 2025-12-04T15:06:12.5845180Z env: 2025-12-04T15:06:12.5845287Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:12.5845438Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:12.5845635Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:12.5845812Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:12.5846223Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:12.5846610Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:12.5846755Z AWS_REGION: us-east-1 2025-12-04T15:06:12.5846954Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:12.5847129Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:12.5849412Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:12.5849596Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:12.5849790Z ##[endgroup] 2025-12-04T15:06:12.7141728Z ##[group]Run actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 2025-12-04T15:06:12.7141920Z with: 2025-12-04T15:06:12.7142058Z name: coredumps-default-2-6-linux.rocm.gpu.gfx942.1.b 2025-12-04T15:06:12.7142214Z retention-days: 14 2025-12-04T15:06:12.7142329Z if-no-files-found: ignore 2025-12-04T15:06:12.7142454Z path: ./**/core.[1-9]* 2025-12-04T15:06:12.7142568Z compression-level: 6 2025-12-04T15:06:12.7142681Z overwrite: false 2025-12-04T15:06:12.7142789Z include-hidden-files: false 2025-12-04T15:06:12.7142908Z env: 2025-12-04T15:06:12.7143005Z GIT_DEFAULT_BRANCH: main 2025-12-04T15:06:12.7143151Z RUNNER_ARTIFACT_DIR: /home/runner/_work/_temp/artifacts 2025-12-04T15:06:12.7143513Z RUNNER_TEST_RESULTS_DIR: /home/runner/_work/_temp/test-results 2025-12-04T15:06:12.7143766Z RUNNER_DOCS_DIR: /home/runner/_work/_temp/docs 2025-12-04T15:06:12.7144173Z GPU_FLAG: --device=/dev/mem --device=/dev/kfd --group-add 110 --device /dev/dri/renderD144 --group-add video --group-add 109 --group-add daemon --group-add bin --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --network=host 2025-12-04T15:06:12.7144552Z AWS_DEFAULT_REGION: us-east-1 2025-12-04T15:06:12.7144678Z AWS_REGION: us-east-1 2025-12-04T15:06:12.7144845Z AWS_ACCESS_KEY_ID: *** 2025-12-04T15:06:12.7145010Z AWS_SECRET_ACCESS_KEY: *** 2025-12-04T15:06:12.7147327Z AWS_SESSION_TOKEN: *** 2025-12-04T15:06:12.7147506Z CONTAINER_NAME: 155504386a4130f32fa487db042aa24be2f97deccb7a4078b358cff4f1b39dd3 2025-12-04T15:06:12.7147693Z ##[endgroup] 2025-12-04T15:06:16.5169463Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded. 2025-12-04T15:06:16.5315039Z Post job cleanup. 2025-12-04T15:06:16.5328016Z Post job cleanup. 2025-12-04T15:06:16.5525774Z Logging out of registry 308535385114.dkr.ecr.us-east-1.amazonaws.com 2025-12-04T15:06:16.5709334Z Post job cleanup. 2025-12-04T15:06:16.6326124Z Post job cleanup. 2025-12-04T15:06:16.6346063Z Post job cleanup. 2025-12-04T15:06:16.6798323Z [command]/usr/bin/git version 2025-12-04T15:06:16.6831149Z git version 2.52.0 2025-12-04T15:06:16.6855003Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/acd44303-3846-4edf-9610-b718179b782a/.gitconfig' 2025-12-04T15:06:16.6860390Z Temporarily overriding HOME='/home/runner/_work/_temp/acd44303-3846-4edf-9610-b718179b782a' before making global git config changes 2025-12-04T15:06:16.6860869Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T15:06:16.6863185Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T15:06:16.6890304Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T15:06:16.6913347Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T15:06:16.7105914Z Entering 'android/libs/fbjni' 2025-12-04T15:06:16.7129920Z Entering 'third_party/FP16' 2025-12-04T15:06:16.7155963Z Entering 'third_party/FXdiv' 2025-12-04T15:06:16.7176820Z Entering 'third_party/NNPACK' 2025-12-04T15:06:16.7202234Z Entering 'third_party/NVTX' 2025-12-04T15:06:16.7230427Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:16.7253749Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:16.7285778Z Entering 'third_party/aiter' 2025-12-04T15:06:16.7310058Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:16.7339134Z Entering 'third_party/benchmark' 2025-12-04T15:06:16.7364899Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:16.7391665Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:16.7416533Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:16.7438117Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:16.7461027Z Entering 'third_party/cutlass' 2025-12-04T15:06:16.7487427Z Entering 'third_party/fbgemm' 2025-12-04T15:06:16.7513102Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:16.7532990Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:16.7557306Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:16.7580079Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:16.7608679Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:16.7629035Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:16.7651177Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:16.7675138Z Entering 'third_party/flash-attention' 2025-12-04T15:06:16.7699877Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:16.7738465Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:16.7765268Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:16.7788672Z Entering 'third_party/fmt' 2025-12-04T15:06:16.7829688Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:16.7860498Z Entering 'third_party/gloo' 2025-12-04T15:06:16.7885366Z Entering 'third_party/googletest' 2025-12-04T15:06:16.7916379Z Entering 'third_party/ideep' 2025-12-04T15:06:16.7942889Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:16.7988021Z Entering 'third_party/ittapi' 2025-12-04T15:06:16.8012190Z Entering 'third_party/kineto' 2025-12-04T15:06:16.8034755Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:16.8058565Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:16.8079458Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:16.8106596Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:16.8130063Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:16.8171304Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:16.8194628Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:16.8227690Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:16.8259172Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:16.8281645Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:16.8304249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:16.8336252Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:16.8359229Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:16.8385742Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:16.8406734Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:16.8430305Z Entering 'third_party/kleidiai' 2025-12-04T15:06:16.8452408Z Entering 'third_party/mimalloc' 2025-12-04T15:06:16.8474443Z Entering 'third_party/nlohmann' 2025-12-04T15:06:16.8497380Z Entering 'third_party/onnx' 2025-12-04T15:06:16.8525682Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:16.8551439Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:16.8575518Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:16.8595738Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:16.8617159Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:16.8638209Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:16.8659182Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:16.8688740Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:16.8711664Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:16.8735488Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:16.8776693Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:16.8812832Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:16.8846256Z Entering 'third_party/pocketfft' 2025-12-04T15:06:16.8871101Z Entering 'third_party/protobuf' 2025-12-04T15:06:16.8894076Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:16.8915882Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:16.8944568Z Entering 'third_party/psimd' 2025-12-04T15:06:16.8969774Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:16.8996973Z Entering 'third_party/pybind11' 2025-12-04T15:06:16.9018739Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:16.9047438Z Entering 'third_party/sleef' 2025-12-04T15:06:16.9069659Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:16.9095237Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:16.9116086Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:16.9136406Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:16.9161754Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:16.9183834Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:16.9238022Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T15:06:16.9258223Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9273444Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2025-12-04T15:06:16.9293907Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T15:06:16.9448801Z Entering 'android/libs/fbjni' 2025-12-04T15:06:16.9464891Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9489395Z Entering 'third_party/FP16' 2025-12-04T15:06:16.9503375Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9527204Z Entering 'third_party/FXdiv' 2025-12-04T15:06:16.9540207Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9561673Z Entering 'third_party/NNPACK' 2025-12-04T15:06:16.9574978Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9599204Z Entering 'third_party/NVTX' 2025-12-04T15:06:16.9613879Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9632598Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:16.9645110Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9662226Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:16.9674970Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9702131Z Entering 'third_party/aiter' 2025-12-04T15:06:16.9720509Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9736527Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:16.9750412Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9772067Z Entering 'third_party/benchmark' 2025-12-04T15:06:16.9786049Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9808484Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:16.9821847Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9840571Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:16.9853578Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9870004Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:16.9884666Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9904006Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:16.9921698Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9941308Z Entering 'third_party/cutlass' 2025-12-04T15:06:16.9954162Z http.https://github.com/.extraheader 2025-12-04T15:06:16.9978504Z Entering 'third_party/fbgemm' 2025-12-04T15:06:16.9992284Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0009217Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:17.0034673Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0063208Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:17.0075932Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0096491Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:17.0112149Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0132928Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:17.0152211Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0177765Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:17.0191402Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0207062Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:17.0219920Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0236070Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:17.0254258Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0272618Z Entering 'third_party/flash-attention' 2025-12-04T15:06:17.0286053Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0302738Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:17.0314866Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0336177Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:17.0349187Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0370144Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:17.0382574Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0400426Z Entering 'third_party/fmt' 2025-12-04T15:06:17.0419815Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0436250Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:17.0450687Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0467933Z Entering 'third_party/gloo' 2025-12-04T15:06:17.0480937Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0498741Z Entering 'third_party/googletest' 2025-12-04T15:06:17.0511364Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0528587Z Entering 'third_party/ideep' 2025-12-04T15:06:17.0540840Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0556083Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:17.0581201Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0603401Z Entering 'third_party/ittapi' 2025-12-04T15:06:17.0616821Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0635651Z Entering 'third_party/kineto' 2025-12-04T15:06:17.0649578Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0671674Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:17.0691664Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0709816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:17.0723680Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0743334Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:17.0756417Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0776460Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:17.0789246Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0811184Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:17.0823527Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0838269Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:17.0851674Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0870408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:17.0881563Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0898658Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:17.0909464Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0928715Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:17.0942068Z http.https://github.com/.extraheader 2025-12-04T15:06:17.0976932Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:17.0993982Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1013449Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:17.1027179Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1043602Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.1061081Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1078931Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.1092452Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1111829Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:17.1123974Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1141147Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:17.1154833Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1173517Z Entering 'third_party/kleidiai' 2025-12-04T15:06:17.1186615Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1203084Z Entering 'third_party/mimalloc' 2025-12-04T15:06:17.1215977Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1234558Z Entering 'third_party/nlohmann' 2025-12-04T15:06:17.1247095Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1264031Z Entering 'third_party/onnx' 2025-12-04T15:06:17.1276721Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1300648Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:17.1314190Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1333196Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:17.1345807Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1362128Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:17.1376246Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1392460Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:17.1405537Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1420723Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:17.1432218Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1449948Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:17.1464515Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1480650Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:17.1494640Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1514942Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:17.1532621Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1549586Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:17.1562212Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1578111Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.1590069Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1608229Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.1620287Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1638168Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:17.1652077Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1675728Z Entering 'third_party/pocketfft' 2025-12-04T15:06:17.1688942Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1706653Z Entering 'third_party/protobuf' 2025-12-04T15:06:17.1719564Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1736059Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:17.1752515Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1768917Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:17.1781283Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1799236Z Entering 'third_party/psimd' 2025-12-04T15:06:17.1813059Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1833719Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:17.1847364Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1868169Z Entering 'third_party/pybind11' 2025-12-04T15:06:17.1880418Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1901441Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:17.1914643Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1932297Z Entering 'third_party/sleef' 2025-12-04T15:06:17.1944665Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1961213Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:17.1974348Z http.https://github.com/.extraheader 2025-12-04T15:06:17.1999194Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:17.2010631Z http.https://github.com/.extraheader 2025-12-04T15:06:17.2027563Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:17.2040060Z http.https://github.com/.extraheader 2025-12-04T15:06:17.2064897Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:17.2076823Z http.https://github.com/.extraheader 2025-12-04T15:06:17.2097058Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:17.2110710Z http.https://github.com/.extraheader 2025-12-04T15:06:17.2125422Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:17.2137750Z http.https://github.com/.extraheader 2025-12-04T15:06:17.2177082Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.2199631Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T15:06:17.2360559Z Entering 'android/libs/fbjni' 2025-12-04T15:06:17.2370524Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T15:06:17.2383059Z Entering 'third_party/FP16' 2025-12-04T15:06:17.2395796Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T15:06:17.2404907Z Entering 'third_party/FXdiv' 2025-12-04T15:06:17.2419832Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T15:06:17.2429051Z Entering 'third_party/NNPACK' 2025-12-04T15:06:17.2449550Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T15:06:17.2458358Z Entering 'third_party/NVTX' 2025-12-04T15:06:17.2468175Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T15:06:17.2477506Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:17.2487633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T15:06:17.2496211Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:17.2505633Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T15:06:17.2520521Z Entering 'third_party/aiter' 2025-12-04T15:06:17.2530593Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T15:06:17.2540457Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:17.2548354Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T15:06:17.2563555Z Entering 'third_party/benchmark' 2025-12-04T15:06:17.2573597Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:17.2585758Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:17.2598876Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T15:06:17.2610451Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:17.2624968Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T15:06:17.2634002Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:17.2646404Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T15:06:17.2655653Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:17.2667475Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T15:06:17.2680658Z Entering 'third_party/cutlass' 2025-12-04T15:06:17.2691781Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T15:06:17.2704061Z Entering 'third_party/fbgemm' 2025-12-04T15:06:17.2714120Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T15:06:17.2725057Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:17.2743403Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T15:06:17.2756709Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:17.2766724Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T15:06:17.2779837Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:17.2792366Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T15:06:17.2800036Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:17.2809239Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T15:06:17.2821690Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:17.2835124Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T15:06:17.2847391Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:17.2856785Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T15:06:17.2864624Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:17.2875745Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T15:06:17.2887633Z Entering 'third_party/flash-attention' 2025-12-04T15:06:17.2897911Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T15:06:17.2910429Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:17.2922227Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T15:06:17.2934369Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:17.2944086Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T15:06:17.2957642Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:17.2967466Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T15:06:17.2977785Z Entering 'third_party/fmt' 2025-12-04T15:06:17.2987208Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:06:17.2996283Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:17.3013631Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T15:06:17.3026272Z Entering 'third_party/gloo' 2025-12-04T15:06:17.3038274Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T15:06:17.3046929Z Entering 'third_party/googletest' 2025-12-04T15:06:17.3057235Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.3069041Z Entering 'third_party/ideep' 2025-12-04T15:06:17.3079100Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T15:06:17.3086715Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:17.3097069Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T15:06:17.3110361Z Entering 'third_party/ittapi' 2025-12-04T15:06:17.3121291Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T15:06:17.3129360Z Entering 'third_party/kineto' 2025-12-04T15:06:17.3139613Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T15:06:17.3148443Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:17.3160107Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T15:06:17.3168994Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:17.3178679Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T15:06:17.3187789Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:17.3199402Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T15:06:17.3208109Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:17.3217197Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:06:17.3225698Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:17.3244726Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T15:06:17.3253881Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:17.3279028Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T15:06:17.3300128Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:17.3311496Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T15:06:17.3321964Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:17.3334683Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.3344841Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:17.3358413Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T15:06:17.3368747Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:17.3379031Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T15:06:17.3393158Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:17.3402937Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:06:17.3412124Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.3422043Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:06:17.3430977Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.3440286Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:06:17.3451932Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:17.3465396Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T15:06:17.3472556Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:17.3482888Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.3493094Z Entering 'third_party/kleidiai' 2025-12-04T15:06:17.3503612Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T15:06:17.3513477Z Entering 'third_party/mimalloc' 2025-12-04T15:06:17.3524605Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T15:06:17.3532984Z Entering 'third_party/nlohmann' 2025-12-04T15:06:17.3543915Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T15:06:17.3553680Z Entering 'third_party/onnx' 2025-12-04T15:06:17.3563536Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T15:06:17.3585468Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:17.3602293Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:17.3619136Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:17.3634303Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T15:06:17.3643160Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:17.3658494Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:17.3668945Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:17.3683637Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.3693451Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:17.3712317Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T15:06:17.3722237Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:17.3735343Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T15:06:17.3754441Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:17.3765399Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T15:06:17.3781859Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:17.3792524Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T15:06:17.3802026Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:17.3817207Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:06:17.3825630Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.3835737Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:06:17.3846003Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.3858018Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:06:17.3871584Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:17.3883855Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T15:06:17.3904793Z Entering 'third_party/pocketfft' 2025-12-04T15:06:17.3916318Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T15:06:17.3925995Z Entering 'third_party/protobuf' 2025-12-04T15:06:17.3940253Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T15:06:17.3949866Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:17.3961126Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:17.3969751Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:17.3979500Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.3990162Z Entering 'third_party/psimd' 2025-12-04T15:06:17.4001744Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T15:06:17.4010577Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:17.4019963Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T15:06:17.4029188Z Entering 'third_party/pybind11' 2025-12-04T15:06:17.4039545Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:17.4048595Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:17.4058136Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T15:06:17.4066884Z Entering 'third_party/sleef' 2025-12-04T15:06:17.4076918Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T15:06:17.4085450Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:17.4099448Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T15:06:17.4107679Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:17.4121421Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:17.4129128Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:17.4138677Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T15:06:17.4147024Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:17.4156216Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T15:06:17.4164176Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:17.4178498Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:17.4188592Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:17.4199622Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T15:06:17.4227698Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4247134Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4267267Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4282796Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4299267Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4314825Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4330957Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4344047Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4358656Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4374346Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4389188Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4402274Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4415997Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4430385Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4444864Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4458630Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4472796Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4487374Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4503664Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4520004Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4535343Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4550003Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4564245Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4576838Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4590875Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4606121Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4618841Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4632621Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4647203Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4661112Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4677180Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4694435Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4711346Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4728959Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4743756Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4761099Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4776264Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4794944Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4809992Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4831205Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4848006Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4862273Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4879771Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4895042Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4910121Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4924932Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4939042Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4953679Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4969052Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4984784Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.4998364Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5012854Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5029084Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5044098Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5059001Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5073904Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5089605Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5105905Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5121770Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5137291Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5152294Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5167600Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5183411Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5203077Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5218565Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5233691Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5247352Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5263545Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5279436Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5294010Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5308965Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5323919Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5338865Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5358067Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5372944Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5386393Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5400811Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5416098Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5430757Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5448576Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5467536Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:17.5577914Z Post job cleanup. 2025-12-04T15:06:17.6036487Z [command]/usr/bin/git version 2025-12-04T15:06:17.6062995Z git version 2.52.0 2025-12-04T15:06:17.6080170Z Copying '/home/runner/.gitconfig' to '/home/runner/_work/_temp/899ee079-ce35-4ca4-b9b3-90ae7ce1d9ed/.gitconfig' 2025-12-04T15:06:17.6084776Z Temporarily overriding HOME='/home/runner/_work/_temp/899ee079-ce35-4ca4-b9b3-90ae7ce1d9ed' before making global git config changes 2025-12-04T15:06:17.6085333Z Adding repository directory to the temporary git global config as a safe directory 2025-12-04T15:06:17.6086838Z [command]/usr/bin/git config --global --add safe.directory /home/runner/_work/pytorch/pytorch 2025-12-04T15:06:17.6107680Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2025-12-04T15:06:17.6126790Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2025-12-04T15:06:17.6289930Z Entering 'android/libs/fbjni' 2025-12-04T15:06:17.6314198Z Entering 'third_party/FP16' 2025-12-04T15:06:17.6336419Z Entering 'third_party/FXdiv' 2025-12-04T15:06:17.6358888Z Entering 'third_party/NNPACK' 2025-12-04T15:06:17.6384496Z Entering 'third_party/NVTX' 2025-12-04T15:06:17.6408811Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:17.6435654Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:17.6466552Z Entering 'third_party/aiter' 2025-12-04T15:06:17.6497100Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:17.6545595Z Entering 'third_party/benchmark' 2025-12-04T15:06:17.6572670Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:17.6600131Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:17.6621004Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:17.6646150Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:17.6668901Z Entering 'third_party/cutlass' 2025-12-04T15:06:17.6701896Z Entering 'third_party/fbgemm' 2025-12-04T15:06:17.6733588Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:17.6766746Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:17.6794587Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:17.6821304Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:17.6856163Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:17.6882933Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:17.6905648Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:17.6938625Z Entering 'third_party/flash-attention' 2025-12-04T15:06:17.6964974Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:17.6989829Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:17.7022396Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:17.7053146Z Entering 'third_party/fmt' 2025-12-04T15:06:17.7076112Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:17.7100793Z Entering 'third_party/gloo' 2025-12-04T15:06:17.7132359Z Entering 'third_party/googletest' 2025-12-04T15:06:17.7153333Z Entering 'third_party/ideep' 2025-12-04T15:06:17.7174458Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:17.7200631Z Entering 'third_party/ittapi' 2025-12-04T15:06:17.7226853Z Entering 'third_party/kineto' 2025-12-04T15:06:17.7255369Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:17.7280438Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:17.7304832Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:17.7335740Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:17.7359537Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:17.7387004Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:17.7418160Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:17.7448788Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:17.7473950Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:17.7505196Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:17.7529224Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:17.7553483Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.7577088Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.7607569Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:17.7632143Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:17.7658638Z Entering 'third_party/kleidiai' 2025-12-04T15:06:17.7689999Z Entering 'third_party/mimalloc' 2025-12-04T15:06:17.7716870Z Entering 'third_party/nlohmann' 2025-12-04T15:06:17.7744579Z Entering 'third_party/onnx' 2025-12-04T15:06:17.7776049Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:17.7801599Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:17.7825387Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:17.7850139Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:17.7877386Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:17.7907100Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:17.7932692Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:17.7959011Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:17.7983555Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:17.8010498Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.8036666Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.8067861Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:17.8107693Z Entering 'third_party/pocketfft' 2025-12-04T15:06:17.8136296Z Entering 'third_party/protobuf' 2025-12-04T15:06:17.8166759Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:17.8190855Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:17.8224035Z Entering 'third_party/psimd' 2025-12-04T15:06:17.8247843Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:17.8271077Z Entering 'third_party/pybind11' 2025-12-04T15:06:17.8293723Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:17.8316645Z Entering 'third_party/sleef' 2025-12-04T15:06:17.8339898Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:17.8361562Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:17.8387144Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:17.8408664Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:17.8443580Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:17.8470532Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:17.8514439Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2025-12-04T15:06:17.8536691Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2025-12-04T15:06:17.8691001Z Entering 'android/libs/fbjni' 2025-12-04T15:06:17.8724942Z Entering 'third_party/FP16' 2025-12-04T15:06:17.8752967Z Entering 'third_party/FXdiv' 2025-12-04T15:06:17.8779268Z Entering 'third_party/NNPACK' 2025-12-04T15:06:17.8803916Z Entering 'third_party/NVTX' 2025-12-04T15:06:17.8831372Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:17.8853854Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:17.8885296Z Entering 'third_party/aiter' 2025-12-04T15:06:17.8906664Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:17.8933962Z Entering 'third_party/benchmark' 2025-12-04T15:06:17.8956286Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:17.8983122Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:17.9005996Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:17.9027552Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:17.9051004Z Entering 'third_party/cutlass' 2025-12-04T15:06:17.9076698Z Entering 'third_party/fbgemm' 2025-12-04T15:06:17.9103468Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:17.9124550Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:17.9151400Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:17.9178379Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:17.9204339Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:17.9229669Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:17.9254953Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:17.9278041Z Entering 'third_party/flash-attention' 2025-12-04T15:06:17.9299484Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:17.9329344Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:17.9355431Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:17.9379243Z Entering 'third_party/fmt' 2025-12-04T15:06:17.9405552Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:17.9430581Z Entering 'third_party/gloo' 2025-12-04T15:06:17.9452526Z Entering 'third_party/googletest' 2025-12-04T15:06:17.9478416Z Entering 'third_party/ideep' 2025-12-04T15:06:17.9500797Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:17.9541530Z Entering 'third_party/ittapi' 2025-12-04T15:06:17.9566476Z Entering 'third_party/kineto' 2025-12-04T15:06:17.9593115Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:17.9624170Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:17.9650547Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:17.9678816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:17.9700763Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:17.9726186Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:17.9761282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:17.9782706Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:17.9803424Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:17.9824913Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:17.9852089Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:17.9881948Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:17.9906250Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:17.9941452Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:17.9967410Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:17.9994574Z Entering 'third_party/kleidiai' 2025-12-04T15:06:18.0026466Z Entering 'third_party/mimalloc' 2025-12-04T15:06:18.0063454Z Entering 'third_party/nlohmann' 2025-12-04T15:06:18.0095287Z Entering 'third_party/onnx' 2025-12-04T15:06:18.0131735Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:18.0158278Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:18.0191059Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:18.0215884Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:18.0240421Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:18.0263083Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:18.0288109Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:18.0311273Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:18.0332521Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:18.0359122Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:18.0383062Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:18.0418739Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:18.0449234Z Entering 'third_party/pocketfft' 2025-12-04T15:06:18.0470928Z Entering 'third_party/protobuf' 2025-12-04T15:06:18.0493572Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:18.0515688Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:18.0539666Z Entering 'third_party/psimd' 2025-12-04T15:06:18.0574386Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:18.0597885Z Entering 'third_party/pybind11' 2025-12-04T15:06:18.0619177Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:18.0639384Z Entering 'third_party/sleef' 2025-12-04T15:06:18.0661033Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:18.0685682Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:18.0712601Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:18.0734881Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:18.0762520Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:18.0783131Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:18.0829205Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.0847862Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2025-12-04T15:06:18.1005850Z Entering 'android/libs/fbjni' 2025-12-04T15:06:18.1021375Z file:/home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2025-12-04T15:06:18.1029504Z Entering 'third_party/FP16' 2025-12-04T15:06:18.1043569Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2025-12-04T15:06:18.1050440Z Entering 'third_party/FXdiv' 2025-12-04T15:06:18.1061789Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2025-12-04T15:06:18.1071880Z Entering 'third_party/NNPACK' 2025-12-04T15:06:18.1089678Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2025-12-04T15:06:18.1101028Z Entering 'third_party/NVTX' 2025-12-04T15:06:18.1112835Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config remote.origin.url 2025-12-04T15:06:18.1128204Z Entering 'third_party/VulkanMemoryAllocator' 2025-12-04T15:06:18.1140769Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config remote.origin.url 2025-12-04T15:06:18.1152251Z Entering 'third_party/XNNPACK' 2025-12-04T15:06:18.1162700Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2025-12-04T15:06:18.1178068Z Entering 'third_party/aiter' 2025-12-04T15:06:18.1188105Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config remote.origin.url 2025-12-04T15:06:18.1197107Z Entering 'third_party/aiter/3rdparty/composable_kernel' 2025-12-04T15:06:18.1205962Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config remote.origin.url 2025-12-04T15:06:18.1221841Z Entering 'third_party/benchmark' 2025-12-04T15:06:18.1232308Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:18.1240757Z Entering 'third_party/composable_kernel' 2025-12-04T15:06:18.1251438Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config remote.origin.url 2025-12-04T15:06:18.1262478Z Entering 'third_party/cpp-httplib' 2025-12-04T15:06:18.1273229Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config remote.origin.url 2025-12-04T15:06:18.1281657Z Entering 'third_party/cpuinfo' 2025-12-04T15:06:18.1293582Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2025-12-04T15:06:18.1303191Z Entering 'third_party/cudnn_frontend' 2025-12-04T15:06:18.1313786Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2025-12-04T15:06:18.1324160Z Entering 'third_party/cutlass' 2025-12-04T15:06:18.1334551Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config remote.origin.url 2025-12-04T15:06:18.1351904Z Entering 'third_party/fbgemm' 2025-12-04T15:06:18.1361899Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2025-12-04T15:06:18.1379635Z Entering 'third_party/fbgemm/external/asmjit' 2025-12-04T15:06:18.1390795Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config remote.origin.url 2025-12-04T15:06:18.1400761Z Entering 'third_party/fbgemm/external/composable_kernel' 2025-12-04T15:06:18.1410866Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config remote.origin.url 2025-12-04T15:06:18.1421738Z Entering 'third_party/fbgemm/external/cpuinfo' 2025-12-04T15:06:18.1435378Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config remote.origin.url 2025-12-04T15:06:18.1444573Z Entering 'third_party/fbgemm/external/cutlass' 2025-12-04T15:06:18.1453692Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config remote.origin.url 2025-12-04T15:06:18.1467323Z Entering 'third_party/fbgemm/external/googletest' 2025-12-04T15:06:18.1478021Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config remote.origin.url 2025-12-04T15:06:18.1486756Z Entering 'third_party/fbgemm/external/hipify_torch' 2025-12-04T15:06:18.1496044Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config remote.origin.url 2025-12-04T15:06:18.1504267Z Entering 'third_party/fbgemm/external/json' 2025-12-04T15:06:18.1513588Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config remote.origin.url 2025-12-04T15:06:18.1523546Z Entering 'third_party/flash-attention' 2025-12-04T15:06:18.1535754Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config remote.origin.url 2025-12-04T15:06:18.1548781Z Entering 'third_party/flash-attention/csrc/composable_kernel' 2025-12-04T15:06:18.1568406Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config remote.origin.url 2025-12-04T15:06:18.1578694Z Entering 'third_party/flash-attention/csrc/cutlass' 2025-12-04T15:06:18.1594980Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config remote.origin.url 2025-12-04T15:06:18.1614147Z Entering 'third_party/flatbuffers' 2025-12-04T15:06:18.1625248Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2025-12-04T15:06:18.1634930Z Entering 'third_party/fmt' 2025-12-04T15:06:18.1644434Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:06:18.1653470Z Entering 'third_party/gemmlowp/gemmlowp' 2025-12-04T15:06:18.1664292Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2025-12-04T15:06:18.1678288Z Entering 'third_party/gloo' 2025-12-04T15:06:18.1687786Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2025-12-04T15:06:18.1697149Z Entering 'third_party/googletest' 2025-12-04T15:06:18.1707446Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.1716540Z Entering 'third_party/ideep' 2025-12-04T15:06:18.1726068Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2025-12-04T15:06:18.1735245Z Entering 'third_party/ideep/mkl-dnn' 2025-12-04T15:06:18.1748250Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2025-12-04T15:06:18.1768308Z Entering 'third_party/ittapi' 2025-12-04T15:06:18.1781208Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config remote.origin.url 2025-12-04T15:06:18.1790382Z Entering 'third_party/kineto' 2025-12-04T15:06:18.1802023Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2025-12-04T15:06:18.1811755Z Entering 'third_party/kineto/libkineto/third_party/dynolog' 2025-12-04T15:06:18.1826541Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config remote.origin.url 2025-12-04T15:06:18.1835667Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM' 2025-12-04T15:06:18.1848055Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config remote.origin.url 2025-12-04T15:06:18.1857970Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr' 2025-12-04T15:06:18.1867960Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config remote.origin.url 2025-12-04T15:06:18.1876220Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt' 2025-12-04T15:06:18.1886178Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config remote.origin.url 2025-12-04T15:06:18.1895374Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags' 2025-12-04T15:06:18.1906851Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config remote.origin.url 2025-12-04T15:06:18.1915107Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc' 2025-12-04T15:06:18.1928053Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config remote.origin.url 2025-12-04T15:06:18.1940290Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog' 2025-12-04T15:06:18.1961007Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config remote.origin.url 2025-12-04T15:06:18.1969129Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest' 2025-12-04T15:06:18.1978788Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.1987764Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json' 2025-12-04T15:06:18.1996771Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config remote.origin.url 2025-12-04T15:06:18.2006061Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs' 2025-12-04T15:06:18.2021258Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config remote.origin.url 2025-12-04T15:06:18.2030058Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp' 2025-12-04T15:06:18.2044792Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:06:18.2054743Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:18.2064717Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:06:18.2074308Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:18.2083893Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:06:18.2095496Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2025-12-04T15:06:18.2111414Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2025-12-04T15:06:18.2121030Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2025-12-04T15:06:18.2132071Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.2142475Z Entering 'third_party/kleidiai' 2025-12-04T15:06:18.2154140Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config remote.origin.url 2025-12-04T15:06:18.2164270Z Entering 'third_party/mimalloc' 2025-12-04T15:06:18.2174516Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config remote.origin.url 2025-12-04T15:06:18.2183972Z Entering 'third_party/nlohmann' 2025-12-04T15:06:18.2193958Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config remote.origin.url 2025-12-04T15:06:18.2204015Z Entering 'third_party/onnx' 2025-12-04T15:06:18.2214279Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2025-12-04T15:06:18.2229661Z Entering 'third_party/onnx/third_party/pybind11' 2025-12-04T15:06:18.2240507Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:18.2252132Z Entering 'third_party/opentelemetry-cpp' 2025-12-04T15:06:18.2266014Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config remote.origin.url 2025-12-04T15:06:18.2276443Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark' 2025-12-04T15:06:18.2291615Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:18.2301673Z Entering 'third_party/opentelemetry-cpp/third_party/googletest' 2025-12-04T15:06:18.2315463Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.2324937Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl' 2025-12-04T15:06:18.2337271Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config remote.origin.url 2025-12-04T15:06:18.2349593Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json' 2025-12-04T15:06:18.2367894Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config remote.origin.url 2025-12-04T15:06:18.2378048Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto' 2025-12-04T15:06:18.2390074Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config remote.origin.url 2025-12-04T15:06:18.2399721Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp' 2025-12-04T15:06:18.2411649Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config remote.origin.url 2025-12-04T15:06:18.2420692Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp' 2025-12-04T15:06:18.2433599Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config remote.origin.url 2025-12-04T15:06:18.2447887Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb' 2025-12-04T15:06:18.2461724Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config remote.origin.url 2025-12-04T15:06:18.2473668Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest' 2025-12-04T15:06:18.2491284Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config remote.origin.url 2025-12-04T15:06:18.2502721Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg' 2025-12-04T15:06:18.2514648Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config remote.origin.url 2025-12-04T15:06:18.2533122Z Entering 'third_party/pocketfft' 2025-12-04T15:06:18.2545905Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2025-12-04T15:06:18.2555615Z Entering 'third_party/protobuf' 2025-12-04T15:06:18.2569278Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2025-12-04T15:06:18.2579573Z Entering 'third_party/protobuf/third_party/benchmark' 2025-12-04T15:06:18.2588631Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2025-12-04T15:06:18.2597275Z Entering 'third_party/protobuf/third_party/googletest' 2025-12-04T15:06:18.2608617Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.2619937Z Entering 'third_party/psimd' 2025-12-04T15:06:18.2630364Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2025-12-04T15:06:18.2641359Z Entering 'third_party/pthreadpool' 2025-12-04T15:06:18.2651064Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2025-12-04T15:06:18.2659384Z Entering 'third_party/pybind11' 2025-12-04T15:06:18.2671671Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:18.2679584Z Entering 'third_party/python-peachpy' 2025-12-04T15:06:18.2691666Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2025-12-04T15:06:18.2701297Z Entering 'third_party/sleef' 2025-12-04T15:06:18.2712031Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2025-12-04T15:06:18.2720591Z Entering 'third_party/tensorpipe' 2025-12-04T15:06:18.2730210Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2025-12-04T15:06:18.2741265Z Entering 'third_party/tensorpipe/third_party/googletest' 2025-12-04T15:06:18.2751210Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2025-12-04T15:06:18.2759790Z Entering 'third_party/tensorpipe/third_party/libnop' 2025-12-04T15:06:18.2773810Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2025-12-04T15:06:18.2782463Z Entering 'third_party/tensorpipe/third_party/libuv' 2025-12-04T15:06:18.2792010Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2025-12-04T15:06:18.2800056Z Entering 'third_party/tensorpipe/third_party/pybind11' 2025-12-04T15:06:18.2811096Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2025-12-04T15:06:18.2825253Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2025-12-04T15:06:18.2837960Z file:/home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2025-12-04T15:06:18.2870145Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2891251Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2910360Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2928131Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2947276Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2966785Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.2991544Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3009424Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3025694Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3045187Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3061506Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3080164Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3101200Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3117416Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3135048Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3154779Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3170776Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3189746Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3206381Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3230385Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3249876Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3268733Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3284993Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3301012Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3319775Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3337968Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3354569Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3373721Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3391330Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3412171Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3428499Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3445192Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3463600Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3478661Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3497918Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3513450Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3529733Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3546851Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3569576Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3586674Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3612958Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3631224Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3647678Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3662579Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3686359Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3715069Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3742906Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3760530Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3777600Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3798481Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3819649Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3840375Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3857692Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3880482Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3897865Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3921000Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3950165Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3970449Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.3990742Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4017534Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4040230Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4070576Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4092574Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4113840Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4132306Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4150714Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4172263Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4189003Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4209969Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4229805Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4247916Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4266036Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4283496Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4300409Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4316797Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4334909Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4352833Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4369913Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4387651Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4407486Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4425953Z [command]/usr/bin/git config --file /home/runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir: 2025-12-04T15:06:18.4548287Z Cleaning up orphan processes